Sample records for multivariate outlier detection

  1. Outlier Detection in Hyperspectral Imagery Using Closest Distance to Center with Ellipsoidal Multivariate Trimming

    DTIC Science & Technology

    2011-01-01

    where r << P. The use of PCA for finding outliers in multivariate data is surveyed by Gnanadesikan and Kettenring16 and Rao.17 As alluded to earlier...1984. 16. Gnanadesikan R and Kettenring JR. Robust estimates, residu­ als, and outlier detection with multiresponse data. Biometrics 1972; 28: 81–124

  2. System and Method for Outlier Detection via Estimating Clusters

    NASA Technical Reports Server (NTRS)

    Iverson, David J. (Inventor)

    2016-01-01

    An efficient method and system for real-time or offline analysis of multivariate sensor data for use in anomaly detection, fault detection, and system health monitoring is provided. Models automatically derived from training data, typically nominal system data acquired from sensors in normally operating conditions or from detailed simulations, are used to identify unusual, out of family data samples (outliers) that indicate possible system failure or degradation. Outliers are determined through analyzing a degree of deviation of current system behavior from the models formed from the nominal system data. The deviation of current system behavior is presented as an easy to interpret numerical score along with a measure of the relative contribution of each system parameter to any off-nominal deviation. The techniques described herein may also be used to "clean" the training data.

  3. The effectiveness of robust RMCD control chart as outliers’ detector

    NASA Astrophysics Data System (ADS)

    Darmanto; Astutik, Suci

    2017-12-01

    A well-known control chart to monitor a multivariate process is Hotelling’s T 2 which its parameters are estimated classically, very sensitive and also marred by masking and swamping of outliers data effect. To overcome these situation, robust estimators are strongly recommended. One of robust estimators is re-weighted minimum covariance determinant (RMCD) which has robust characteristics as same as MCD. In this paper, the effectiveness term is accuracy of the RMCD control chart in detecting outliers as real outliers. In other word, how effectively this control chart can identify and remove masking and swamping effects of outliers. We assessed the effectiveness the robust control chart based on simulation by considering different scenarios: n sample sizes, proportion of outliers, number of p quality characteristics. We found that in some scenarios, this RMCD robust control chart works effectively.

  4. LSST Astroinformatics And Astrostatistics: Data-oriented Astronomical Research

    NASA Astrophysics Data System (ADS)

    Borne, Kirk D.; Stassun, K.; Brunner, R. J.; Djorgovski, S. G.; Graham, M.; Hakkila, J.; Mahabal, A.; Paegert, M.; Pesenson, M.; Ptak, A.; Scargle, J.; Informatics, LSST; Statistics Team

    2011-01-01

    The LSST Informatics and Statistics Science Collaboration (ISSC) focuses on research and scientific discovery challenges posed by the very large and complex data collection that LSST will generate. Application areas include astroinformatics, machine learning, data mining, astrostatistics, visualization, scientific data semantics, time series analysis, and advanced signal processing. Research problems to be addressed with these methodologies include transient event characterization and classification, rare class discovery, correlation mining, outlier/anomaly/surprise detection, improved estimators (e.g., for photometric redshift or early onset supernova classification), exploration of highly dimensional (multivariate) data catalogs, and more. We present sample science results from these data-oriented approaches to large-data astronomical research. We present results from LSST ISSC team members, including the EB (Eclipsing Binary) Factory, the environmental variations in the fundamental plane of elliptical galaxies, and outlier detection in multivariate catalogs.

  5. Detecting Outliers in Factor Analysis Using the Forward Search Algorithm

    ERIC Educational Resources Information Center

    Mavridis, Dimitris; Moustaki, Irini

    2008-01-01

    In this article we extend and implement the forward search algorithm for identifying atypical subjects/observations in factor analysis models. The forward search has been mainly developed for detecting aberrant observations in regression models (Atkinson, 1994) and in multivariate methods such as cluster and discriminant analysis (Atkinson, Riani,…

  6. Identification of Differential Item Functioning in Multiple-Group Settings: A Multivariate Outlier Detection Approach

    ERIC Educational Resources Information Center

    Magis, David; De Boeck, Paul

    2011-01-01

    We focus on the identification of differential item functioning (DIF) when more than two groups of examinees are considered. We propose to consider items as elements of a multivariate space, where DIF items are outlying elements. Following this approach, the situation of multiple groups is a quite natural case. A robust statistics technique is…

  7. Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

    NASA Astrophysics Data System (ADS)

    Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

    2015-09-01

    Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.

  8. Power enhancement via multivariate outlier testing with gene expression arrays.

    PubMed

    Asare, Adam L; Gao, Zhong; Carey, Vincent J; Wang, Richard; Seyfert-Margolis, Vicki

    2009-01-01

    As the use of microarrays in human studies continues to increase, stringent quality assurance is necessary to ensure accurate experimental interpretation. We present a formal approach for microarray quality assessment that is based on dimension reduction of established measures of signal and noise components of expression followed by parametric multivariate outlier testing. We applied our approach to several data resources. First, as a negative control, we found that the Affymetrix and Illumina contributions to MAQC data were free from outliers at a nominal outlier flagging rate of alpha=0.01. Second, we created a tunable framework for artificially corrupting intensity data from the Affymetrix Latin Square spike-in experiment to allow investigation of sensitivity and specificity of quality assurance (QA) criteria. Third, we applied the procedure to 507 Affymetrix microarray GeneChips processed with RNA from human peripheral blood samples. We show that exclusion of arrays by this approach substantially increases inferential power, or the ability to detect differential expression, in large clinical studies. http://bioconductor.org/packages/2.3/bioc/html/arrayMvout.html and http://bioconductor.org/packages/2.3/bioc/html/affyContam.html affyContam (credentials: readonly/readonly)

  9. Detection of outliers in water quality monitoring samples using functional data analysis in San Esteban estuary (Northern Spain).

    PubMed

    Díaz Muñiz, C; García Nieto, P J; Alonso Fernández, J R; Martínez Torres, J; Taboada, J

    2012-11-15

    Water quality controls involve large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using oxygen and turbidity as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. Our approach lies in considering water quality monitoring through time as curves instead of vectors, that is to say, the data set of the problem is considered as a time-dependent function and not as a set of discrete values in different time instants. The methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in San Esteban estuary. Results were discussed in terms of origin, causes, etc., and compared with those obtained using the conventional method based on vector comparison. Finally, the advantages of the functional method are exposed. Copyright © 2012 Elsevier B.V. All rights reserved.

  10. DigOut: viewing differential expression genes as outliers.

    PubMed

    Yu, Hui; Tu, Kang; Xie, Lu; Li, Yuan-Yuan

    2010-12-01

    With regards to well-replicated two-conditional microarray datasets, the selection of differentially expressed (DE) genes is a well-studied computational topic, but for multi-conditional microarray datasets with limited or no replication, the same task is not properly addressed by previous studies. This paper adopts multivariate outlier analysis to analyze replication-lacking multi-conditional microarray datasets, finding that it performs significantly better than the widely used limit fold change (LFC) model in a simulated comparative experiment. Compared with the LFC model, the multivariate outlier analysis also demonstrates improved stability against sample variations in a series of manipulated real expression datasets. The reanalysis of a real non-replicated multi-conditional expression dataset series leads to satisfactory results. In conclusion, a multivariate outlier analysis algorithm, like DigOut, is particularly useful for selecting DE genes from non-replicated multi-conditional gene expression dataset.

  11. Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method.

    PubMed

    Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

    2016-01-25

    To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb's test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R² and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data.

  12. Evaluation of statistical protocols for quality control of ecosystem carbon dioxide fluxes

    Treesearch

    Jorge F. Perez-Quezada; Nicanor Z. Saliendra; William E. Emmerich; Emilio A. Laca

    2007-01-01

    The process of quality control of micrometeorological and carbon dioxide (CO2) flux data can be subjective and may lack repeatability, which would undermine the results of many studies. Multivariate statistical methods and time series analysis were used together and independently to detect and replace outliers in CO2 flux...

  13. Open-Source Radiation Exposure Extraction Engine (RE3) with Patient-Specific Outlier Detection.

    PubMed

    Weisenthal, Samuel J; Folio, Les; Kovacs, William; Seff, Ari; Derderian, Vana; Summers, Ronald M; Yao, Jianhua

    2016-08-01

    We present an open-source, picture archiving and communication system (PACS)-integrated radiation exposure extraction engine (RE3) that provides study-, series-, and slice-specific data for automated monitoring of computed tomography (CT) radiation exposure. RE3 was built using open-source components and seamlessly integrates with the PACS. RE3 calculations of dose length product (DLP) from the Digital imaging and communications in medicine (DICOM) headers showed high agreement (R (2) = 0.99) with the vendor dose pages. For study-specific outlier detection, RE3 constructs robust, automatically updating multivariable regression models to predict DLP in the context of patient gender and age, scan length, water-equivalent diameter (D w), and scanned body volume (SBV). As proof of concept, the model was trained on 811 CT chest, abdomen + pelvis (CAP) exams and 29 outliers were detected. The continuous variables used in the outlier detection model were scan length (R (2)  = 0.45), D w (R (2) = 0.70), SBV (R (2) = 0.80), and age (R (2) = 0.01). The categorical variables were gender (male average 1182.7 ± 26.3 and female 1047.1 ± 26.9 mGy cm) and pediatric status (pediatric average 710.7 ± 73.6 mGy cm and adult 1134.5 ± 19.3 mGy cm).

  14. Evaluation of the Bitterness of Traditional Chinese Medicines using an E-Tongue Coupled with a Robust Partial Least Squares Regression Method

    PubMed Central

    Lin, Zhaozhou; Zhang, Qiao; Liu, Ruixin; Gao, Xiaojie; Zhang, Lu; Kang, Bingya; Shi, Junhan; Wu, Zidan; Gui, Xinjing; Li, Xuelin

    2016-01-01

    To accurately, safely, and efficiently evaluate the bitterness of Traditional Chinese Medicines (TCMs), a robust predictor was developed using robust partial least squares (RPLS) regression method based on data obtained from an electronic tongue (e-tongue) system. The data quality was verified by the Grubb’s test. Moreover, potential outliers were detected based on both the standardized residual and score distance calculated for each sample. The performance of RPLS on the dataset before and after outlier detection was compared to other state-of-the-art methods including multivariate linear regression, least squares support vector machine, and the plain partial least squares regression. Both R2 and root-mean-squares error (RMSE) of cross-validation (CV) were recorded for each model. With four latent variables, a robust RMSECV value of 0.3916 with bitterness values ranging from 0.63 to 4.78 were obtained for the RPLS model that was constructed based on the dataset including outliers. Meanwhile, the RMSECV, which was calculated using the models constructed by other methods, was larger than that of the RPLS model. After six outliers were excluded, the performance of all benchmark methods markedly improved, but the difference between the RPLS model constructed before and after outlier exclusion was negligible. In conclusion, the bitterness of TCM decoctions can be accurately evaluated with the RPLS model constructed using e-tongue data. PMID:26821026

  15. Model diagnostics in reduced-rank estimation

    PubMed Central

    Chen, Kun

    2016-01-01

    Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches. PMID:28003860

  16. Model diagnostics in reduced-rank estimation.

    PubMed

    Chen, Kun

    2016-01-01

    Reduced-rank methods are very popular in high-dimensional multivariate analysis for conducting simultaneous dimension reduction and model estimation. However, the commonly-used reduced-rank methods are not robust, as the underlying reduced-rank structure can be easily distorted by only a few data outliers. Anomalies are bound to exist in big data problems, and in some applications they themselves could be of the primary interest. While naive residual analysis is often inadequate for outlier detection due to potential masking and swamping, robust reduced-rank estimation approaches could be computationally demanding. Under Stein's unbiased risk estimation framework, we propose a set of tools, including leverage score and generalized information score, to perform model diagnostics and outlier detection in large-scale reduced-rank estimation. The leverage scores give an exact decomposition of the so-called model degrees of freedom to the observation level, which lead to exact decomposition of many commonly-used information criteria; the resulting quantities are thus named information scores of the observations. The proposed information score approach provides a principled way of combining the residuals and leverage scores for anomaly detection. Simulation studies confirm that the proposed diagnostic tools work well. A pattern recognition example with hand-writing digital images and a time series analysis example with monthly U.S. macroeconomic data further demonstrate the efficacy of the proposed approaches.

  17. Bayesian methods for outliers detection in GNSS time series

    NASA Astrophysics Data System (ADS)

    Qianqian, Zhang; Qingming, Gui

    2013-07-01

    This article is concerned with the problem of detecting outliers in GNSS time series based on Bayesian statistical theory. Firstly, a new model is proposed to simultaneously detect different types of outliers based on the conception of introducing different types of classification variables corresponding to the different types of outliers; the problem of outlier detection is converted into the computation of the corresponding posterior probabilities, and the algorithm for computing the posterior probabilities based on standard Gibbs sampler is designed. Secondly, we analyze the reasons of masking and swamping about detecting patches of additive outliers intensively; an unmasking Bayesian method for detecting additive outlier patches is proposed based on an adaptive Gibbs sampler. Thirdly, the correctness of the theories and methods proposed above is illustrated by simulated data and then by analyzing real GNSS observations, such as cycle slips detection in carrier phase data. Examples illustrate that the Bayesian methods for outliers detection in GNSS time series proposed by this paper are not only capable of detecting isolated outliers but also capable of detecting additive outlier patches. Furthermore, it can be successfully used to process cycle slips in phase data, which solves the problem of small cycle slips.

  18. Development of a methodology for the detection of hospital financial outliers using information systems.

    PubMed

    Okada, Sachiko; Nagase, Keisuke; Ito, Ayako; Ando, Fumihiko; Nakagawa, Yoshiaki; Okamoto, Kazuya; Kume, Naoto; Takemura, Tadamasa; Kuroda, Tomohiro; Yoshihara, Hiroyuki

    2014-01-01

    Comparison of financial indices helps to illustrate differences in operations and efficiency among similar hospitals. Outlier data tend to influence statistical indices, and so detection of outliers is desirable. Development of a methodology for financial outlier detection using information systems will help to reduce the time and effort required, eliminate the subjective elements in detection of outlier data, and improve the efficiency and quality of analysis. The purpose of this research was to develop such a methodology. Financial outliers were defined based on a case model. An outlier-detection method using the distances between cases in multi-dimensional space is proposed. Experiments using three diagnosis groups indicated successful detection of cases for which the profitability and income structure differed from other cases. Therefore, the method proposed here can be used to detect outliers. Copyright © 2013 John Wiley & Sons, Ltd.

  19. Incremental Principal Component Analysis Based Outlier Detection Methods for Spatiotemporal Data Streams

    NASA Astrophysics Data System (ADS)

    Bhushan, A.; Sharker, M. H.; Karimi, H. A.

    2015-07-01

    In this paper, we address outliers in spatiotemporal data streams obtained from sensors placed across geographically distributed locations. Outliers may appear in such sensor data due to various reasons such as instrumental error and environmental change. Real-time detection of these outliers is essential to prevent propagation of errors in subsequent analyses and results. Incremental Principal Component Analysis (IPCA) is one possible approach for detecting outliers in such type of spatiotemporal data streams. IPCA has been widely used in many real-time applications such as credit card fraud detection, pattern recognition, and image analysis. However, the suitability of applying IPCA for outlier detection in spatiotemporal data streams is unknown and needs to be investigated. To fill this research gap, this paper contributes by presenting two new IPCA-based outlier detection methods and performing a comparative analysis with the existing IPCA-based outlier detection methods to assess their suitability for spatiotemporal sensor data streams.

  20. An integrated approach for identifying wrongly labelled samples when performing classification in microarray data.

    PubMed

    Leung, Yuk Yee; Chang, Chun Qi; Hung, Yeung Sam

    2012-01-01

    Using hybrid approach for gene selection and classification is common as results obtained are generally better than performing the two tasks independently. Yet, for some microarray datasets, both classification accuracy and stability of gene sets obtained still have rooms for improvement. This may be due to the presence of samples with wrong class labels (i.e. outliers). Outlier detection algorithms proposed so far are either not suitable for microarray data, or only solve the outlier detection problem on their own. We tackle the outlier detection problem based on a previously proposed Multiple-Filter-Multiple-Wrapper (MFMW) model, which was demonstrated to yield promising results when compared to other hybrid approaches (Leung and Hung, 2010). To incorporate outlier detection and overcome limitations of the existing MFMW model, three new features are introduced in our proposed MFMW-outlier approach: 1) an unbiased external Leave-One-Out Cross-Validation framework is developed to replace internal cross-validation in the previous MFMW model; 2) wrongly labeled samples are identified within the MFMW-outlier model; and 3) a stable set of genes is selected using an L1-norm SVM that removes any redundant genes present. Six binary-class microarray datasets were tested. Comparing with outlier detection studies on the same datasets, MFMW-outlier could detect all the outliers found in the original paper (for which the data was provided for analysis), and the genes selected after outlier removal were proven to have biological relevance. We also compared MFMW-outlier with PRAPIV (Zhang et al., 2006) based on same synthetic datasets. MFMW-outlier gave better average precision and recall values on three different settings. Lastly, artificially flipped microarray datasets were created by removing our detected outliers and flipping some of the remaining samples' labels. Almost all the 'wrong' (artificially flipped) samples were detected, suggesting that MFMW-outlier was sufficiently powerful to detect outliers in high-dimensional microarray datasets.

  1. Estimating multivariate response surface model with data outliers, case study in enhancing surface layer properties of an aircraft aluminium alloy

    NASA Astrophysics Data System (ADS)

    Widodo, Edy; Kariyam

    2017-03-01

    To determine the input variable settings that create the optimal compromise in response variable used Response Surface Methodology (RSM). There are three primary steps in the RSM problem, namely data collection, modelling, and optimization. In this study focused on the establishment of response surface models, using the assumption that the data produced is correct. Usually the response surface model parameters are estimated by OLS. However, this method is highly sensitive to outliers. Outliers can generate substantial residual and often affect the estimator models. Estimator models produced can be biased and could lead to errors in the determination of the optimal point of fact, that the main purpose of RSM is not reached. Meanwhile, in real life, the collected data often contain some response variable and a set of independent variables. Treat each response separately and apply a single response procedures can result in the wrong interpretation. So we need a development model for the multi-response case. Therefore, it takes a multivariate model of the response surface that is resistant to outliers. As an alternative, in this study discussed on M-estimation as a parameter estimator in multivariate response surface models containing outliers. As an illustration presented a case study on the experimental results to the enhancement of the surface layer of aluminium alloy air by shot peening.

  2. Aberrant Gene Expression in Humans

    PubMed Central

    Yang, Ence; Ji, Guoli; Brinkmeyer-Langford, Candice L.; Cai, James J.

    2015-01-01

    Gene expression as an intermediate molecular phenotype has been a focus of research interest. In particular, studies of expression quantitative trait loci (eQTL) have offered promise for understanding gene regulation through the discovery of genetic variants that explain variation in gene expression levels. Existing eQTL methods are designed for assessing the effects of common variants, but not rare variants. Here, we address the problem by establishing a novel analytical framework for evaluating the effects of rare or private variants on gene expression. Our method starts from the identification of outlier individuals that show markedly different gene expression from the majority of a population, and then reveals the contributions of private SNPs to the aberrant gene expression in these outliers. Using population-scale mRNA sequencing data, we identify outlier individuals using a multivariate approach. We find that outlier individuals are more readily detected with respect to gene sets that include genes involved in cellular regulation and signal transduction, and less likely to be detected with respect to the gene sets with genes involved in metabolic pathways and other fundamental molecular functions. Analysis of polymorphic data suggests that private SNPs of outlier individuals are enriched in the enhancer and promoter regions of corresponding aberrantly-expressed genes, suggesting a specific regulatory role of private SNPs, while the commonly-occurring regulatory genetic variants (i.e., eQTL SNPs) show little evidence of involvement. Additional data suggest that non-genetic factors may also underlie aberrant gene expression. Taken together, our findings advance a novel viewpoint relevant to situations wherein common eQTLs fail to predict gene expression when heritable, rare inter-individual variation exists. The analytical framework we describe, taking into consideration the reality of differential phenotypic robustness, may be valuable for investigating complex traits and conditions. PMID:25617623

  3. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

    PubMed Central

    Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

    2018-01-01

    This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555

  4. Analysis and detection of functional outliers in water quality parameters from different automated monitoring stations in the Nalón river basin (Northern Spain).

    PubMed

    Piñeiro Di Blasi, J I; Martínez Torres, J; García Nieto, P J; Alonso Fernández, J R; Díaz Muñiz, C; Taboada, J

    2015-01-01

    The purposes and intent of the authorities in establishing water quality standards are to provide enhancement of water quality and prevention of pollution to protect the public health or welfare in accordance with the public interest for drinking water supplies, conservation of fish, wildlife and other beneficial aquatic life, and agricultural, industrial, recreational, and other reasonable and necessary uses as well as to maintain and improve the biological integrity of the waters. In this way, water quality controls involve a large number of variables and observations, often subject to some outliers. An outlier is an observation that is numerically distant from the rest of the data or that appears to deviate markedly from other members of the sample in which it occurs. An interesting analysis is to find those observations that produce measurements that are different from the pattern established in the sample. Therefore, identification of atypical observations is an important concern in water quality monitoring and a difficult task because of the multivariate nature of water quality data. Our study provides a new method for detecting outliers in water quality monitoring parameters, using turbidity, conductivity and ammonium ion as indicator variables. Until now, methods were based on considering the different parameters as a vector whose components were their concentration values. This innovative approach lies in considering water quality monitoring over time as continuous curves instead of discrete points, that is to say, the dataset of the problem are considered as a time-dependent function and not as a set of discrete values in different time instants. This new methodology, which is based on the concept of functional depth, was applied to the detection of outliers in water quality monitoring samples in the Nalón river basin with success. Results of this study were discussed here in terms of origin, causes, etc. Finally, the conclusions as well as advantages of the functional method are exposed.

  5. Ranking Fragment Ions Based on Outlier Detection for Improved Label-Free Quantification in Data-Independent Acquisition LC-MS/MS

    PubMed Central

    Bilbao, Aivett; Zhang, Ying; Varesio, Emmanuel; Luban, Jeremy; Strambio-De-Castillia, Caterina; Lisacek, Frédérique; Hopfgartner, Gérard

    2016-01-01

    Data-independent acquisition LC-MS/MS techniques complement supervised methods for peptide quantification. However, due to the wide precursor isolation windows, these techniques are prone to interference at the fragment ion level, which in turn is detrimental for accurate quantification. The “non-outlier fragment ion” (NOFI) ranking algorithm has been developed to assign low priority to fragment ions affected by interference. By using the optimal subset of high priority fragment ions these interfered fragment ions are effectively excluded from quantification. NOFI represents each fragment ion as a vector of four dimensions related to chromatographic and MS fragmentation attributes and applies multivariate outlier detection techniques. Benchmarking conducted on a well-defined quantitative dataset (i.e. the SWATH Gold Standard), indicates that NOFI on average is able to accurately quantify 11-25% more peptides than the commonly used Top-N library intensity ranking method. The sum of the area of the Top3-5 NOFIs produces similar coefficients of variation as compared to the library intensity method but with more accurate quantification results. On a biologically relevant human dendritic cell digest dataset, NOFI properly assigns low priority ranks to 85% of annotated interferences, resulting in sensitivity values between 0.92 and 0.80 against 0.76 for the Spectronaut interference detection algorithm. PMID:26412574

  6. Outliers in Questionnaire Data: Can They Be Detected and Should They Be Removed?

    ERIC Educational Resources Information Center

    Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

    2011-01-01

    Outliers in questionnaire data are unusual observations, which may bias statistical results, and outlier statistics may be used to detect such outliers. The authors investigated the effect outliers have on the specificity and the sensitivity of each of six different outlier statistics. The Mahalanobis distance and the item-pair based outlier…

  7. Optoelectronic instrumentation enhancement using data mining feedback for a 3D measurement system

    NASA Astrophysics Data System (ADS)

    Flores-Fuentes, Wendy; Sergiyenko, Oleg; Gonzalez-Navarro, Félix F.; Rivas-López, Moisés; Hernandez-Balbuena, Daniel; Rodríguez-Quiñonez, Julio C.; Tyrsa, Vera; Lindner, Lars

    2016-12-01

    3D measurement by a cyber-physical system based on optoelectronic scanning instrumentation has been enhanced by outliers and regression data mining feedback. The prototype has applications in (1) industrial manufacturing systems that include: robotic machinery, embedded vision, and motion control, (2) health care systems for measurement scanning, and (3) infrastructure by providing structural health monitoring. This paper presents new research performed in data processing of a 3D measurement vision sensing database. Outliers from multivariate data have been detected and removal to improve artificial intelligence regression algorithm results. Physical measurement error regression data has been used for 3D measurements error correction. Concluding, that the joint of physical phenomena, measurement and computation is an effectiveness action for feedback loops in the control of industrial, medical and civil tasks.

  8. Multivariate Quality Control Procedures

    DTIC Science & Technology

    1988-10-01

    CLASSIFICATION OF THIS PAGE PREFACE The mathematical modeling work described in this report was authorized under Project No. IC162706A553, CB Defense and...the sum of the measurements. A CUSUM of the first principal component would detect changes in the overall thickness of the sheet. A linear trend could...develop- ment of a unique outlier rule for the specific application. 28 LITERATURE CITED 1. Mood, A.M., Graybill , F.A., and Boes, D.C., Introduction to

  9. Outlier detection for particle image velocimetry data using a locally estimated noise variance

    NASA Astrophysics Data System (ADS)

    Lee, Yong; Yang, Hua; Yin, ZhouPing

    2017-03-01

    This work describes an adaptive spatial variable threshold outlier detection algorithm for raw gridded particle image velocimetry data using a locally estimated noise variance. This method is an iterative procedure, and each iteration is composed of a reference vector field reconstruction step and an outlier detection step. We construct the reference vector field using a weighted adaptive smoothing method (Garcia 2010 Comput. Stat. Data Anal. 54 1167-78), and the weights are determined in the outlier detection step using a modified outlier detector (Ma et al 2014 IEEE Trans. Image Process. 23 1706-21). A hard decision on the final weights of the iteration can produce outlier labels of the field. The technical contribution is that the spatial variable threshold motivation is embedded in the modified outlier detector with a locally estimated noise variance in an iterative framework for the first time. It turns out that a spatial variable threshold is preferable to a single spatial constant threshold in complicated flows such as vortex flows or turbulent flows. Synthetic cellular vortical flows with simulated scattered or clustered outliers are adopted to evaluate the performance of our proposed method in comparison with popular validation approaches. This method also turns out to be beneficial in a real PIV measurement of turbulent flow. The experimental results demonstrated that the proposed method yields the competitive performance in terms of outlier under-detection count and over-detection count. In addition, the outlier detection method is computational efficient and adaptive, requires no user-defined parameters, and corresponding implementations are also provided in supplementary materials.

  10. Outlier Detection in Urban Air Quality Sensor Networks.

    PubMed

    van Zoest, V M; Stein, A; Hoek, G

    2018-01-01

    Low-cost urban air quality sensor networks are increasingly used to study the spatio-temporal variability in air pollutant concentrations. Recently installed low-cost urban sensors, however, are more prone to result in erroneous data than conventional monitors, e.g., leading to outliers. Commonly applied outlier detection methods are unsuitable for air pollutant measurements that have large spatial and temporal variations as occur in urban areas. We present a novel outlier detection method based upon a spatio-temporal classification, focusing on hourly NO 2 concentrations. We divide a full year's observations into 16 spatio-temporal classes, reflecting urban background vs. urban traffic stations, weekdays vs. weekends, and four periods per day. For each spatio-temporal class, we detect outliers using the mean and standard deviation of the normal distribution underlying the truncated normal distribution of the NO 2 observations. Applying this method to a low-cost air quality sensor network in the city of Eindhoven, the Netherlands, we found 0.1-0.5% of outliers. Outliers could reflect measurement errors or unusual high air pollution events. Additional evaluation using expert knowledge is needed to decide on treatment of the identified outliers. We conclude that our method is able to detect outliers while maintaining the spatio-temporal variability of air pollutant concentrations in urban areas.

  11. Detecting measurement outliers: remeasure efficiently

    NASA Astrophysics Data System (ADS)

    Ullrich, Albrecht

    2010-09-01

    Shrinking structures, advanced optical proximity correction (OPC) and complex measurement strategies continually challenge critical dimension (CD) metrology tools and recipe creation processes. One important quality ensuring task is the control of measurement outlier behavior. Outliers could trigger false positive alarm for specification violations impacting cycle time or potentially yield. Constant high level of outliers not only deteriorates cycle time but also puts unnecessary stress on tool operators leading eventually to human errors. At tool level the sources of outliers are natural variations (e.g. beam current etc.), drifts, contrast conditions, focus determination or pattern recognition issues, etc. Some of these can result from suboptimal or even wrong recipe settings, like focus position or measurement box size. Such outliers, created by an automatic recipe creation process faced with more complicated structures, would manifest itself rather as systematic variation of measurements than the one caused by 'pure' tool variation. I analyzed several statistical methods to detect outliers. These range from classical outlier tests for extrema, robust metrics like interquartile range (IQR) to methods evaluating the distribution of different populations of measurement sites, like the Cochran test. The latter suits especially the detection of systematic effects. The next level of outlier detection entwines additional information about the mask and the manufacturing process with the measurement results. The methods were reviewed for measured variations assumed to be normally distributed with zero mean but also for the presence of a statistically significant spatial process signature. I arrive at the conclusion that intelligent outlier detection can influence the efficiency and cycle time of CD metrology greatly. In combination with process information like target, typical platform variation and signature, one can tailor the detection to the needs of the photomask at hand. By monitoring the outlier behavior carefully, weaknesses of the automatic recipe creation process can be spotted.

  12. Stratification-Based Outlier Detection over the Deep Web.

    PubMed

    Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

    2016-01-01

    For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web.

  13. The effect of different distance measures in detecting outliers using clustering-based algorithm for circular regression model

    NASA Astrophysics Data System (ADS)

    Di, Nur Faraidah Muhammad; Satari, Siti Zanariah

    2017-05-01

    Outlier detection in linear data sets has been done vigorously but only a small amount of work has been done for outlier detection in circular data. In this study, we proposed multiple outliers detection in circular regression models based on the clustering algorithm. Clustering technique basically utilizes distance measure to define distance between various data points. Here, we introduce the similarity distance based on Euclidean distance for circular model and obtain a cluster tree using the single linkage clustering algorithm. Then, a stopping rule for the cluster tree based on the mean direction and circular standard deviation of the tree height is proposed. We classify the cluster group that exceeds the stopping rule as potential outlier. Our aim is to demonstrate the effectiveness of proposed algorithms with the similarity distances in detecting the outliers. It is found that the proposed methods are performed well and applicable for circular regression model.

  14. Stratification-Based Outlier Detection over the Deep Web

    PubMed Central

    Xian, Xuefeng; Zhao, Pengpeng; Sheng, Victor S.; Fang, Ligang; Gu, Caidong; Yang, Yuanfeng; Cui, Zhiming

    2016-01-01

    For many applications, finding rare instances or outliers can be more interesting than finding common patterns. Existing work in outlier detection never considers the context of deep web. In this paper, we argue that, for many scenarios, it is more meaningful to detect outliers over deep web. In the context of deep web, users must submit queries through a query interface to retrieve corresponding data. Therefore, traditional data mining methods cannot be directly applied. The primary contribution of this paper is to develop a new data mining method for outlier detection over deep web. In our approach, the query space of a deep web data source is stratified based on a pilot sample. Neighborhood sampling and uncertainty sampling are developed in this paper with the goal of improving recall and precision based on stratification. Finally, a careful performance evaluation of our algorithm confirms that our approach can effectively detect outliers in deep web. PMID:27313603

  15. Comparison of outliers and novelty detection to identify ionospheric TEC irregularities during geomagnetic storm and substorm

    NASA Astrophysics Data System (ADS)

    Pattisahusiwa, Asis; Houw Liong, The; Purqon, Acep

    2016-08-01

    In this study, we compare two learning mechanisms: outliers and novelty detection in order to detect ionospheric TEC disturbance by November 2004 geomagnetic storm and January 2005 substorm. The mechanisms are applied by using v-SVR learning algorithm which is a regression version of SVM. Our results show that both mechanisms are quiet accurate in learning TEC data. However, novelty detection is more accurate than outliers detection in extracting anomalies related to geomagnetic events. The detected anomalies by outliers detection are mostly related to trend of data, while novelty detection are associated to geomagnetic events. Novelty detection also shows evidence of LSTID during geomagnetic events.

  16. Ensemble Learning Method for Outlier Detection and its Application to Astronomical Light Curves

    NASA Astrophysics Data System (ADS)

    Nun, Isadora; Protopapas, Pavlos; Sim, Brandon; Chen, Wesley

    2016-09-01

    Outlier detection is necessary for automated data analysis, with specific applications spanning almost every domain from financial markets to epidemiology to fraud detection. We introduce a novel mixture of the experts outlier detection model, which uses a dynamically trained, weighted network of five distinct outlier detection methods. After dimensionality reduction, individual outlier detection methods score each data point for “outlierness” in this new feature space. Our model then uses dynamically trained parameters to weigh the scores of each method, allowing for a finalized outlier score. We find that the mixture of experts model performs, on average, better than any single expert model in identifying both artificially and manually picked outliers. This mixture model is applied to a data set of astronomical light curves, after dimensionality reduction via time series feature extraction. Our model was tested using three fields from the MACHO catalog and generated a list of anomalous candidates. We confirm that the outliers detected using this method belong to rare classes, like Novae, He-burning, and red giant stars; other outlier light curves identified have no available information associated with them. To elucidate their nature, we created a website containing the light-curve data and information about these objects. Users can attempt to classify the light curves, give conjectures about their identities, and sign up for follow up messages about the progress made on identifying these objects. This user submitted data can be used further train of our mixture of experts model. Our code is publicly available to all who are interested.

  17. Detecting multiple outliers in linear functional relationship model for circular variables using clustering technique

    NASA Astrophysics Data System (ADS)

    Mokhtar, Nurkhairany Amyra; Zubairi, Yong Zulina; Hussin, Abdul Ghapor

    2017-05-01

    Outlier detection has been used extensively in data analysis to detect anomalous observation in data and has important application in fraud detection and robust analysis. In this paper, we propose a method in detecting multiple outliers for circular variables in linear functional relationship model. Using the residual values of the Caires and Wyatt model, we applied the hierarchical clustering procedure. With the use of tree diagram, we illustrate the graphical approach of the detection of outlier. A simulation study is done to verify the accuracy of the proposed method. Also, an illustration to a real data set is given to show its practical applicability.

  18. Spatio-temporal Outlier Detection in Precipitation Data

    NASA Astrophysics Data System (ADS)

    Wu, Elizabeth; Liu, Wei; Chawla, Sanjay

    The detection of outliers from spatio-temporal data is an important task due to the increasing amount of spatio-temporal data available and the need to understand and interpret it. Due to the limitations of current data mining techniques, new techniques to handle this data need to be developed. We propose a spatio-temporal outlier detection algorithm called Outstretch, which discovers the outlier movement patterns of the top-k spatial outliers over several time periods. The top-k spatial outliers are found using the Exact-Grid Top- k and Approx-Grid Top- k algorithms, which are an extension of algorithms developed by Agarwal et al. [1]. Since they use the Kulldorff spatial scan statistic, they are capable of discovering all outliers, unaffected by neighbouring regions that may contain missing values. After generating the outlier sequences, we show one way they can be interpreted, by comparing them to the phases of the El Niño Southern Oscilliation (ENSO) weather phenomenon to provide a meaningful analysis of the results.

  19. Methodology to assess clinical liver safety data.

    PubMed

    Merz, Michael; Lee, Kwan R; Kullak-Ublick, Gerd A; Brueckner, Andreas; Watkins, Paul B

    2014-11-01

    Analysis of liver safety data has to be multivariate by nature and needs to take into account time dependency of observations. Current standard tools for liver safety assessment such as summary tables, individual data listings, and narratives address these requirements to a limited extent only. Using graphics in the context of a systematic workflow including predefined graph templates is a valuable addition to standard instruments, helping to ensure completeness of evaluation, and supporting both hypothesis generation and testing. Employing graphical workflows interactively allows analysis in a team-based setting and facilitates identification of the most suitable graphics for publishing and regulatory reporting. Another important tool is statistical outlier detection, accounting for the fact that for assessment of Drug-Induced Liver Injury, identification and thorough evaluation of extreme values has much more relevance than measures of central tendency in the data. Taken together, systematical graphical data exploration and statistical outlier detection may have the potential to significantly improve assessment and interpretation of clinical liver safety data. A workshop was convened to discuss best practices for the assessment of drug-induced liver injury (DILI) in clinical trials.

  20. Detection of Outliers in Spatial-Temporal Data

    ERIC Educational Resources Information Center

    Rogers, James P.

    2010-01-01

    Outlier detection is an important data mining task that is focused on the discovery of objects that deviate significantly when compared with a set of observations that are considered typical. Outlier detection can reveal objects that behave anomalously with respect to other observations, and these objects may highlight current or future problems. …

  1. Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories.

    PubMed

    Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie

    2016-12-01

    Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques.

  2. Rejection of Multivariate Outliers.

    DTIC Science & Technology

    1983-05-01

    available in Gnanadesikan (1977). 2 The motivation for the present investigation lies in a recent paper of Schvager and Margolin (1982) who derive a... Gnanadesikan , R. (1977). Methods for Statistical Data Analysis of Multivariate Observations. Wiley, New York. [7] Hawkins, D.M. (1980). Identification of

  3. Trend-Residual Dual Modeling for Detection of Outliers in Low-Cost GPS Trajectories

    PubMed Central

    Chen, Xiaojian; Cui, Tingting; Fu, Jianhong; Peng, Jianwei; Shan, Jie

    2016-01-01

    Low-cost GPS (receiver) has become a ubiquitous and integral part of our daily life. Despite noticeable advantages such as being cheap, small, light, and easy to use, its limited positioning accuracy devalues and hampers its wide applications for reliable mapping and analysis. Two conventional techniques to remove outliers in a GPS trajectory are thresholding and Kalman-based methods, which are difficult in selecting appropriate thresholds and modeling the trajectories. Moreover, they are insensitive to medium and small outliers, especially for low-sample-rate trajectories. This paper proposes a model-based GPS trajectory cleaner. Rather than examining speed and acceleration or assuming a pre-determined trajectory model, we first use cubic smooth spline to adaptively model the trend of the trajectory. The residuals, i.e., the differences between the trend and GPS measurements, are then further modeled by time series method. Outliers are detected by scoring the residuals at every GPS trajectory point. Comparing to the conventional procedures, the trend-residual dual modeling approach has the following features: (a) it is able to model trajectories and detect outliers adaptively; (b) only one critical value for outlier scores needs to be set; (c) it is able to robustly detect unapparent outliers; and (d) it is effective in cleaning outliers for GPS trajectories with low sample rates. Tests are carried out on three real-world GPS trajectories datasets. The evaluation demonstrates an average of 9.27 times better performance in outlier detection for GPS trajectories than thresholding and Kalman-based techniques. PMID:27916944

  4. Query-Based Outlier Detection in Heterogeneous Information Networks.

    PubMed

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-03-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user's search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks.

  5. Query-Based Outlier Detection in Heterogeneous Information Networks

    PubMed Central

    Kuck, Jonathan; Zhuang, Honglei; Yan, Xifeng; Cam, Hasan; Han, Jiawei

    2015-01-01

    Outlier or anomaly detection in large data sets is a fundamental task in data science, with broad applications. However, in real data sets with high-dimensional space, most outliers are hidden in certain dimensional combinations and are relative to a user’s search space and interest. It is often more effective to give power to users and allow them to specify outlier queries flexibly, and the system will then process such mining queries efficiently. In this study, we introduce the concept of query-based outlier in heterogeneous information networks, design a query language to facilitate users to specify such queries flexibly, define a good outlier measure in heterogeneous networks, and study how to process outlier queries efficiently in large data sets. Our experiments on real data sets show that following such a methodology, interesting outliers can be defined and uncovered flexibly and effectively in large heterogeneous networks. PMID:27064397

  6. The good, the bad and the outliers: automated detection of errors and outliers from groundwater hydrographs

    NASA Astrophysics Data System (ADS)

    Peterson, Tim J.; Western, Andrew W.; Cheng, Xiang

    2018-03-01

    Suspicious groundwater-level observations are common and can arise for many reasons ranging from an unforeseen biophysical process to bore failure and data management errors. Unforeseen observations may provide valuable insights that challenge existing expectations and can be deemed outliers, while monitoring and data handling failures can be deemed errors, and, if ignored, may compromise trend analysis and groundwater model calibration. Ideally, outliers and errors should be identified but to date this has been a subjective process that is not reproducible and is inefficient. This paper presents an approach to objectively and efficiently identify multiple types of errors and outliers. The approach requires only the observed groundwater hydrograph, requires no particular consideration of the hydrogeology, the drivers (e.g. pumping) or the monitoring frequency, and is freely available in the HydroSight toolbox. Herein, the algorithms and time-series model are detailed and applied to four observation bores with varying dynamics. The detection of outliers was most reliable when the observation data were acquired quarterly or more frequently. Outlier detection where the groundwater-level variance is nonstationary or the absolute trend increases rapidly was more challenging, with the former likely to result in an under-estimation of the number of outliers and the latter an overestimation in the number of outliers.

  7. Online Conditional Outlier Detection in Nonstationary Time Series

    PubMed Central

    Liu, Siqi; Wright, Adam; Hauskrecht, Milos

    2017-01-01

    The objective of this work is to develop methods for detecting outliers in time series data. Such methods can become the key component of various monitoring and alerting systems, where an outlier may be equal to some adverse condition that needs human attention. However, real-world time series are often affected by various sources of variability present in the environment that may influence the quality of detection; they may (1) explain some of the changes in the signal that would otherwise lead to false positive detections, as well as, (2) reduce the sensitivity of the detection algorithm leading to increase in false negatives. To alleviate these problems, we propose a new two-layer outlier detection approach that first tries to model and account for the nonstationarity and periodic variation in the time series, and then tries to use other observable variables in the environment to explain any additional signal variation. Our experiments on several data sets in different domains show that our method provides more accurate modeling of the time series, and that it is able to significantly improve outlier detection performance. PMID:29644345

  8. Online Conditional Outlier Detection in Nonstationary Time Series.

    PubMed

    Liu, Siqi; Wright, Adam; Hauskrecht, Milos

    2017-05-01

    The objective of this work is to develop methods for detecting outliers in time series data. Such methods can become the key component of various monitoring and alerting systems, where an outlier may be equal to some adverse condition that needs human attention. However, real-world time series are often affected by various sources of variability present in the environment that may influence the quality of detection; they may (1) explain some of the changes in the signal that would otherwise lead to false positive detections, as well as, (2) reduce the sensitivity of the detection algorithm leading to increase in false negatives. To alleviate these problems, we propose a new two-layer outlier detection approach that first tries to model and account for the nonstationarity and periodic variation in the time series, and then tries to use other observable variables in the environment to explain any additional signal variation. Our experiments on several data sets in different domains show that our method provides more accurate modeling of the time series, and that it is able to significantly improve outlier detection performance.

  9. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  10. Identification of unusual events in multichannel bridge monitoring data using wavelet transform and outlier analysis

    NASA Astrophysics Data System (ADS)

    Omenzetter, Piotr; Brownjohn, James M. W.; Moyo, Pilate

    2003-08-01

    Continuously operating instrumented structural health monitoring (SHM) systems are becoming a practical alternative to replace visual inspection for assessment of condition and soundness of civil infrastructure. However, converting large amount of data from an SHM system into usable information is a great challenge to which special signal processing techniques must be applied. This study is devoted to identification of abrupt, anomalous and potentially onerous events in the time histories of static, hourly sampled strains recorded by a multi-sensor SHM system installed in a major bridge structure in Singapore and operating continuously for a long time. Such events may result, among other causes, from sudden settlement of foundation, ground movement, excessive traffic load or failure of post-tensioning cables. A method of outlier detection in multivariate data has been applied to the problem of finding and localizing sudden events in the strain data. For sharp discrimination of abrupt strain changes from slowly varying ones wavelet transform has been used. The proposed method has been successfully tested using known events recorded during construction of the bridge, and later effectively used for detection of anomalous post-construction events.

  11. Outlier detection in a new half-circular distribution

    NASA Astrophysics Data System (ADS)

    Rambli, Adzhar; Mohamed, Ibrahim Bin; Shimizu, Kunio; Khalidin, Nurliza

    2015-10-01

    In this paper, we use a discordancy test based on spacing theory to detect outlier in a half-circular data. Up to now, numerous discordancy tests have been proposed to detect outlier in circular distributions which are defined in [0,2π). However, some circular data lie within just half of this range. Therefore, first we introduce a new half-circular distribution developed using the inverse stereographic projection technique on a gamma distributed variable. Then, we develop a new discordancy test to detect single or multiple outliers in the half-circular data based on the spacing theory. We show the practical value of the test by applying it to an eye data set obtained from a glaucoma clinic at the University of Malaya Medical Centre, Malaysia.

  12. Robust Mokken Scale Analysis by Means of the Forward Search Algorithm for Outlier Detection

    ERIC Educational Resources Information Center

    Zijlstra, Wobbe P.; van der Ark, L. Andries; Sijtsma, Klaas

    2011-01-01

    Exploratory Mokken scale analysis (MSA) is a popular method for identifying scales from larger sets of items. As with any statistical method, in MSA the presence of outliers in the data may result in biased results and wrong conclusions. The forward search algorithm is a robust diagnostic method for outlier detection, which we adapt here to…

  13. A framework for periodic outlier pattern detection in time-series sequences.

    PubMed

    Rasheed, Faraz; Alhajj, Reda

    2014-05-01

    Periodic pattern detection in time-ordered sequences is an important data mining task, which discovers in the time series all patterns that exhibit temporal regularities. Periodic pattern mining has a large number of applications in real life; it helps understanding the regular trend of the data along time, and enables the forecast and prediction of future events. An interesting related and vital problem that has not received enough attention is to discover outlier periodic patterns in a time series. Outlier patterns are defined as those which are different from the rest of the patterns; outliers are not noise. While noise does not belong to the data and it is mostly eliminated by preprocessing, outliers are actual instances in the data but have exceptional characteristics compared with the majority of the other instances. Outliers are unusual patterns that rarely occur, and, thus, have lesser support (frequency of appearance) in the data. Outlier patterns may hint toward discrepancy in the data such as fraudulent transactions, network intrusion, change in customer behavior, recession in the economy, epidemic and disease biomarkers, severe weather conditions like tornados, etc. We argue that detecting the periodicity of outlier patterns might be more important in many sequences than the periodicity of regular, more frequent patterns. In this paper, we present a robust and time efficient suffix tree-based algorithm capable of detecting the periodicity of outlier patterns in a time series by giving more significance to less frequent yet periodic patterns. Several experiments have been conducted using both real and synthetic data; all aspects of the proposed approach are compared with the existing algorithm InfoMiner; the reported results demonstrate the effectiveness and applicability of the proposed approach.

  14. Ellipsoids for anomaly detection in remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Grosklos, Guenchik; Theiler, James

    2015-05-01

    For many target and anomaly detection algorithms, a key step is the estimation of a centroid (relatively easy) and a covariance matrix (somewhat harder) that characterize the background clutter. For a background that can be modeled as a multivariate Gaussian, the centroid and covariance lead to an explicit probability density function that can be used in likelihood ratio tests for optimal detection statistics. But ellipsoidal contours can characterize a much larger class of multivariate density function, and the ellipsoids that characterize the outer periphery of the distribution are most appropriate for detection in the low false alarm rate regime. Traditionally the sample mean and sample covariance are used to estimate ellipsoid location and shape, but these quantities are confounded both by large lever-arm outliers and non-Gaussian distributions within the ellipsoid of interest. This paper compares a variety of centroid and covariance estimation schemes with the aim of characterizing the periphery of the background distribution. In particular, we will consider a robust variant of the Khachiyan algorithm for minimum-volume enclosing ellipsoid. The performance of these different approaches is evaluated on multispectral and hyperspectral remote sensing imagery using coverage plots of ellipsoid volume versus false alarm rate.

  15. Stacked Autoencoders for Outlier Detection in Over-the-Horizon Radar Signals

    PubMed Central

    Protopapadakis, Eftychios; Doulamis, Anastasios; Doulamis, Nikolaos; Dres, Dimitrios; Bimpas, Matthaios

    2017-01-01

    Detection of outliers in radar signals is a considerable challenge in maritime surveillance applications. High-Frequency Surface-Wave (HFSW) radars have attracted significant interest as potential tools for long-range target identification and outlier detection at over-the-horizon (OTH) distances. However, a number of disadvantages, such as their low spatial resolution and presence of clutter, have a negative impact on their accuracy. In this paper, we explore the applicability of deep learning techniques for detecting deviations from the norm in behavioral patterns of vessels (outliers) as they are tracked from an OTH radar. The proposed methodology exploits the nonlinear mapping capabilities of deep stacked autoencoders in combination with density-based clustering. A comparative experimental evaluation of the approach shows promising results in terms of the proposed methodology's performance. PMID:29312449

  16. Penalized unsupervised learning with outliers

    PubMed Central

    Witten, Daniela M.

    2013-01-01

    We consider the problem of performing unsupervised learning in the presence of outliers – that is, observations that do not come from the same distribution as the rest of the data. It is known that in this setting, standard approaches for unsupervised learning can yield unsatisfactory results. For instance, in the presence of severe outliers, K-means clustering will often assign each outlier to its own cluster, or alternatively may yield distorted clusters in order to accommodate the outliers. In this paper, we take a new approach to extending existing unsupervised learning techniques to accommodate outliers. Our approach is an extension of a recent proposal for outlier detection in the regression setting. We allow each observation to take on an “error” term, and we penalize the errors using a group lasso penalty in order to encourage most of the observations’ errors to exactly equal zero. We show that this approach can be used in order to develop extensions of K-means clustering and principal components analysis that result in accurate outlier detection, as well as improved performance in the presence of outliers. These methods are illustrated in a simulation study and on two gene expression data sets, and connections with M-estimation are explored. PMID:23875057

  17. Multivariate-$t$ nonlinear mixed models with application to censored multi-outcome AIDS studies.

    PubMed

    Lin, Tsung-I; Wang, Wan-Lun

    2017-10-01

    In multivariate longitudinal HIV/AIDS studies, multi-outcome repeated measures on each patient over time may contain outliers, and the viral loads are often subject to a upper or lower limit of detection depending on the quantification assays. In this article, we consider an extension of the multivariate nonlinear mixed-effects model by adopting a joint multivariate-$t$ distribution for random effects and within-subject errors and taking the censoring information of multiple responses into account. The proposed model is called the multivariate-$t$ nonlinear mixed-effects model with censored responses (MtNLMMC), allowing for analyzing multi-outcome longitudinal data exhibiting nonlinear growth patterns with censorship and fat-tailed behavior. Utilizing the Taylor-series linearization method, a pseudo-data version of expectation conditional maximization either (ECME) algorithm is developed for iteratively carrying out maximum likelihood estimation. We illustrate our techniques with two data examples from HIV/AIDS studies. Experimental results signify that the MtNLMMC performs favorably compared to its Gaussian analogue and some existing approaches. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  18. Slowing ash mortality: a potential strategy to slam emerald ash borer in outlier sites

    Treesearch

    Deborah G. McCullough; Nathan W. Siegert; John Bedford

    2009-01-01

    Several isolated outlier populations of emerald ash borer (Agrilus planipennis Fairmaire) were discovered in 2008 and additional outliers will likely be found as detection surveys and public outreach activities...

  19. Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution

    PubMed Central

    Lo, Kenneth

    2011-01-01

    Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components. PMID:22125375

  20. Flexible mixture modeling via the multivariate t distribution with the Box-Cox transformation: an alternative to the skew-t distribution.

    PubMed

    Lo, Kenneth; Gottardo, Raphael

    2012-01-01

    Cluster analysis is the automated search for groups of homogeneous observations in a data set. A popular modeling approach for clustering is based on finite normal mixture models, which assume that each cluster is modeled as a multivariate normal distribution. However, the normality assumption that each component is symmetric is often unrealistic. Furthermore, normal mixture models are not robust against outliers; they often require extra components for modeling outliers and/or give a poor representation of the data. To address these issues, we propose a new class of distributions, multivariate t distributions with the Box-Cox transformation, for mixture modeling. This class of distributions generalizes the normal distribution with the more heavy-tailed t distribution, and introduces skewness via the Box-Cox transformation. As a result, this provides a unified framework to simultaneously handle outlier identification and data transformation, two interrelated issues. We describe an Expectation-Maximization algorithm for parameter estimation along with transformation selection. We demonstrate the proposed methodology with three real data sets and simulation studies. Compared with a wealth of approaches including the skew-t mixture model, the proposed t mixture model with the Box-Cox transformation performs favorably in terms of accuracy in the assignment of observations, robustness against model misspecification, and selection of the number of components.

  1. Unsupervised Sequential Outlier Detection With Deep Architectures.

    PubMed

    Lu, Weining; Cheng, Yu; Xiao, Cao; Chang, Shiyu; Huang, Shuai; Liang, Bin; Huang, Thomas

    2017-09-01

    Unsupervised outlier detection is a vital task and has high impact on a wide variety of applications domains, such as image analysis and video surveillance. It also gains long-standing attentions and has been extensively studied in multiple research areas. Detecting and taking action on outliers as quickly as possible are imperative in order to protect network and related stakeholders or to maintain the reliability of critical systems. However, outlier detection is difficult due to the one class nature and challenges in feature construction. Sequential anomaly detection is even harder with more challenges from temporal correlation in data, as well as the presence of noise and high dimensionality. In this paper, we introduce a novel deep structured framework to solve the challenging sequential outlier detection problem. We use autoencoder models to capture the intrinsic difference between outliers and normal instances and integrate the models to recurrent neural networks that allow the learning to make use of previous context as well as make the learners more robust to warp along the time axis. Furthermore, we propose to use a layerwise training procedure, which significantly simplifies the training procedure and hence helps achieve efficient and scalable training. In addition, we investigate a fine-tuning step to update all parameters set by incorporating the temporal correlation in the sequence. We further apply our proposed models to conduct systematic experiments on five real-world benchmark data sets. Experimental results demonstrate the effectiveness of our model, compared with other state-of-the-art approaches.

  2. Using Innovative Outliers to Detect Discrete Shifts in Dynamics in Group-Based State-Space Models

    ERIC Educational Resources Information Center

    Chow, Sy-Miin; Hamaker, Ellen L.; Allaire, Jason C.

    2009-01-01

    Outliers are typically regarded as data anomalies that should be discarded. However, dynamic or "innovative" outliers can be appropriately utilized to capture unusual but substantively meaningful shifts in a system's dynamics. We extend De Jong and Penzer's 1998 approach for representing outliers in single-subject state-space models to a…

  3. A tandem regression-outlier analysis of a ligand cellular system for key structural modifications around ligand binding.

    PubMed

    Lin, Ying-Ting

    2013-04-30

    A tandem technique of hard equipment is often used for the chemical analysis of a single cell to first isolate and then detect the wanted identities. The first part is the separation of wanted chemicals from the bulk of a cell; the second part is the actual detection of the important identities. To identify the key structural modifications around ligand binding, the present study aims to develop a counterpart of tandem technique for cheminformatics. A statistical regression and its outliers act as a computational technique for separation. A PPARγ (peroxisome proliferator-activated receptor gamma) agonist cellular system was subjected to such an investigation. Results show that this tandem regression-outlier analysis, or the prioritization of the context equations tagged with features of the outliers, is an effective regression technique of cheminformatics to detect key structural modifications, as well as their tendency of impact to ligand binding. The key structural modifications around ligand binding are effectively extracted or characterized out of cellular reactions. This is because molecular binding is the paramount factor in such ligand cellular system and key structural modifications around ligand binding are expected to create outliers. Therefore, such outliers can be captured by this tandem regression-outlier analysis.

  4. Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization.

    PubMed

    Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel

    2016-04-22

    In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the 'outliers' in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation.

  5. [Research on outlier detection methods for determination of oil yield in oil shales using near-infrared spectroscopy].

    PubMed

    Zhang, Huai-zhu; Lin, Jun; Zhang, Huai-Zhu

    2014-06-01

    In the present paper, the outlier detection methods for determination of oil yield in oil shale using near-infrared (NIR) diffuse reflection spectroscopy was studied. During the quantitative analysis with near-infrared spectroscopy, environmental change and operator error will both produce outliers. The presence of outliers will affect the overall distribution trend of samples and lead to the decrease in predictive capability. Thus, the detection of outliers are important for the construction of high-quality calibration models. The methods including principal component analysis-Mahalanobis distance (PCA-MD) and resampling by half-means (RHM) were applied to the discrimination and elimination of outliers in this work. The thresholds and confidences for MD and RHM were optimized using the performance of partial least squares (PLS) models constructed after the elimination of outliers, respectively. Compared with the model constructed with the data of full spectrum, the values of RMSEP of the models constructed with the application of PCA-MD with a threshold of a value equal to the sum of average and standard deviation of MD, RHM with the confidence level of 85%, and the combination of PCA-MD and RHM, were reduced by 48.3%, 27.5% and 44.8%, respectively. The predictive ability of the calibration model has been improved effectively.

  6. Biomarker profiling in reef corals of Tonga’s Ha’apai and Vava’u archipelagos

    PubMed Central

    Chen, Chii-Shiarng; Dempsey, Alexandra C.

    2017-01-01

    Given the significant threats towards Earth’s coral reefs, there is an urgent need to document the current physiological condition of the resident organisms, particularly the reef-building scleractinians themselves. Unfortunately, most of the planet’s reefs are understudied, and some have yet to be seen. For instance, the Kingdom of Tonga possesses an extensive reef system, with thousands of hectares of unobserved reefs; little is known about their ecology, nor is there any information on the health of the resident corals. Given such knowledge deficiencies, 59 reefs across three Tongan archipelagos were surveyed herein, and pocilloporid corals were sampled from approximately half of these surveyed sites; 10 molecular-scale response variable were assessed in 88 of the sampled colonies, and 12 colonies were found to be outliers based on employment of a multivariate statistics-based aberrancy detection system. These outliers differed from the statistically normally behaving colonies in having not only higher RNA/DNA ratios but also elevated expression levels of three genes: 1) Symbiodinium zinc-induced facilitator-like 1-like, 2) host coral copper-zinc superoxide dismutase, and 3) host green fluorescent protein-like chromoprotein. Outliers were also characterized by significantly higher variation amongst the molecular response variables assessed, and the response variables that contributed most significantly to colonies being delineated as outliers differed between the two predominant reef coral species sampled, Pocillopora damicornis and P. acuta. These closely related species also displayed dissimilar temporal fluctuation patterns in their molecular physiologies, an observation that may have been driven by differences in their feeding strategies. Future works should attempt to determine whether corals displaying statistically aberrant molecular physiology, such as the 12 Tongan outliers identified herein, are indeed characterized by a diminished capacity for acclimating to the rapid changes in their abiotic milieu occurring as a result of global climate change. PMID:29091723

  7. Method for outlier detection: a tool to assess the consistency between laboratory data and ultraviolet-visible absorbance spectra in wastewater samples.

    PubMed

    Zamora, D; Torres, A

    2014-01-01

    Reliable estimations of the evolution of water quality parameters by using in situ technologies make it possible to follow the operation of a wastewater treatment plant (WWTP), as well as improving the understanding and control of the operation, especially in the detection of disturbances. However, ultraviolet (UV)-Vis sensors have to be calibrated by means of a local fingerprint laboratory reference concentration-value data-set. The detection of outliers in these data-sets is therefore important. This paper presents a method for detecting outliers in UV-Vis absorbances coupled to water quality reference laboratory concentrations for samples used for calibration purposes. Application to samples from the influent of the San Fernando WWTP (Medellín, Colombia) is shown. After the removal of outliers, improvements in the predictability of the influent concentrations using absorbance spectra were found.

  8. Outlier detection in contamination control

    NASA Astrophysics Data System (ADS)

    Weintraub, Jeffrey; Warrick, Scott

    2018-03-01

    A machine-learning model is presented that effectively partitions historical process data into outlier and inlier subpopulations. This is necessary in order to avoid using outlier data to build a model for detecting process instability. Exact control limits are given without recourse to approximations and the error characteristics of the control model are derived. A worked example for contamination control is presented along with the machine learning algorithm used and all the programming statements needed for implementation.

  9. An online outlier identification and removal scheme for improving fault detection performance.

    PubMed

    Ferdowsi, Hasan; Jagannathan, Sarangapani; Zawodniok, Maciej

    2014-05-01

    Measured data or states for a nonlinear dynamic system is usually contaminated by outliers. Identifying and removing outliers will make the data (or system states) more trustworthy and reliable since outliers in the measured data (or states) can cause missed or false alarms during fault diagnosis. In addition, faults can make the system states nonstationary needing a novel analytical model-based fault detection (FD) framework. In this paper, an online outlier identification and removal (OIR) scheme is proposed for a nonlinear dynamic system. Since the dynamics of the system can experience unknown changes due to faults, traditional observer-based techniques cannot be used to remove the outliers. The OIR scheme uses a neural network (NN) to estimate the actual system states from measured system states involving outliers. With this method, the outlier detection is performed online at each time instant by finding the difference between the estimated and the measured states and comparing its median with its standard deviation over a moving time window. The NN weight update law in OIR is designed such that the detected outliers will have no effect on the state estimation, which is subsequently used for model-based fault diagnosis. In addition, since the OIR estimator cannot distinguish between the faulty or healthy operating conditions, a separate model-based observer is designed for fault diagnosis, which uses the OIR scheme as a preprocessing unit to improve the FD performance. The stability analysis of both OIR and fault diagnosis schemes are introduced. Finally, a three-tank benchmarking system and a simple linear system are used to verify the proposed scheme in simulations, and then the scheme is applied on an axial piston pump testbed. The scheme can be applied to nonlinear systems whose dynamics and underlying distribution of states are subjected to change due to both unknown faults and operating conditions.

  10. Detecting outliers when fitting data with nonlinear regression – a new method based on robust nonlinear regression and the false discovery rate

    PubMed Central

    Motulsky, Harvey J; Brown, Ronald E

    2006-01-01

    Background Nonlinear regression, like linear regression, assumes that the scatter of data around the ideal curve follows a Gaussian or normal distribution. This assumption leads to the familiar goal of regression: to minimize the sum of the squares of the vertical or Y-value distances between the points and the curve. Outliers can dominate the sum-of-the-squares calculation, and lead to misleading results. However, we know of no practical method for routinely identifying outliers when fitting curves with nonlinear regression. Results We describe a new method for identifying outliers when fitting data with nonlinear regression. We first fit the data using a robust form of nonlinear regression, based on the assumption that scatter follows a Lorentzian distribution. We devised a new adaptive method that gradually becomes more robust as the method proceeds. To define outliers, we adapted the false discovery rate approach to handling multiple comparisons. We then remove the outliers, and analyze the data using ordinary least-squares regression. Because the method combines robust regression and outlier removal, we call it the ROUT method. When analyzing simulated data, where all scatter is Gaussian, our method detects (falsely) one or more outlier in only about 1–3% of experiments. When analyzing data contaminated with one or several outliers, the ROUT method performs well at outlier identification, with an average False Discovery Rate less than 1%. Conclusion Our method, which combines a new method of robust nonlinear regression with a new method of outlier identification, identifies outliers from nonlinear curve fits with reasonable power and few false positives. PMID:16526949

  11. Influence assessment in censored mixed-effects models using the multivariate Student’s-t distribution

    PubMed Central

    Matos, Larissa A.; Bandyopadhyay, Dipankar; Castro, Luis M.; Lachos, Victor H.

    2015-01-01

    In biomedical studies on HIV RNA dynamics, viral loads generate repeated measures that are often subjected to upper and lower detection limits, and hence these responses are either left- or right-censored. Linear and non-linear mixed-effects censored (LMEC/NLMEC) models are routinely used to analyse these longitudinal data, with normality assumptions for the random effects and residual errors. However, the derived inference may not be robust when these underlying normality assumptions are questionable, especially the presence of outliers and thick-tails. Motivated by this, Matos et al. (2013b) recently proposed an exact EM-type algorithm for LMEC/NLMEC models using a multivariate Student’s-t distribution, with closed-form expressions at the E-step. In this paper, we develop influence diagnostics for LMEC/NLMEC models using the multivariate Student’s-t density, based on the conditional expectation of the complete data log-likelihood. This partially eliminates the complexity associated with the approach of Cook (1977, 1986) for censored mixed-effects models. The new methodology is illustrated via an application to a longitudinal HIV dataset. In addition, a simulation study explores the accuracy of the proposed measures in detecting possible influential observations for heavy-tailed censored data under different perturbation and censoring schemes. PMID:26190871

  12. Assessing Principal Component Regression Prediction of Neurochemicals Detected with Fast-Scan Cyclic Voltammetry

    PubMed Central

    2011-01-01

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook’s distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards. PMID:21966586

  13. Assessing principal component regression prediction of neurochemicals detected with fast-scan cyclic voltammetry.

    PubMed

    Keithley, Richard B; Wightman, R Mark

    2011-06-07

    Principal component regression is a multivariate data analysis approach routinely used to predict neurochemical concentrations from in vivo fast-scan cyclic voltammetry measurements. This mathematical procedure can rapidly be employed with present day computer programming languages. Here, we evaluate several methods that can be used to evaluate and improve multivariate concentration determination. The cyclic voltammetric representation of the calculated regression vector is shown to be a valuable tool in determining whether the calculated multivariate model is chemically appropriate. The use of Cook's distance successfully identified outliers contained within in vivo fast-scan cyclic voltammetry training sets. This work also presents the first direct interpretation of a residual color plot and demonstrated the effect of peak shifts on predicted dopamine concentrations. Finally, separate analyses of smaller increments of a single continuous measurement could not be concatenated without substantial error in the predicted neurochemical concentrations due to electrode drift. Taken together, these tools allow for the construction of more robust multivariate calibration models and provide the first approach to assess the predictive ability of a procedure that is inherently impossible to validate because of the lack of in vivo standards.

  14. [Outlier sample discriminating methods for building calibration model in melons quality detecting using NIR spectra].

    PubMed

    Tian, Hai-Qing; Wang, Chun-Guang; Zhang, Hai-Jun; Yu, Zhi-Hong; Li, Jian-Kang

    2012-11-01

    Outlier samples strongly influence the precision of the calibration model in soluble solids content measurement of melons using NIR Spectra. According to the possible sources of outlier samples, three methods (predicted concentration residual test; Chauvenet test; leverage and studentized residual test) were used to discriminate these outliers respectively. Nine suspicious outliers were detected from calibration set which including 85 fruit samples. Considering the 9 suspicious outlier samples maybe contain some no-outlier samples, they were reclaimed to the model one by one to see whether they influence the model and prediction precision or not. In this way, 5 samples which were helpful to the model joined in calibration set again, and a new model was developed with the correlation coefficient (r) 0. 889 and root mean square errors for calibration (RMSEC) 0.6010 Brix. For 35 unknown samples, the root mean square errors prediction (RMSEP) was 0.854 degrees Brix. The performance of this model was more better than that developed with non outlier was eliminated from calibration set (r = 0.797, RMSEC= 0.849 degrees Brix, RMSEP = 1.19 degrees Brix), and more representative and stable with all 9 samples were eliminated from calibration set (r = 0.892, RMSEC = 0.605 degrees Brix, RMSEP = 0.862 degrees).

  15. Outlier Detection in GNSS Pseudo-Range/Doppler Measurements for Robust Localization

    PubMed Central

    Zair, Salim; Le Hégarat-Mascle, Sylvie; Seignez, Emmanuel

    2016-01-01

    In urban areas or space-constrained environments with obstacles, vehicle localization using Global Navigation Satellite System (GNSS) data is hindered by Non-Line Of Sight (NLOS) and multipath receptions. These phenomena induce faulty data that disrupt the precise localization of the GNSS receiver. In this study, we detect the outliers among the observations, Pseudo-Range (PR) and/or Doppler measurements, and we evaluate how discarding them improves the localization. We specify a contrario modeling for GNSS raw data to derive an algorithm that partitions the dataset between inliers and outliers. Then, only the inlier data are considered in the localization process performed either through a classical Particle Filter (PF) or a Rao-Blackwellization (RB) approach. Both localization algorithms exclusively use GNSS data, but they differ by the way Doppler measurements are processed. An experiment has been performed with a GPS receiver aboard a vehicle. Results show that the proposed algorithms are able to detect the ‘outliers’ in the raw data while being robust to non-Gaussian noise and to intermittent satellite blockage. We compare the performance results achieved either estimating only PR outliers or estimating both PR and Doppler outliers. The best localization is achieved using the RB approach coupled with PR-Doppler outlier estimation. PMID:27110796

  16. Integration of genome and phenotypic scanning gives evidence of genetic structure in Mesoamerican common bean (Phaseolus vulgaris L.) landraces from the southwest of Europe.

    PubMed

    Santalla, M; De Ron, A M; De La Fuente, M

    2010-05-01

    Southwestern Europe has been considered as a secondary centre of genetic diversity for the common bean. The dispersal of domesticated materials from their centres of origin provides an experimental system that reveals how human selection during cultivation and adaptation to novel environments affects the genetic composition. In this paper, our goal was to elucidate how distinct events could modify the structure and level of genetic diversity in the common bean. The genome-wide genetic composition was analysed at 42 microsatellite loci in individuals of 22 landraces of domesticated common bean from the Mesoamerican gene pool. The accessions were also characterised for phaseolin seed protein and for nine allozyme polymorphisms and phenotypic traits. One of this study's important findings was the complementary information obtained from all the polymorphisms examined. Most of the markers found to be potentially under the influence of selection were located in the proximity of previously mapped genes and quantitative trait loci (QTLs) related to important agronomic traits, which indicates that population genomics approaches are very efficient in detecting QTLs. As it was revealed by outlier simple sequence repeats, loci analysis with STRUCTURE software and multivariate analysis of phenotypic data, the landraces were grouped into three clusters according to seed size and shape, vegetative growth habit and genetic resistance. A total of 151 alleles were detected with an average of 4 alleles per locus and an average polymorphism information content of 0.31. Using a model-based approach, on the basis of neutral markers implemented in the software STRUCTURE, three clusters were inferred, which were in good agreement with multivariate analysis. Geographic and genetic distances were congruent with the exception of a few putative hybrids identified in this study, suggesting a predominant effect of isolation by distance. Genomic scans using both markers linked to genes affected by selection (outlier) and neutral markers showed advantages relative to other approaches, since they help to create a more complete picture of how adaptation to environmental conditions has sculpted the common bean genomes in southern Europe. The use of outlier loci also gives a clue about what selective forces gave rise to the actual phenotypes of the analysed landraces.

  17. Sparsity-weighted outlier FLOODing (OFLOOD) method: Efficient rare event sampling method using sparsity of distribution.

    PubMed

    Harada, Ryuhei; Nakamura, Tomotake; Shigeta, Yasuteru

    2016-03-30

    As an extension of the Outlier FLOODing (OFLOOD) method [Harada et al., J. Comput. Chem. 2015, 36, 763], the sparsity of the outliers defined by a hierarchical clustering algorithm, FlexDice, was considered to achieve an efficient conformational search as sparsity-weighted "OFLOOD." In OFLOOD, FlexDice detects areas of sparse distribution as outliers. The outliers are regarded as candidates that have high potential to promote conformational transitions and are employed as initial structures for conformational resampling by restarting molecular dynamics simulations. When detecting outliers, FlexDice defines a rank in the hierarchy for each outlier, which relates to sparsity in the distribution. In this study, we define a lower rank (first ranked), a medium rank (second ranked), and the highest rank (third ranked) outliers, respectively. For instance, the first-ranked outliers are located in a given conformational space away from the clusters (highly sparse distribution), whereas those with the third-ranked outliers are nearby the clusters (a moderately sparse distribution). To achieve the conformational search efficiently, resampling from the outliers with a given rank is performed. As demonstrations, this method was applied to several model systems: Alanine dipeptide, Met-enkephalin, Trp-cage, T4 lysozyme, and glutamine binding protein. In each demonstration, the present method successfully reproduced transitions among metastable states. In particular, the first-ranked OFLOOD highly accelerated the exploration of conformational space by expanding the edges. In contrast, the third-ranked OFLOOD reproduced local transitions among neighboring metastable states intensively. For quantitatively evaluations of sampled snapshots, free energy calculations were performed with a combination of umbrella samplings, providing rigorous landscapes of the biomolecules. © 2015 Wiley Periodicals, Inc.

  18. Adaptive vector validation in image velocimetry to minimise the influence of outlier clusters

    NASA Astrophysics Data System (ADS)

    Masullo, Alessandro; Theunissen, Raf

    2016-03-01

    The universal outlier detection scheme (Westerweel and Scarano in Exp Fluids 39:1096-1100, 2005) and the distance-weighted universal outlier detection scheme for unstructured data (Duncan et al. in Meas Sci Technol 21:057002, 2010) are the most common PIV data validation routines. However, such techniques rely on a spatial comparison of each vector with those in a fixed-size neighbourhood and their performance subsequently suffers in the presence of clusters of outliers. This paper proposes an advancement to render outlier detection more robust while reducing the probability of mistakenly invalidating correct vectors. Velocity fields undergo a preliminary evaluation in terms of local coherency, which parametrises the extent of the neighbourhood with which each vector will be compared subsequently. Such adaptivity is shown to reduce the number of undetected outliers, even when implemented in the afore validation schemes. In addition, the authors present an alternative residual definition considering vector magnitude and angle adopting a modified Gaussian-weighted distance-based averaging median. This procedure is able to adapt the degree of acceptable background fluctuations in velocity to the local displacement magnitude. The traditional, extended and recommended validation methods are numerically assessed on the basis of flow fields from an isolated vortex, a turbulent channel flow and a DNS simulation of forced isotropic turbulence. The resulting validation method is adaptive, requires no user-defined parameters and is demonstrated to yield the best performances in terms of outlier under- and over-detection. Finally, the novel validation routine is applied to the PIV analysis of experimental studies focused on the near wake behind a porous disc and on a supersonic jet, illustrating the potential gains in spatial resolution and accuracy.

  19. Factors influencing hospital high length of stay outliers

    PubMed Central

    2012-01-01

    Background The study of length of stay (LOS) outliers is important for the management and financing of hospitals. Our aim was to study variables associated with high LOS outliers and their evolution over time. Methods We used hospital administrative data from inpatient episodes in public acute care hospitals in the Portuguese National Health Service (NHS), with discharges between years 2000 and 2009, together with some hospital characteristics. The dependent variable, LOS outliers, was calculated for each diagnosis related group (DRG) using a trim point defined for each year by the geometric mean plus two standard deviations. Hospitals were classified on the basis of administrative, economic and teaching characteristics. We also studied the influence of comorbidities and readmissions. Logistic regression models, including a multivariable logistic regression, were used in the analysis. All the logistic regressions were fitted using generalized estimating equations (GEE). Results In near nine million inpatient episodes analysed we found a proportion of 3.9% high LOS outliers, accounting for 19.2% of total inpatient days. The number of hospital patient discharges increased between years 2000 and 2005 and slightly decreased after that. The proportion of outliers ranged between the lowest value of 3.6% (in years 2001 and 2002) and the highest value of 4.3% in 2009. Teaching hospitals with over 1,000 beds have significantly more outliers than other hospitals, even after adjustment to readmissions and several patient characteristics. Conclusions In the last years both average LOS and high LOS outliers are increasing in Portuguese NHS hospitals. As high LOS outliers represent an important proportion in the total inpatient days, this should be seen as an important alert for the management of hospitals and for national health policies. As expected, age, type of admission, and hospital type were significantly associated with high LOS outliers. The proportion of high outliers does not seem to be related to their financial coverage; they should be studied in order to highlight areas for further investigation. The increasing complexity of both hospitals and patients may be the single most important determinant of high LOS outliers and must therefore be taken into account by health managers when considering hospital costs. PMID:22906386

  20. A Content-Adaptive Analysis and Representation Framework for Audio Event Discovery from "Unscripted" Multimedia

    NASA Astrophysics Data System (ADS)

    Radhakrishnan, Regunathan; Divakaran, Ajay; Xiong, Ziyou; Otsuka, Isao

    2006-12-01

    We propose a content-adaptive analysis and representation framework to discover events using audio features from "unscripted" multimedia such as sports and surveillance for summarization. The proposed analysis framework performs an inlier/outlier-based temporal segmentation of the content. It is motivated by the observation that "interesting" events in unscripted multimedia occur sparsely in a background of usual or "uninteresting" events. We treat the sequence of low/mid-level features extracted from the audio as a time series and identify subsequences that are outliers. The outlier detection is based on eigenvector analysis of the affinity matrix constructed from statistical models estimated from the subsequences of the time series. We define the confidence measure on each of the detected outliers as the probability that it is an outlier. Then, we establish a relationship between the parameters of the proposed framework and the confidence measure. Furthermore, we use the confidence measure to rank the detected outliers in terms of their departures from the background process. Our experimental results with sequences of low- and mid-level audio features extracted from sports video show that "highlight" events can be extracted effectively as outliers from a background process using the proposed framework. We proceed to show the effectiveness of the proposed framework in bringing out suspicious events from surveillance videos without any a priori knowledge. We show that such temporal segmentation into background and outliers, along with the ranking based on the departure from the background, can be used to generate content summaries of any desired length. Finally, we also show that the proposed framework can be used to systematically select "key audio classes" that are indicative of events of interest in the chosen domain.

  1. Using State Estimation Residuals to Detect Abnormal SCADA Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Jian; Chen, Yousu; Huang, Zhenyu

    2010-04-30

    Detection of abnormal supervisory control and data acquisition (SCADA) data is critically important for safe and secure operation of modern power systems. In this paper, a methodology of abnormal SCADA data detection based on state estimation residuals is presented. Preceded with a brief overview of outlier detection methods and bad SCADA data detection for state estimation, the framework of the proposed methodology is described. Instead of using original SCADA measurements as the bad data sources, the residuals calculated based on the results of the state estimator are used as the input for the outlier detection algorithm. The BACON algorithm ismore » applied to the outlier detection task. The IEEE 118-bus system is used as a test base to evaluate the effectiveness of the proposed methodology. The accuracy of the BACON method is compared with that of the 3-σ method for the simulated SCADA measurements and residuals.« less

  2. A simple transformation independent method for outlier definition.

    PubMed

    Johansen, Martin Berg; Christensen, Peter Astrup

    2018-04-10

    Definition and elimination of outliers is a key element for medical laboratories establishing or verifying reference intervals (RIs). Especially as inclusion of just a few outlying observations may seriously affect the determination of the reference limits. Many methods have been developed for definition of outliers. Several of these methods are developed for the normal distribution and often data require transformation before outlier elimination. We have developed a non-parametric transformation independent outlier definition. The new method relies on drawing reproducible histograms. This is done by using defined bin sizes above and below the median. The method is compared to the method recommended by CLSI/IFCC, which uses Box-Cox transformation (BCT) and Tukey's fences for outlier definition. The comparison is done on eight simulated distributions and an indirect clinical datasets. The comparison on simulated distributions shows that without outliers added the recommended method in general defines fewer outliers. However, when outliers are added on one side the proposed method often produces better results. With outliers on both sides the methods are equally good. Furthermore, it is found that the presence of outliers affects the BCT, and subsequently affects the determined limits of current recommended methods. This is especially seen in skewed distributions. The proposed outlier definition reproduced current RI limits on clinical data containing outliers. We find our simple transformation independent outlier detection method as good as or better than the currently recommended methods.

  3. The detection and correction of outlying determinations that may occur during geochemical analysis

    USGS Publications Warehouse

    Harvey, P.K.

    1974-01-01

    'Wild', 'rogue' or outlying determinations occur periodically during geochemical analysis. Existing tests in the literature for the detection of such determinations within a set of replicate measurements are often misleading. This account describes the chances of detecting outliers and the extent to which correction may be made for their presence in sample sizes of three to seven replicate measurements. A systematic procedure for monitoring data for outliers is outlined. The problem of outliers becomes more important as instrumental methods of analysis become faster and more highly automated; a state in which it becomes increasingly difficult for the analyst to examine every determination. The recommended procedure is easily adapted to such analytical systems. ?? 1974.

  4. Detection of outliers in the response and explanatory variables of the simple circular regression model

    NASA Astrophysics Data System (ADS)

    Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah

    2016-06-01

    The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.

  5. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sheng, Y; Ge, Y; Yuan, L

    Purpose: To investigate the impact of outliers on knowledge modeling in radiation therapy, and develop a systematic workflow for identifying and analyzing geometric and dosimetric outliers using pelvic cases. Methods: Four groups (G1-G4) of pelvic plans were included: G1 (37 prostate cases), G2 (37 prostate plus lymph node cases), and G3 (37 prostate bed cases) are all clinical IMRT cases. G4 are 10 plans outside G1 re-planned with dynamic-arc to simulate dosimetric outliers. The workflow involves 2 steps: 1. identify geometric outliers, assess impact and clean up; 2. identify dosimetric outliers, assess impact and clean up.1. A baseline model wasmore » trained with all G1 cases. G2/G3 cases were then individually added to the baseline model as geometric outliers. The impact on the model was assessed by comparing leverage statistic of inliers (G1) and outliers (G2/G3). Receiver-operating-characteristics (ROC) analysis was performed to determine optimal threshold. 2. A separate baseline model was trained with 32 G1 cases. Each G4 case (dosimetric outliers) was then progressively added to perturb this model. DVH predictions were performed using these perturbed models for remaining 5 G1 cases. Normal tissue complication probability (NTCP) calculated from predicted DVH were used to evaluate dosimetric outliers’ impact. Results: The leverage of inliers and outliers was significantly different. The Area-Under-Curve (AUC) for differentiating G2 from G1 was 0.94 (threshold: 0.22) for bladder; and 0.80 (threshold: 0.10) for rectum. For differentiating G3 from G1, the AUC (threshold) was 0.68 (0.09) for bladder, 0.76 (0.08) for rectum. Significant increase in NTCP started from models with 4 dosimetric outliers for bladder (p<0.05), and with only 1 dosimetric outlier for rectum (p<0.05). Conclusion: We established a systematic workflow for identifying and analyzing geometric and dosimetric outliers, and investigated statistical metrics for detecting. Results validated the necessity for outlier detection and clean-up to enhance model quality in clinical practice. Research Grant: Varian master research grant.« less

  6. Data quality enhancement and knowledge discovery from relevant signals in acoustic emission

    NASA Astrophysics Data System (ADS)

    Mejia, Felipe; Shyu, Mei-Ling; Nanni, Antonio

    2015-10-01

    The increasing popularity of structural health monitoring has brought with it a growing need for automated data management and data analysis tools. Of great importance are filters that can systematically detect unwanted signals in acoustic emission datasets. This study presents a semi-supervised data mining scheme that detects data belonging to unfamiliar distributions. This type of outlier detection scheme is useful detecting the presence of new acoustic emission sources, given a training dataset of unwanted signals. In addition to classifying new observations (herein referred to as "outliers") within a dataset, the scheme generates a decision tree that classifies sub-clusters within the outlier context set. The obtained tree can be interpreted as a series of characterization rules for newly-observed data, and they can potentially describe the basic structure of different modes within the outlier distribution. The data mining scheme is first validated on a synthetic dataset, and an attempt is made to confirm the algorithms' ability to discriminate outlier acoustic emission sources from a controlled pencil-lead-break experiment. Finally, the scheme is applied to data from two fatigue crack-growth steel specimens, where it is shown that extracted rules can adequately describe crack-growth related acoustic emission sources while filtering out background "noise." Results show promising performance in filter generation, thereby allowing analysts to extract, characterize, and focus only on meaningful signals.

  7. A Finite Mixture Method for Outlier Detection and Robustness in Meta-Analysis

    ERIC Educational Resources Information Center

    Beath, Ken J.

    2014-01-01

    When performing a meta-analysis unexplained variation above that predicted by within study variation is usually modeled by a random effect. However, in some cases, this is not sufficient to explain all the variation because of outlier or unusual studies. A previously described method is to define an outlier as a study requiring a higher random…

  8. Case-Deletion Diagnostics for Maximum Likelihood Multipoint Quantitative Trait Locus Linkage Analysis

    PubMed Central

    Mendoza, Maria C.B.; Burns, Trudy L.; Jones, Michael P.

    2009-01-01

    Objectives Case-deletion diagnostic methods are tools that allow identification of influential observations that may affect parameter estimates and model fitting conclusions. The goal of this paper was to develop two case-deletion diagnostics, the exact case deletion (ECD) and the empirical influence function (EIF), for detecting outliers that can affect results of sib-pair maximum likelihood quantitative trait locus (QTL) linkage analysis. Methods Subroutines to compute the ECD and EIF were incorporated into the maximum likelihood QTL variance estimation components of the linkage analysis program MAPMAKER/SIBS. Performance of the diagnostics was compared in simulation studies that evaluated the proportion of outliers correctly identified (sensitivity), and the proportion of non-outliers correctly identified (specificity). Results Simulations involving nuclear family data sets with one outlier showed EIF sensitivities approximated ECD sensitivities well for outlier-affected parameters. Sensitivities were high, indicating the outlier was identified a high proportion of the time. Simulations also showed the enormous computational time advantage of the EIF. Diagnostics applied to body mass index in nuclear families detected observations influential on the lod score and model parameter estimates. Conclusions The EIF is a practical diagnostic tool that has the advantages of high sensitivity and quick computation. PMID:19172086

  9. Some Integrated Squared Error Procedures for Multivariate Normal Data,

    DTIC Science & Technology

    1986-01-01

    a lnear regresmion or experimental design model). Our procedures have &lSO been usned wcelyOn non -linear models but we do not addres nan-lnear...of fit, outliers, influence functions, experimental design , cluster analysis, robustness 24L A =TO ACT (VCefme - pvre alli of magsy MW identif by...structured data such as multivariate experimental designs . Several illustrations are provided. * 0 %41 %-. 4.’. * " , -.--, ,. -,, ., -, ’v ’ , " ,,- ,, . -,-. . ., * . - tAma- t

  10. Mixture-Tuned, Clutter Matched Filter for Remote Detection of Subpixel Spectral Signals

    NASA Technical Reports Server (NTRS)

    Thompson, David R.; Mandrake, Lukas; Green, Robert O.

    2013-01-01

    Mapping localized spectral features in large images demands sensitive and robust detection algorithms. Two aspects of large images that can harm matched-filter detection performance are addressed simultaneously. First, multimodal backgrounds may thwart the typical Gaussian model. Second, outlier features can trigger false detections from large projections onto the target vector. Two state-of-the-art approaches are combined that independently address outlier false positives and multimodal backgrounds. The background clustering models multimodal backgrounds, and the mixture tuned matched filter (MT-MF) addresses outliers. Combining the two methods captures significant additional performance benefits. The resulting mixture tuned clutter matched filter (MT-CMF) shows effective performance on simulated and airborne datasets. The classical MNF transform was applied, followed by k-means clustering. Then, each cluster s mean, covariance, and the corresponding eigenvalues were estimated. This yields a cluster-specific matched filter estimate as well as a cluster- specific feasibility score to flag outlier false positives. The technology described is a proof of concept that may be employed in future target detection and mapping applications for remote imaging spectrometers. It is of most direct relevance to JPL proposals for airborne and orbital hyperspectral instruments. Applications include subpixel target detection in hyperspectral scenes for military surveillance. Earth science applications include mineralogical mapping, species discrimination for ecosystem health monitoring, and land use classification.

  11. Visualizing Big Data Outliers through Distributed Aggregation.

    PubMed

    Wilkinson, Leland

    2017-08-29

    Visualizing outliers in massive datasets requires statistical pre-processing in order to reduce the scale of the problem to a size amenable to rendering systems like D3, Plotly or analytic systems like R or SAS. This paper presents a new algorithm, called hdoutliers, for detecting multidimensional outliers. It is unique for a) dealing with a mixture of categorical and continuous variables, b) dealing with big-p (many columns of data), c) dealing with big-n (many rows of data), d) dealing with outliers that mask other outliers, and e) dealing consistently with unidimensional and multidimensional datasets. Unlike ad hoc methods found in many machine learning papers, hdoutliers is based on a distributional model that allows outliers to be tagged with a probability. This critical feature reduces the likelihood of false discoveries.

  12. Development of a multivariate calibration model for the determination of dry extract content in Brazilian commercial bee propolis extracts through UV-Vis spectroscopy

    NASA Astrophysics Data System (ADS)

    Barbeira, Paulo J. S.; Paganotti, Rosilene S. N.; Ássimos, Ariane A.

    2013-10-01

    This study had the objective of determining the content of dry extract of commercial alcoholic extracts of bee propolis through Partial Least Squares (PLS) multivariate calibration and electronic spectroscopy. The PLS model provided a good prediction of dry extract content in commercial alcoholic extracts of bee propolis in the range of 2.7 a 16.8% (m/v), presenting the advantage of being less laborious and faster than the traditional gravimetric methodology. The PLS model was optimized with outlier detection tests according to the ASTM E 1655-05. In this study it was possible to verify that a centrifugation stage is extremely important in order to avoid the presence of waxes, resulting in a more accurate model. Around 50% of the analyzed samples presented content of dry extract lower than the value established by Brazilian legislation, in most cases, the values found were different from the values claimed in the product's label.

  13. The Outlier Detection for Ordinal Data Using Scalling Technique of Regression Coefficients

    NASA Astrophysics Data System (ADS)

    Adnan, Arisman; Sugiarto, Sigit

    2017-06-01

    The aims of this study is to detect the outliers by using coefficients of Ordinal Logistic Regression (OLR) for the case of k category responses where the score from 1 (the best) to 8 (the worst). We detect them by using the sum of moduli of the ordinal regression coefficients calculated by jackknife technique. This technique is improved by scalling the regression coefficients to their means. R language has been used on a set of ordinal data from reference distribution. Furthermore, we compare this approach by using studentised residual plots of jackknife technique for ANOVA (Analysis of Variance) and OLR. This study shows that the jackknifing technique along with the proper scaling may lead us to reveal outliers in ordinal regression reasonably well.

  14. Supervised Outlier Detection in Large-Scale Mvs Point Clouds for 3d City Modeling Applications

    NASA Astrophysics Data System (ADS)

    Stucker, C.; Richard, A.; Wegner, J. D.; Schindler, K.

    2018-05-01

    We propose to use a discriminative classifier for outlier detection in large-scale point clouds of cities generated via multi-view stereo (MVS) from densely acquired images. What makes outlier removal hard are varying distributions of inliers and outliers across a scene. Heuristic outlier removal using a specific feature that encodes point distribution often delivers unsatisfying results. Although most outliers can be identified correctly (high recall), many inliers are erroneously removed (low precision), too. This aggravates object 3D reconstruction due to missing data. We thus propose to discriminatively learn class-specific distributions directly from the data to achieve high precision. We apply a standard Random Forest classifier that infers a binary label (inlier or outlier) for each 3D point in the raw, unfiltered point cloud and test two approaches for training. In the first, non-semantic approach, features are extracted without considering the semantic interpretation of the 3D points. The trained model approximates the average distribution of inliers and outliers across all semantic classes. Second, semantic interpretation is incorporated into the learning process, i.e. we train separate inlieroutlier classifiers per semantic class (building facades, roof, ground, vegetation, fields, and water). Performance of learned filtering is evaluated on several large SfM point clouds of cities. We find that results confirm our underlying assumption that discriminatively learning inlier-outlier distributions does improve precision over global heuristics by up to ≍ 12 percent points. Moreover, semantically informed filtering that models class-specific distributions further improves precision by up to ≍ 10 percent points, being able to remove very isolated building, roof, and water points while preserving inliers on building facades and vegetation.

  15. Observed to expected or logistic regression to identify hospitals with high or low 30-day mortality?

    PubMed Central

    Helgeland, Jon; Clench-Aas, Jocelyne; Laake, Petter; Veierød, Marit B.

    2018-01-01

    Introduction A common quality indicator for monitoring and comparing hospitals is based on death within 30 days of admission. An important use is to determine whether a hospital has higher or lower mortality than other hospitals. Thus, the ability to identify such outliers correctly is essential. Two approaches for detection are: 1) calculating the ratio of observed to expected number of deaths (OE) per hospital and 2) including all hospitals in a logistic regression (LR) comparing each hospital to a form of average over all hospitals. The aim of this study was to compare OE and LR with respect to correctly identifying 30-day mortality outliers. Modifications of the methods, i.e., variance corrected approach of OE (OE-Faris), bias corrected LR (LR-Firth), and trimmed mean variants of LR and LR-Firth were also studied. Materials and methods To study the properties of OE and LR and their variants, we performed a simulation study by generating patient data from hospitals with known outlier status (low mortality, high mortality, non-outlier). Data from simulated scenarios with varying number of hospitals, hospital volume, and mortality outlier status, were analysed by the different methods and compared by level of significance (ability to falsely claim an outlier) and power (ability to reveal an outlier). Moreover, administrative data for patients with acute myocardial infarction (AMI), stroke, and hip fracture from Norwegian hospitals for 2012–2014 were analysed. Results None of the methods achieved the nominal (test) level of significance for both low and high mortality outliers. For low mortality outliers, the levels of significance were increased four- to fivefold for OE and OE-Faris. For high mortality outliers, OE and OE-Faris, LR 25% trimmed and LR-Firth 10% and 25% trimmed maintained approximately the nominal level. The methods agreed with respect to outlier status for 94.1% of the AMI hospitals, 98.0% of the stroke, and 97.8% of the hip fracture hospitals. Conclusion We recommend, on the balance, LR-Firth 10% or 25% trimmed for detection of both low and high mortality outliers. PMID:29652941

  16. Outlier Detection in Infrared Signatures

    DTIC Science & Technology

    1992-01-01

    for model idcntification. Gnanadcsikan (1977) pointed out that Hampci’s influence function (Huampcl (1974)) can bc used to estimate the effect...individual outliers have on sample estimates of parameters. Chernick noted that the influence function for parameters of intcrcst to the users of a data...important outliers, while those with amall estimated influence are not). In this way the influence function provides a "distance" measure for multi

  17. Multiple approaches to detect outliers in a genome scan for selection in ocellated lizards (Lacerta lepida) along an environmental gradient.

    PubMed

    Nunes, Vera L; Beaumont, Mark A; Butlin, Roger K; Paulo, Octávio S

    2011-01-01

    Identification of loci with adaptive importance is a key step to understand the speciation process in natural populations, because those loci are responsible for phenotypic variation that affects fitness in different environments. We conducted an AFLP genome scan in populations of ocellated lizards (Lacerta lepida) to search for candidate loci influenced by selection along an environmental gradient in the Iberian Peninsula. This gradient is strongly influenced by climatic variables, and two subspecies can be recognized at the opposite extremes: L. lepida iberica in the northwest and L. lepida nevadensis in the southeast. Both subspecies show substantial morphological differences that may be involved in their local adaptation to the climatic extremes. To investigate how the use of a particular outlier detection method can influence the results, a frequentist method, DFDIST, and a Bayesian method, BayeScan, were used to search for outliers influenced by selection. Additionally, the spatial analysis method was used to test for associations of AFLP marker band frequencies with 54 climatic variables by logistic regression. Results obtained with each method highlight differences in their sensitivity. DFDIST and BayeScan detected a similar proportion of outliers (3-4%), but only a few loci were simultaneously detected by both methods. Several loci detected as outliers were also associated with temperature, insolation or precipitation according to spatial analysis method. These results are in accordance with reported data in the literature about morphological and life-history variation of L. lepida subspecies along the environmental gradient. © 2010 Blackwell Publishing Ltd.

  18. Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

    PubMed

    Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

    2018-05-03

    Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.

  19. Identifying Outliers of Non-Gaussian Groundwater State Data Based on Ensemble Estimation for Long-Term Trends

    NASA Astrophysics Data System (ADS)

    Park, E.; Jeong, J.; Choi, J.; Han, W. S.; Yun, S. T.

    2016-12-01

    Three modified outlier identification methods: the three sigma rule (3s), inter quantile range (IQR) and median absolute deviation (MAD), which take advantage of the ensemble regression method are proposed. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method is found to have a limitation in the false identification of excessive outliers, which may be supplemented by joint applications with the other methods (i.e., the 3s rule and MAD methods). The proposed methods can be also applied as a potential tool for future anomaly detection by model training based on currently available data.

  20. A new algorithm for automatic Outlier Detection in GPS Time Series

    NASA Astrophysics Data System (ADS)

    Cannavo', Flavio; Mattia, Mario; Rossi, Massimo; Palano, Mimmo; Bruno, Valentina

    2010-05-01

    Nowadays continuous GPS time series are considered a crucial product of GPS permanent networks, useful in many geo-science fields, such as active tectonics, seismology, crustal deformation and volcano monitoring (Altamimi et al. 2002, Elósegui et al. 2006, Aloisi et al. 2009). Although the GPS data elaboration software has increased in reliability, the time series are still affected by different kind of noise, from the intrinsic noise (e.g. thropospheric delay) to the un-modeled noise (e.g. cycle slips, satellite faults, parameters changing). Typically GPS Time Series present characteristic noise that is a linear combination of white noise and correlated colored noise, and this characteristic is fractal in the sense that is evident for every considered time scale or sampling rate. The un-modeled noise sources result in spikes, outliers and steps. These kind of errors can appreciably influence the estimation of velocities of the monitored sites. The outlier detection in generic time series is a widely treated problem in literature (Wei, 2005), while is not fully developed for the specific kind of GPS series. We propose a robust automatic procedure for cleaning the GPS time series from the outliers and, especially for long daily series, steps due to strong seismic or volcanic events or merely instrumentation changing such as antenna and receiver upgrades. The procedure is basically divided in two steps: a first step for the colored noise reduction and a second step for outlier detection through adaptive series segmentation. Both algorithms present novel ideas and are nearly unsupervised. In particular, we propose an algorithm to estimate an autoregressive model for colored noise in GPS time series in order to subtract the effect of non Gaussian noise on the series. This step is useful for the subsequent step (i.e. adaptive segmentation) which requires the hypothesis of Gaussian noise. The proposed algorithms are tested in a benchmark case study and the results confirm that the algorithms are effective and reasonable. Bibliography - Aloisi M., A. Bonaccorso, F. Cannavò, S. Gambino, M. Mattia, G. Puglisi, E. Boschi, A new dyke intrusion style for the Mount Etna May 2008 eruption modelled through continuous tilt and GPS data, Terra Nova, Volume 21 Issue 4 , Pages 316 - 321, doi: 10.1111/j.1365-3121.2009.00889.x (August 2009) - Altamimi Z., Sillard P., Boucher C., ITRF2000: A new release of the International Terrestrial Reference frame for earth science applications, J Geophys Res-Solid Earth, 107 (B10): art. no.-2214, (Oct 2002) - Elósegui, P., J. L. Davis, D. Oberlander, R. Baena, and G. Ekström , Accuracy of high-rate GPS for seismology, Geophys. Res. Lett., 33, L11308, doi:10.1029/2006GL026065 (2006) - Wei W. S., Time Series Analysis: Univariate and Multivariate Methods, Addison Wesley (2 edition), ISBN-10: 0321322169 (July, 2005)

  1. Detection of Outliers in TWSTFT Data Used in TAI

    DTIC Science & Technology

    2009-11-01

    41st Annual Precise Time and Time Interval (PTTI) Meeting 421 DETECTION OF OUTLIERS IN TWSTFT DATA USED IN TAI A...data in two-way satellite time and frequency transfer ( TWSTFT ) time links. In the case of TWSTFT data used to calculate International Atomic Time...data; that TWSTFT links can show an underlying slope which renders the standard treatment more difficult. Using phase and frequency filtering

  2. PCA leverage: outlier detection for high-dimensional functional magnetic resonance imaging data.

    PubMed

    Mejia, Amanda F; Nebel, Mary Beth; Eloyan, Ani; Caffo, Brian; Lindquist, Martin A

    2017-07-01

    Outlier detection for high-dimensional (HD) data is a popular topic in modern statistical research. However, one source of HD data that has received relatively little attention is functional magnetic resonance images (fMRI), which consists of hundreds of thousands of measurements sampled at hundreds of time points. At a time when the availability of fMRI data is rapidly growing-primarily through large, publicly available grassroots datasets-automated quality control and outlier detection methods are greatly needed. We propose principal components analysis (PCA) leverage and demonstrate how it can be used to identify outlying time points in an fMRI run. Furthermore, PCA leverage is a measure of the influence of each observation on the estimation of principal components, which are often of interest in fMRI data. We also propose an alternative measure, PCA robust distance, which is less sensitive to outliers and has controllable statistical properties. The proposed methods are validated through simulation studies and are shown to be highly accurate. We also conduct a reliability study using resting-state fMRI data from the Autism Brain Imaging Data Exchange and find that removal of outliers using the proposed methods results in more reliable estimation of subject-level resting-state networks using independent components analysis. © The Author 2017. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.

  3. Applying robust variant of Principal Component Analysis as a damage detector in the presence of outliers

    NASA Astrophysics Data System (ADS)

    Gharibnezhad, Fahit; Mujica, Luis E.; Rodellar, José

    2015-01-01

    Using Principal Component Analysis (PCA) for Structural Health Monitoring (SHM) has received considerable attention over the past few years. PCA has been used not only as a direct method to identify, classify and localize damages but also as a significant primary step for other methods. Despite several positive specifications that PCA conveys, it is very sensitive to outliers. Outliers are anomalous observations that can affect the variance and the covariance as vital parts of PCA method. Therefore, the results based on PCA in the presence of outliers are not fully satisfactory. As a main contribution, this work suggests the use of robust variant of PCA not sensitive to outliers, as an effective way to deal with this problem in SHM field. In addition, the robust PCA is compared with the classical PCA in the sense of detecting probable damages. The comparison between the results shows that robust PCA can distinguish the damages much better than using classical one, and even in many cases allows the detection where classic PCA is not able to discern between damaged and non-damaged structures. Moreover, different types of robust PCA are compared with each other as well as with classical counterpart in the term of damage detection. All the results are obtained through experiments with an aircraft turbine blade using piezoelectric transducers as sensors and actuators and adding simulated damages.

  4. Inference on the Ranks of the Canonical Correlation Matrices for Elliptically Symmetric Populations.

    DTIC Science & Technology

    1985-05-01

    robust estimates of the covariance matrix, the reader is referred to Devlin, Gnanadesikan and Kettenring (1975) and Maronna (1976). Murihead and...contoured distributions. J. Multivariate Anal. 11, 368-385. 6. DEVLIN, S.J. GNANADESIKAN , R. and KETTENRING, J. (1975). Robust estima- tion and outlier

  5. Identifying outliers of non-Gaussian groundwater state data based on ensemble estimation for long-term trends

    NASA Astrophysics Data System (ADS)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kueyoung; Choung, Sungwook; Chung, Il Moon

    2017-05-01

    A hydrogeological dataset often includes substantial deviations that need to be inspected. In the present study, three outlier identification methods - the three sigma rule (3σ), inter quantile range (IQR), and median absolute deviation (MAD) - that take advantage of the ensemble regression method are proposed by considering non-Gaussian characteristics of groundwater data. For validation purposes, the performance of the methods is compared using simulated and actual groundwater data with a few hypothetical conditions. In the validations using simulated data, all of the proposed methods reasonably identify outliers at a 5% outlier level; whereas, only the IQR method performs well for identifying outliers at a 30% outlier level. When applying the methods to real groundwater data, the outlier identification performance of the IQR method is found to be superior to the other two methods. However, the IQR method shows limitation by identifying excessive false outliers, which may be overcome by its joint application with other methods (for example, the 3σ rule and MAD methods). The proposed methods can be also applied as potential tools for the detection of future anomalies by model training based on currently available data.

  6. Micro- and macro-geographic scale effect on the molecular imprint of selection and adaptation in Norway spruce.

    PubMed

    Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B

    2014-01-01

    Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F(ST)-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F(ST)-outlier methods detected together 11 F(ST)-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F(ST)-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation.

  7. Micro- and Macro-Geographic Scale Effect on the Molecular Imprint of Selection and Adaptation in Norway Spruce

    PubMed Central

    Scalfi, Marta; Mosca, Elena; Di Pierro, Erica Adele; Troggio, Michela; Vendramin, Giovanni Giuseppe; Sperisen, Christoph; La Porta, Nicola; Neale, David B.

    2014-01-01

    Forest tree species of temperate and boreal regions have undergone a long history of demographic changes and evolutionary adaptations. The main objective of this study was to detect signals of selection in Norway spruce (Picea abies [L.] Karst), at different sampling-scales and to investigate, accounting for population structure, the effect of environment on species genetic diversity. A total of 384 single nucleotide polymorphisms (SNPs) representing 290 genes were genotyped at two geographic scales: across 12 populations distributed along two altitudinal-transects in the Alps (micro-geographic scale), and across 27 populations belonging to the range of Norway spruce in central and south-east Europe (macro-geographic scale). At the macrogeographic scale, principal component analysis combined with Bayesian clustering revealed three major clusters, corresponding to the main areas of southern spruce occurrence, i.e. the Alps, Carpathians, and Hercynia. The populations along the altitudinal transects were not differentiated. To assess the role of selection in structuring genetic variation, we applied a Bayesian and coalescent-based F ST-outlier method and tested for correlations between allele frequencies and climatic variables using regression analyses. At the macro-geographic scale, the F ST-outlier methods detected together 11 F ST-outliers. Six outliers were detected when the same analyses were carried out taking into account the genetic structure. Regression analyses with population structure correction resulted in the identification of two (micro-geographic scale) and 38 SNPs (macro-geographic scale) significantly correlated with temperature and/or precipitation. Six of these loci overlapped with F ST-outliers, among them two loci encoding an enzyme involved in riboflavin biosynthesis and a sucrose synthase. The results of this study indicate a strong relationship between genetic and environmental variation at both geographic scales. It also suggests that an integrative approach combining different outlier detection methods and population sampling at different geographic scales is useful to identify loci potentially involved in adaptation. PMID:25551624

  8. Iterative outlier removal: A method for identifying outliers in laboratory recalibration studies

    PubMed Central

    Parrinello, Christina M.; Grams, Morgan E.; Sang, Yingying; Couper, David; Wruck, Lisa M.; Li, Danni; Eckfeldt, John H.; Selvin, Elizabeth; Coresh, Josef

    2016-01-01

    Background Extreme values that arise for any reason, including through non-laboratory measurement procedure-related processes (inadequate mixing, evaporation, mislabeling), lead to outliers and inflate errors in recalibration studies. We present an approach termed iterative outlier removal (IOR) for identifying such outliers. Methods We previously identified substantial laboratory drift in uric acid measurements in the Atherosclerosis Risk in Communities (ARIC) Study over time. Serum uric acid was originally measured in 1990–92 on a Coulter DACOS instrument using an uricase-based measurement procedure. To recalibrate previous measured concentrations to a newer enzymatic colorimetric measurement procedure, uric acid was re-measured in 200 participants from stored plasma in 2011–13 on a Beckman Olympus 480 autoanalyzer. To conduct IOR, we excluded data points >3 standard deviations (SDs) from the mean difference. We continued this process using the resulting data until no outliers remained. Results IOR detected more outliers and yielded greater precision in simulation. The original mean difference (SD) in uric acid was 1.25 (0.62) mg/dL. After four iterations, 9 outliers were excluded, and the mean difference (SD) was 1.23 (0.45) mg/dL. Conducting only one round of outlier removal (standard approach) would have excluded 4 outliers (mean difference [SD] = 1.22 [0.51] mg/dL). Applying the recalibration (derived from Deming regression) from each approach to the original measurements, the prevalence of hyperuricemia (>7 mg/dL) was 28.5% before IOR and 8.5% after IOR. Conclusion IOR is a useful method for removal of extreme outliers irrelevant to recalibrating laboratory measurements, and identifies more extraneous outliers than the standard approach. PMID:27197675

  9. Moving object detection via low-rank total variation regularization

    NASA Astrophysics Data System (ADS)

    Wang, Pengcheng; Chen, Qian; Shao, Na

    2016-09-01

    Moving object detection is a challenging task in video surveillance. Recently proposed Robust Principal Component Analysis (RPCA) can recover the outlier patterns from the low-rank data under some mild conditions. However, the l-penalty in RPCA doesn't work well in moving object detection because the irrepresentable condition is often not satisfied. In this paper, a method based on total variation (TV) regularization scheme is proposed. In our model, image sequences captured with a static camera are highly related, which can be described using a low-rank matrix. Meanwhile, the low-rank matrix can absorb background motion, e.g. periodic and random perturbation. The foreground objects in the sequence are usually sparsely distributed and drifting continuously, and can be treated as group outliers from the highly-related background scenes. Instead of l-penalty, we exploit the total variation of the foreground. By minimizing the total variation energy, the outliers tend to collapse and finally converge to be the exact moving objects. The TV-penalty is superior to the l-penalty especially when the outlier is in the majority for some pixels, and our method can estimate the outlier explicitly with less bias but higher variance. To solve the problem, a joint optimization function is formulated and can be effectively solved through the inexact Augmented Lagrange Multiplier (ALM) method. We evaluate our method along with several state-of-the-art approaches in MATLAB. Both qualitative and quantitative results demonstrate that our proposed method works effectively on a large range of complex scenarios.

  10. Detecting isotopic ratio outliers

    NASA Astrophysics Data System (ADS)

    Bayne, C. K.; Smith, D. H.

    An alternative method is proposed for improving isotopic ratio estimates. This method mathematically models pulse-count data and uses iterative reweighted Poisson regression to estimate model parameters to calculate the isotopic ratios. This computer-oriented approach provides theoretically better methods than conventional techniques to establish error limits and to identify outliers.

  11. Confirmatory Factor Analysis on the Professional Suitability Scale for Social Work Practice

    ERIC Educational Resources Information Center

    Tam, Dora M. Y.; Twigg, Robert C.; Boey, Kam-Wing; Kwok, Siu-Ming

    2013-01-01

    Objective: This article presents a validation study to examine the factor structure of an instrument designed to measure professional suitability for social work practice. Method: Data were collected from registered social workers in a provincial mailed survey. The response rate was 23.2%. After eliminating five cases with multivariate outliers,…

  12. A feasibility study in adapting Shamos Bickel and Hodges Lehman estimator into T-Method for normalization

    NASA Astrophysics Data System (ADS)

    Harudin, N.; Jamaludin, K. R.; Muhtazaruddin, M. Nabil; Ramlie, F.; Muhamad, Wan Zuki Azman Wan

    2018-03-01

    T-Method is one of the techniques governed under Mahalanobis Taguchi System that developed specifically for multivariate data predictions. Prediction using T-Method is always possible even with very limited sample size. The user of T-Method required to clearly understanding the population data trend since this method is not considering the effect of outliers within it. Outliers may cause apparent non-normality and the entire classical methods breakdown. There exist robust parameter estimate that provide satisfactory results when the data contain outliers, as well as when the data are free of them. The robust parameter estimates of location and scale measure called Shamos Bickel (SB) and Hodges Lehman (HL) which are used as a comparable method to calculate the mean and standard deviation of classical statistic is part of it. Embedding these into T-Method normalize stage feasibly help in enhancing the accuracy of the T-Method as well as analysing the robustness of T-method itself. However, the result of higher sample size case study shows that T-method is having lowest average error percentages (3.09%) on data with extreme outliers. HL and SB is having lowest error percentages (4.67%) for data without extreme outliers with minimum error differences compared to T-Method. The error percentages prediction trend is vice versa for lower sample size case study. The result shows that with minimum sample size, which outliers always be at low risk, T-Method is much better on that, while higher sample size with extreme outliers, T-Method as well show better prediction compared to others. For the case studies conducted in this research, it shows that normalization of T-Method is showing satisfactory results and it is not feasible to adapt HL and SB or normal mean and standard deviation into it since it’s only provide minimum effect of percentages errors. Normalization using T-method is still considered having lower risk towards outlier’s effect.

  13. Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection

    DTIC Science & Technology

    2015-11-01

    specific, if the anomaly behaves as a sudden outlier after which the data stream goes back to normal state, then the anomalous data point should be...introduced three types of anomalies , all of them are sudden outliers . 438 J. Huang and X. Ning Table 2. Synthetic dataset: AUC and parameters method...Latent Space Tracking from Heterogeneous Data with an Application for Anomaly Detection Jiaji Huang1(B) and Xia Ning2 1 Department of Electrical

  14. Optimum outlier model for potential improvement of environmental cleaning and disinfection.

    PubMed

    Rupp, Mark E; Huerta, Tomas; Cavalieri, R J; Lyden, Elizabeth; Van Schooneveld, Trevor; Carling, Philip; Smith, Philip W

    2014-06-01

    The effectiveness and efficiency of 17 housekeepers in terminal cleaning 292 hospital rooms was evaluated through adenosine triphosphate detection. A subgroup of housekeepers was identified who were significantly more effective and efficient than their coworkers. These optimum outliers may be used in performance improvement to optimize environmental cleaning.

  15. Novel Hyperspectral Anomaly Detection Methods Based on Unsupervised Nearest Regularized Subspace

    NASA Astrophysics Data System (ADS)

    Hou, Z.; Chen, Y.; Tan, K.; Du, P.

    2018-04-01

    Anomaly detection has been of great interest in hyperspectral imagery analysis. Most conventional anomaly detectors merely take advantage of spectral and spatial information within neighboring pixels. In this paper, two methods of Unsupervised Nearest Regularized Subspace-based with Outlier Removal Anomaly Detector (UNRSORAD) and Local Summation UNRSORAD (LSUNRSORAD) are proposed, which are based on the concept that each pixel in background can be approximately represented by its spatial neighborhoods, while anomalies cannot. Using a dual window, an approximation of each testing pixel is a representation of surrounding data via a linear combination. The existence of outliers in the dual window will affect detection accuracy. Proposed detectors remove outlier pixels that are significantly different from majority of pixels. In order to make full use of various local spatial distributions information with the neighboring pixels of the pixels under test, we take the local summation dual-window sliding strategy. The residual image is constituted by subtracting the predicted background from the original hyperspectral imagery, and anomalies can be detected in the residual image. Experimental results show that the proposed methods have greatly improved the detection accuracy compared with other traditional detection method.

  16. Oximeter for reliable clinical determination of blood oxygen saturation in a fetus

    DOEpatents

    Robinson, Mark R.; Haaland, David M.; Ward, Kenneth J.

    1996-01-01

    With the crude instrumentation now in use to continuously monitor the status of the fetus at delivery, the obstetrician and labor room staff not only over-recognize the possibility of fetal distress with the resultant rise in operative deliveries, but at times do not identify fetal distress which may result in preventable fetal neurological harm. The invention, which addresses these two basic problems, comprises a method and apparatus for non-invasive determination of blood oxygen saturation in the fetus. The apparatus includes a multiple frequency light source which is coupled to an optical fiber. The output of the fiber is used to illuminate blood containing tissue of the fetus. In the preferred embodiment, the reflected light is transmitted back to the apparatus where the light intensities are simultaneously detected at multiple frequencies. The resulting spectrum is then analyzed for determination of oxygen saturation. The analysis method uses multivariate calibration techniques that compensate for nonlinear spectral response, model interfering spectral responses and detect outlier data with high sensitivity.

  17. Analysis of lightning outliers in the EUCLID network

    NASA Astrophysics Data System (ADS)

    Poelman, Dieter R.; Schulz, Wolfgang; Kaltenboeck, Rudolf; Delobbe, Laurent

    2017-11-01

    Lightning data as observed by the European Cooperation for Lightning Detection (EUCLID) network are used in combination with radar data to retrieve the temporal and spatial behavior of lightning outliers, i.e., discharges located in a wrong place, over a 5-year period from 2011 to 2016. Cloud-to-ground (CG) stroke and intracloud (IC) pulse data are superimposed on corresponding 5 min radar precipitation fields in two topographically different areas, Belgium and Austria, in order to extract lightning outliers based on the distance between each lightning event and the nearest precipitation. It is shown that the percentage of outliers is sensitive to changes in the network and to the location algorithm itself. The total percentage of outliers for both regions varies over the years between 0.8 and 1.7 % for a distance to the nearest precipitation of 2 km, with an average of approximately 1.2 % in Belgium and Austria. Outside the European summer thunderstorm season, the percentage of outliers tends to increase somewhat. The majority of all the outliers are low peak current events with absolute values falling between 0 and 10 kA. More specifically, positive cloud-to-ground strokes are more likely to be classified as outliers compared to all other types of discharges. Furthermore, it turns out that the number of sensors participating in locating a lightning discharge is different for outliers versus correctly located events, with outliers having the lowest amount of sensors participating. In addition, it is shown that in most cases the semi-major axis (SMA) assigned to a lightning discharge as a confidence indicator in the location accuracy (LA) is smaller for correctly located events compared to the semi-major axis of outliers.

  18. Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.

    ERIC Educational Resources Information Center

    Jarrell, Michele G.

    A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…

  19. Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie

    PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less

  20. Detecting Anomalies from End-to-End Internet Performance Measurements (PingER) Using Cluster Based Local Outlier Factor

    DOE PAGES

    Ali, Saqib; Wang, Guojun; Cottrell, Roger Leslie; ...

    2018-05-28

    PingER (Ping End-to-End Reporting) is a worldwide end-to-end Internet performance measurement framework. It was developed by the SLAC National Accelerator Laboratory, Stanford, USA and running from the last 20 years. It has more than 700 monitoring agents and remote sites which monitor the performance of Internet links around 170 countries of the world. At present, the size of the compressed PingER data set is about 60 GB comprising of 100,000 flat files. The data is publicly available for valuable Internet performance analyses. However, the data sets suffer from missing values and anomalies due to congestion, bottleneck links, queuing overflow, networkmore » software misconfiguration, hardware failure, cable cuts, and social upheavals. Therefore, the objective of this paper is to detect such performance drops or spikes labeled as anomalies or outliers for the PingER data set. In the proposed approach, the raw text files of the data set are transformed into a PingER dimensional model. The missing values are imputed using the k-NN algorithm. The data is partitioned into similar instances using the k-means clustering algorithm. Afterward, clustering is integrated with the Local Outlier Factor (LOF) using the Cluster Based Local Outlier Factor (CBLOF) algorithm to detect the anomalies or outliers from the PingER data. Lastly, anomalies are further analyzed to identify the time frame and location of the hosts generating the major percentage of the anomalies in the PingER data set ranging from 1998 to 2016.« less

  1. Anomaly Detection in Large Sets of High-Dimensional Symbol Sequences

    NASA Technical Reports Server (NTRS)

    Budalakoti, Suratna; Srivastava, Ashok N.; Akella, Ram; Turkov, Eugene

    2006-01-01

    This paper addresses the problem of detecting and describing anomalies in large sets of high-dimensional symbol sequences. The approach taken uses unsupervised clustering of sequences using the normalized longest common subsequence (LCS) as a similarity measure, followed by detailed analysis of outliers to detect anomalies. As the LCS measure is expensive to compute, the first part of the paper discusses existing algorithms, such as the Hunt-Szymanski algorithm, that have low time-complexity. We then discuss why these algorithms often do not work well in practice and present a new hybrid algorithm for computing the LCS that, in our tests, outperforms the Hunt-Szymanski algorithm by a factor of five. The second part of the paper presents new algorithms for outlier analysis that provide comprehensible indicators as to why a particular sequence was deemed to be an outlier. The algorithms provide a coherent description to an analyst of the anomalies in the sequence, compared to more normal sequences. The algorithms we present are general and domain-independent, so we discuss applications in related areas such as anomaly detection.

  2. Modeling Data Containing Outliers using ARIMA Additive Outlier (ARIMA-AO)

    NASA Astrophysics Data System (ADS)

    Saleh Ahmar, Ansari; Guritno, Suryo; Abdurakhman; Rahman, Abdul; Awi; Alimuddin; Minggi, Ilham; Arif Tiro, M.; Kasim Aidid, M.; Annas, Suwardi; Utami Sutiksno, Dian; Ahmar, Dewi S.; Ahmar, Kurniawan H.; Abqary Ahmar, A.; Zaki, Ahmad; Abdullah, Dahlan; Rahim, Robbi; Nurdiyanto, Heri; Hidayat, Rahmat; Napitupulu, Darmawan; Simarmata, Janner; Kurniasih, Nuning; Andretti Abdillah, Leon; Pranolo, Andri; Haviluddin; Albra, Wahyudin; Arifin, A. Nurani M.

    2018-01-01

    The aim this study is discussed on the detection and correction of data containing the additive outlier (AO) on the model ARIMA (p, d, q). The process of detection and correction of data using an iterative procedure popularized by Box, Jenkins, and Reinsel (1994). By using this method we obtained an ARIMA models were fit to the data containing AO, this model is added to the original model of ARIMA coefficients obtained from the iteration process using regression methods. In the simulation data is obtained that the data contained AO initial models are ARIMA (2,0,0) with MSE = 36,780, after the detection and correction of data obtained by the iteration of the model ARIMA (2,0,0) with the coefficients obtained from the regression Zt = 0,106+0,204Z t-1+0,401Z t-2-329X 1(t)+115X 2(t)+35,9X 3(t) and MSE = 19,365. This shows that there is an improvement of forecasting error rate data.

  3. Clustering analysis of line indices for LAMOST spectra with AstroStat

    NASA Astrophysics Data System (ADS)

    Chen, Shu-Xin; Sun, Wei-Min; Yan, Qi

    2018-06-01

    The application of data mining in astronomical surveys, such as the Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) survey, provides an effective approach to automatically analyze a large amount of complex survey data. Unsupervised clustering could help astronomers find the associations and outliers in a big data set. In this paper, we employ the k-means method to perform clustering for the line index of LAMOST spectra with the powerful software AstroStat. Implementing the line index approach for analyzing astronomical spectra is an effective way to extract spectral features for low resolution spectra, which can represent the main spectral characteristics of stars. A total of 144 340 line indices for A type stars is analyzed through calculating their intra and inter distances between pairs of stars. For intra distance, we use the definition of Mahalanobis distance to explore the degree of clustering for each class, while for outlier detection, we define a local outlier factor for each spectrum. AstroStat furnishes a set of visualization tools for illustrating the analysis results. Checking the spectra detected as outliers, we find that most of them are problematic data and only a few correspond to rare astronomical objects. We show two examples of these outliers, a spectrum with abnormal continuumand a spectrum with emission lines. Our work demonstrates that line index clustering is a good method for examining data quality and identifying rare objects.

  4. A New Quality Control Method base on IRMCD for Wind Profiler Observation towards Future Assimilation Application

    NASA Astrophysics Data System (ADS)

    Chen, Min; Zhang, Yu

    2017-04-01

    A wind profiler network with a total of 65 profiling radars was operated by the MOC/CMA in China until July 2015. In this study, a quality control procedure is constructed to incorporate the profiler data from the wind-profiling network into the local data assimilation and forecasting system (BJRUC). The procedure applies a blacklisting check that removes stations with gross errors and an outlier check that rejects data with large deviations from the background. Instead of the bi-weighting method, which has been commonly implemented in outlier elimination for one-dimensional scalar observations, an outlier elimination method is developed based on the iterated reweighted minimum covariance determinant (IRMCD) for multi-variate observations such as wind profiler data. A quality control experiment is separately performed for subsets containing profiler data tagged in parallel with/without rain flags at every 00UTC/12UTC from 20 June to 30 Sep 2015. From the results, we find that with the quality control, the frequency distributions of the differences between the observations and model background become more Gaussian-like and meet the requirements of a Gaussian distribution for data assimilation. Further intensive assessment for each quality control step reveals that the stations rejected by blacklisting contain poor data quality, and the IRMCD rejects outliers in a robust and physically reasonable manner.

  5. Accuracy of GIPSY PPP from version 6.2: a robust method to remove outliers

    NASA Astrophysics Data System (ADS)

    Hayal, Adem G.; Ugur Sanli, D.

    2014-05-01

    In this paper, we figure out the accuracy of GIPSY PPP from the latest version, version 6.2. As the research community prepares for the real-time PPP, it would be interesting to revise the accuracy of static GPS from the latest version of well established research software, the first among its kinds. Although the results do not significantly differ from the previous version, version 6.1.1, we still observe the slight improvement on the vertical component due to an enhanced second order ionospheric modeling which came out with the latest version. However, in this study, we rather turned our attention into outlier detection. Outliers usually occur among the solutions from shorter observation sessions and degrade the quality of the accuracy modeling. In our previous analysis from version 6.1.1, we argued that the elimination of outliers was cumbersome with the traditional method since repeated trials were needed, and subjectivity that could affect the statistical significance of the solutions might have been existed among the results (Hayal and Sanli, 2013). Here we overcome this problem using a robust outlier elimination method. Median is perhaps the simplest of the robust outlier detection methods in terms of applicability. At the same time, it might be considered to be the most efficient one with its highest breakdown point. In our analysis, we used a slightly different version of the median as introduced in Tut et al. 2013. Hence, we were able to remove suspected outliers at one run; which were, with the traditional methods, more problematic to remove this time from the solutions produced using the latest version of the software. References Hayal, AG, Sanli DU, Accuracy of GIPSY PPP from version 6, GNSS Precise Point Positioning Workshop: Reaching Full Potential, Vol. 1, pp. 41-42, (2013) Tut,İ., Sanli D.U., Erdogan B., Hekimoglu S., Efficiency of BERNESE single baseline rapid static positioning solutions with SEARCH strategy, Survey Review, Vol. 45, Issue 331, pp.296-304, (2013)

  6. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    USGS Publications Warehouse

    Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

    2013-01-01

    he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  7. Outlier Detection for Patient Monitoring and Alerting

    PubMed Central

    Hauskrecht, Milos; Batal, Iyad; Valko, Michal; Visweswaran, Shyam; Cooper, Gregory F.; Clermont, Gilles

    2012-01-01

    We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management decisions using past patient cases stored in electronic health records (EHRs). Our hypothesis is that a patient-management decision that is unusual with respect to past patient care may be due to an error and that it is worthwhile to generate an alert if such a decision is encountered. We evaluate this hypothesis using data obtained from EHRs of 4,486 post-cardiac surgical patients and a subset of 222 alerts generated from the data. We base the evaluation on the opinions of a panel of experts. The results of the study support our hypothesis that the outlier-based alerting can lead to promising true alert rates. We observed true alert rates that ranged from 25% to 66% for a variety of patient-management actions, with 66% corresponding to the strongest outliers. PMID:22944172

  8. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    NASA Astrophysics Data System (ADS)

    Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

    2013-08-01

    The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  9. Rumbling Orchids: How To Assess Divergent Evolution Between Chloroplast Endosymbionts and the Nuclear Host.

    PubMed

    Pérez-Escobar, Oscar Alejandro; Balbuena, Juan Antonio; Gottschling, Marc

    2016-01-01

    Phylogenetic relationships inferred from multilocus organellar and nuclear DNA data are often difficult to resolve because of evolutionary conflicts among gene trees. However, conflicting or "outlier" associations (i.e., linked pairs of "operational terminal units" in two phylogenies) among these data sets often provide valuable information on evolutionary processes such as chloroplast capture following hybridization, incomplete lineage sorting, and horizontal gene transfer. Statistical tools that to date have been used in cophylogenetic studies only also have the potential to test for the degree of topological congruence between organellar and nuclear data sets and reliably detect outlier associations. Two distance-based methods, namely ParaFit and Procrustean Approach to Cophylogeny (PACo), were used in conjunction to detect those outliers contributing to conflicting phylogenies independently derived from chloroplast and nuclear sequence data. We explored their efficiency of retrieving outlier associations, and the impact of input data (unit branch length and additive trees) between data sets, by using several simulation approaches. To test their performance using real data sets, we additionally inferred the phylogenetic relationships within Neotropical Catasetinae (Epidendroideae, Orchidaceae), which is a suitable group to investigate phylogenetic incongruence because of hybridization processes between some of its constituent species. A comparison between trees derived from chloroplast and nuclear sequence data reflected strong, well-supported incongruence within Catasetum, Cycnoches, and Mormodes. As a result, outliers among chloroplast and nuclear data sets, and in experimental simulations, were successfully detected by PACo when using patristic distance matrices obtained from phylograms, but not from unit branch length trees. The performance of ParaFit was overall inferior compared to PACo, using either phylograms or unit branch lengths as input data. Because workflows for applying cophylogenetic analyses are not standardized yet, we provide a pipeline for executing PACo and ParaFit as well as displaying outlier associations in plots and trees by using the software R. The pipeline renders a method to identify outliers with high reliability and to assess the combinability of the independently derived data sets by means of statistical analyses. © The Author(s) 2015. Published by Oxford University Press, on behalf of the Society of Systematic Biologists. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  10. Comparison of outlier identification methods in hospital surgical quality improvement programs.

    PubMed

    Bilimoria, Karl Y; Cohen, Mark E; Merkow, Ryan P; Wang, Xue; Bentrem, David J; Ingraham, Angela M; Richards, Karen; Hall, Bruce L; Ko, Clifford Y

    2010-10-01

    Surgeons and hospitals are being increasingly assessed by third parties regarding surgical quality and outcomes, and much of this information is reported publicly. Our objective was to compare various methods used to classify hospitals as outliers in established surgical quality assessment programs by applying each approach to a single data set. Using American College of Surgeons National Surgical Quality Improvement Program data (7/2008-6/2009), hospital risk-adjusted 30-day morbidity and mortality were assessed for general surgery at 231 hospitals (cases = 217,630) and for colorectal surgery at 109 hospitals (cases = 17,251). The number of outliers (poor performers) identified using different methods and criteria were compared. The overall morbidity was 10.3% for general surgery and 25.3% for colorectal surgery. The mortality was 1.6% for general surgery and 4.0% for colorectal surgery. Programs used different methods (logistic regression, hierarchical modeling, partitioning) and criteria (P < 0.01, P < 0.05, P < 0.10) to identify outliers. Depending on outlier identification methods and criteria employed, when each approach was applied to this single dataset, the number of outliers ranged from 7 to 57 hospitals for general surgery morbidity, 1 to 57 hospitals for general surgery mortality, 4 to 27 hospitals for colorectal morbidity, and 0 to 27 hospitals for colorectal mortality. There was considerable variation in the number of outliers identified using different detection approaches. Quality programs seem to be utilizing outlier identification methods contrary to what might be expected, thus they should justify their methodology based on the intent of the program (i.e., quality improvement vs. reimbursement). Surgeons and hospitals should be aware of variability in methods used to assess their performance as these outlier designations will likely have referral and reimbursement consequences.

  11. Patient classification as an outlier detection problem: An application of the One-Class Support Vector Machine

    PubMed Central

    Mourão-Miranda, Janaina; Hardoon, David R.; Hahn, Tim; Marquand, Andre F.; Williams, Steve C.R.; Shawe-Taylor, John; Brammer, Michael

    2011-01-01

    Pattern recognition approaches, such as the Support Vector Machine (SVM), have been successfully used to classify groups of individuals based on their patterns of brain activity or structure. However these approaches focus on finding group differences and are not applicable to situations where one is interested in accessing deviations from a specific class or population. In the present work we propose an application of the one-class SVM (OC-SVM) to investigate if patterns of fMRI response to sad facial expressions in depressed patients would be classified as outliers in relation to patterns of healthy control subjects. We defined features based on whole brain voxels and anatomical regions. In both cases we found a significant correlation between the OC-SVM predictions and the patients' Hamilton Rating Scale for Depression (HRSD), i.e. the more depressed the patients were the more of an outlier they were. In addition the OC-SVM split the patient groups into two subgroups whose membership was associated with future response to treatment. When applied to region-based features the OC-SVM classified 52% of patients as outliers. However among the patients classified as outliers 70% did not respond to treatment and among those classified as non-outliers 89% responded to treatment. In addition 89% of the healthy controls were classified as non-outliers. PMID:21723950

  12. Outlier Analysis Defines Zinc Finger Gene Family DNA Methylation in Tumors and Saliva of Head and Neck Cancer Patients.

    PubMed

    Gaykalova, Daria A; Vatapalli, Rajita; Wei, Yingying; Tsai, Hua-Ling; Wang, Hao; Zhang, Chi; Hennessey, Patrick T; Guo, Theresa; Tan, Marietta; Li, Ryan; Ahn, Julie; Khan, Zubair; Westra, William H; Bishop, Justin A; Zaboli, David; Koch, Wayne M; Khan, Tanbir; Ochs, Michael F; Califano, Joseph A

    2015-01-01

    Head and Neck Squamous Cell Carcinoma (HNSCC) is the fifth most common cancer, annually affecting over half a million people worldwide. Presently, there are no accepted biomarkers for clinical detection and surveillance of HNSCC. In this work, a comprehensive genome-wide analysis of epigenetic alterations in primary HNSCC tumors was employed in conjunction with cancer-specific outlier statistics to define novel biomarker genes which are differentially methylated in HNSCC. The 37 identified biomarker candidates were top-scoring outlier genes with prominent differential methylation in tumors, but with no signal in normal tissues. These putative candidates were validated in independent HNSCC cohorts from our institution and TCGA (The Cancer Genome Atlas). Using the top candidates, ZNF14, ZNF160, and ZNF420, an assay was developed for detection of HNSCC cancer in primary tissue and saliva samples with 100% specificity when compared to normal control samples. Given the high detection specificity, the analysis of ZNF DNA methylation in combination with other DNA methylation biomarkers may be useful in the clinical setting for HNSCC detection and surveillance, particularly in high-risk patients. Several additional candidates identified through this work can be further investigated toward future development of a multi-gene panel of biomarkers for the surveillance and detection of HNSCC.

  13. Application of two tests of multivariate discordancy to fisheries data sets

    USGS Publications Warehouse

    Stapanian, M.A.; Kocovsky, P.M.; Garner, F.C.

    2008-01-01

    The generalized (Mahalanobis) distance and multivariate kurtosis are two powerful tests of multivariate discordancies (outliers). Unlike the generalized distance test, the multivariate kurtosis test has not been applied as a test of discordancy to fisheries data heretofore. We applied both tests, along with published algorithms for identifying suspected causal variable(s) of discordant observations, to two fisheries data sets from Lake Erie: total length, mass, and age from 1,234 burbot, Lota lota; and 22 combinations of unique subsets of 10 morphometrics taken from 119 yellow perch, Perca flavescens. For the burbot data set, the generalized distance test identified six discordant observations and the multivariate kurtosis test identified 24 discordant observations. In contrast with the multivariate tests, the univariate generalized distance test identified no discordancies when applied separately to each variable. Removing discordancies had a substantial effect on length-versus-mass regression equations. For 500-mm burbot, the percent difference in estimated mass after removing discordancies in our study was greater than the percent difference in masses estimated for burbot of the same length in lakes that differed substantially in productivity. The number of discordant yellow perch detected ranged from 0 to 2 with the multivariate generalized distance test and from 6 to 11 with the multivariate kurtosis test. With the kurtosis test, 108 yellow perch (90.7%) were identified as discordant in zero to two combinations, and five (4.2%) were identified as discordant in either all or 21 of the 22 combinations. The relationship among the variables included in each combination determined which variables were identified as causal. The generalized distance test identified between zero and six discordancies when applied separately to each variable. Removing the discordancies found in at least one-half of the combinations (k=5) had a marked effect on a principal components analysis. In particular, the percent of the total variation explained by second and third principal components, which explain shape, increased by 52 and 44% respectively when the discordancies were removed. Multivariate applications of the tests have numerous ecological advantages over univariate applications, including improved management of fish stocks and interpretation of multivariate morphometric data. ?? 2007 Springer Science+Business Media B.V.

  14. Spatiotemporal evolution of the completeness magnitude of the Icelandic earthquake catalogue from 1991 to 2013

    NASA Astrophysics Data System (ADS)

    Panzera, Francesco; Mignan, Arnaud; Vogfjörð, Kristin S.

    2017-07-01

    In 1991, a digital seismic monitoring network was installed in Iceland with a digital seismic system and automatic operation. After 20 years of operation, we explore for the first time its nationwide performance by analysing the spatiotemporal variations of the completeness magnitude. We use the Bayesian magnitude of completeness (BMC) method that combines local completeness magnitude observations with prior information based on the density of seismic stations. Additionally, we test the impact of earthquake location uncertainties on the BMC results, by filtering the catalogue using a multivariate analysis that identifies outliers in the hypocentre error distribution. We find that the entire North-to-South active rift zone shows a relatively low magnitude of completeness Mc in the range 0.5-1.0, highlighting the ability of the Icelandic network to detect small earthquakes. This work also demonstrates the influence of earthquake location uncertainties on the spatiotemporal magnitude of completeness analysis.

  15. Assessing signal-to-noise in quantitative proteomics: multivariate statistical analysis in DIGE experiments.

    PubMed

    Friedman, David B

    2012-01-01

    All quantitative proteomics experiments measure variation between samples. When performing large-scale experiments that involve multiple conditions or treatments, the experimental design should include the appropriate number of individual biological replicates from each condition to enable the distinction between a relevant biological signal from technical noise. Multivariate statistical analyses, such as principal component analysis (PCA), provide a global perspective on experimental variation, thereby enabling the assessment of whether the variation describes the expected biological signal or the unanticipated technical/biological noise inherent in the system. Examples will be shown from high-resolution multivariable DIGE experiments where PCA was instrumental in demonstrating biologically significant variation as well as sample outliers, fouled samples, and overriding technical variation that would not be readily observed using standard univariate tests.

  16. DEFINITION OF MULTIVARIATE GEOCHEMICAL ASSOCIATIONS WITH POLYMETALLIC MINERAL OCCURRENCES USING A SPATIALLY DEPENDENT CLUSTERING TECHNIQUE AND RASTERIZED STREAM SEDIMENT DATA - AN ALASKAN EXAMPLE.

    USGS Publications Warehouse

    Jenson, Susan K.; Trautwein, C.M.

    1984-01-01

    The application of an unsupervised, spatially dependent clustering technique (AMOEBA) to interpolated raster arrays of stream sediment data has been found to provide useful multivariate geochemical associations for modeling regional polymetallic resource potential. The technique is based on three assumptions regarding the compositional and spatial relationships of stream sediment data and their regional significance. These assumptions are: (1) compositionally separable classes exist and can be statistically distinguished; (2) the classification of multivariate data should minimize the pair probability of misclustering to establish useful compositional associations; and (3) a compositionally defined class represented by three or more contiguous cells within an array is a more important descriptor of a terrane than a class represented by spatial outliers.

  17. Intelligent agent-based intrusion detection system using enhanced multiclass SVM.

    PubMed

    Ganapathy, S; Yogesh, P; Kannan, A

    2012-01-01

    Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set.

  18. Intelligent Agent-Based Intrusion Detection System Using Enhanced Multiclass SVM

    PubMed Central

    Ganapathy, S.; Yogesh, P.; Kannan, A.

    2012-01-01

    Intrusion detection systems were used in the past along with various techniques to detect intrusions in networks effectively. However, most of these systems are able to detect the intruders only with high false alarm rate. In this paper, we propose a new intelligent agent-based intrusion detection model for mobile ad hoc networks using a combination of attribute selection, outlier detection, and enhanced multiclass SVM classification methods. For this purpose, an effective preprocessing technique is proposed that improves the detection accuracy and reduces the processing time. Moreover, two new algorithms, namely, an Intelligent Agent Weighted Distance Outlier Detection algorithm and an Intelligent Agent-based Enhanced Multiclass Support Vector Machine algorithm are proposed for detecting the intruders in a distributed database environment that uses intelligent agents for trust management and coordination in transaction processing. The experimental results of the proposed model show that this system detects anomalies with low false alarm rate and high-detection rate when tested with KDD Cup 99 data set. PMID:23056036

  19. Locally adaptive decision in detection of clustered microcalcifications in mammograms.

    PubMed

    Sainz de Cea, María V; Nishikawa, Robert M; Yang, Yongyi

    2018-02-15

    In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value  <10 -4 ). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.

  20. Locally adaptive decision in detection of clustered microcalcifications in mammograms

    NASA Astrophysics Data System (ADS)

    Sainz de Cea, María V.; Nishikawa, Robert M.; Yang, Yongyi

    2018-02-01

    In computer-aided detection or diagnosis of clustered microcalcifications (MCs) in mammograms, the performance often suffers from not only the presence of false positives (FPs) among the detected individual MCs but also large variability in detection accuracy among different cases. To address this issue, we investigate a locally adaptive decision scheme in MC detection by exploiting the noise characteristics in a lesion area. Instead of developing a new MC detector, we propose a decision scheme on how to best decide whether a detected object is an MC or not in the detector output. We formulate the individual MCs as statistical outliers compared to the many noisy detections in a lesion area so as to account for the local image characteristics. To identify the MCs, we first consider a parametric method for outlier detection, the Mahalanobis distance detector, which is based on a multi-dimensional Gaussian distribution on the noisy detections. We also consider a non-parametric method which is based on a stochastic neighbor graph model of the detected objects. We demonstrated the proposed decision approach with two existing MC detectors on a set of 188 full-field digital mammograms (95 cases). The results, evaluated using free response operating characteristic (FROC) analysis, showed a significant improvement in detection accuracy by the proposed outlier decision approach over traditional thresholding (the partial area under the FROC curve increased from 3.95 to 4.25, p-value  <10-4). There was also a reduction in case-to-case variability in detected FPs at a given sensitivity level. The proposed adaptive decision approach could not only reduce the number of FPs in detected MCs but also improve case-to-case consistency in detection.

  1. Using State Estimation Residuals to Detect Abnormal SCADA Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ma, Jian; Chen, Yousu; Huang, Zhenyu

    2010-06-14

    Detection of manipulated supervisory control and data acquisition (SCADA) data is critically important for the safe and secure operation of modern power systems. In this paper, a methodology of detecting manipulated SCADA data based on state estimation residuals is presented. A framework of the proposed methodology is described. Instead of using original SCADA measurements as the bad data sources, the residuals calculated based on the results of the state estimator are used as the input for the outlier detection process. The BACON algorithm is applied to detect outliers in the state estimation residuals. The IEEE 118-bus system is used asmore » a test case to evaluate the effectiveness of the proposed methodology. The accuracy of the BACON method is compared with that of the 3-σ method for the simulated SCADA measurements and residuals.« less

  2. Identification of unusual events in multi-channel bridge monitoring data

    NASA Astrophysics Data System (ADS)

    Omenzetter, Piotr; Brownjohn, James Mark William; Moyo, Pilate

    2004-03-01

    Continuously operating instrumented structural health monitoring (SHM) systems are becoming a practical alternative to replace visual inspection for assessment of condition and soundness of civil infrastructure such as bridges. However, converting large amounts of data from an SHM system into usable information is a great challenge to which special signal processing techniques must be applied. This study is devoted to identification of abrupt, anomalous and potentially onerous events in the time histories of static, hourly sampled strains recorded by a multi-sensor SHM system installed in a major bridge structure and operating continuously for a long time. Such events may result, among other causes, from sudden settlement of foundation, ground movement, excessive traffic load or failure of post-tensioning cables. A method of outlier detection in multivariate data has been applied to the problem of finding and localising sudden events in the strain data. For sharp discrimination of abrupt strain changes from slowly varying ones wavelet transform has been used. The proposed method has been successfully tested using known events recorded during construction of the bridge, and later effectively used for detection of anomalous post-construction events.

  3. The stopping rules for winsorized tree

    NASA Astrophysics Data System (ADS)

    Ch'ng, Chee Keong; Mahat, Nor Idayu

    2017-11-01

    Winsorized tree is a modified tree-based classifier that is able to investigate and to handle all outliers in all nodes along the process of constructing the tree. It overcomes the tedious process of constructing a classical tree where the splitting of branches and pruning go concurrently so that the constructed tree would not grow bushy. This mechanism is controlled by the proposed algorithm. In winsorized tree, data are screened for identifying outlier. If outlier is detected, the value is neutralized using winsorize approach. Both outlier identification and value neutralization are executed recursively in every node until predetermined stopping criterion is met. The aim of this paper is to search for significant stopping criterion to stop the tree from further splitting before overfitting. The result obtained from the conducted experiment on pima indian dataset proved that the node could produce the final successor nodes (leaves) when it has achieved the range of 70% in information gain.

  4. Detection of Cell Wall Chemical Variation in Zea Mays Mutants Using Near-Infrared Spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Buyck, N.; Thomas, S.

    Corn stover is regarded as the prime candidate feedstock material for commercial biomass conversion in the United States. Variations in chemical composition of Zea mays cell walls can affect biomass conversion process yields and economics. Mutant lines were constructed by activating a Mu transposon system. The cell wall chemical composition of 48 mutant families was characterized using near-infrared (NIR) spectroscopy. NIR data were analyzed using a multivariate statistical analysis technique called Principal Component Analysis (PCA). PCA of the NIR data from 349 maize leaf samples reveals 57 individuals as outliers on one or more of six Principal Components (PCs) atmore » the 95% confidence interval. Of these, 19 individuals from 16 families are outliers on either PC3 (9% of the variation) or PC6 (1% of the variation), the two PCs that contain information about cell wall polymers. Those individuals for which altered cell wall chemistry is confirmed with wet chemical analysis will then be subjected to fermentation analysis to determine whether or not biomass conversion process kinetics, yields and/or economics are significantly affected. Those mutants that provide indications for a decrease in process cost will be pursued further to identify the gene(s) responsible for the observed changes in cell wall composition and associated changes in process economics. These genes will eventually be incorporated into maize breeding programs directed at the development of a truly dual use crop.« less

  5. Image Corruption Detection in Diffusion Tensor Imaging for Post-Processing and Real-Time Monitoring

    PubMed Central

    Li, Yue; Shea, Steven M.; Lorenz, Christine H.; Jiang, Hangyi; Chou, Ming-Chung; Mori, Susumu

    2013-01-01

    Due to the high sensitivity of diffusion tensor imaging (DTI) to physiological motion, clinical DTI scans often suffer a significant amount of artifacts. Tensor-fitting-based, post-processing outlier rejection is often used to reduce the influence of motion artifacts. Although it is an effective approach, when there are multiple corrupted data, this method may no longer correctly identify and reject the corrupted data. In this paper, we introduce a new criterion called “corrected Inter-Slice Intensity Discontinuity” (cISID) to detect motion-induced artifacts. We compared the performance of algorithms using cISID and other existing methods with regard to artifact detection. The experimental results show that the integration of cISID into fitting-based methods significantly improves the retrospective detection performance at post-processing analysis. The performance of the cISID criterion, if used alone, was inferior to the fitting-based methods, but cISID could effectively identify severely corrupted images with a rapid calculation time. In the second part of this paper, an outlier rejection scheme was implemented on a scanner for real-time monitoring of image quality and reacquisition of the corrupted data. The real-time monitoring, based on cISID and followed by post-processing, fitting-based outlier rejection, could provide a robust environment for routine DTI studies. PMID:24204551

  6. Sources of Artefacts in Synthetic Aperture Radar Interferometry Data Sets

    NASA Astrophysics Data System (ADS)

    Becek, K.; Borkowski, A.

    2012-07-01

    In recent years, much attention has been devoted to digital elevation models (DEMs) produced using Synthetic Aperture Radar Interferometry (InSAR). This has been triggered by the relative novelty of the InSAR method and its world-famous product—the Shuttle Radar Topography Mission (SRTM) DEM. However, much less attention, if at all, has been paid to sources of artefacts in SRTM. In this work, we focus not on the missing pixels (null pixels) due to shadows or the layover effect, but rather on outliers that were undetected by the SRTM validation process. The aim of this study is to identify some of the causes of the elevation outliers in SRTM. Such knowledge may be helpful to mitigate similar problems in future InSAR DEMs, notably the ones currently being developed from data acquired by the TanDEM-X mission. We analysed many cross-sections derived from SRTM. These cross-sections were extracted over the elevation test areas, which are available from the Global Elevation Data Testing Facility (GEDTF) whose database contains about 8,500 runways with known vertical profiles. Whenever a significant discrepancy between the known runway profile and the SRTM cross-section was detected, a visual interpretation of the high-resolution satellite image was carried out to identify the objects causing the irregularities. A distance and a bearing from the outlier to the object were recorded. Moreover, we considered the SRTM look direction parameter. A comprehensive analysis of the acquired data allows us to establish that large metallic structures, such as hangars or car parking lots, are causing the outliers. Water areas or plain wet terrains may also cause an InSAR outlier. The look direction and the depression angle of the InSAR system in relation to the suspected objects influence the magnitude of the outliers. We hope that these findings will be helpful in designing the error detection routines of future InSAR or, in fact, any microwave aerial- or space-based survey. The presence of outliers in SRTM was first reported in Becek, K. (2008). Investigating error structure of shuttle radar topography mission elevation data product, Geophys. Res. Lett., 35, L15403.

  7. Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons

    PubMed Central

    Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

    2016-01-01

    Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target’s location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment. PMID:27128917

  8. Smartphone-Based Indoor Localization with Bluetooth Low Energy Beacons.

    PubMed

    Zhuang, Yuan; Yang, Jun; Li, You; Qi, Longning; El-Sheimy, Naser

    2016-04-26

    Indoor wireless localization using Bluetooth Low Energy (BLE) beacons has attracted considerable attention after the release of the BLE protocol. In this paper, we propose an algorithm that uses the combination of channel-separate polynomial regression model (PRM), channel-separate fingerprinting (FP), outlier detection and extended Kalman filtering (EKF) for smartphone-based indoor localization with BLE beacons. The proposed algorithm uses FP and PRM to estimate the target's location and the distances between the target and BLE beacons respectively. We compare the performance of distance estimation that uses separate PRM for three advertisement channels (i.e., the separate strategy) with that use an aggregate PRM generated through the combination of information from all channels (i.e., the aggregate strategy). The performance of FP-based location estimation results of the separate strategy and the aggregate strategy are also compared. It was found that the separate strategy can provide higher accuracy; thus, it is preferred to adopt PRM and FP for each BLE advertisement channel separately. Furthermore, to enhance the robustness of the algorithm, a two-level outlier detection mechanism is designed. Distance and location estimates obtained from PRM and FP are passed to the first outlier detection to generate improved distance estimates for the EKF. After the EKF process, the second outlier detection algorithm based on statistical testing is further performed to remove the outliers. The proposed algorithm was evaluated by various field experiments. Results show that the proposed algorithm achieved the accuracy of <2.56 m at 90% of the time with dense deployment of BLE beacons (1 beacon per 9 m), which performs 35.82% better than <3.99 m from the Propagation Model (PM) + EKF algorithm and 15.77% more accurate than <3.04 m from the FP + EKF algorithm. With sparse deployment (1 beacon per 18 m), the proposed algorithm achieves the accuracies of <3.88 m at 90% of the time, which performs 49.58% more accurate than <8.00 m from the PM + EKF algorithm and 21.41% better than <4.94 m from the FP + EKF algorithm. Therefore, the proposed algorithm is especially useful to improve the localization accuracy in environments with sparse beacon deployment.

  9. Outlier and target detection in aerial hyperspectral imagery: a comparison of traditional and percentage occupancy hit or miss transform techniques

    NASA Astrophysics Data System (ADS)

    Young, Andrew; Marshall, Stephen; Gray, Alison

    2016-05-01

    The use of aerial hyperspectral imagery for the purpose of remote sensing is a rapidly growing research area. Currently, targets are generally detected by looking for distinct spectral features of the objects under surveillance. For example, a camouflaged vehicle, deliberately designed to blend into background trees and grass in the visible spectrum, can be revealed using spectral features in the near-infrared spectrum. This work aims to develop improved target detection methods, using a two-stage approach, firstly by development of a physics-based atmospheric correction algorithm to convert radiance into re ectance hyperspectral image data and secondly by use of improved outlier detection techniques. In this paper the use of the Percentage Occupancy Hit or Miss Transform is explored to provide an automated method for target detection in aerial hyperspectral imagery.

  10. Visual cues for data mining

    NASA Astrophysics Data System (ADS)

    Rogowitz, Bernice E.; Rabenhorst, David A.; Gerth, John A.; Kalin, Edward B.

    1996-04-01

    This paper describes a set of visual techniques, based on principles of human perception and cognition, which can help users analyze and develop intuitions about tabular data. Collections of tabular data are widely available, including, for example, multivariate time series data, customer satisfaction data, stock market performance data, multivariate profiles of companies and individuals, and scientific measurements. In our approach, we show how visual cues can help users perform a number of data mining tasks, including identifying correlations and interaction effects, finding clusters and understanding the semantics of cluster membership, identifying anomalies and outliers, and discovering multivariate relationships among variables. These cues are derived from psychological studies on perceptual organization, visual search, perceptual scaling, and color perception. These visual techniques are presented as a complement to the statistical and algorithmic methods more commonly associated with these tasks, and provide an interactive interface for the human analyst.

  11. Nonlinear optimization-based device-free localization with outlier link rejection.

    PubMed

    Xiao, Wendong; Song, Biao; Yu, Xiting; Chen, Peiyuan

    2015-04-07

    Device-free localization (DFL) is an emerging wireless technique for estimating the location of target that does not have any attached electronic device. It has found extensive use in Smart City applications such as healthcare at home and hospitals, location-based services at smart spaces, city emergency response and infrastructure security. In DFL, wireless devices are used as sensors that can sense the target by transmitting and receiving wireless signals collaboratively. Many DFL systems are implemented based on received signal strength (RSS) measurements and the location of the target is estimated by detecting the changes of the RSS measurements of the wireless links. Due to the uncertainty of the wireless channel, certain links may be seriously polluted and result in erroneous detection. In this paper, we propose a novel nonlinear optimization approach with outlier link rejection (NOOLR) for RSS-based DFL. It consists of three key strategies, including: (1) affected link identification by differential RSS detection; (2) outlier link rejection via geometrical positional relationship among links; (3) target location estimation by formulating and solving a nonlinear optimization problem. Experimental results demonstrate that NOOLR is robust to the fluctuation of the wireless signals with superior localization accuracy compared with the existing Radio Tomographic Imaging (RTI) approach.

  12. Detecting outliers and learning complex structures with large spectroscopic surveys - a case study with APOGEE stars

    NASA Astrophysics Data System (ADS)

    Reis, Itamar; Poznanski, Dovi; Baron, Dalya; Zasowski, Gail; Shahaf, Sahar

    2018-05-01

    In this work, we apply and expand on a recently introduced outlier detection algorithm that is based on an unsupervised random forest. We use the algorithm to calculate a similarity measure for stellar spectra from the Apache Point Observatory Galactic Evolution Experiment (APOGEE). We show that the similarity measure traces non-trivial physical properties and contains information about complex structures in the data. We use it for visualization and clustering of the data set, and discuss its ability to find groups of highly similar objects, including spectroscopic twins. Using the similarity matrix to search the data set for objects allows us to find objects that are impossible to find using their best-fitting model parameters. This includes extreme objects for which the models fail, and rare objects that are outside the scope of the model. We use the similarity measure to detect outliers in the data set, and find a number of previously unknown Be-type stars, spectroscopic binaries, carbon rich stars, young stars, and a few that we cannot interpret. Our work further demonstrates the potential for scientific discovery when combining machine learning methods with modern survey data.

  13. Short-term change detection for UAV video

    NASA Astrophysics Data System (ADS)

    Saur, Günter; Krüger, Wolfgang

    2012-11-01

    In the last years, there has been an increased use of unmanned aerial vehicles (UAV) for video reconnaissance and surveillance. An important application in this context is change detection in UAV video data. Here we address short-term change detection, in which the time between observations ranges from several minutes to a few hours. We distinguish this task from video motion detection (shorter time scale) and from long-term change detection, based on time series of still images taken between several days, weeks, or even years. Examples for relevant changes we are looking for are recently parked or moved vehicles. As a pre-requisite, a precise image-to-image registration is needed. Images are selected on the basis of the geo-coordinates of the sensor's footprint and with respect to a certain minimal overlap. The automatic imagebased fine-registration adjusts the image pair to a common geometry by using a robust matching approach to handle outliers. The change detection algorithm has to distinguish between relevant and non-relevant changes. Examples for non-relevant changes are stereo disparity at 3D structures of the scene, changed length of shadows, and compression or transmission artifacts. To detect changes in image pairs we analyzed image differencing, local image correlation, and a transformation-based approach (multivariate alteration detection). As input we used color and gradient magnitude images. To cope with local misalignment of image structures we extended the approaches by a local neighborhood search. The algorithms are applied to several examples covering both urban and rural scenes. The local neighborhood search in combination with intensity and gradient magnitude differencing clearly improved the results. Extended image differencing performed better than both the correlation based approach and the multivariate alternation detection. The algorithms are adapted to be used in semi-automatic workflows for the ABUL video exploitation system of Fraunhofer IOSB, see Heinze et. al. 2010.1 In a further step we plan to incorporate more information from the video sequences to the change detection input images, e.g., by image enhancement or by along-track stereo which are available in the ABUL system.

  14. Detecting short spatial scale local adaptation and epistatic selection in climate-related candidate genes in European beech (Fagus sylvatica) populations.

    PubMed

    Csilléry, Katalin; Lalagüe, Hadrien; Vendramin, Giovanni G; González-Martínez, Santiago C; Fady, Bruno; Oddou-Muratorio, Sylvie

    2014-10-01

    Detecting signatures of selection in tree populations threatened by climate change is currently a major research priority. Here, we investigated the signature of local adaptation over a short spatial scale using 96 European beech (Fagus sylvatica L.) individuals originating from two pairs of populations on the northern and southern slopes of Mont Ventoux (south-eastern France). We performed both single and multilocus analysis of selection based on 53 climate-related candidate genes containing 546 SNPs. FST outlier methods at the SNP level revealed a weak signal of selection, with three marginally significant outliers in the northern populations. At the gene level, considering haplotypes as alleles, two additional marginally significant outliers were detected, one on each slope. To account for the uncertainty of haplotype inference, we averaged the Bayes factors over many possible phase reconstructions. Epistatic selection offers a realistic multilocus model of selection in natural populations. Here, we used a test suggested by Ohta based on the decomposition of the variance of linkage disequilibrium. Overall populations, 0.23% of the SNP pairs (haplotypes) showed evidence of epistatic selection, with nearly 80% of them being within genes. One of the between gene epistatic selection signals arose between an FST outlier and a nonsynonymous mutation in a drought response gene. Additionally, we identified haplotypes containing selectively advantageous allele combinations which were unique to high or low elevations and northern or southern populations. Several haplotypes contained nonsynonymous mutations situated in genes with known functional importance for adaptation to climatic factors. © 2014 John Wiley & Sons Ltd.

  15. Outlier analyses to test for local adaptation to breeding grounds in a migratory arctic seabird.

    PubMed

    Tigano, Anna; Shultz, Allison J; Edwards, Scott V; Robertson, Gregory J; Friesen, Vicki L

    2017-04-01

    Investigating the extent (or the existence) of local adaptation is crucial to understanding how populations adapt. When experiments or fitness measurements are difficult or impossible to perform in natural populations, genomic techniques allow us to investigate local adaptation through the comparison of allele frequencies and outlier loci along environmental clines. The thick-billed murre ( Uria lomvia ) is a highly philopatric colonial arctic seabird that occupies a significant environmental gradient, shows marked phenotypic differences among colonies, and has large effective population sizes. To test whether thick-billed murres from five colonies along the eastern Canadian Arctic coast show genomic signatures of local adaptation to their breeding grounds, we analyzed geographic variation in genome-wide markers mapped to a newly assembled thick-billed murre reference genome. We used outlier analyses to detect loci putatively under selection, and clustering analyses to investigate patterns of differentiation based on 2220 genomewide single nucleotide polymorphisms (SNPs) and 137 outlier SNPs. We found no evidence of population structure among colonies using all loci but found population structure based on outliers only, where birds from the two northernmost colonies (Minarets and Prince Leopold) grouped with birds from the southernmost colony (Gannet), and birds from Coats and Akpatok were distinct from all other colonies. Although results from our analyses did not support local adaptation along the latitudinal cline of breeding colonies, outlier loci grouped birds from different colonies according to their non-breeding distributions, suggesting that outliers may be informative about adaptation and/or demographic connectivity associated with their migration patterns or nonbreeding grounds.

  16. HacDivSel: Two new methods (haplotype-based and outlier-based) for the detection of divergent selection in pairs of populations

    PubMed Central

    2017-01-01

    The detection of genomic regions involved in local adaptation is an important topic in current population genetics. There are several detection strategies available depending on the kind of genetic and demographic information at hand. A common drawback is the high risk of false positives. In this study we introduce two complementary methods for the detection of divergent selection from populations connected by migration. Both methods have been developed with the aim of being robust to false positives. The first method combines haplotype information with inter-population differentiation (FST). Evidence of divergent selection is concluded only when both the haplotype pattern and the FST value support it. The second method is developed for independently segregating markers i.e. there is no haplotype information. In this case, the power to detect selection is attained by developing a new outlier test based on detecting a bimodal distribution. The test computes the FST outliers and then assumes that those of interest would have a different mode. We demonstrate the utility of the two methods through simulations and the analysis of real data. The simulation results showed power ranging from 60–95% in several of the scenarios whilst the false positive rate was controlled below the nominal level. The analysis of real samples consisted of phased data from the HapMap project and unphased data from intertidal marine snail ecotypes. The results illustrate that the proposed methods could be useful for detecting locally adapted polymorphisms. The software HacDivSel implements the methods explained in this manuscript. PMID:28423003

  17. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers

    PubMed Central

    Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

    2009-01-01

    Background The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. Methods We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*pop; and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. Results For simulated data with outlier patterns, Tango's MEET, Moran's I and I*pop had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*pop (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. Conclusion SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*pop perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan. PMID:19822013

  18. Comparison of tests for spatial heterogeneity on data with global clustering patterns and outliers.

    PubMed

    Jackson, Monica C; Huang, Lan; Luo, Jun; Hachey, Mark; Feuer, Eric

    2009-10-12

    The ability to evaluate geographic heterogeneity of cancer incidence and mortality is important in cancer surveillance. Many statistical methods for evaluating global clustering and local cluster patterns are developed and have been examined by many simulation studies. However, the performance of these methods on two extreme cases (global clustering evaluation and local anomaly (outlier) detection) has not been thoroughly investigated. We compare methods for global clustering evaluation including Tango's Index, Moran's I, and Oden's I*(pop); and cluster detection methods such as local Moran's I and SaTScan elliptic version on simulated count data that mimic global clustering patterns and outliers for cancer cases in the continental United States. We examine the power and precision of the selected methods in the purely spatial analysis. We illustrate Tango's MEET and SaTScan elliptic version on a 1987-2004 HIV and a 1950-1969 lung cancer mortality data in the United States. For simulated data with outlier patterns, Tango's MEET, Moran's I and I*(pop) had powers less than 0.2, and SaTScan had powers around 0.97. For simulated data with global clustering patterns, Tango's MEET and I*(pop) (with 50% of total population as the maximum search window) had powers close to 1. SaTScan had powers around 0.7-0.8 and Moran's I has powers around 0.2-0.3. In the real data example, Tango's MEET indicated the existence of global clustering patterns in both the HIV and lung cancer mortality data. SaTScan found a large cluster for HIV mortality rates, which is consistent with the finding from Tango's MEET. SaTScan also found clusters and outliers in the lung cancer mortality data. SaTScan elliptic version is more efficient for outlier detection compared with the other methods evaluated in this article. Tango's MEET and Oden's I*(pop) perform best in global clustering scenarios among the selected methods. The use of SaTScan for data with global clustering patterns should be used with caution since SatScan may reveal an incorrect spatial pattern even though it has enough power to reject a null hypothesis of homogeneous relative risk. Tango's method should be used for global clustering evaluation instead of SaTScan.

  19. Correction of Dual-PRF Doppler Velocity Outliers in the Presence of Aliasing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Altube, Patricia; Bech, Joan; Argemí, Oriol

    In Doppler weather radars, the presence of unfolding errors or outliers is a well-known quality issue for radial velocity fields estimated using the dual–pulse repetition frequency (PRF) technique. Postprocessing methods have been developed to correct dual-PRF outliers, but these need prior application of a dealiasing algorithm for an adequate correction. Our paper presents an alternative procedure based on circular statistics that corrects dual-PRF errors in the presence of extended Nyquist aliasing. The correction potential of the proposed method is quantitatively tested by means of velocity field simulations and is exemplified in the application to real cases, including severe storm events.more » The comparison with two other existing correction methods indicates an improved performance in the correction of clustered outliers. The technique we propose is well suited for real-time applications requiring high-quality Doppler radar velocity fields, such as wind shear and mesocyclone detection algorithms, or assimilation in numerical weather prediction models.« less

  20. Robust w-Estimators for Cryo-EM Class Means

    PubMed Central

    Huang, Chenxi; Tagare, Hemant D.

    2016-01-01

    A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the “class mean”, improves the signal-to-noise ratio in single particle reconstruction (SPR). The averaging step is often compromised because of outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods is done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a “w-estimator” of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions (CTFs) is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers. PMID:26841397

  1. Robust w-Estimators for Cryo-EM Class Means.

    PubMed

    Huang, Chenxi; Tagare, Hemant D

    2016-02-01

    A critical step in cryogenic electron microscopy (cryo-EM) image analysis is to calculate the average of all images aligned to a projection direction. This average, called the class mean, improves the signal-to-noise ratio in single-particle reconstruction. The averaging step is often compromised because of the outlier images of ice, contaminants, and particle fragments. Outlier detection and rejection in the majority of current cryo-EM methods are done using cross-correlation with a manually determined threshold. Empirical assessment shows that the performance of these methods is very sensitive to the threshold. This paper proposes an alternative: a w-estimator of the average image, which is robust to outliers and which does not use a threshold. Various properties of the estimator, such as consistency and influence function are investigated. An extension of the estimator to images with different contrast transfer functions is also provided. Experiments with simulated and real cryo-EM images show that the proposed estimator performs quite well in the presence of outliers.

  2. Correction of Dual-PRF Doppler Velocity Outliers in the Presence of Aliasing

    DOE PAGES

    Altube, Patricia; Bech, Joan; Argemí, Oriol; ...

    2017-07-18

    In Doppler weather radars, the presence of unfolding errors or outliers is a well-known quality issue for radial velocity fields estimated using the dual–pulse repetition frequency (PRF) technique. Postprocessing methods have been developed to correct dual-PRF outliers, but these need prior application of a dealiasing algorithm for an adequate correction. Our paper presents an alternative procedure based on circular statistics that corrects dual-PRF errors in the presence of extended Nyquist aliasing. The correction potential of the proposed method is quantitatively tested by means of velocity field simulations and is exemplified in the application to real cases, including severe storm events.more » The comparison with two other existing correction methods indicates an improved performance in the correction of clustered outliers. The technique we propose is well suited for real-time applications requiring high-quality Doppler radar velocity fields, such as wind shear and mesocyclone detection algorithms, or assimilation in numerical weather prediction models.« less

  3. Adaptive distributed outlier detection for WSNs.

    PubMed

    De Paola, Alessandra; Gaglio, Salvatore; Lo Re, Giuseppe; Milazzo, Fabrizio; Ortolani, Marco

    2015-05-01

    The paradigm of pervasive computing is gaining more and more attention nowadays, thanks to the possibility of obtaining precise and continuous monitoring. Ease of deployment and adaptivity are typically implemented by adopting autonomous and cooperative sensory devices; however, for such systems to be of any practical use, reliability and fault tolerance must be guaranteed, for instance by detecting corrupted readings amidst the huge amount of gathered sensory data. This paper proposes an adaptive distributed Bayesian approach for detecting outliers in data collected by a wireless sensor network; our algorithm aims at optimizing classification accuracy, time complexity and communication complexity, and also considering externally imposed constraints on such conflicting goals. The performed experimental evaluation showed that our approach is able to improve the considered metrics for latency and energy consumption, with limited impact on classification accuracy.

  4. A coupled classification - evolutionary optimization model for contamination event detection in water distribution systems.

    PubMed

    Oliker, Nurit; Ostfeld, Avi

    2014-03-15

    This study describes a decision support system, alerts for contamination events in water distribution systems. The developed model comprises a weighted support vector machine (SVM) for the detection of outliers, and a following sequence analysis for the classification of contamination events. The contribution of this study is an improvement of contamination events detection ability and a multi-dimensional analysis of the data, differing from the parallel one-dimensional analysis conducted so far. The multivariate analysis examines the relationships between water quality parameters and detects changes in their mutual patterns. The weights of the SVM model accomplish two goals: blurring the difference between sizes of the two classes' data sets (as there are much more normal/regular than event time measurements), and adhering the time factor attribute by a time decay coefficient, ascribing higher importance to recent observations when classifying a time step measurement. All model parameters were determined by data driven optimization so the calibration of the model was completely autonomic. The model was trained and tested on a real water distribution system (WDS) data set with randomly simulated events superimposed on the original measurements. The model is prominent in its ability to detect events that were only partly expressed in the data (i.e., affecting only some of the measured parameters). The model showed high accuracy and better detection ability as compared to previous modeling attempts of contamination event detection. Copyright © 2013 Elsevier Ltd. All rights reserved.

  5. Low income, community poverty and risk of end stage renal disease.

    PubMed

    Crews, Deidra C; Gutiérrez, Orlando M; Fedewa, Stacey A; Luthi, Jean-Christophe; Shoham, David; Judd, Suzanne E; Powe, Neil R; McClellan, William M

    2014-12-04

    The risk of end stage renal disease (ESRD) is increased among individuals with low income and in low income communities. However, few studies have examined the relation of both individual and community socioeconomic status (SES) with incident ESRD. Among 23,314 U.S. adults in the population-based Reasons for Geographic and Racial Differences in Stroke study, we assessed participant differences across geospatially-linked categories of county poverty [outlier poverty, extremely high poverty, very high poverty, high poverty, neither (reference), high affluence and outlier affluence]. Multivariable Cox proportional hazards models were used to examine associations of annual household income and geospatially-linked county poverty measures with incident ESRD, while accounting for death as a competing event using the Fine and Gray method. There were 158 ESRD cases during follow-up. Incident ESRD rates were 178.8 per 100,000 person-years (105 py) in high poverty outlier counties and were 76.3 /105 py in affluent outlier counties, p trend=0.06. In unadjusted competing risk models, persons residing in high poverty outlier counties had higher incidence of ESRD (which was not statistically significant) when compared to those persons residing in counties with neither high poverty nor affluence [hazard ratio (HR) 1.54, 95% Confidence Interval (CI) 0.75-3.20]. This association was markedly attenuated following adjustment for socio-demographic factors (age, sex, race, education, and income); HR 0.96, 95% CI 0.46-2.00. However, in the same adjusted model, income was independently associated with risk of ESRD [HR 3.75, 95% CI 1.62-8.64, comparing the <$20,000 income group to the >$75,000 group]. There were no statistically significant associations of county measures of poverty with incident ESRD, and no evidence of effect modification. In contrast to annual family income, geospatially-linked measures of county poverty have little relation with risk of ESRD. Efforts to mitigate socioeconomic disparities in kidney disease may be best appropriated at the individual level.

  6. The relationships between phenolic content, pollen diversity, physicochemical information and radical scavenging activity in honey.

    PubMed

    Giorgi, Annamaria; Madeo, Moira; Baumgartner, Johann; Lozzia, Giuseppe Carlo

    2011-01-07

    Honey is rich in different secondary plant metabolites acting as natural antioxidants and contributing to human health. Radical scavenging activity (RSA) is related to antioxidant activity, while the correlation between the phenolic content and RSA is often weak. Consequently, exclusive information on phenolics is often insufficient to qualify the RSA and the health promoting effects of honey. The paper deals with a case study of honey samples originating from the alpine areas of Italy's Lombardia and Veneto regions and realized by standard physicochemical and statistical analytical methods. In pure honey, the total phenolic content and the RSA were measured in spectrophotometric tests with the 2,2-diphenyl-1-picrylhydrazyl (DPPH·) free radical and Folin-Ciocalteu assays, respectively. Melissopalynological data was used to qualify pollen diversity through rank-frequency curves separating the samples into two groups. On the basis of physicochemical data, the samples were analyzed through multivariate classification and ranking procedures resulting in the identification of an outlier. Elimination of the outlier produced a high correlation between the total phenolic content and RSA in the two pollen diversity groups. The case study suggests that, after disregarding outliers, the RSA activity can be satisfactorily qualified on the basis of phenolics with pollen diversity as a covariate.

  7. Robust Surface Reconstruction via Laplace-Beltrami Eigen-Projection and Boundary Deformation

    PubMed Central

    Shi, Yonggang; Lai, Rongjie; Morra, Jonathan H.; Dinov, Ivo; Thompson, Paul M.; Toga, Arthur W.

    2010-01-01

    In medical shape analysis, a critical problem is reconstructing a smooth surface of correct topology from a binary mask that typically has spurious features due to segmentation artifacts. The challenge is the robust removal of these outliers without affecting the accuracy of other parts of the boundary. In this paper, we propose a novel approach for this problem based on the Laplace-Beltrami (LB) eigen-projection and properly designed boundary deformations. Using the metric distortion during the LB eigen-projection, our method automatically detects the location of outliers and feeds this information to a well-composed and topology-preserving deformation. By iterating between these two steps of outlier detection and boundary deformation, we can robustly filter out the outliers without moving the smooth part of the boundary. The final surface is the eigen-projection of the filtered mask boundary that has the correct topology, desired accuracy and smoothness. In our experiments, we illustrate the robustness of our method on different input masks of the same structure, and compare with the popular SPHARM tool and the topology preserving level set method to show that our method can reconstruct accurate surface representations without introducing artificial oscillations. We also successfully validate our method on a large data set of more than 900 hippocampal masks and demonstrate that the reconstructed surfaces retain volume information accurately. PMID:20624704

  8. How immunogenetically different are domestic pigs from wild boars: a perspective from single-nucleotide polymorphisms of 19 immunity-related candidate genes.

    PubMed

    Chen, Shanyuan; Gomes, Rui; Costa, Vânia; Santos, Pedro; Charneca, Rui; Zhang, Ya-ping; Liu, Xue-hong; Wang, Shao-qing; Bento, Pedro; Nunes, Jose-Luis; Buzgó, József; Varga, Gyula; Anton, István; Zsolnai, Attila; Beja-Pereira, Albano

    2013-10-01

    The coexistence of wild boars and domestic pigs across Eurasia makes it feasible to conduct comparative genetic or genomic analyses for addressing how genetically different a domestic species is from its wild ancestor. To test whether there are differences in patterns of genetic variability between wild and domestic pigs at immunity-related genes and to detect outlier loci putatively under selection that may underlie differences in immune responses, here we analyzed 54 single-nucleotide polymorphisms (SNPs) of 19 immunity-related candidate genes on 11 autosomes in three pairs of wild boar and domestic pig populations from China, Iberian Peninsula, and Hungary. Our results showed no statistically significant differences in allele frequency and heterozygosity across SNPs between three pairs of wild and domestic populations. This observation was more likely due to the widespread and long-lasting gene flow between wild boars and domestic pigs across Eurasia. In addition, we detected eight coding SNPs from six genes as outliers being under selection consistently by three outlier tests (BayeScan2.1, FDIST2, and Arlequin3.5). Among four non-synonymous outlier SNPs, one from TLR4 gene was identified as being subject to positive (diversifying) selection and three each from CD36, IFNW1, and IL1B genes were suggested as under balancing selection. All of these four non-synonymous variants were predicted as being benign by PolyPhen-2. Our results were supported by other independent lines of evidence for positive selection or balancing selection acting on these four immune genes (CD36, IFNW1, IL1B, and TLR4). Our study showed an example applying a candidate gene approach to identify functionally important mutations (i.e., outlier loci) in wild and domestic pigs for subsequent functional experiments.

  9. An Unsupervised Anomalous Event Detection and Interactive Analysis Framework for Large-scale Satellite Data

    NASA Astrophysics Data System (ADS)

    LIU, Q.; Lv, Q.; Klucik, R.; Chen, C.; Gallaher, D. W.; Grant, G.; Shang, L.

    2016-12-01

    Due to the high volume and complexity of satellite data, computer-aided tools for fast quality assessments and scientific discovery are indispensable for scientists in the era of Big Data. In this work, we have developed a framework for automated anomalous event detection in massive satellite data. The framework consists of a clustering-based anomaly detection algorithm and a cloud-based tool for interactive analysis of detected anomalies. The algorithm is unsupervised and requires no prior knowledge of the data (e.g., expected normal pattern or known anomalies). As such, it works for diverse data sets, and performs well even in the presence of missing and noisy data. The cloud-based tool provides an intuitive mapping interface that allows users to interactively analyze anomalies using multiple features. As a whole, our framework can (1) identify outliers in a spatio-temporal context, (2) recognize and distinguish meaningful anomalous events from individual outliers, (3) rank those events based on "interestingness" (e.g., rareness or total number of outliers) defined by users, and (4) enable interactively query, exploration, and analysis of those anomalous events. In this presentation, we will demonstrate the effectiveness and efficiency of our framework in the application of detecting data quality issues and unusual natural events using two satellite datasets. The techniques and tools developed in this project are applicable for a diverse set of satellite data and will be made publicly available for scientists in early 2017.

  10. Multivariate analysis of progressive thermal desorption coupled gas chromatography-mass spectrometry.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Van Benthem, Mark Hilary; Mowry, Curtis Dale; Kotula, Paul Gabriel

    Thermal decomposition of poly dimethyl siloxane compounds, Sylgard{reg_sign} 184 and 186, were examined using thermal desorption coupled gas chromatography-mass spectrometry (TD/GC-MS) and multivariate analysis. This work describes a method of producing multiway data using a stepped thermal desorption. The technique involves sequentially heating a sample of the material of interest with subsequent analysis in a commercial GC/MS system. The decomposition chromatograms were analyzed using multivariate analysis tools including principal component analysis (PCA), factor rotation employing the varimax criterion, and multivariate curve resolution. The results of the analysis show seven components related to offgassing of various fractions of siloxanes that varymore » as a function of temperature. Thermal desorption coupled with gas chromatography-mass spectrometry (TD/GC-MS) is a powerful analytical technique for analyzing chemical mixtures. It has great potential in numerous analytic areas including materials analysis, sports medicine, in the detection of designer drugs; and biological research for metabolomics. Data analysis is complicated, far from automated and can result in high false positive or false negative rates. We have demonstrated a step-wise TD/GC-MS technique that removes more volatile compounds from a sample before extracting the less volatile compounds. This creates an additional dimension of separation before the GC column, while simultaneously generating three-way data. Sandia's proven multivariate analysis methods, when applied to these data, have several advantages over current commercial options. It also has demonstrated potential for success in finding and enabling identification of trace compounds. Several challenges remain, however, including understanding the sources of noise in the data, outlier detection, improving the data pretreatment and analysis methods, developing a software tool for ease of use by the chemist, and demonstrating our belief that this multivariate analysis will enable superior differentiation capabilities. In addition, noise and system artifacts challenge the analysis of GC-MS data collected on lower cost equipment, ubiquitous in commercial laboratories. This research has the potential to affect many areas of analytical chemistry including materials analysis, medical testing, and environmental surveillance. It could also provide a method to measure adsorption parameters for chemical interactions on various surfaces by measuring desorption as a function of temperature for mixtures. We have presented results of a novel method for examining offgas products of a common PDMS material. Our method involves utilizing a stepped TD/GC-MS data acquisition scheme that may be almost totally automated, coupled with multivariate analysis schemes. This method of data generation and analysis can be applied to a number of materials aging and thermal degradation studies.« less

  11. Risk adjustment in the American College of Surgeons National Surgical Quality Improvement Program: a comparison of logistic versus hierarchical modeling.

    PubMed

    Cohen, Mark E; Dimick, Justin B; Bilimoria, Karl Y; Ko, Clifford Y; Richards, Karen; Hall, Bruce Lee

    2009-12-01

    Although logistic regression has commonly been used to adjust for risk differences in patient and case mix to permit quality comparisons across hospitals, hierarchical modeling has been advocated as the preferred methodology, because it accounts for clustering of patients within hospitals. It is unclear whether hierarchical models would yield important differences in quality assessments compared with logistic models when applied to American College of Surgeons (ACS) National Surgical Quality Improvement Program (NSQIP) data. Our objective was to evaluate differences in logistic versus hierarchical modeling for identifying hospitals with outlying outcomes in the ACS-NSQIP. Data from ACS-NSQIP patients who underwent colorectal operations in 2008 at hospitals that reported at least 100 operations were used to generate logistic and hierarchical prediction models for 30-day morbidity and mortality. Differences in risk-adjusted performance (ratio of observed-to-expected events) and outlier detections from the two models were compared. Logistic and hierarchical models identified the same 25 hospitals as morbidity outliers (14 low and 11 high outliers), but the hierarchical model identified 2 additional high outliers. Both models identified the same eight hospitals as mortality outliers (five low and three high outliers). The values of observed-to-expected events ratios and p values from the two models were highly correlated. Results were similar when data were permitted from hospitals providing < 100 patients. When applied to ACS-NSQIP data, logistic and hierarchical models provided nearly identical results with respect to identification of hospitals' observed-to-expected events ratio outliers. As hierarchical models are prone to implementation problems, logistic regression will remain an accurate and efficient method for performing risk adjustment of hospital quality comparisons.

  12. Use of Mahalanobis Distance for Detecting Outliers and Outlier Clusters in Markedly Non-Normal Data: A Vehicular Traffic Example

    DTIC Science & Technology

    2011-06-01

    usually walking on the right of on-coming people, and cars discouraged from passing on the right of a car traveling in the same direction. “Usually...forces a loss of detail due to horizontal compression: Valleys or troughs are squeezed into oblivion . To enable valleys to be seen, Figures 20 and 21...Volume. Left Panel: North- bound Traffic. Right Panel: Southbound Traffic. Northbound and Southbound Volume Ranges are Different 5.5 Fractional

  13. Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka)

    PubMed Central

    Veale, Andrew J.

    2017-01-01

    Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. PMID:29045601

  14. New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System

    PubMed Central

    Lee, Jae Kang; Lee, Jae One; Kim, Jung Ok

    2016-01-01

    This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS) positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS) based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA) technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased. PMID:27240375

  15. Robust High-dimensional Bioinformatics Data Streams Mining by ODR-ioVFDT

    PubMed Central

    Wang, Dantong; Fong, Simon; Wong, Raymond K.; Mohammed, Sabah; Fiaidhi, Jinan; Wong, Kelvin K. L.

    2017-01-01

    Outlier detection in bioinformatics data streaming mining has received significant attention by research communities in recent years. The problems of how to distinguish noise from an exception and deciding whether to discard it or to devise an extra decision path for accommodating it are causing dilemma. In this paper, we propose a novel algorithm called ODR with incrementally Optimized Very Fast Decision Tree (ODR-ioVFDT) for taking care of outliers in the progress of continuous data learning. By using an adaptive interquartile-range based identification method, a tolerance threshold is set. It is then used to judge if a data of exceptional value should be included for training or otherwise. This is different from the traditional outlier detection/removal approaches which are two separate steps in processing through the data. The proposed algorithm is tested using datasets of five bioinformatics scenarios and comparing the performance of our model and other ones without ODR. The results show that ODR-ioVFDT has better performance in classification accuracy, kappa statistics, and time consumption. The ODR-ioVFDT applied onto bioinformatics streaming data processing for detecting and quantifying the information of life phenomena, states, characters, variables and components of the organism can help to diagnose and treat disease more effectively. PMID:28230161

  16. New Quality Control Algorithm Based on GNSS Sensing Data for a Bridge Health Monitoring System.

    PubMed

    Lee, Jae Kang; Lee, Jae One; Kim, Jung Ok

    2016-05-27

    This research introduces an improvement plan for the reliability of Global Navigation Satellite System (GNSS) positioning solutions. It should be considered the most suitable methodology in terms of the adjustment and positioning of GNSS in order to maximize the utilization of GNSS applications. Though various studies have been conducted with regards to Bridge Health Monitoring System (BHMS) based on GNSS, the outliers which depend on the signal reception environment could not be considered until now. Since these outliers may be connected to GNSS data collected from major bridge members, which can reduce the reliability of a whole monitoring system through the delivery of false information, they should be detected and eliminated in the previous adjustment stage. In this investigation, the Detection, Identification, Adaptation (DIA) technique was applied and implemented through an algorithm. Moreover, it can be directly applied to GNSS data collected from long span cable stayed bridges and most of outliers were efficiently detected and eliminated simultaneously. By these effects, the reliability of GNSS should be enormously improved. Improvement on GNSS positioning accuracy is directly linked to the safety of bridges itself, and at the same time, the reliability of monitoring systems in terms of the system operation can also be increased.

  17. Designing a risk-based surveillance program for Mycobacterium avium ssp. paratuberculosis in Norwegian dairy herds using multivariate statistical process control analysis.

    PubMed

    Whist, A C; Liland, K H; Jonsson, M E; Sæbø, S; Sviland, S; Østerås, O; Norström, M; Hopp, P

    2014-11-01

    Surveillance programs for animal diseases are critical to early disease detection and risk estimation and to documenting a population's disease status at a given time. The aim of this study was to describe a risk-based surveillance program for detecting Mycobacterium avium ssp. paratuberculosis (MAP) infection in Norwegian dairy cattle. The included risk factors for detecting MAP were purchase of cattle, combined cattle and goat farming, and location of the cattle farm in counties containing goats with MAP. The risk indicators included production data [culling of animals >3 yr of age, carcass conformation of animals >3 yr of age, milk production decrease in older lactating cows (lactations 3, 4, and 5)], and clinical data (diarrhea, enteritis, or both, in animals >3 yr of age). Except for combined cattle and goat farming and cattle farm location, all data were collected at the cow level and summarized at the herd level. Predefined risk factors and risk indicators were extracted from different national databases and combined in a multivariate statistical process control to obtain a risk assessment for each herd. The ordinary Hotelling's T(2) statistic was applied as a multivariate, standardized measure of difference between the current observed state and the average state of the risk factors for a given herd. To make the analysis more robust and adapt it to the slowly developing nature of MAP, monthly risk calculations were based on data accumulated during a 24-mo period. Monitoring of these variables was performed to identify outliers that may indicate deviance in one or more of the underlying processes. The highest-ranked herds were scattered all over Norway and clustered in high-density dairy cattle farm areas. The resulting rankings of herds are being used in the national surveillance program for MAP in 2014 to increase the sensitivity of the ongoing surveillance program in which 5 fecal samples for bacteriological examination are collected from 25 dairy herds. The use of multivariate statistical process control for selection of herds will be beneficial when a diagnostic test suitable for mass screening is available and validated on the Norwegian cattle population, thus making it possible to increase the number of sampled herds. Copyright © 2014 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.

  18. Visual saliency detection based on modeling the spatial Gaussianity

    NASA Astrophysics Data System (ADS)

    Ju, Hongbin

    2015-04-01

    In this paper, a novel salient object detection method based on modeling the spatial anomalies is presented. The proposed framework is inspired by the biological mechanism that human eyes are sensitive to the unusual and anomalous objects among complex background. It is supposed that a natural image can be seen as a combination of some similar or dissimilar basic patches, and there is a direct relationship between its saliency and anomaly. Some patches share high degree of similarity and have a vast number of quantity. They usually make up the background of an image. On the other hand, some patches present strong rarity and specificity. We name these patches "anomalies". Generally, anomalous patch is a reflection of the edge or some special colors and textures in an image, and these pattern cannot be well "explained" by their surroundings. Human eyes show great interests in these anomalous patterns, and will automatically pick out the anomalous parts of an image as the salient regions. To better evaluate the anomaly degree of the basic patches and exploit their nonlinear statistical characteristics, a multivariate Gaussian distribution saliency evaluation model is proposed. In this way, objects with anomalous patterns usually appear as the outliers in the Gaussian distribution, and we identify these anomalous objects as salient ones. Experiments are conducted on the well-known MSRA saliency detection dataset. Compared with other recent developed visual saliency detection methods, our method suggests significant advantages.

  19. LOSITAN: a workbench to detect molecular adaptation based on a Fst-outlier method.

    PubMed

    Antao, Tiago; Lopes, Ana; Lopes, Ricardo J; Beja-Pereira, Albano; Luikart, Gordon

    2008-07-28

    Testing for selection is becoming one of the most important steps in the analysis of multilocus population genetics data sets. Existing applications are difficult to use, leaving many non-trivial, error-prone tasks to the user. Here we present LOSITAN, a selection detection workbench based on a well evaluated Fst-outlier detection method. LOSITAN greatly facilitates correct approximation of model parameters (e.g., genome-wide average, neutral Fst), provides data import and export functions, iterative contour smoothing and generation of graphics in a easy to use graphical user interface. LOSITAN is able to use modern multi-core processor architectures by locally parallelizing fdist, reducing computation time by half in current dual core machines and with almost linear performance gains in machines with more cores. LOSITAN makes selection detection feasible to a much wider range of users, even for large population genomic datasets, by both providing an easy to use interface and essential functionality to complete the whole selection detection process.

  20. Evaluation of two outlier-detection-based methods for detecting tissue-selective genes from microarray data.

    PubMed

    Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

    2007-05-01

    Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent's non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent's method is not suitable for ROKU.

  1. A genome scan for selection signatures comparing farmed Atlantic salmon with two wild populations: Testing colocalization among outlier markers, candidate genes, and quantitative trait loci for production traits.

    PubMed

    Liu, Lei; Ang, Keng Pee; Elliott, J A K; Kent, Matthew Peter; Lien, Sigbjørn; MacDonald, Danielle; Boulding, Elizabeth Grace

    2017-03-01

    Comparative genome scans can be used to identify chromosome regions, but not traits, that are putatively under selection. Identification of targeted traits may be more likely in recently domesticated populations under strong artificial selection for increased production. We used a North American Atlantic salmon 6K SNP dataset to locate genome regions of an aquaculture strain (Saint John River) that were highly diverged from that of its putative wild founder population (Tobique River). First, admixed individuals with partial European ancestry were detected using STRUCTURE and removed from the dataset. Outlier loci were then identified as those showing extreme differentiation between the aquaculture population and the founder population. All Arlequin methods identified an overlapping subset of 17 outlier loci, three of which were also identified by BayeScan. Many outlier loci were near candidate genes and some were near published quantitative trait loci (QTLs) for growth, appetite, maturity, or disease resistance. Parallel comparisons using a wild, nonfounder population (Stewiacke River) yielded only one overlapping outlier locus as well as a known maturity QTL. We conclude that genome scans comparing a recently domesticated strain with its wild founder population can facilitate identification of candidate genes for traits known to have been under strong artificial selection.

  2. Fast clustering using adaptive density peak detection.

    PubMed

    Wang, Xiao-Feng; Xu, Yifan

    2017-12-01

    Common limitations of clustering methods include the slow algorithm convergence, the instability of the pre-specification on a number of intrinsic parameters, and the lack of robustness to outliers. A recent clustering approach proposed a fast search algorithm of cluster centers based on their local densities. However, the selection of the key intrinsic parameters in the algorithm was not systematically investigated. It is relatively difficult to estimate the "optimal" parameters since the original definition of the local density in the algorithm is based on a truncated counting measure. In this paper, we propose a clustering procedure with adaptive density peak detection, where the local density is estimated through the nonparametric multivariate kernel estimation. The model parameter is then able to be calculated from the equations with statistical theoretical justification. We also develop an automatic cluster centroid selection method through maximizing an average silhouette index. The advantage and flexibility of the proposed method are demonstrated through simulation studies and the analysis of a few benchmark gene expression data sets. The method only needs to perform in one single step without any iteration and thus is fast and has a great potential to apply on big data analysis. A user-friendly R package ADPclust is developed for public use.

  3. User Behavior Analytics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Turcotte, Melissa; Moore, Juston Shane

    User Behaviour Analytics is the tracking, collecting and assessing of user data and activities. The goal is to detect misuse of user credentials by developing models for the normal behaviour of user credentials within a computer network and detect outliers with respect to their baseline.

  4. "Contrasting patterns of selection at Pinus pinaster Ait. Drought stress candidate genes as revealed by genetic differentiation analyses".

    PubMed

    Eveno, Emmanuelle; Collada, Carmen; Guevara, M Angeles; Léger, Valérie; Soto, Alvaro; Díaz, Luis; Léger, Patrick; González-Martínez, Santiago C; Cervera, M Teresa; Plomion, Christophe; Garnier-Géré, Pauline H

    2008-02-01

    The importance of natural selection for shaping adaptive trait differentiation among natural populations of allogamous tree species has long been recognized. Determining the molecular basis of local adaptation remains largely unresolved, and the respective roles of selection and demography in shaping population structure are actively debated. Using a multilocus scan that aims to detect outliers from simulated neutral expectations, we analyzed patterns of nucleotide diversity and genetic differentiation at 11 polymorphic candidate genes for drought stress tolerance in phenotypically contrasted Pinus pinaster Ait. populations across its geographical range. We compared 3 coalescent-based methods: 2 frequentist-like, including 1 approach specifically developed for biallelic single nucleotide polymorphisms (SNPs) here and 1 Bayesian. Five genes showed outlier patterns that were robust across methods at the haplotype level for 2 of them. Two genes presented higher F(ST) values than expected (PR-AGP4 and erd3), suggesting that they could have been affected by the action of diversifying selection among populations. In contrast, 3 genes presented lower F(ST) values than expected (dhn-1, dhn2, and lp3-1), which could represent signatures of homogenizing selection among populations. A smaller proportion of outliers were detected at the SNP level suggesting the potential functional significance of particular combinations of sites in drought-response candidate genes. The Bayesian method appeared robust to low sample sizes, flexible to assumptions regarding migration rates, and powerful for detecting selection at the haplotype level, but the frequentist-like method adapted to SNPs was more efficient for the identification of outlier SNPs showing low differentiation. Population-specific effects estimated in the Bayesian method also revealed populations with lower immigration rates, which could have led to favorable situations for local adaptation. Outlier patterns are discussed in relation to the different genes' putative involvement in drought tolerance responses, from published results in transcriptomics and association mapping in P. pinaster and other related species. These genes clearly constitute relevant candidates for future association studies in P. pinaster.

  5. Automated artifact detection and removal for improved tensor estimation in motion-corrupted DTI data sets using the combination of local binary patterns and 2D partial least squares.

    PubMed

    Zhou, Zhenyu; Liu, Wei; Cui, Jiali; Wang, Xunheng; Arias, Diana; Wen, Ying; Bansal, Ravi; Hao, Xuejun; Wang, Zhishun; Peterson, Bradley S; Xu, Dongrong

    2011-02-01

    Signal variation in diffusion-weighted images (DWIs) is influenced both by thermal noise and by spatially and temporally varying artifacts, such as rigid-body motion and cardiac pulsation. Motion artifacts are particularly prevalent when scanning difficult patient populations, such as human infants. Although some motion during data acquisition can be corrected using image coregistration procedures, frequently individual DWIs are corrupted beyond repair by sudden, large amplitude motion either within or outside of the imaging plane. We propose a novel approach to identify and reject outlier images automatically using local binary patterns (LBP) and 2D partial least square (2D-PLS) to estimate diffusion tensors robustly. This method uses an enhanced LBP algorithm to extract texture features from a local texture feature of the image matrix from the DWI data. Because the images have been transformed to local texture matrices, we are able to extract discriminating information that identifies outliers in the data set by extending a traditional one-dimensional PLS algorithm to a two-dimension operator. The class-membership matrix in this 2D-PLS algorithm is adapted to process samples that are image matrix, and the membership matrix thus represents varying degrees of importance of local information within the images. We also derive the analytic form of the generalized inverse of the class-membership matrix. We show that this method can effectively extract local features from brain images obtained from a large sample of human infants to identify images that are outliers in their textural features, permitting their exclusion from further processing when estimating tensors using the DWIs. This technique is shown to be superior in performance when compared with visual inspection and other common methods to address motion-related artifacts in DWI data. This technique is applicable to correct motion artifact in other magnetic resonance imaging (MRI) techniques (e.g., the bootstrapping estimation) that use univariate or multivariate regression methods to fit MRI data to a pre-specified model. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. Supervised Detection of Anomalous Light Curves in Massive Astronomical Catalogs

    NASA Astrophysics Data System (ADS)

    Nun, Isadora; Pichara, Karim; Protopapas, Pavlos; Kim, Dae-Won

    2014-09-01

    The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each of the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.

  7. Outlier-resilient complexity analysis of heartbeat dynamics

    NASA Astrophysics Data System (ADS)

    Lo, Men-Tzung; Chang, Yi-Chung; Lin, Chen; Young, Hsu-Wen Vincent; Lin, Yen-Hung; Ho, Yi-Lwun; Peng, Chung-Kang; Hu, Kun

    2015-03-01

    Complexity in physiological outputs is believed to be a hallmark of healthy physiological control. How to accurately quantify the degree of complexity in physiological signals with outliers remains a major barrier for translating this novel concept of nonlinear dynamic theory to clinical practice. Here we propose a new approach to estimate the complexity in a signal by analyzing the irregularity of the sign time series of its coarse-grained time series at different time scales. Using surrogate data, we show that the method can reliably assess the complexity in noisy data while being highly resilient to outliers. We further apply this method to the analysis of human heartbeat recordings. Without removing any outliers due to ectopic beats, the method is able to detect a degradation of cardiac control in patients with congestive heart failure and a more degradation in critically ill patients whose life continuation relies on extracorporeal membrane oxygenator (ECMO). Moreover, the derived complexity measures can predict the mortality of ECMO patients. These results indicate that the proposed method may serve as a promising tool for monitoring cardiac function of patients in clinical settings.

  8. Addressing the issue of insufficient information in data-based bridge health monitoring : final report.

    DOT National Transportation Integrated Search

    2015-11-01

    One of the most efficient ways to solve the damage detection problem using the statistical pattern recognition : approach is that of exploiting the methods of outlier analysis. Cast within the pattern recognition framework, : damage detection assesse...

  9. Lower reference limits of quantitative cord glucose-6-phosphate dehydrogenase estimated from healthy term neonates according to the clinical and laboratory standards institute guidelines: a cross sectional retrospective study

    PubMed Central

    2013-01-01

    Background Previous studies have reported the lower reference limit (LRL) of quantitative cord glucose-6-phosphate dehydrogenase (G6PD), but they have not used approved international statistical methodology. Using common standards is expecting to yield more true findings. Therefore, we aimed to estimate LRL of quantitative G6PD detection in healthy term neonates by using statistical analyses endorsed by the International Federation of Clinical Chemistry (IFCC) and the Clinical and Laboratory Standards Institute (CLSI) for reference interval estimation. Methods This cross sectional retrospective study was performed at King Abdulaziz Hospital, Saudi Arabia, between March 2010 and June 2012. The study monitored consecutive neonates born to mothers from one Arab Muslim tribe that was assumed to have a low prevalence of G6PD-deficiency. Neonates that satisfied the following criteria were included: full-term birth (37 weeks); no admission to the special care nursery; no phototherapy treatment; negative direct antiglobulin test; and fathers of female neonates were from the same mothers’ tribe. The G6PD activity (Units/gram Hemoglobin) was measured spectrophotometrically by an automated kit. This study used statistical analyses endorsed by IFCC and CLSI for reference interval estimation. The 2.5th percentiles and the corresponding 95% confidence intervals (CI) were estimated as LRLs, both in presence and absence of outliers. Results 207 males and 188 females term neonates who had cord blood quantitative G6PD testing met the inclusion criteria. Method of Horn detected 20 G6PD values as outliers (8 males and 12 females). Distributions of quantitative cord G6PD values exhibited a normal distribution in absence of the outliers only. The Harris-Boyd method and proportion criteria revealed that combined gender LRLs were reliable. The combined bootstrap LRL in presence of the outliers was 10.0 (95% CI: 7.5-10.7) and the combined parametric LRL in absence of the outliers was 11.0 (95% CI: 10.5-11.3). Conclusion These results contribute to the LRL of quantitative cord G6PD detection in full-term neonates. They are transferable to another laboratory when pre-analytical factors and testing methods are comparable and the IFCC-CLSI requirements of transference are satisfied. We are suggesting using estimated LRL in absence of the outliers as mislabeling G6PD-deficient neonates as normal is intolerable whereas mislabeling G6PD-normal neonates as deficient is tolerable. PMID:24016342

  10. A method for separating seismo-ionospheric TEC outliers from heliogeomagnetic disturbances by using nu-SVR

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pattisahusiwa, Asis; Liong, The Houw; Purqon, Acep

    Seismo-Ionospheric is a study of ionosphere disturbances associated with seismic activities. In many previous researches, heliogeomagnetic or strong earthquake activities can caused the disturbances in the ionosphere. However, it is difficult to separate these disturbances based on related sources. In this research, we proposed a method to separate these disturbances/outliers by using nu-SVR with the world-wide GPS data. TEC data related to the 26th December 2004 Sumatra and the 11th March 2011 Honshu earthquakes had been analyzed. After analyzed TEC data in several location around the earthquake epicenter and compared with geomagnetic data, the method shows a good result inmore » the average to detect the source of these outliers. This method is promising to use in the future research.« less

  11. Locality-constrained anomaly detection for hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Liu, Jiabin; Li, Wei; Du, Qian; Liu, Kui

    2015-12-01

    Detecting a target with low-occurrence-probability from unknown background in a hyperspectral image, namely anomaly detection, is of practical significance. Reed-Xiaoli (RX) algorithm is considered as a classic anomaly detector, which calculates the Mahalanobis distance between local background and the pixel under test. Local RX, as an adaptive RX detector, employs a dual-window strategy to consider pixels within the frame between inner and outer windows as local background. However, the detector is sensitive if such a local region contains anomalous pixels (i.e., outliers). In this paper, a locality-constrained anomaly detector is proposed to remove outliers in the local background region before employing the RX algorithm. Specifically, a local linear representation is designed to exploit the internal relationship between linearly correlated pixels in the local background region and the pixel under test and its neighbors. Experimental results demonstrate that the proposed detector improves the original local RX algorithm.

  12. An anomaly detection approach for the identification of DME patients using spectral domain optical coherence tomography images.

    PubMed

    Sidibé, Désiré; Sankar, Shrinivasan; Lemaître, Guillaume; Rastgoo, Mojdeh; Massich, Joan; Cheung, Carol Y; Tan, Gavin S W; Milea, Dan; Lamoureux, Ecosse; Wong, Tien Y; Mériaudeau, Fabrice

    2017-02-01

    This paper proposes a method for automatic classification of spectral domain OCT data for the identification of patients with retinal diseases such as Diabetic Macular Edema (DME). We address this issue as an anomaly detection problem and propose a method that not only allows the classification of the OCT volume, but also allows the identification of the individual diseased B-scans inside the volume. Our approach is based on modeling the appearance of normal OCT images with a Gaussian Mixture Model (GMM) and detecting abnormal OCT images as outliers. The classification of an OCT volume is based on the number of detected outliers. Experimental results with two different datasets show that the proposed method achieves a sensitivity and a specificity of 80% and 93% on the first dataset, and 100% and 80% on the second one. Moreover, the experiments show that the proposed method achieves better classification performance than other recently published works. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  13. Spatial detection of tv channel logos as outliers from the content

    NASA Astrophysics Data System (ADS)

    Ekin, Ahmet; Braspenning, Ralph

    2006-01-01

    This paper proposes a purely image-based TV channel logo detection algorithm that can detect logos independently from their motion and transparency features. The proposed algorithm can robustly detect any type of logos, such as transparent and animated, without requiring any temporal constraints whereas known methods have to wait for the occurrence of large motion in the scene and assume stationary logos. The algorithm models logo pixels as outliers from the actual scene content that is represented by multiple 3-D histograms in the YC BC R space. We use four scene histograms corresponding to each of the four corners because the content characteristics change from one image corner to another. A further novelty of the proposed algorithm is that we define image corners and the areas where we compute the scene histograms by a cinematic technique called Golden Section Rule that is used by professionals. The robustness of the proposed algorithm is demonstrated over a dataset of representative TV content.

  14. Genomic Changes Associated with Reproductive and Migratory Ecotypes in Sockeye Salmon (Oncorhynchus nerka).

    PubMed

    Veale, Andrew J; Russello, Michael A

    2017-10-01

    Mechanisms underlying adaptive evolution can best be explored using paired populations displaying similar phenotypic divergence, illuminating the genomic changes associated with specific life history traits. Here, we used paired migratory [anadromous vs. resident (kokanee)] and reproductive [shore- vs. stream-spawning] ecotypes of sockeye salmon (Oncorhynchus nerka) sampled from seven lakes and two rivers spanning three catchments (Columbia, Fraser, and Skeena) in British Columbia, Canada to investigate the patterns and processes underlying their divergence. Restriction-site associated DNA sequencing was used to genotype this sampling at 7,347 single nucleotide polymorphisms, 334 of which were identified as outlier loci and candidates for divergent selection within at least one ecotype comparison. Sixty-eight of these outliers were present in two or more comparisons, with 33 detected across multiple catchments. Of particular note, one locus was detected as the most significant outlier between shore and stream-spawning ecotypes in multiple comparisons and across catchments (Columbia, Fraser, and Snake). We also detected several genomic islands of divergence, some shared among comparisons, potentially showing linked signals of differential selection. The single nucleotide polymorphisms and genomic regions identified in our study offer a range of mechanistic hypotheses associated with the genetic basis of O. nerka life history variation and provide novel tools for informing fisheries management. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  15. Data Analytics for Smart Parking Applications.

    PubMed

    Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele

    2016-09-23

    We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset.

  16. Unsupervised universal steganalyzer for high-dimensional steganalytic features

    NASA Astrophysics Data System (ADS)

    Hou, Xiaodan; Zhang, Tao

    2016-11-01

    The research in developing steganalytic features has been highly successful. These features are extremely powerful when applied to supervised binary classification problems. However, they are incompatible with unsupervised universal steganalysis because the unsupervised method cannot distinguish embedding distortion from varying levels of noises caused by cover variation. This study attempts to alleviate the problem by introducing similarity retrieval of image statistical properties (SRISP), with the specific aim of mitigating the effect of cover variation on the existing steganalytic features. First, cover images with some statistical properties similar to those of a given test image are searched from a retrieval cover database to establish an aided sample set. Then, unsupervised outlier detection is performed on a test set composed of the given test image and its aided sample set to determine the type (cover or stego) of the given test image. Our proposed framework, called SRISP-aided unsupervised outlier detection, requires no training. Thus, it does not suffer from model mismatch mess. Compared with prior unsupervised outlier detectors that do not consider SRISP, the proposed framework not only retains the universality but also exhibits superior performance when applied to high-dimensional steganalytic features.

  17. Data Analytics for Smart Parking Applications

    PubMed Central

    Piovesan, Nicola; Turi, Leo; Toigo, Enrico; Martinez, Borja; Rossi, Michele

    2016-01-01

    We consider real-life smart parking systems where parking lot occupancy data are collected from field sensor devices and sent to backend servers for further processing and usage for applications. Our objective is to make these data useful to end users, such as parking managers, and, ultimately, to citizens. To this end, we concoct and validate an automated classification algorithm having two objectives: (1) outlier detection: to detect sensors with anomalous behavioral patterns, i.e., outliers; and (2) clustering: to group the parking sensors exhibiting similar patterns into distinct clusters. We first analyze the statistics of real parking data, obtaining suitable simulation models for parking traces. We then consider a simple classification algorithm based on the empirical complementary distribution function of occupancy times and show its limitations. Hence, we design a more sophisticated algorithm exploiting unsupervised learning techniques (self-organizing maps). These are tuned following a supervised approach using our trace generator and are compared against other clustering schemes, namely expectation maximization, k-means clustering and DBSCAN, considering six months of data from a real sensor deployment. Our approach is found to be superior in terms of classification accuracy, while also being capable of identifying all of the outliers in the dataset. PMID:27669259

  18. A Comprehensive review of group level model performance in the presence of heteroscedasticity: Can a single model control Type I errors in the presence of outliers?

    PubMed Central

    Mumford, Jeanette A.

    2017-01-01

    Even after thorough preprocessing and a careful time series analysis of functional magnetic resonance imaging (fMRI) data, artifact and other issues can lead to violations of the assumption that the variance is constant across subjects in the group level model. This is especially concerning when modeling a continuous covariate at the group level, as the slope is easily biased by outliers. Various models have been proposed to deal with outliers including models that use the first level variance or that use the group level residual magnitude to differentially weight subjects. The most typically used robust regression, implementing a robust estimator of the regression slope, has been previously studied in the context of fMRI studies and was found to perform well in some scenarios, but a loss of Type I error control can occur for some outlier settings. A second type of robust regression using a heteroscedastic autocorrelation consistent (HAC) estimator, which produces robust slope and variance estimates has been shown to perform well, with better Type I error control, but with large sample sizes (500–1000 subjects). The Type I error control with smaller sample sizes has not been studied in this model and has not been compared to other modeling approaches that handle outliers such as FSL’s Flame 1 and FSL’s outlier de-weighting. Focusing on group level inference with a continuous covariate over a range of sample sizes and degree of heteroscedasticity, which can be driven either by the within- or between-subject variability, both styles of robust regression are compared to ordinary least squares (OLS), FSL’s Flame 1, Flame 1 with outlier de-weighting algorithm and Kendall’s Tau. Additionally, subject omission using the Cook’s Distance measure with OLS and nonparametric inference with the OLS statistic are studied. Pros and cons of these models as well as general strategies for detecting outliers in data and taking precaution to avoid inflated Type I error rates are discussed. PMID:28030782

  19. Evaluation of Two Outlier-Detection-Based Methods for Detecting Tissue-Selective Genes from Microarray Data

    PubMed Central

    Kadota, Koji; Konishi, Tomokazu; Shimizu, Kentaro

    2007-01-01

    Large-scale expression profiling using DNA microarrays enables identification of tissue-selective genes for which expression is considerably higher and/or lower in some tissues than in others. Among numerous possible methods, only two outlier-detection-based methods (an AIC-based method and Sprent’s non-parametric method) can treat equally various types of selective patterns, but they produce substantially different results. We investigated the performance of these two methods for different parameter settings and for a reduced number of samples. We focused on their ability to detect selective expression patterns robustly. We applied them to public microarray data collected from 36 normal human tissue samples and analyzed the effects of both changing the parameter settings and reducing the number of samples. The AIC-based method was more robust in both cases. The findings confirm that the use of the AIC-based method in the recently proposed ROKU method for detecting tissue-selective expression patterns is correct and that Sprent’s method is not suitable for ROKU. PMID:19936074

  20. Real-time detection of organic contamination events in water distribution systems by principal components analysis of ultraviolet spectral data.

    PubMed

    Zhang, Jian; Hou, Dibo; Wang, Ke; Huang, Pingjie; Zhang, Guangxin; Loáiciga, Hugo

    2017-05-01

    The detection of organic contaminants in water distribution systems is essential to protect public health from potential harmful compounds resulting from accidental spills or intentional releases. Existing methods for detecting organic contaminants are based on quantitative analyses such as chemical testing and gas/liquid chromatography, which are time- and reagent-consuming and involve costly maintenance. This study proposes a novel procedure based on discrete wavelet transform and principal component analysis for detecting organic contamination events from ultraviolet spectral data. Firstly, the spectrum of each observation is transformed using discrete wavelet with a coiflet mother wavelet to capture the abrupt change along the wavelength. Principal component analysis is then employed to approximate the spectra based on capture and fusion features. The significant value of Hotelling's T 2 statistics is calculated and used to detect outliers. An alarm of contamination event is triggered by sequential Bayesian analysis when the outliers appear continuously in several observations. The effectiveness of the proposed procedure is tested on-line using a pilot-scale setup and experimental data.

  1. Automated rice leaf disease detection using color image analysis

    NASA Astrophysics Data System (ADS)

    Pugoy, Reinald Adrian D. L.; Mariano, Vladimir Y.

    2011-06-01

    In rice-related institutions such as the International Rice Research Institute, assessing the health condition of a rice plant through its leaves, which is usually done as a manual eyeball exercise, is important to come up with good nutrient and disease management strategies. In this paper, an automated system that can detect diseases present in a rice leaf using color image analysis is presented. In the system, the outlier region is first obtained from a rice leaf image to be tested using histogram intersection between the test and healthy rice leaf images. Upon obtaining the outlier, it is then subjected to a threshold-based K-means clustering algorithm to group related regions into clusters. Then, these clusters are subjected to further analysis to finally determine the suspected diseases of the rice leaf.

  2. An efficient sampling algorithm for uncertain abnormal data detection in biomedical image processing and disease prediction.

    PubMed

    Liu, Fei; Zhang, Xi; Jia, Yan

    2015-01-01

    In this paper, we propose a computer information processing algorithm that can be used for biomedical image processing and disease prediction. A biomedical image is considered a data object in a multi-dimensional space. Each dimension is a feature that can be used for disease diagnosis. We introduce a new concept of the top (k1,k2) outlier. It can be used to detect abnormal data objects in the multi-dimensional space. This technique focuses on uncertain space, where each data object has several possible instances with distinct probabilities. We design an efficient sampling algorithm for the top (k1,k2) outlier in uncertain space. Some improvement techniques are used for acceleration. Experiments show our methods' high accuracy and high efficiency.

  3. Portraying the Expression Landscapes of B-Cell Lymphoma-Intuitive Detection of Outlier Samples and of Molecular Subtypes

    PubMed Central

    Hopp, Lydia; Lembcke, Kathrin; Binder, Hans; Wirth, Henry

    2013-01-01

    We present an analytic framework based on Self-Organizing Map (SOM) machine learning to study large scale patient data sets. The potency of the approach is demonstrated in a case study using gene expression data of more than 200 mature aggressive B-cell lymphoma patients. The method portrays each sample with individual resolution, characterizes the subtypes, disentangles the expression patterns into distinct modules, extracts their functional context using enrichment techniques and enables investigation of the similarity relations between the samples. The method also allows to detect and to correct outliers caused by contaminations. Based on our analysis, we propose a refined classification of B-cell Lymphoma into four molecular subtypes which are characterized by differential functional and clinical characteristics. PMID:24833231

  4. Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction.

    PubMed

    Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

    2017-08-08

    Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors.

  5. Just-in-Time Correntropy Soft Sensor with Noisy Data for Industrial Silicon Content Prediction

    PubMed Central

    Chen, Kun; Liang, Yu; Gao, Zengliang; Liu, Yi

    2017-01-01

    Development of accurate data-driven quality prediction models for industrial blast furnaces encounters several challenges mainly because the collected data are nonlinear, non-Gaussian, and uneven distributed. A just-in-time correntropy-based local soft sensing approach is presented to predict the silicon content in this work. Without cumbersome efforts for outlier detection, a correntropy support vector regression (CSVR) modeling framework is proposed to deal with the soft sensor development and outlier detection simultaneously. Moreover, with a continuous updating database and a clustering strategy, a just-in-time CSVR (JCSVR) method is developed. Consequently, more accurate prediction and efficient implementations of JCSVR can be achieved. Better prediction performance of JCSVR is validated on the online silicon content prediction, compared with traditional soft sensors. PMID:28786957

  6. Genetic variation of loci potentially under selection confounds species-genetic diversity correlations in a fragmented habitat.

    PubMed

    Bertin, Angeline; Gouin, Nicolas; Baumel, Alex; Gianoli, Ernesto; Serratosa, Juan; Osorio, Rodomiro; Manel, Stephanie

    2017-01-01

    Positive species-genetic diversity correlations (SGDCs) are often thought to result from the parallel influence of neutral processes on genetic and species diversity. Yet, confounding effects of non-neutral mechanisms have not been explored. Here, we investigate the impact of non-neutral genetic diversity on SGDCs in high Andean wetlands. We compare correlations between plant species diversity and genetic diversity (GD) calculated with and without loci potentially under selection (outlier loci). The study system includes 2188 specimens from five species (three common aquatic macroinvertebrate and two dominant plant species) that were genotyped for 396 amplified fragment length polymorphism loci. We also appraise the importance of neutral processes on SGDCs by investigating the influence of habitat fragmentation features. Significant positive SGDCs were detected for all five species (mean SGDC = 0.52 ± 0.05). While only a few outlier loci were detected in each species, they resulted in significant decreases in GD and in SGDCs. This supports the hypothesis that neutral processes drive species-genetic diversity relationships in high Andean wetlands. Unexpectedly, the effects on genetic diversity GD of the habitat fragmentation characteristics in this study increased with the presence of outlier loci in two species. Overall, our results reveal pitfalls in using habitat features to infer processes driving SGDCs and show that a few loci potentially under selection are enough to cause a significant downward bias in SGDC. Investigating confounding effects of outlier loci thus represents a useful approach to evidence the contribution of neutral processes on species-genetic diversity relationships. © 2016 John Wiley & Sons Ltd.

  7. Improving Electronic Sensor Reliability by Robust Outlier Screening

    PubMed Central

    Moreno-Lizaranzu, Manuel J.; Cuesta, Federico

    2013-01-01

    Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs. PMID:24113682

  8. Improving electronic sensor reliability by robust outlier screening.

    PubMed

    Moreno-Lizaranzu, Manuel J; Cuesta, Federico

    2013-10-09

    Electronic sensors are widely used in different application areas, and in some of them, such as automotive or medical equipment, they must perform with an extremely low defect rate. Increasing reliability is paramount. Outlier detection algorithms are a key component in screening latent defects and decreasing the number of customer quality incidents (CQIs). This paper focuses on new spatial algorithms (Good Die in a Bad Cluster with Statistical Bins (GDBC SB) and Bad Bin in a Bad Cluster (BBBC)) and an advanced outlier screening method, called Robust Dynamic Part Averaging Testing (RDPAT), as well as two practical improvements, which significantly enhance existing algorithms. Those methods have been used in production in Freescale® Semiconductor probe factories around the world for several years. Moreover, a study was conducted with production data of 289,080 dice with 26 CQIs to determine and compare the efficiency and effectiveness of all these algorithms in identifying CQIs.

  9. Genome scanning for detecting adaptive genes along environmental gradients in the Japanese conifer, Cryptomeria japonica.

    PubMed

    Tsumura, Y; Uchiyama, K; Moriguchi, Y; Ueno, S; Ihara-Ujino, T

    2012-12-01

    Local adaptation is important in evolutionary processes and speciation. We used multiple tests to identify several candidate genes that may be involved in local adaptation from 1026 loci in 14 natural populations of Cryptomeria japonica, the most economically important forestry tree in Japan. We also studied the relationships between genotypes and environmental variables to obtain information on the selective pressures acting on individual populations. Outlier loci were mapped onto a linkage map, and the positions of loci associated with specific environmental variables are considered. The outlier loci were not randomly distributed on the linkage map; linkage group 11 was identified as a genomic island of divergence. Three loci in this region were also associated with environmental variables such as mean annual temperature, daily maximum temperature, maximum snow depth, and so on. Outlier loci identified with high significance levels will be essential for conservation purposes and for future work on molecular breeding.

  10. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling.

    PubMed

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  11. Noise-robust unsupervised spike sorting based on discriminative subspace learning with outlier handling

    NASA Astrophysics Data System (ADS)

    Keshtkaran, Mohammad Reza; Yang, Zhi

    2017-06-01

    Objective. Spike sorting is a fundamental preprocessing step for many neuroscience studies which rely on the analysis of spike trains. Most of the feature extraction and dimensionality reduction techniques that have been used for spike sorting give a projection subspace which is not necessarily the most discriminative one. Therefore, the clusters which appear inherently separable in some discriminative subspace may overlap if projected using conventional feature extraction approaches leading to a poor sorting accuracy especially when the noise level is high. In this paper, we propose a noise-robust and unsupervised spike sorting algorithm based on learning discriminative spike features for clustering. Approach. The proposed algorithm uses discriminative subspace learning to extract low dimensional and most discriminative features from the spike waveforms and perform clustering with automatic detection of the number of the clusters. The core part of the algorithm involves iterative subspace selection using linear discriminant analysis and clustering using Gaussian mixture model with outlier detection. A statistical test in the discriminative subspace is proposed to automatically detect the number of the clusters. Main results. Comparative results on publicly available simulated and real in vivo datasets demonstrate that our algorithm achieves substantially improved cluster distinction leading to higher sorting accuracy and more reliable detection of clusters which are highly overlapping and not detectable using conventional feature extraction techniques such as principal component analysis or wavelets. Significance. By providing more accurate information about the activity of more number of individual neurons with high robustness to neural noise and outliers, the proposed unsupervised spike sorting algorithm facilitates more detailed and accurate analysis of single- and multi-unit activities in neuroscience and brain machine interface studies.

  12. Community trees: Identifying codiversification in the Páramo dipteran community.

    PubMed

    Carstens, Bryan C; Gruenstaeudl, Michael; Reid, Noah M

    2016-05-01

    Groups of codistributed species that responded in a concerted manner to environmental events are expected to share patterns of evolutionary diversification. However, the identification of such groups has largely been based on qualitative, post hoc analyses. We develop here two methods (posterior predictive simulation [PPS], Kuhner-Felsenstein [K-F] analysis of variance [ANOVA]) for the analysis of codistributed species that, given a group of species with a shared pattern of diversification, allow empiricists to identify those taxa that do not codiversify (i.e., "outlier" species). The identification of outlier species makes it possible to jointly estimate the evolutionary history of co-diversifying taxa. To evaluate the approaches presented here, we collected data from Páramo dipterans, identified outlier species, and estimated a "community tree" from species that are identified as having codiversified. Our results demonstrate that dipteran communities from different Páramo habitats in the same mountain range are more closely related than communities in other ranges. We also conduct simulation testing to evaluate this approach. Results suggest that our approach provides a useful addition to comparative phylogeographic methods, while identifying aspects of the analysis that require careful interpretation. In particular, both the PPS and K-F ANOVA perform acceptably when there are one or two outlier species, but less so as the number of outliers increases. This is likely a function of the corresponding degradation of the signal of community divergence; without a strong signal from a codiversifying community, there is no dominant pattern from which to detect an outlier species. For this reason, both the magnitude of K-F distance distribution and outside knowledge about the phylogeographic history of each putative member of the community should be considered when interpreting the results. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.

  13. The Space-Time Variation of Global Crop Yields, Detecting Simultaneous Outliers and Identifying the Teleconnections with Climatic Patterns

    NASA Astrophysics Data System (ADS)

    Najafi, E.; Devineni, N.; Pal, I.; Khanbilvardi, R.

    2017-12-01

    An understanding of the climate factors that influence the space-time variability of crop yields is important for food security purposes and can help us predict global food availability. In this study, we address how the crop yield trends of countries globally were related to each other during the last several decades and the main climatic variables that triggered high/low crop yields simultaneously across the world. Robust Principal Component Analysis (rPCA) is used to identify the primary modes of variation in wheat, maize, sorghum, rice, soybeans, and barley yields. Relations between these modes of variability and important climatic variables, especially anomalous sea surface temperature (SSTa), are examined from 1964 to 2010. rPCA is also used to identify simultaneous outliers in each year, i.e. systematic high/low crop yields across the globe. The results demonstrated spatiotemporal patterns of these crop yields and the climate-related events that caused them as well as the connection of outliers with weather extremes. We find that among climatic variables, SST has had the most impact on creating simultaneous crop yields variability and yield outliers in many countries. An understanding of this phenomenon can benefit global crop trade networks.

  14. Genomic signatures of positive selection in humans and the limits of outlier approaches.

    PubMed

    Kelley, Joanna L; Madeoy, Jennifer; Calhoun, John C; Swanson, Willie; Akey, Joshua M

    2006-08-01

    Identifying regions of the human genome that have been targets of positive selection will provide important insights into recent human evolutionary history and may facilitate the search for complex disease genes. However, the confounding effects of population demographic history and selection on patterns of genetic variation complicate inferences of selection when a small number of loci are studied. To this end, identifying outlier loci from empirical genome-wide distributions of genetic variation is a promising strategy to detect targets of selection. Here, we evaluate the power and efficiency of a simple outlier approach and describe a genome-wide scan for positive selection using a dense catalog of 1.58 million SNPs that were genotyped in three human populations. In total, we analyzed 14,589 genes, 385 of which possess patterns of genetic variation consistent with the hypothesis of positive selection. Furthermore, several extended genomic regions were found, spanning >500 kb, that contained multiple contiguous candidate selection genes. More generally, these data provide important practical insights into the limits of outlier approaches in genome-wide scans for selection, provide strong candidate selection genes to study in greater detail, and may have important implications for disease related research.

  15. Outlier detection for groundwater data in France

    NASA Astrophysics Data System (ADS)

    Valmy, Larissa; de Fouquet, Chantal; Bourgine, Bernard

    2014-05-01

    Quality and quantity water in France are increasingly observed since the 70s. Moreover, in 2000, the EU Water Framework Directive established a framework for community action in the water policy field for the protection of inland surface waters (rivers and lakes), transitional waters (estuaries), coastal waters and groundwater. It will ensure that all aquatic ecosystems and, with regard to their water needs, terrestrial ecosystems and wetlands meet 'good status' by 2015. The Directive requires Member States to establish river basin districts and for each of these a river basin management plan. In France, monitoring programs for the water status were implemented in each basin since 2007. The data collected through these programs feed into an information system which contributes to check the compliance of water environmental legislation implementation, assess the status of water guide management actions (programs of measures) and evaluate their effectiveness, and inform the public. Our work consists in study quality and quantity groundwater data for some basins in France. We propose a specific mathematical approach in order to detect outliers and study trends in time series. In statistic, an outlier is an observation that lies outside the overall pattern of a distribution. Usually, the presence of an outlier indicates some sort of problem, thus, it is important to detect it in order to know the cause. In fact, techniques for temporal data analysis have been developed for several decades in parallel with geostatistical methods. However compared to standard statistical methods, geostatistical analysis allows incomplete or irregular time series analysis. Otherwise, tests carried out by the BRGM showed the potential contribution of geostatistical methods for characterization of environmental data time series. Our approach is to exploit this potential through the development of specific algorithms, tests and validation of methods. We will introduce and explain our method and approach by considering the Loire Bretagne basin case.

  16. Multivariate assessment of event-related potentials with the t-CWT method.

    PubMed

    Bostanov, Vladimir

    2015-11-05

    Event-related brain potentials (ERPs) are usually assessed with univariate statistical tests although they are essentially multivariate objects. Brain-computer interface applications are a notable exception to this practice, because they are based on multivariate classification of single-trial ERPs. Multivariate ERP assessment can be facilitated by feature extraction methods. One such method is t-CWT, a mathematical-statistical algorithm based on the continuous wavelet transform (CWT) and Student's t-test. This article begins with a geometric primer on some basic concepts of multivariate statistics as applied to ERP assessment in general and to the t-CWT method in particular. Further, it presents for the first time a detailed, step-by-step, formal mathematical description of the t-CWT algorithm. A new multivariate outlier rejection procedure based on principal component analysis in the frequency domain is presented as an important pre-processing step. The MATLAB and GNU Octave implementation of t-CWT is also made publicly available for the first time as free and open source code. The method is demonstrated on some example ERP data obtained in a passive oddball paradigm. Finally, some conceptually novel applications of the multivariate approach in general and of the t-CWT method in particular are suggested and discussed. Hopefully, the publication of both the t-CWT source code and its underlying mathematical algorithm along with a didactic geometric introduction to some basic concepts of multivariate statistics would make t-CWT more accessible to both users and developers in the field of neuroscience research.

  17. On impact damage detection and quantification for CFRP laminates using structural response data only

    NASA Astrophysics Data System (ADS)

    Sultan, M. T. H.; Worden, K.; Pierce, S. G.; Hickey, D.; Staszewski, W. J.; Dulieu-Barton, J. M.; Hodzic, A.

    2011-11-01

    The overall purpose of the research is to detect and attempt to quantify impact damage in structures made from composite materials. A study that uses simplified coupon specimens made from a Carbon Fibre-Reinforced Polymer (CFRP) prepreg with 11, 12 and 13 plies is presented. PZT sensors were placed at three separate locations in each test specimen to record the responses from impact events. To perform damaging impact tests, an instrumented drop-test machine was used and the impact energy was set to cover a range of 0.37-41.72 J. The response signals captured from each sensor were recorded by a data acquisition system for subsequent evaluation. The impacted specimens were examined with an X-ray technique to determine the extent of the damaged areas and it was found that the apparent damaged area grew monotonically with impact energy. A number of simple univariate and multivariate features were extracted from the sensor signals recorded during impact by computing their spectra and calculating frequency centroids. The concept of discordancy from the statistical discipline of outlier analysis is employed in order to separate the responses from non-damaging and damaging impacts. The results show that the potential damage indices introduced here provide a means of identifying damaging impacts from the response data alone.

  18. Anomaly detection of microstructural defects in continuous fiber reinforced composites

    NASA Astrophysics Data System (ADS)

    Bricker, Stephen; Simmons, J. P.; Przybyla, Craig; Hardie, Russell

    2015-03-01

    Ceramic matrix composites (CMC) with continuous fiber reinforcements have the potential to enable the next generation of high speed hypersonic vehicles and/or significant improvements in gas turbine engine performance due to their exhibited toughness when subjected to high mechanical loads at extreme temperatures (2200F+). Reinforced fiber composites (RFC) provide increased fracture toughness, crack growth resistance, and strength, though little is known about how stochastic variation and imperfections in the material effect material properties. In this work, tools are developed for quantifying anomalies within the microstructure at several scales. The detection and characterization of anomalous microstructure is a critical step in linking production techniques to properties, as well as in accurate material simulation and property prediction for the integrated computation materials engineering (ICME) of RFC based components. It is desired to find statistical outliers for any number of material characteristics such as fibers, fiber coatings, and pores. Here, fiber orientation, or `velocity', and `velocity' gradient are developed and examined for anomalous behavior. Categorizing anomalous behavior in the CMC is approached by multivariate Gaussian mixture modeling. A Gaussian mixture is employed to estimate the probability density function (PDF) of the features in question, and anomalies are classified by their likelihood of belonging to the statistical normal behavior for that feature.

  19. Robust detrending, rereferencing, outlier detection, and inpainting for multichannel data.

    PubMed

    de Cheveigné, Alain; Arzounian, Dorothée

    2018-05-15

    Electroencephalography (EEG), magnetoencephalography (MEG) and related techniques are prone to glitches, slow drift, steps, etc., that contaminate the data and interfere with the analysis and interpretation. These artifacts are usually addressed in a preprocessing phase that attempts to remove them or minimize their impact. This paper offers a set of useful techniques for this purpose: robust detrending, robust rereferencing, outlier detection, data interpolation (inpainting), step removal, and filter ringing artifact removal. These techniques provide a less wasteful alternative to discarding corrupted trials or channels, and they are relatively immune to artifacts that disrupt alternative approaches such as filtering. Robust detrending allows slow drifts and common mode signals to be factored out while avoiding the deleterious effects of glitches. Robust rereferencing reduces the impact of artifacts on the reference. Inpainting allows corrupt data to be interpolated from intact parts based on the correlation structure estimated over the intact parts. Outlier detection allows the corrupt parts to be identified. Step removal fixes the high-amplitude flux jump artifacts that are common with some MEG systems. Ringing removal allows the ringing response of the antialiasing filter to glitches (steps, pulses) to be suppressed. The performance of the methods is illustrated and evaluated using synthetic data and data from real EEG and MEG systems. These methods, which are mainly automatic and require little tuning, can greatly improve the quality of the data. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  20. Evaluating Effect of Albendazole on Trichuris trichiura Infection: A Systematic Review Article.

    PubMed

    Ahmadi Jouybari, Toraj; Najaf Ghobadi, Khadije; Lotfi, Bahare; Alavi Majd, Hamid; Ahmadi, Nayeb Ali; Rostami-Nejad, Mohammad; Aghaei, Abbas

    2016-01-01

    The aim of the study was assessment of defaults and conducted meta-analysis of the efficacy of single-dose oral albendazole against T. trichiura infection. We searched PubMed, ISI Web of Science, Science Direct, the Cochrane Central Register of Controlled Trials, and WHO library databases between 1983 and 2014. Data from 13 clinical trial articles were used. Each article was included the effect of single oral dose (400 mg) albendazole and placebo in treating two groups of patients with T. trichiura infection. For both groups in each article, sample size, the number of those with T. trichiura infection, and the number of those recovered following the intake of albendazole were identified and recorded. The relative risk and variance were computed. Funnel plot, Beggs and Eggers tests were used for assessment of publication bias. The random effect variance shift outlier model and likelihood ratio test were applied for detecting outliers. In order to detect influence, DFFITS values, Cook's distances and COVRATIO were used. Data were analyzed using STATA and R software. The article number 13 and 9 were outlier and influence, respectively. Outlier is diagnosed by variance shift of target study in inferential method and by RR value in graphical method. Funnel plot and Beggs test did not show the publication bias ( P =0.272). However, the Eggers test confirmed it ( P =0.034). Meta-analysis after removal of article 13 showed that relative risk was 1.99 (CI 95% 1.71 - 2.31). The estimated RR and our meta-analyses show that treatment of T. trichiura with single oral doses of albendazole is unsatisfactory. New anthelminthics are urgently needed.

  1. Multiple Hypothesis Testing for Experimental Gingivitis Based on Wilcoxon Signed Rank Statistics

    PubMed Central

    Preisser, John S.; Sen, Pranab K.; Offenbacher, Steven

    2011-01-01

    Dental research often involves repeated multivariate outcomes on a small number of subjects for which there is interest in identifying outcomes that exhibit change in their levels over time as well as to characterize the nature of that change. In particular, periodontal research often involves the analysis of molecular mediators of inflammation for which multivariate parametric methods are highly sensitive to outliers and deviations from Gaussian assumptions. In such settings, nonparametric methods may be favored over parametric ones. Additionally, there is a need for statistical methods that control an overall error rate for multiple hypothesis testing. We review univariate and multivariate nonparametric hypothesis tests and apply them to longitudinal data to assess changes over time in 31 biomarkers measured from the gingival crevicular fluid in 22 subjects whereby gingivitis was induced by temporarily withholding tooth brushing. To identify biomarkers that can be induced to change, multivariate Wilcoxon signed rank tests for a set of four summary measures based upon area under the curve are applied for each biomarker and compared to their univariate counterparts. Multiple hypothesis testing methods with choice of control of the false discovery rate or strong control of the family-wise error rate are examined. PMID:21984957

  2. Detection and monitoring of emerald ash borer populations: trap trees and the factors that may influence their effectiveness

    Treesearch

    Andrew J. Storer; Jessica A. Metzger; Ivich Fraser; Deborah G. McCullough; Therese M. Poland; Robert L. Heyd

    2007-01-01

    The exotic emerald ash borer (EAB), Agrilus planipennis Fairmaire, was first identified in Michigan in 2002, though it had likely been established there for a number of years prior to detection. A key to management of EAB populations is the ability to detect this insect in order to accurately describe its distribution and to locate new outlier...

  3. Metastatic Melanoma Induced Metabolic Changes in C57BL/6J Mouse Stomach Measured by 1H NMR Spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hu, M; Wang, Xiliang

    Melanoma is a malignant tumor of melanocytes with high capability of invasion and rapid metastasis to other organs. Malignant melanoma is the most common metastatic malignancy found in gastrointestinal tract (GI). To the best of our knowledge, previous studies of melanoma in gastrointestinal tract are all clinical case reports. In this work, 1H NMR-based metabolomics approach is used to investigate the metabolite profiles differences of stomach tissue extracts of metastatic B16-F10 melanoma in C57BL/6J mouse and search for specific metabolite biomarker candidates. Principal Component Analysis (PCA), an unsupervised multivariate data analysis method, is used to detect possible outliers, while Orthogonalmore » Projection to Latent Structure (OPLS), a supervised multivariate data analysis method, is employed to evaluate important metabolites responsible for discriminating the control and the melanoma groups. Both PCA and OPLS results reveal that the melanoma group can be well separated from its control group. Among the 50 identified metabolites, it is found that the concentrations of 19 metabolites are statistically and significantly changed with the levels of O-phosphocholine and hypoxanthine down-regulated while the levels of isoleucine, leucine, valine, isobutyrate, threonine, cadaverine, alanine, glutamate, glutamine, methionine, citrate, asparagine, tryptophan, glycine, serine, uracil, and formate up-regulated in the melanoma group. These significantly changed metabolites are associated with multiple biological pathways and may be potential biomarkers for metastatic melanoma in stomach.« less

  4. Metastatic Melanoma Induced Metabolic Changes in C57BL/6J Mouse Stomach Measured by 1H NMR Spectroscopy

    DOE PAGES

    Hu, M; Wang, Xiliang

    2014-12-05

    Melanoma is a malignant tumor of melanocytes with high capability of invasion and rapid metastasis to other organs. Malignant melanoma is the most common metastatic malignancy found in gastrointestinal tract (GI). To the best of our knowledge, previous studies of melanoma in gastrointestinal tract are all clinical case reports. In this work, 1H NMR-based metabolomics approach is used to investigate the metabolite profiles differences of stomach tissue extracts of metastatic B16-F10 melanoma in C57BL/6J mouse and search for specific metabolite biomarker candidates. Principal Component Analysis (PCA), an unsupervised multivariate data analysis method, is used to detect possible outliers, while Orthogonalmore » Projection to Latent Structure (OPLS), a supervised multivariate data analysis method, is employed to evaluate important metabolites responsible for discriminating the control and the melanoma groups. Both PCA and OPLS results reveal that the melanoma group can be well separated from its control group. Among the 50 identified metabolites, it is found that the concentrations of 19 metabolites are statistically and significantly changed with the levels of O-phosphocholine and hypoxanthine down-regulated while the levels of isoleucine, leucine, valine, isobutyrate, threonine, cadaverine, alanine, glutamate, glutamine, methionine, citrate, asparagine, tryptophan, glycine, serine, uracil, and formate up-regulated in the melanoma group. These significantly changed metabolites are associated with multiple biological pathways and may be potential biomarkers for metastatic melanoma in stomach.« less

  5. SU-E-J-85: Leave-One-Out Perturbation (LOOP) Fitting Algorithm for Absolute Dose Film Calibration

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chu, A; Ahmad, M; Chen, Z

    2014-06-01

    Purpose: To introduce an outliers-recognition fitting routine for film dosimetry. It cannot only be flexible with any linear and non-linear regression but also can provide information for the minimal number of sampling points, critical sampling distributions and evaluating analytical functions for absolute film-dose calibration. Methods: The technique, leave-one-out (LOO) cross validation, is often used for statistical analyses on model performance. We used LOO analyses with perturbed bootstrap fitting called leave-one-out perturbation (LOOP) for film-dose calibration . Given a threshold, the LOO process detects unfit points (“outliers”) compared to other cohorts, and a bootstrap fitting process follows to seek any possibilitiesmore » of using perturbations for further improvement. After that outliers were reconfirmed by a traditional t-test statistics and eliminated, then another LOOP feedback resulted in the final. An over-sampled film-dose- calibration dataset was collected as a reference (dose range: 0-800cGy), and various simulated conditions for outliers and sampling distributions were derived from the reference. Comparisons over the various conditions were made, and the performance of fitting functions, polynomial and rational functions, were evaluated. Results: (1) LOOP can prove its sensitive outlier-recognition by its statistical correlation to an exceptional better goodness-of-fit as outliers being left-out. (2) With sufficient statistical information, the LOOP can correct outliers under some low-sampling conditions that other “robust fits”, e.g. Least Absolute Residuals, cannot. (3) Complete cross-validated analyses of LOOP indicate that the function of rational type demonstrates a much superior performance compared to the polynomial. Even with 5 data points including one outlier, using LOOP with rational function can restore more than a 95% value back to its reference values, while the polynomial fitting completely failed under the same conditions. Conclusion: LOOP can cooperate with any fitting routine functioning as a “robust fit”. In addition, it can be set as a benchmark for film-dose calibration fitting performance.« less

  6. Cross-visit tumor sub-segmentation and registration with outlier rejection for dynamic contrast-enhanced MRI time series data.

    PubMed

    Buonaccorsi, G A; Rose, C J; O'Connor, J P B; Roberts, C; Watson, Y; Jackson, A; Jayson, G C; Parker, G J M

    2010-01-01

    Clinical trials of anti-angiogenic and vascular-disrupting agents often use biomarkers derived from DCE-MRI, typically reporting whole-tumor summary statistics and so overlooking spatial parameter variations caused by tissue heterogeneity. We present a data-driven segmentation method comprising tracer-kinetic model-driven registration for motion correction, conversion from MR signal intensity to contrast agent concentration for cross-visit normalization, iterative principal components analysis for imputation of missing data and dimensionality reduction, and statistical outlier detection using the minimum covariance determinant to obtain a robust Mahalanobis distance. After applying these techniques we cluster in the principal components space using k-means. We present results from a clinical trial of a VEGF inhibitor, using time-series data selected because of problems due to motion and outlier time series. We obtained spatially-contiguous clusters that map to regions with distinct microvascular characteristics. This methodology has the potential to uncover localized effects in trials using DCE-MRI-based biomarkers.

  7. A computational study on outliers in world music.

    PubMed

    Panteli, Maria; Benetos, Emmanouil; Dixon, Simon

    2017-01-01

    The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as 'outliers'. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the 'uniqueness' of the music of each country.

  8. Analyzing contentious relationships and outlier genes in phylogenomics.

    PubMed

    Walker, Joseph F; Brown, Joseph W; Smith, Stephen A

    2018-06-08

    Recent studies have demonstrated that conflict is common among gene trees in phylogenomic studies, and that less than one percent of genes may ultimately drive species tree inference in supermatrix analyses. Here, we examined two datasets where supermatrix and coalescent-based species trees conflict. We identified two highly influential "outlier" genes in each dataset. When removed from each dataset, the inferred supermatrix trees matched the topologies obtained from coalescent analyses. We also demonstrate that, while the outlier genes in the vertebrate dataset have been shown in a previous study to be the result of errors in orthology detection, the outlier genes from a plant dataset did not exhibit any obvious systematic error and therefore may be the result of some biological process yet to be determined. While topological comparisons among a small set of alternate topologies can be helpful in discovering outlier genes, they can be limited in several ways, such as assuming all genes share the same topology. Coalescent species tree methods relax this assumption but do not explicitly facilitate the examination of specific edges. Coalescent methods often also assume that conflict is the result of incomplete lineage sorting (ILS). Here we explored a framework that allows for quickly examining alternative edges and support for large phylogenomic datasets that does not assume a single topology for all genes. For both datasets, these analyses provided detailed results confirming the support for coalescent-based topologies. This framework suggests that we can improve our understanding of the underlying signal in phylogenomic datasets by asking more targeted edge-based questions.

  9. [Application of Stata software to test heterogeneity in meta-analysis method].

    PubMed

    Wang, Dan; Mou, Zhen-yun; Zhai, Jun-xia; Zong, Hong-xia; Zhao, Xiao-dong

    2008-07-01

    To introduce the application of Stata software to heterogeneity test in meta-analysis. A data set was set up according to the example in the study, and the corresponding commands of the methods in Stata 9 software were applied to test the example. The methods used were Q-test and I2 statistic attached to the fixed effect model forest plot, H statistic and Galbraith plot. The existence of the heterogeneity among studies could be detected by Q-test and H statistic and the degree of the heterogeneity could be detected by I2 statistic. The outliers which were the sources of the heterogeneity could be spotted from the Galbraith plot. Heterogeneity test in meta-analysis can be completed by the four methods in Stata software simply and quickly. H and I2 statistics are more robust, and the outliers of the heterogeneity can be clearly seen in the Galbraith plot among the four methods.

  10. Reduction of ZTD outliers through improved GNSS data processing and screening strategies

    NASA Astrophysics Data System (ADS)

    Stepniak, Katarzyna; Bock, Olivier; Wielgosz, Pawel

    2018-03-01

    Though Global Navigation Satellite System (GNSS) data processing has been significantly improved over the years, it is still commonly observed that zenith tropospheric delay (ZTD) estimates contain many outliers which are detrimental to meteorological and climatological applications. In this paper, we show that ZTD outliers in double-difference processing are mostly caused by sub-daily data gaps at reference stations, which cause disconnections of clusters of stations from the reference network and common mode biases due to the strong correlation between stations in short baselines. They can reach a few centimetres in ZTD and usually coincide with a jump in formal errors. The magnitude and sign of these biases are impossible to predict because they depend on different errors in the observations and on the geometry of the baselines. We elaborate and test a new baseline strategy which solves this problem and significantly reduces the number of outliers compared to the standard strategy commonly used for positioning (e.g. determination of national reference frame) in which the pre-defined network is composed of a skeleton of reference stations to which secondary stations are connected in a star-like structure. The new strategy is also shown to perform better than the widely used strategy maximizing the number of observations available in many GNSS programs. The reason is that observations are maximized before processing, whereas the final number of used observations can be dramatically lower because of data rejection (screening) during the processing. The study relies on the analysis of 1 year of GPS (Global Positioning System) data from a regional network of 136 GNSS stations processed using Bernese GNSS Software v.5.2. A post-processing screening procedure is also proposed to detect and remove a few outliers which may still remain due to short data gaps. It is based on a combination of range checks and outlier checks of ZTD and formal errors. The accuracy of the final screened GPS ZTD estimates is assessed by comparison to ERA-Interim reanalysis.

  11. Prospective clinical validation of independent DVH prediction for plan QA in automatic treatment planning for prostate cancer patients.

    PubMed

    Wang, Yibing; Heijmen, Ben J M; Petit, Steven F

    2017-12-01

    To prospectively investigate the use of an independent DVH prediction tool to detect outliers in the quality of fully automatically generated treatment plans for prostate cancer patients. A plan QA tool was developed to predict rectum, anus and bladder DVHs, based on overlap volume histograms and principal component analysis (PCA). The tool was trained with 22 automatically generated, clinical plans, and independently validated with 21 plans. Its use was prospectively investigated for 50 new plans by replanning in case of detected outliers. For rectum D mean , V 65Gy , V 75Gy , anus D mean , and bladder D mean , the difference between predicted and achieved was within 0.4 Gy or 0.3% (SD within 1.8 Gy or 1.3%). Thirteen detected outliers were re-planned, leading to moderate but statistically significant improvements (mean, max): rectum D mean (1.3 Gy, 3.4 Gy), V 65Gy (2.7%, 4.2%), anus D mean (1.6 Gy, 6.9 Gy), and bladder D mean (1.5 Gy, 5.1 Gy). The rectum V 75Gy of the new plans slightly increased (0.2%, p = 0.087). A high accuracy DVH prediction tool was developed and used for independent QA of automatically generated plans. In 28% of plans, minor dosimetric deviations were observed that could be improved by plan adjustments. Larger gains are expected for manually generated plans. Copyright © 2017 Elsevier B.V. All rights reserved.

  12. Prospective casemix-based funding, analysis and financial impact of cost outliers in all-patient refined diagnosis related groups in three Belgian general hospitals.

    PubMed

    Pirson, Magali; Martins, Dimitri; Jackson, Terri; Dramaix, Michèle; Leclercq, Pol

    2006-03-01

    This study examined the impact of cost outliers in term of hospital resources consumption, the financial impact of the outliers under the Belgium casemix-based system, and the validity of two "proxies" for costs: length of stay and charges. The cost of all hospital stays at three Belgian general hospitals were calculated for the year 2001. High resource use outliers were selected according to the following rule: 75th percentile +1.5 xinter-quartile range. The frequency of cost outliers varied from 7% to 8% across hospitals. Explanatory factors were: major or extreme severity of illness, longer length of stay, and intensive care unit stay. Cost outliers account for 22-30% of hospital costs. One-third of length-of-stay outliers are not cost outliers, and nearly one-quarter of charges outliers are not cost outliers. The current funding system in Belgium does not penalize hospitals having a high percentage of outliers. The billing generated by these patients largely compensates for costs generated. Length of stay and charges are not a good approximation to select cost outliers.

  13. Bayesian Local Contamination Models for Multivariate Outliers

    PubMed Central

    Page, Garritt L.; Dunson, David B.

    2013-01-01

    In studies where data are generated from multiple locations or sources it is common for there to exist observations that are quite unlike the majority. Motivated by the application of establishing a reference value in an inter-laboratory setting when outlying labs are present, we propose a local contamination model that is able to accommodate unusual multivariate realizations in a flexible way. The proposed method models the process level of a hierarchical model using a mixture with a parametric component and a possibly nonparametric contamination. Much of the flexibility in the methodology is achieved by allowing varying random subsets of the elements in the lab-specific mean vectors to be allocated to the contamination component. Computational methods are developed and the methodology is compared to three other possible approaches using a simulation study. We apply the proposed method to a NIST/NOAA sponsored inter-laboratory study which motivated the methodological development. PMID:24363465

  14. Anthropometry as a predictor of vertical jump heights derived from an instrumented platform.

    PubMed

    Caruso, John F; Daily, Jeremy S; Mason, Melissa L; Shepherd, Catherine M; McLagan, Jessica R; Marshall, Mallory R; Walker, Ron H; West, Jason O

    2012-01-01

    The current study purpose examined the vertical height-anthropometry relationship with jump data obtained from an instrumented platform. Our methods required college-aged (n = 177) subjects to make 3 visits to our laboratory to measure the following anthropometric variables: height, body mass, upper arm length (UAL), lower arm length, upper leg length, and lower leg length. Per jump, maximum height was measured in 3 ways: from the subjects' takeoff, hang times, and as they landed on the platform. Standard multivariate regression assessed how well anthropometry predicted the criterion variance per gender (men, women, pooled) and jump height method (takeoff, hang time, landing) combination. Z-scores indicated that small amounts of the total data were outliers. The results showed that the majority of outliers were from jump heights calculated as women landed on the platform. With the genders pooled, anthropometry predicted a significant (p < 0.05) amount of variance from jump heights calculated from both takeoff and hang time. The anthropometry-vertical jump relationship was not significant from heights calculated as subjects landed on the platform, likely due to the female outliers. Yet anthropometric data of men did predict a significant amount of variance from heights calculated when they landed on the platform; univariate correlations of men's data revealed that UAL was the best predictor. It was concluded that the large sample of men's data led to greater data heterogeneity and a higher univariate correlation. Because of our sample size and data heterogeneity, practical applications suggest that coaches may find our results best predict performance for a variety of college-aged athletes and vertical jump enthusiasts.

  15. Improving the estimation of zenith dry tropospheric delays using regional surface meteorological data

    NASA Astrophysics Data System (ADS)

    Luo, X.; Heck, B.; Awange, J. L.

    2013-12-01

    Global Navigation Satellite Systems (GNSS) are emerging as possible tools for remote sensing high-resolution atmospheric water vapour that improves weather forecasting through numerical weather prediction models. Nowadays, the GNSS-derived tropospheric zenith total delay (ZTD), comprising zenith dry delay (ZDD) and zenith wet delay (ZWD), is achievable with sub-centimetre accuracy. However, if no representative near-site meteorological information is available, the quality of the ZDD derived from tropospheric models is degraded, leading to inaccurate estimation of the water vapour component ZWD as difference between ZTD and ZDD. On the basis of freely accessible regional surface meteorological data, this paper proposes a height-dependent linear correction model for a priori ZDD. By applying the ordinary least-squares estimation (OLSE), bootstrapping (BOOT), and leave-one-out cross-validation (CROS) methods, the model parameters are estimated and analysed with respect to outlier detection. The model validation is carried out using GNSS stations with near-site meteorological measurements. The results verify the efficiency of the proposed ZDD correction model, showing a significant reduction in the mean bias from several centimetres to about 5 mm. The OLSE method enables a fast computation, while the CROS procedure allows for outlier detection. All the three methods produce consistent results after outlier elimination, which improves the regression quality by about 20% and the model accuracy by up to 30%.

  16. A New Secondary Structure Assignment Algorithm Using Cα Backbone Fragments

    PubMed Central

    Cao, Chen; Wang, Guishen; Liu, An; Xu, Shutan; Wang, Lincong; Zou, Shuxue

    2016-01-01

    The assignment of secondary structure elements in proteins is a key step in the analysis of their structures and functions. We have developed an algorithm, SACF (secondary structure assignment based on Cα fragments), for secondary structure element (SSE) assignment based on the alignment of Cα backbone fragments with central poses derived by clustering known SSE fragments. The assignment algorithm consists of three steps: First, the outlier fragments on known SSEs are detected. Next, the remaining fragments are clustered to obtain the central fragments for each cluster. Finally, the central fragments are used as a template to make assignments. Following a large-scale comparison of 11 secondary structure assignment methods, SACF, KAKSI and PROSS are found to have similar agreement with DSSP, while PCASSO agrees with DSSP best. SACF and PCASSO show preference to reducing residues in N and C cap regions, whereas KAKSI, P-SEA and SEGNO tend to add residues to the terminals when DSSP assignment is taken as standard. Moreover, our algorithm is able to assign subtle helices (310-helix, π-helix and left-handed helix) and make uniform assignments, as well as to detect rare SSEs in β-sheets or long helices as outlier fragments from other programs. The structural uniformity should be useful for protein structure classification and prediction, while outlier fragments underlie the structure–function relationship. PMID:26978354

  17. Evaluation schemes for video and image anomaly detection algorithms

    NASA Astrophysics Data System (ADS)

    Parameswaran, Shibin; Harguess, Josh; Barngrover, Christopher; Shafer, Scott; Reese, Michael

    2016-05-01

    Video anomaly detection is a critical research area in computer vision. It is a natural first step before applying object recognition algorithms. There are many algorithms that detect anomalies (outliers) in videos and images that have been introduced in recent years. However, these algorithms behave and perform differently based on differences in domains and tasks to which they are subjected. In order to better understand the strengths and weaknesses of outlier algorithms and their applicability in a particular domain/task of interest, it is important to measure and quantify their performance using appropriate evaluation metrics. There are many evaluation metrics that have been used in the literature such as precision curves, precision-recall curves, and receiver operating characteristic (ROC) curves. In order to construct these different metrics, it is also important to choose an appropriate evaluation scheme that decides when a proposed detection is considered a true or a false detection. Choosing the right evaluation metric and the right scheme is very critical since the choice can introduce positive or negative bias in the measuring criterion and may favor (or work against) a particular algorithm or task. In this paper, we review evaluation metrics and popular evaluation schemes that are used to measure the performance of anomaly detection algorithms on videos and imagery with one or more anomalies. We analyze the biases introduced by these by measuring the performance of an existing anomaly detection algorithm.

  18. Identification of Outliers in Grace Data for Indo-Gangetic Plain Using Various Methods (Z-Score, Modified Z-score and Adjusted Boxplot) and Its Removal

    NASA Astrophysics Data System (ADS)

    Srivastava, S.

    2015-12-01

    Gravity Recovery and Climate Experiment (GRACE) data are widely used for the hydrological studies for large scale basins (≥100,000 sq km). GRACE data (Stokes Coefficients or Equivalent Water Height) used for hydrological studies are not direct observations but result from high level processing of raw data from the GRACE mission. Different partner agencies like CSR, GFZ and JPL implement their own methodology and their processing methods are independent from each other. The primary source of errors in GRACE data are due to measurement and modeling errors and the processing strategy of these agencies. Because of different processing methods, the final data from all the partner agencies are inconsistent with each other at some epoch. GRACE data provide spatio-temporal variations in Earth's gravity which is mainly attributed to the seasonal fluctuations in water level on Earth surfaces and subsurface. During the quantification of error/uncertainties, several high positive and negative peaks were observed which do not correspond to any hydrological processes but may emanate from a combination of primary error sources, or some other geophysical processes (e.g. Earthquakes, landslide, etc.) resulting in redistribution of earth's mass. Such peaks can be considered as outliers for hydrological studies. In this work, an algorithm has been designed to extract outliers from the GRACE data for Indo-Gangetic plain, which considers the seasonal variations and the trend in data. Different outlier detection methods have been used such as Z-score, modified Z-score and adjusted boxplot. For verification, assimilated hydrological (GLDAS) and hydro-meteorological data are used as the reference. The results have shown that the consistency amongst all data sets improved significantly after the removal of outliers.

  19. Asynchronous P300 classification in a reactive brain-computer interface during an outlier detection task

    NASA Astrophysics Data System (ADS)

    Krumpe, Tanja; Walter, Carina; Rosenstiel, Wolfgang; Spüler, Martin

    2016-08-01

    Objective. In this study, the feasibility of detecting a P300 via an asynchronous classification mode in a reactive EEG-based brain-computer interface (BCI) was evaluated. The P300 is one of the most popular BCI control signals and therefore used in many applications, mostly for active communication purposes (e.g. P300 speller). As the majority of all systems work with a stimulus-locked mode of classification (synchronous), the field of applications is limited. A new approach needs to be applied in a setting in which a stimulus-locked classification cannot be used due to the fact that the presented stimuli cannot be controlled or predicted by the system. Approach. A continuous observation task requiring the detection of outliers was implemented to test such an approach. The study was divided into an offline and an online part. Main results. Both parts of the study revealed that an asynchronous detection of the P300 can successfully be used to detect single events with high specificity. It also revealed that no significant difference in performance was found between the synchronous and the asynchronous approach. Significance. The results encourage the use of an asynchronous classification approach in suitable applications without a potential loss in performance.

  20. Atomic absorption spectrophotometric determination of tin in canned foods, using nitric acid-hydrochloric acid digestion and nitrous oxide-acetylene flame: collaborative study.

    PubMed

    Dabeka, R W; McKenzie, A D; Albert, R H

    1985-01-01

    Twenty-six collaborators participated in a study to evaluate an atomic absorption spectrophotometric (AAS) method for the determination of tin in canned foods. The 5 foods evaluated were meat, pineapple juice, tomato paste, evaporated milk, and green beans, each spiked at 2 levels. The concentration range of tin in the samples was 10-450 micrograms/g, and each level was sent as a blind duplicate. Statistical treatment of results revealed no laboratory outliers and 6 individual or replicate-total outliers, accounting for 3.3% of the data. Repeatability (within-laboratory) coefficient of variation (CVo) ranged from 2.2 to 48%, depending on the tin level and food evaluated. For samples containing greater than or equal to 80 micrograms/g of tin, repeatability CV averaged 5.6% including outliers and 3.7% after their rejection. Overall among-laboratories coefficient of variation (CVx) varied from 3.3 to 58%; at levels greater than or equal to 80 micrograms/g, it averaged 7.3% with outliers and 5.3% after their rejection. Recovery of tin, based on spiking levels, ranged from 100.0 to 112.8% and averaged 105.4%. Detection limit range is 2-10 micrograms/g, and lower quantitation limit is 40 micrograms/g. This method has been adopted official first action.

  1. Signatures of positive selection and local adaptation to urbanization in white-footed mice (Peromyscus leucopus).

    PubMed

    Harris, Stephen E; Munshi-South, Jason

    2017-11-01

    Urbanization significantly alters natural ecosystems and has accelerated globally. Urban wildlife populations are often highly fragmented by human infrastructure, and isolated populations may adapt in response to local urban pressures. However, relatively few studies have identified genomic signatures of adaptation in urban animals. We used a landscape genomic approach to examine signatures of selection in urban populations of white-footed mice (Peromyscus leucopus) in New York City. We analysed 154,770 SNPs identified from transcriptome data from 48 P. leucopus individuals from three urban and three rural populations and used outlier tests to identify evidence of urban adaptation. We accounted for demography by simulating a neutral SNP data set under an inferred demographic history as a null model for outlier analysis. We also tested whether candidate genes were associated with environmental variables related to urbanization. In total, we detected 381 outlier loci and after stringent filtering, identified and annotated 19 candidate loci. Many of the candidate genes were involved in metabolic processes and have well-established roles in metabolizing lipids and carbohydrates. Our results indicate that white-footed mice in New York City are adapting at the biomolecular level to local selective pressures in urban habitats. Annotation of outlier loci suggests selection is acting on metabolic pathways in urban populations, likely related to novel diets in cities that differ from diets in less disturbed areas. © 2017 John Wiley & Sons Ltd.

  2. Accounting for regional background and population size in the detection of spatial clusters and outliers using geostatistical filtering and spatial neutral models: the case of lung cancer in Long Island, New York

    PubMed Central

    Goovaerts, Pierre; Jacquez, Geoffrey M

    2004-01-01

    Background Complete Spatial Randomness (CSR) is the null hypothesis employed by many statistical tests for spatial pattern, such as local cluster or boundary analysis. CSR is however not a relevant null hypothesis for highly complex and organized systems such as those encountered in the environmental and health sciences in which underlying spatial pattern is present. This paper presents a geostatistical approach to filter the noise caused by spatially varying population size and to generate spatially correlated neutral models that account for regional background obtained by geostatistical smoothing of observed mortality rates. These neutral models were used in conjunction with the local Moran statistics to identify spatial clusters and outliers in the geographical distribution of male and female lung cancer in Nassau, Queens, and Suffolk counties, New York, USA. Results We developed a typology of neutral models that progressively relaxes the assumptions of null hypotheses, allowing for the presence of spatial autocorrelation, non-uniform risk, and incorporation of spatially heterogeneous population sizes. Incorporation of spatial autocorrelation led to fewer significant ZIP codes than found in previous studies, confirming earlier claims that CSR can lead to over-identification of the number of significant spatial clusters or outliers. Accounting for population size through geostatistical filtering increased the size of clusters while removing most of the spatial outliers. Integration of regional background into the neutral models yielded substantially different spatial clusters and outliers, leading to the identification of ZIP codes where SMR values significantly depart from their regional background. Conclusion The approach presented in this paper enables researchers to assess geographic relationships using appropriate null hypotheses that account for the background variation extant in real-world systems. In particular, this new methodology allows one to identify geographic pattern above and beyond background variation. The implementation of this approach in spatial statistical software will facilitate the detection of spatial disparities in mortality rates, establishing the rationale for targeted cancer control interventions, including consideration of health services needs, and resource allocation for screening and diagnostic testing. It will allow researchers to systematically evaluate how sensitive their results are to assumptions implicit under alternative null hypotheses. PMID:15272930

  3. Privacy Preserving Nearest Neighbor Search

    NASA Astrophysics Data System (ADS)

    Shaneck, Mark; Kim, Yongdae; Kumar, Vipin

    Data mining is frequently obstructed by privacy concerns. In many cases data is distributed, and bringing the data together in one place for analysis is not possible due to privacy laws (e.g. HIPAA) or policies. Privacy preserving data mining techniques have been developed to address this issue by providing mechanisms to mine the data while giving certain privacy guarantees. In this chapter we address the issue of privacy preserving nearest neighbor search, which forms the kernel of many data mining applications. To this end, we present a novel algorithm based on secure multiparty computation primitives to compute the nearest neighbors of records in horizontally distributed data. We show how this algorithm can be used in three important data mining algorithms, namely LOF outlier detection, SNN clustering, and kNN classification. We prove the security of these algorithms under the semi-honest adversarial model, and describe methods that can be used to optimize their performance. Keywords: Privacy Preserving Data Mining, Nearest Neighbor Search, Outlier Detection, Clustering, Classification, Secure Multiparty Computation

  4. Dysmorphometrics: the modelling of morphological abnormalities.

    PubMed

    Claes, Peter; Daniels, Katleen; Walters, Mark; Clement, John; Vandermeulen, Dirk; Suetens, Paul

    2012-02-06

    The study of typical morphological variations using quantitative, morphometric descriptors has always interested biologists in general. However, unusual examples of form, such as abnormalities are often encountered in biomedical sciences. Despite the long history of morphometrics, the means to identify and quantify such unusual form differences remains limited. A theoretical concept, called dysmorphometrics, is introduced augmenting current geometric morphometrics with a focus on identifying and modelling form abnormalities. Dysmorphometrics applies the paradigm of detecting form differences as outliers compared to an appropriate norm. To achieve this, the likelihood formulation of landmark superimpositions is extended with outlier processes explicitly introducing a latent variable coding for abnormalities. A tractable solution to this augmented superimposition problem is obtained using Expectation-Maximization. The topography of detected abnormalities is encoded in a dysmorphogram. We demonstrate the use of dysmorphometrics to measure abrupt changes in time, asymmetry and discordancy in a set of human faces presenting with facial abnormalities. The results clearly illustrate the unique power to reveal unusual form differences given only normative data with clear applications in both biomedical practice & research.

  5. Adaptive population divergence and directional gene flow across steep elevational gradients in a climate‐sensitive mammal

    USGS Publications Warehouse

    Waterhouse, Matthew D.; Erb, Liesl P.; Beever, Erik; Russello, Michael A.

    2018-01-01

    The American pika is a thermally sensitive, alpine lagomorph species. Recent climate-associated population extirpations and genetic signatures of reduced population sizes range-wide indicate the viability of this species is sensitive to climate change. To test for potential adaptive responses to climate stress, we sampled pikas along two elevational gradients (each ~470 to 1640 m) and employed three outlier detection methods, BAYESCAN, LFMM, and BAYPASS, to scan for genotype-environment associations in samples genotyped at 30,763 SNP loci. We resolved 173 loci with robust evidence of natural selection detected by either two independent analyses or replicated in both transects. A BLASTN search of these outlier loci revealed several genes associated with metabolic function and oxygen transport, indicating natural selection from thermal stress and hypoxia. We also found evidence of directional gene flow primarily downslope from large high-elevation populations and reduced gene flow at outlier loci, a pattern suggesting potential impediments to the upward elevational movement of adaptive alleles in response to contemporary climate change. Finally, we documented evidence of reduced genetic diversity associated the south-facing transect and an increase in corticosterone stress levels associated with inbreeding. This study suggests the American pika is already undergoing climate-associated natural selection at multiple genomic regions. Further analysis is needed to determine if the rate of climate adaptation in the American pika and other thermally sensitive species will be able to keep pace with rapidly changing climate conditions.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nun, Isadora; Pichara, Karim; Protopapas, Pavlos

    The development of synoptic sky surveys has led to a massive amount of data for which resources needed for analysis are beyond human capabilities. In order to process this information and to extract all possible knowledge, machine learning techniques become necessary. Here we present a new methodology to automatically discover unknown variable objects in large astronomical catalogs. With the aim of taking full advantage of all information we have about known objects, our method is based on a supervised algorithm. In particular, we train a random forest classifier using known variability classes of objects and obtain votes for each ofmore » the objects in the training set. We then model this voting distribution with a Bayesian network and obtain the joint voting distribution among the training objects. Consequently, an unknown object is considered as an outlier insofar it has a low joint probability. By leaving out one of the classes on the training set, we perform a validity test and show that when the random forest classifier attempts to classify unknown light curves (the class left out), it votes with an unusual distribution among the classes. This rare voting is detected by the Bayesian network and expressed as a low joint probability. Our method is suitable for exploring massive data sets given that the training process is performed offline. We tested our algorithm on 20 million light curves from the MACHO catalog and generated a list of anomalous candidates. After analysis, we divided the candidates into two main classes of outliers: artifacts and intrinsic outliers. Artifacts were principally due to air mass variation, seasonal variation, bad calibration, or instrumental errors and were consequently removed from our outlier list and added to the training set. After retraining, we selected about 4000 objects, which we passed to a post-analysis stage by performing a cross-match with all publicly available catalogs. Within these candidates we identified certain known but rare objects such as eclipsing Cepheids, blue variables, cataclysmic variables, and X-ray sources. For some outliers there was no additional information. Among them we identified three unknown variability types and a few individual outliers that will be followed up in order to perform a deeper analysis.« less

  7. 42 CFR 413.237 - Outliers.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ...-only drugs effective January 1, 2014. (2) Adult predicted ESRD outlier services Medicare allowable... furnished to an adult beneficiary by an ESRD facility. (3) Pediatric predicted ESRD outlier services... outlier services furnished to a pediatric beneficiary by an ESRD facility. (4) Adult fixed dollar loss...

  8. Identification and influence of spatio-temporal outliers in urban air quality measurements.

    PubMed

    O'Leary, Brendan; Reiners, John J; Xu, Xiaohong; Lemke, Lawrence D

    2016-12-15

    Forty eight potential outliers in air pollution measurements taken simultaneously in Detroit, Michigan, USA and Windsor, Ontario, Canada in 2008 and 2009 were identified using four independent methods: box plots, variogram clouds, difference maps, and the Local Moran's I statistic. These methods were subsequently used in combination to reduce and select a final set of 13 outliers for nitrogen dioxide (NO 2 ), volatile organic compounds (VOCs), total benzene, toluene, ethyl benzene, and xylene (BTEX), and particulate matter in two size fractions (PM 2.5 and PM 10 ). The selected outliers were excluded from the measurement datasets and used to revise air pollution models. In addition, a set of temporally-scaled air pollution models was generated using time series measurements from community air quality monitors, with and without the selected outliers. The influence of outlier exclusion on associations with asthma exacerbation rates aggregated at a postal zone scale in both cities was evaluated. Results demonstrate that the inclusion or exclusion of outliers influences the strength of observed associations between intraurban air quality and asthma exacerbation in both cities. The box plot, variogram cloud, and difference map methods largely determined the final list of outliers, due to the high degree of conformity among their results. The Moran's I approach was not useful for outlier identification in the datasets studied. Removing outliers changed the spatial distribution of modeled concentration values and derivative exposure estimates averaged over postal zones. Overall, associations between air pollution and acute asthma exacerbation rates were weaker with outliers removed, but improved with the addition of temporal information. Decreases in statistically significant associations between air pollution and asthma resulted, in part, from smaller pollutant concentration ranges used for linear regression. Nevertheless, the practice of identifying outliers through congruence among multiple methods strengthens confidence in the analysis of outlier presence and influence in environmental datasets. Copyright © 2016 The Authors. Published by Elsevier B.V. All rights reserved.

  9. Treatment of Outliers via Interpolation Method with Neural Network Forecast Performances

    NASA Astrophysics Data System (ADS)

    Wahir, N. A.; Nor, M. E.; Rusiman, M. S.; Gopal, K.

    2018-04-01

    Outliers often lurk in many datasets, especially in real data. Such anomalous data can negatively affect statistical analyses, primarily normality, variance, and estimation aspects. Hence, handling the occurrences of outliers require special attention. Therefore, it is important to determine the suitable ways in treating outliers so as to ensure that the quality of the analyzed data is indeed high. As such, this paper discusses an alternative method to treat outliers via linear interpolation method. In fact, assuming outlier as a missing value in the dataset allows the application of the interpolation method to interpolate the outliers thus, enabling the comparison of data series using forecast accuracy before and after outlier treatment. With that, the monthly time series of Malaysian tourist arrivals from January 1998 until December 2015 had been used to interpolate the new series. The results indicated that the linear interpolation method, which was comprised of improved time series data, displayed better results, when compared to the original time series data in forecasting from both Box-Jenkins and neural network approaches.

  10. Applying knowledge engineering and representation methods to improve support vector machine and multivariate probabilistic neural network CAD performance

    NASA Astrophysics Data System (ADS)

    Land, Walker H., Jr.; Anderson, Frances; Smith, Tom; Fahlbusch, Stephen; Choma, Robert; Wong, Lut

    2005-04-01

    Achieving consistent and correct database cases is crucial to the correct evaluation of any computer-assisted diagnostic (CAD) paradigm. This paper describes the application of artificial intelligence (AI), knowledge engineering (KE) and knowledge representation (KR) to a data set of ~2500 cases from six separate hospitals, with the objective of removing/reducing inconsistent outlier data. Several support vector machine (SVM) kernels were used to measure diagnostic performance of the original and a "cleaned" data set. Specifically, KE and ER principles were applied to the two data sets which were re-examined with respect to the environment and agents. One data set was found to contain 25 non-characterizable sets. The other data set contained 180 non-characterizable sets. CAD system performance was measured with both the original and "cleaned" data sets using two SVM kernels as well as a multivariate probabilistic neural network (PNN). Results demonstrated: (i) a 10% average improvement in overall Az and (ii) approximately a 50% average improvement in partial Az.

  11. Chemical quality of bottom sediments in selected streams, Jefferson County, Kentucky, April-July 1992

    USGS Publications Warehouse

    Moore, B.L.; Evaldi, R.D.

    1995-01-01

    Bottom sediments from 25 stream sites in Jefferson County, Ky., were analyzed for percent volatile solids and concentrations of nutrients, major metals, trace elements, miscellaneous inorganic compounds, and selected organic compounds. Statistical high outliers of the constituent concentrations analyzed for in the bottom sediments were defined as a measure of possible elevated concentrations. Statistical high outliers were determined for at least 1 constituent at each of 12 sampling sites in Jefferson County. Of the 10 stream basins sampled in Jefferson County, the Middle Fork Beargrass Basin, Cedar Creek Basin, and Harrods Creek Basin were the only three basins where a statistical high outlier was not found for any of the measured constituents. In the Pennsylvania Run Basin, total volatile solids, nitrate plus nitrite, and endrin constituents were statistical high outliers. Pond Creek was the only basin where five constituents were statistical high outliers-barium, beryllium, cadmium, chromium, and silver. Nitrate plus nitrite and copper constituents were the only statistical high outliers found in the Mill Creek Basin. In the Floyds Fork Basin, nitrate plus nitrite, phosphorus, mercury, and silver constituents were the only statistical high outliers. Ammonia was the only statistical high outlier found in the South Fork Beargrass Basin. In the Goose Creek Basin, mercury and silver constituents were the only statistical high outliers. Cyanide was the only statistical high outlier in the Muddy Fork Basin.

  12. Influence of outliers on accuracy estimation in genomic prediction in plant breeding.

    PubMed

    Estaghvirou, Sidi Boubacar Ould; Ogutu, Joseph O; Piepho, Hans-Peter

    2014-10-01

    Outliers often pose problems in analyses of data in plant breeding, but their influence on the performance of methods for estimating predictive accuracy in genomic prediction studies has not yet been evaluated. Here, we evaluate the influence of outliers on the performance of methods for accuracy estimation in genomic prediction studies using simulation. We simulated 1000 datasets for each of 10 scenarios to evaluate the influence of outliers on the performance of seven methods for estimating accuracy. These scenarios are defined by the number of genotypes, marker effect variance, and magnitude of outliers. To mimic outliers, we added to one observation in each simulated dataset, in turn, 5-, 8-, and 10-times the error SD used to simulate small and large phenotypic datasets. The effect of outliers on accuracy estimation was evaluated by comparing deviations in the estimated and true accuracies for datasets with and without outliers. Outliers adversely influenced accuracy estimation, more so at small values of genetic variance or number of genotypes. A method for estimating heritability and predictive accuracy in plant breeding and another used to estimate accuracy in animal breeding were the most accurate and resistant to outliers across all scenarios and are therefore preferable for accuracy estimation in genomic prediction studies. The performances of the other five methods that use cross-validation were less consistent and varied widely across scenarios. The computing time for the methods increased as the size of outliers and sample size increased and the genetic variance decreased. Copyright © 2014 Ould Estaghvirou et al.

  13. Hybrid online sensor error detection and functional redundancy for systems with time-varying parameters.

    PubMed

    Feng, Jianyuan; Turksoy, Kamuran; Samadi, Sediqeh; Hajizadeh, Iman; Littlejohn, Elizabeth; Cinar, Ali

    2017-12-01

    Supervision and control systems rely on signals from sensors to receive information to monitor the operation of a system and adjust manipulated variables to achieve the control objective. However, sensor performance is often limited by their working conditions and sensors may also be subjected to interference by other devices. Many different types of sensor errors such as outliers, missing values, drifts and corruption with noise may occur during process operation. A hybrid online sensor error detection and functional redundancy system is developed to detect errors in online signals, and replace erroneous or missing values detected with model-based estimates. The proposed hybrid system relies on two techniques, an outlier-robust Kalman filter (ORKF) and a locally-weighted partial least squares (LW-PLS) regression model, which leverage the advantages of automatic measurement error elimination with ORKF and data-driven prediction with LW-PLS. The system includes a nominal angle analysis (NAA) method to distinguish between signal faults and large changes in sensor values caused by real dynamic changes in process operation. The performance of the system is illustrated with clinical data continuous glucose monitoring (CGM) sensors from people with type 1 diabetes. More than 50,000 CGM sensor errors were added to original CGM signals from 25 clinical experiments, then the performance of error detection and functional redundancy algorithms were analyzed. The results indicate that the proposed system can successfully detect most of the erroneous signals and substitute them with reasonable estimated values computed by functional redundancy system.

  14. TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

    PubMed

    Mai, Uyen; Mirarab, Siavash

    2018-05-08

    Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

  15. On damage detection in wind turbine gearboxes using outlier analysis

    NASA Astrophysics Data System (ADS)

    Antoniadou, Ifigeneia; Manson, Graeme; Dervilis, Nikolaos; Staszewski, Wieslaw J.; Worden, Keith

    2012-04-01

    The proportion of worldwide installed wind power in power systems increases over the years as a result of the steadily growing interest in renewable energy sources. Still, the advantages offered by the use of wind power are overshadowed by the high operational and maintenance costs, resulting in the low competitiveness of wind power in the energy market. In order to reduce the costs of corrective maintenance, the application of condition monitoring to gearboxes becomes highly important, since gearboxes are among the wind turbine components with the most frequent failure observations. While condition monitoring of gearboxes in general is common practice, with various methods having been developed over the last few decades, wind turbine gearbox condition monitoring faces a major challenge: the detection of faults under the time-varying load conditions prevailing in wind turbine systems. Classical time and frequency domain methods fail to detect faults under variable load conditions, due to the temporary effect that these faults have on vibration signals. This paper uses the statistical discipline of outlier analysis for the damage detection of gearbox tooth faults. A simplified two-degree-of-freedom gearbox model considering nonlinear backlash, time-periodic mesh stiffness and static transmission error, simulates the vibration signals to be analysed. Local stiffness reduction is used for the simulation of tooth faults and statistical processes determine the existence of intermittencies. The lowest level of fault detection, the threshold value, is considered and the Mahalanobis squared-distance is calculated for the novelty detection problem.

  16. Patient-specific instrumentation improved mechanical alignment, while early clinical outcome was comparable to conventional instrumentation in TKA.

    PubMed

    Anderl, Werner; Pauzenberger, Leo; Kölblinger, Roman; Kiesselbach, Gabriele; Brandl, Georg; Laky, Brenda; Kriegleder, Bernhard; Heuberer, Philipp; Schwameis, Eva

    2016-01-01

    The aim of this prospective study was to compare early clinical outcome, radiological limb alignment, and three-dimensional (3D)-component positioning between conventional and computed tomography (CT)-based patient-specific instrumentation (PSI) in primary mobile-bearing total knee arthroplasty (TKA). Two hundred ninety consecutive patients (300 knees) with severe, debilitating osteoarthritis scheduled for TKA were included in this study using either conventional instrumentation (CVI, n = 150) or PSI (n = 150). Patients were clinically assessed before and 2 years after surgery according to the Knee-Society-Score (KSS) and the visual-analog-scale for pain (VAS). Additionally, the Western Ontario McMaster Universities Osteoarthritis Index (WOMAC) and the Oxford-Knee-Score (OKS) were collected at follow-up. To evaluate accuracy of CVI and PSI, hip-knee-ankle angle (HKA) and 3D-component positioning were assessed on postoperative radiographs and CT. Data of 222 knees (CVI: n = 108, PSI: n = 114) were available for analysis after a mean follow-up of 28.6 ± 5.2 months. At the early follow-up, clinical outcome (KSS, VAS, WOMAC, OKS) was comparable between the two groups. Mean HKA-deviation from the targeted neutral mechanical axis (CVI: 2.2° ± 1.7°; PSI: 1.5° ± 1.4°; p < 0.001), rates of outliers (CVI: 22.2%; PSI: 9.6%; p = 0.016), and 3D-component positioning outliers were significantly lower in the PSI group. Non-outliers (HKA: 180° ± 3°) showed better clinical results than outliers at the 2-year follow-up. CT-based PSI compared with CVI improves accuracy of mechanical alignment restoration and 3D-component positioning in primary TKA. While clinical outcome was comparable between the two instrumentation groups at early follow-up, significantly inferior outcome was detected in the subgroup of HKA-outliers. Prospective comparative study, Level II.

  17. Outlier identification in urban soils and its implications for identification of potential contaminated land

    NASA Astrophysics Data System (ADS)

    Zhang, Chaosheng

    2010-05-01

    Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.

  18. Outliers: A Potential Data Problem.

    ERIC Educational Resources Information Center

    Douzenis, Cordelia; Rakow, Ernest A.

    Outliers, extreme data values relative to others in a sample, may distort statistics that assume internal levels of measurement and normal distribution. The outlier may be a valid value or an error. Several procedures are available for identifying outliers, and each may be applied to errors of prediction from the regression lines for utility in a…

  19. Comparison of cardiac TnI outliers using a contemporary and a high-sensitivity assay on the Abbott Architect platform.

    PubMed

    Ryan, J B; Southby, S J; Stuart, L A; Mackay, R; Florkowski, C M; George, P M

    2014-07-01

    Assays for cardiac troponin (cTn) have undergone improvements in sensitivity and precision in recent years. Increased rates of outliers, however, have been reported on various cTn platforms, typically giving irreproducible, falsely higher results. We aimed to evaluate the outlier rate occurring in patients with elevated cTnI using a contemporary and high-sensitivity assay. All patients with elevated cTnI (up to 300 ng/L) performed over a 21-month period were assayed in duplicate. A contemporary assay (Abbott STAT Troponin-I) was used for the first part of the study and subsequently a high-sensitivity assay (Abbott STAT High-Sensitive Troponin-I) was used. Outliers exceeded a calculated critical difference (CD) (CD = z × √2 × SDAnalytical) where z = 3.5 (for probability of 0.0005) and critical outliers also were on a different side of the decision level. The respective outlier and critical outlier rates were 0.22% and 0.10% for the contemporary assay (n = 4009) and 0.18% and 0.13% for the high-sensitivity assay (n = 3878). There was no significant reduction in outlier rate between the two assays (χ(2) = 0.034, P = 0.854). Fifty-six percent of outliers occurred in samples where cTn was an 'add-on' test (and was stored and refrigerated prior to assay). Despite recent improvements in cTn methods, outliers (including critical outliers) still occur at a low rate in both a contemporary and high-sensitivity cTnI assay. Laboratory and clinical staff should be aware of this potential analytical error, particularly in samples with suboptimal sample handling such as add-on tests. © The Author(s) 2014 Reprints and permissions: sagepub.co.uk/journalsPermissions.nav.

  20. Assessing significance in a Markov chain without mixing.

    PubMed

    Chikina, Maria; Frieze, Alan; Pegden, Wesley

    2017-03-14

    We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a [Formula: see text] value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a [Formula: see text] outlier compared with the sampled ranks (its rank is in the bottom [Formula: see text] of sampled ranks), then this observation should correspond to a [Formula: see text] value of [Formula: see text] This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an [Formula: see text]-outlier on the walk is significant at [Formula: see text] under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at [Formula: see text] is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting.

  1. Adaptive population divergence and directional gene flow across steep elevational gradients in a climate-sensitive mammal.

    PubMed

    Waterhouse, Matthew D; Erb, Liesl P; Beever, Erik A; Russello, Michael A

    2018-06-01

    The ecological effects of climate change have been shown in most major taxonomic groups; however, the evolutionary consequences are less well-documented. Adaptation to new climatic conditions offers a potential long-term mechanism for species to maintain viability in rapidly changing environments, but mammalian examples remain scarce. The American pika (Ochotona princeps) has been impacted by recent climate-associated extirpations and range-wide reductions in population sizes, establishing it as a sentinel mammalian species for climate change. To investigate evidence for local adaptation and reconstruct patterns of genomic diversity and gene flow across rapidly changing environments, we used a space-for-time design and restriction site-associated DNA sequencing to genotype American pikas along two steep elevational gradients at 30,966 SNPs and employed independent outlier detection methods that scanned for genotype-environment associations. We identified 338 outlier SNPs detected by two separate analyses and/or replicated in both transects, several of which were annotated to genes involved in metabolic function and oxygen transport. Additionally, we found evidence of directional gene flow primarily downslope from high-elevation populations, along with reduced gene flow at outlier loci. If this trend continues, elevational range contractions in American pikas will likely be from local extirpation rather than upward movement of low-elevation individuals; this, in turn, could limit the potential for adaptation within this landscape. These findings are of particular relevance for future conservation and management of American pikas and other elevationally restricted, thermally sensitive species. © 2018 John Wiley & Sons Ltd.

  2. Assessing significance in a Markov chain without mixing

    PubMed Central

    Chikina, Maria; Frieze, Alan; Pegden, Wesley

    2017-01-01

    We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331

  3. 75 FR 42835 - Medicare Program; Inpatient Rehabilitation Facility Prospective Payment System for Federal Fiscal...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2010-07-22

    ... estimated cost of the case exceeds the adjusted outlier threshold. We calculate the adjusted outlier... to 80 percent of the difference between the estimated cost of the case and the outlier threshold. In... Federal Prospective Payment Rates VI. Update to Payments for High-Cost Outliers under the IRF PPS A...

  4. Outlier Detection in High-Stakes Certification Testing. Research Report.

    ERIC Educational Resources Information Center

    Meijer, Rob R.

    Recent developments of person-fit analysis in computerized adaptive testing (CAT) are discussed. Methods from statistical process control are presented that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory (IRT) model in a CAT. Most person-fit research in CAT is restricted to…

  5. Outlier Detection in High-Stakes Certification Testing.

    ERIC Educational Resources Information Center

    Meijer, Rob R.

    2002-01-01

    Used empirical data from a certification test to study methods from statistical process control that have been proposed to classify an item score pattern as fitting or misfitting the underlying item response theory model in computerized adaptive testing. Results for 1,392 examinees show that different types of misfit can be distinguished. (SLD)

  6. Lesion identification using unified segmentation-normalisation models and fuzzy clustering

    PubMed Central

    Seghier, Mohamed L.; Ramlackhansingh, Anil; Crinion, Jenny; Leff, Alexander P.; Price, Cathy J.

    2008-01-01

    In this paper, we propose a new automated procedure for lesion identification from single images based on the detection of outlier voxels. We demonstrate the utility of this procedure using artificial and real lesions. The scheme rests on two innovations: First, we augment the generative model used for combined segmentation and normalization of images, with an empirical prior for an atypical tissue class, which can be optimised iteratively. Second, we adopt a fuzzy clustering procedure to identify outlier voxels in normalised gray and white matter segments. These two advances suppress misclassification of voxels and restrict lesion identification to gray/white matter lesions respectively. Our analyses show a high sensitivity for detecting and delineating brain lesions with different sizes, locations, and textures. Our approach has important implications for the generation of lesion overlap maps of a given population and the assessment of lesion-deficit mappings. From a clinical perspective, our method should help to compute the total volume of lesion or to trace precisely lesion boundaries that might be pertinent for surgical or diagnostic purposes. PMID:18482850

  7. Cycle bases to the rescue

    NASA Astrophysics Data System (ADS)

    Tóbiás, Roland; Furtenbacher, Tibor; Császár, Attila G.

    2017-12-01

    Cycle bases of graph theory are introduced for the analysis of transition data deposited in line-by-line rovibronic spectroscopic databases. The principal advantage of using cycle bases is that outlier transitions -almost always present in spectroscopic databases built from experimental data originating from many different sources- can be detected and identified straightforwardly and automatically. The data available for six water isotopologues, H216O, H217O, H218O, HD16O, HD17O, and HD18O, in the HITRAN2012 and GEISA2015 databases are used to demonstrate the utility of cycle-basis-based outlier-detection approaches. The spectroscopic databases appear to be sufficiently complete so that the great majority of the entries of the minimum cycle basis have the minimum possible length of four. More than 2000 transition conflicts have been identified for the isotopologue H216O in the HITRAN2012 database, the seven common conflict types are discussed. It is recommended to employ cycle bases, and especially a minimum cycle basis, for the analysis of transitions deposited in high-resolution spectroscopic databases.

  8. Improvement of statistical methods for detecting anomalies in climate and environmental monitoring systems

    NASA Astrophysics Data System (ADS)

    Yakunin, A. G.; Hussein, H. M.

    2018-01-01

    The article shows how the known statistical methods, which are widely used in solving financial problems and a number of other fields of science and technology, can be effectively applied after minor modification for solving such problems in climate and environment monitoring systems, as the detection of anomalies in the form of abrupt changes in signal levels, the occurrence of positive and negative outliers and the violation of the cycle form in periodic processes.

  9. Single nucleotide polymorphisms unravel hierarchical divergence and signatures of selection among Alaskan sockeye salmon (Oncorhynchus nerka) populations.

    PubMed

    Gomez-Uchida, Daniel; Seeb, James E; Smith, Matt J; Habicht, Christopher; Quinn, Thomas P; Seeb, Lisa W

    2011-02-18

    Disentangling the roles of geography and ecology driving population divergence and distinguishing adaptive from neutral evolution at the molecular level have been common goals among evolutionary and conservation biologists. Using single nucleotide polymorphism (SNP) multilocus genotypes for 31 sockeye salmon (Oncorhynchus nerka) populations from the Kvichak River, Alaska, we assessed the relative roles of geography (discrete boundaries or continuous distance) and ecology (spawning habitat and timing) driving genetic divergence in this species at varying spatial scales within the drainage. We also evaluated two outlier detection methods to characterize candidate SNPs responding to environmental selection, emphasizing which mechanism(s) may maintain the genetic variation of outlier loci. For the entire drainage, Mantel tests suggested a greater role of geographic distance on population divergence than differences in spawn timing when each variable was correlated with pairwise genetic distances. Clustering and hierarchical analyses of molecular variance indicated that the largest genetic differentiation occurred between populations from distinct lakes or subdrainages. Within one population-rich lake, however, Mantel tests suggested a greater role of spawn timing than geographic distance on population divergence when each variable was correlated with pairwise genetic distances. Variable spawn timing among populations was linked to specific spawning habitats as revealed by principal coordinate analyses. We additionally identified two outlier SNPs located in the major histocompatibility complex (MHC) class II that appeared robust to violations of demographic assumptions from an initial pool of eight candidates for selection. First, our results suggest that geography and ecology have influenced genetic divergence between Alaskan sockeye salmon populations in a hierarchical manner depending on the spatial scale. Second, we found consistent evidence for diversifying selection in two loci located in the MHC class II by means of outlier detection methods; yet, alternative scenarios for the evolution of these loci were also evaluated. Both conclusions argue that historical contingency and contemporary adaptation have likely driven differentiation between Kvichak River sockeye salmon populations, as revealed by a suite of SNPs. Our findings highlight the need for conservation of complex population structure, because it provides resilience in the face of environmental change, both natural and anthropogenic.

  10. Single nucleotide polymorphisms unravel hierarchical divergence and signatures of selection among Alaskan sockeye salmon (Oncorhynchus nerka) populations

    PubMed Central

    2011-01-01

    Background Disentangling the roles of geography and ecology driving population divergence and distinguishing adaptive from neutral evolution at the molecular level have been common goals among evolutionary and conservation biologists. Using single nucleotide polymorphism (SNP) multilocus genotypes for 31 sockeye salmon (Oncorhynchus nerka) populations from the Kvichak River, Alaska, we assessed the relative roles of geography (discrete boundaries or continuous distance) and ecology (spawning habitat and timing) driving genetic divergence in this species at varying spatial scales within the drainage. We also evaluated two outlier detection methods to characterize candidate SNPs responding to environmental selection, emphasizing which mechanism(s) may maintain the genetic variation of outlier loci. Results For the entire drainage, Mantel tests suggested a greater role of geographic distance on population divergence than differences in spawn timing when each variable was correlated with pairwise genetic distances. Clustering and hierarchical analyses of molecular variance indicated that the largest genetic differentiation occurred between populations from distinct lakes or subdrainages. Within one population-rich lake, however, Mantel tests suggested a greater role of spawn timing than geographic distance on population divergence when each variable was correlated with pairwise genetic distances. Variable spawn timing among populations was linked to specific spawning habitats as revealed by principal coordinate analyses. We additionally identified two outlier SNPs located in the major histocompatibility complex (MHC) class II that appeared robust to violations of demographic assumptions from an initial pool of eight candidates for selection. Conclusions First, our results suggest that geography and ecology have influenced genetic divergence between Alaskan sockeye salmon populations in a hierarchical manner depending on the spatial scale. Second, we found consistent evidence for diversifying selection in two loci located in the MHC class II by means of outlier detection methods; yet, alternative scenarios for the evolution of these loci were also evaluated. Both conclusions argue that historical contingency and contemporary adaptation have likely driven differentiation between Kvichak River sockeye salmon populations, as revealed by a suite of SNPs. Our findings highlight the need for conservation of complex population structure, because it provides resilience in the face of environmental change, both natural and anthropogenic. PMID:21332997

  11. Development of a computerized monitoring program to identify narcotic diversion in a pediatric anesthesia practice.

    PubMed

    Brenn, B Randall; Kim, Margaret A; Hilmas, Elora

    2015-08-15

    Development of an operational reporting dashboard designed to correlate data from multiple sources to help detect potential drug diversion by automated dispensing cabinet (ADC) users is described. A commercial business intelligence platform was used to create a dashboard tool for rapid detection of unusual patterns of ADC transactions by anesthesia service providers at a large pediatric hospital. By linking information from the hospital's pharmacy information management system (PIMS) and anesthesia information management system (AIMS) in an associative data model, the "narcotic reconciliation dashboard" can generate various reports to help spot outlier activity associated with ADC dispensing of controlled substances and documentation of medication waste processing. The dashboard's utility was evaluated by "back-testing" the program with historical data on an actual episode of diversion by an anesthesia provider that had not been detected through traditional methods of PIMS and AIMS data monitoring. Dashboard-generated reports on key metrics (e.g., ADC transaction counts, discrepancies in dispensed versus reconciled amounts of narcotics, PIMS-AIMS documentation mismatches) over various time frames during the period of known diversion clearly indicated the diverter's outlier status relative to other authorized ADC users. A dashboard program for correlating ADC transaction data with pharmacy and patient care data may be an effective tool for detecting patterns of ADC use that suggest drug diversion. Copyright © 2015 by the American Society of Health-System Pharmacists, Inc. All rights reserved.

  12. Outlier identification and visualization for Pb concentrations in urban soils and its implications for identification of potential contaminated land.

    PubMed

    Zhang, Chaosheng; Tang, Ya; Luo, Lin; Xu, Weilin

    2009-11-01

    Outliers in urban soil geochemical databases may imply potential contaminated land. Different methodologies which can be easily implemented for the identification of global and spatial outliers were applied for Pb concentrations in urban soils of Galway City in Ireland. Due to its strongly skewed probability feature, a Box-Cox transformation was performed prior to further analyses. The graphic methods of histogram and box-and-whisker plot were effective in identification of global outliers at the original scale of the dataset. Spatial outliers could be identified by a local indicator of spatial association of local Moran's I, cross-validation of kriging, and a geographically weighted regression. The spatial locations of outliers were visualised using a geographical information system. Different methods showed generally consistent results, but differences existed. It is suggested that outliers identified by statistical methods should be confirmed and justified using scientific knowledge before they are properly dealt with.

  13. Outliers to the peak energy-isotropic energy relation in gamma-ray bursts

    NASA Astrophysics Data System (ADS)

    Nakar, Ehud; Piran, Tsvi

    2005-06-01

    The peak energy-isotropic energy (EpEi) relation is among the most intriguing recent discoveries concerning gamma-ray bursts (GRBs). It can have numerous implications for our understanding of the emission mechanism of the bursts and for the application of GRBs to cosmological studies. However, this relation has been verified only for a small sample of bursts with measured redshifts. We propose here a test of whether a burst with an unknown redshift can potentially satisfy the EpEi relation. Applying this test to a large sample of BATSE bursts, we find that a significant fraction of those bursts cannot satisfy this relation. Our test is sensitive only to dim and hard bursts, and therefore this relation might still hold as an inequality (i.e. there are no intrinsically bright and soft bursts). We conclude that the observed relation seen in the sample of bursts with known redshift might be influenced by observational biases and the inability to locate and to localize well hard and weak bursts that have only a small number of photons. In particular, we point out that the threshold for detection, localization and redshift measurement is essentially higher than the threshold for detection alone. We predict that Swift will detect some hard and weak bursts that would be outliers to the EpEi relation. However, we cannot quantify this prediction. We stress the importance of understanding the detection-localization-redshift threshold for the coming Swift detections.

  14. Robust Kriged Kalman Filtering

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Baingana, Brian; Dall'Anese, Emiliano; Mateos, Gonzalo

    2015-11-11

    Although the kriged Kalman filter (KKF) has well-documented merits for prediction of spatial-temporal processes, its performance degrades in the presence of outliers due to anomalous events, or measurement equipment failures. This paper proposes a robust KKF model that explicitly accounts for presence of measurement outliers. Exploiting outlier sparsity, a novel l1-regularized estimator that jointly predicts the spatial-temporal process at unmonitored locations, while identifying measurement outliers is put forth. Numerical tests are conducted on a synthetic Internet protocol (IP) network, and real transformer load data. Test results corroborate the effectiveness of the novel estimator in joint spatial prediction and outlier identification.

  15. Universal Linear Fit Identification: A Method Independent of Data, Outliers and Noise Distribution Model and Free of Missing or Removed Data Imputation.

    PubMed

    Adikaram, K K L B; Hussein, M A; Effenberger, M; Becker, T

    2015-01-01

    Data processing requires a robust linear fit identification method. In this paper, we introduce a non-parametric robust linear fit identification method for time series. The method uses an indicator 2/n to identify linear fit, where n is number of terms in a series. The ratio Rmax of amax - amin and Sn - amin*n and that of Rmin of amax - amin and amax*n - Sn are always equal to 2/n, where amax is the maximum element, amin is the minimum element and Sn is the sum of all elements. If any series expected to follow y = c consists of data that do not agree with y = c form, Rmax > 2/n and Rmin > 2/n imply that the maximum and minimum elements, respectively, do not agree with linear fit. We define threshold values for outliers and noise detection as 2/n * (1 + k1) and 2/n * (1 + k2), respectively, where k1 > k2 and 0 ≤ k1 ≤ n/2 - 1. Given this relation and transformation technique, which transforms data into the form y = c, we show that removing all data that do not agree with linear fit is possible. Furthermore, the method is independent of the number of data points, missing data, removed data points and nature of distribution (Gaussian or non-Gaussian) of outliers, noise and clean data. These are major advantages over the existing linear fit methods. Since having a perfect linear relation between two variables in the real world is impossible, we used artificial data sets with extreme conditions to verify the method. The method detects the correct linear fit when the percentage of data agreeing with linear fit is less than 50%, and the deviation of data that do not agree with linear fit is very small, of the order of ±10-4%. The method results in incorrect detections only when numerical accuracy is insufficient in the calculation process.

  16. Simulation of a Geiger-Mode Imaging LADAR System for Performance Assessment

    PubMed Central

    Kim, Seongjoon; Lee, Impyeong; Kwon, Yong Joon

    2013-01-01

    As LADAR systems applications gradually become more diverse, new types of systems are being developed. When developing new systems, simulation studies are an essential prerequisite. A simulator enables performance predictions and optimal system parameters at the design level, as well as providing sample data for developing and validating application algorithms. The purpose of the study is to propose a method for simulating a Geiger-mode imaging LADAR system. We develop simulation software to assess system performance and generate sample data for the applications. The simulation is based on three aspects of modeling—the geometry, radiometry and detection. The geometric model computes the ranges to the reflection points of the laser pulses. The radiometric model generates the return signals, including the noises. The detection model determines the flight times of the laser pulses based on the nature of the Geiger-mode detector. We generated sample data using the simulator with the system parameters and analyzed the detection performance by comparing the simulated points to the reference points. The proportion of the outliers in the simulated points reached 25.53%, indicating the need for efficient outlier elimination algorithms. In addition, the false alarm rate and dropout rate of the designed system were computed as 1.76% and 1.06%, respectively. PMID:23823970

  17. Quality assurance using outlier detection on an automatic segmentation method for the cerebellar peduncles

    NASA Astrophysics Data System (ADS)

    Li, Ke; Ye, Chuyang; Yang, Zhen; Carass, Aaron; Ying, Sarah H.; Prince, Jerry L.

    2016-03-01

    Cerebellar peduncles (CPs) are white matter tracts connecting the cerebellum to other brain regions. Automatic segmentation methods of the CPs have been proposed for studying their structure and function. Usually the performance of these methods is evaluated by comparing segmentation results with manual delineations (ground truth). However, when a segmentation method is run on new data (for which no ground truth exists) it is highly desirable to efficiently detect and assess algorithm failures so that these cases can be excluded from scientific analysis. In this work, two outlier detection methods aimed to assess the performance of an automatic CP segmentation algorithm are presented. The first one is a univariate non-parametric method using a box-whisker plot. We first categorize automatic segmentation results of a dataset of diffusion tensor imaging (DTI) scans from 48 subjects as either a success or a failure. We then design three groups of features from the image data of nine categorized failures for failure detection. Results show that most of these features can efficiently detect the true failures. The second method—supervised classification—was employed on a larger DTI dataset of 249 manually categorized subjects. Four classifiers—linear discriminant analysis (LDA), logistic regression (LR), support vector machine (SVM), and random forest classification (RFC)—were trained using the designed features and evaluated using a leave-one-out cross validation. Results show that the LR performs worst among the four classifiers and the other three perform comparably, which demonstrates the feasibility of automatically detecting segmentation failures using classification methods.

  18. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data.

    PubMed

    Shen, Shihao; Park, Juw Won; Lu, Zhi-xiang; Lin, Lan; Henry, Michael D; Wu, Ying Nian; Zhou, Qing; Xing, Yi

    2014-12-23

    Ultra-deep RNA sequencing (RNA-Seq) has become a powerful approach for genome-wide analysis of pre-mRNA alternative splicing. We previously developed multivariate analysis of transcript splicing (MATS), a statistical method for detecting differential alternative splicing between two RNA-Seq samples. Here we describe a new statistical model and computer program, replicate MATS (rMATS), designed for detection of differential alternative splicing from replicate RNA-Seq data. rMATS uses a hierarchical model to simultaneously account for sampling uncertainty in individual replicates and variability among replicates. In addition to the analysis of unpaired replicates, rMATS also includes a model specifically designed for paired replicates between sample groups. The hypothesis-testing framework of rMATS is flexible and can assess the statistical significance over any user-defined magnitude of splicing change. The performance of rMATS is evaluated by the analysis of simulated and real RNA-Seq data. rMATS outperformed two existing methods for replicate RNA-Seq data in all simulation settings, and RT-PCR yielded a high validation rate (94%) in an RNA-Seq dataset of prostate cancer cell lines. Our data also provide guiding principles for designing RNA-Seq studies of alternative splicing. We demonstrate that it is essential to incorporate biological replicates in the study design. Of note, pooling RNAs or merging RNA-Seq data from multiple replicates is not an effective approach to account for variability, and the result is particularly sensitive to outliers. The rMATS source code is freely available at rnaseq-mats.sourceforge.net/. As the popularity of RNA-Seq continues to grow, we expect rMATS will be useful for studies of alternative splicing in diverse RNA-Seq projects.

  19. A Geometrical-Statistical Approach to Outlier Removal for TDOA Measurements

    NASA Astrophysics Data System (ADS)

    Compagnoni, Marco; Pini, Alessia; Canclini, Antonio; Bestagini, Paolo; Antonacci, Fabio; Tubaro, Stefano; Sarti, Augusto

    2017-08-01

    The curse of outlier measurements in estimation problems is a well known issue in a variety of fields. Therefore, outlier removal procedures, which enables the identification of spurious measurements within a set, have been developed for many different scenarios and applications. In this paper, we propose a statistically motivated outlier removal algorithm for time differences of arrival (TDOAs), or equivalently range differences (RD), acquired at sensor arrays. The method exploits the TDOA-space formalism and works by only knowing relative sensor positions. As the proposed method is completely independent from the application for which measurements are used, it can be reliably used to identify outliers within a set of TDOA/RD measurements in different fields (e.g. acoustic source localization, sensor synchronization, radar, remote sensing, etc.). The proposed outlier removal algorithm is validated by means of synthetic simulations and real experiments.

  20. Viewpoints: A High-Performance High-Dimensional Exploratory Data Analysis Tool

    NASA Astrophysics Data System (ADS)

    Gazis, P. R.; Levit, C.; Way, M. J.

    2010-12-01

    Scientific data sets continue to increase in both size and complexity. In the past, dedicated graphics systems at supercomputing centers were required to visualize large data sets, but as the price of commodity graphics hardware has dropped and its capability has increased, it is now possible, in principle, to view large complex data sets on a single workstation. To do this in practice, an investigator will need software that is written to take advantage of the relevant graphics hardware. The Viewpoints visualization package described herein is an example of such software. Viewpoints is an interactive tool for exploratory visual analysis of large high-dimensional (multivariate) data. It leverages the capabilities of modern graphics boards (GPUs) to run on a single workstation or laptop. Viewpoints is minimalist: it attempts to do a small set of useful things very well (or at least very quickly) in comparison with similar packages today. Its basic feature set includes linked scatter plots with brushing, dynamic histograms, normalization, and outlier detection/removal. Viewpoints was originally designed for astrophysicists, but it has since been used in a variety of fields that range from astronomy, quantum chemistry, fluid dynamics, machine learning, bioinformatics, and finance to information technology server log mining. In this article, we describe the Viewpoints package and show examples of its usage.

  1. Using a Linear Regression Method to Detect Outliers in IRT Common Item Equating

    ERIC Educational Resources Information Center

    He, Yong; Cui, Zhongmin; Fang, Yu; Chen, Hanwei

    2013-01-01

    Common test items play an important role in equating alternate test forms under the common item nonequivalent groups design. When the item response theory (IRT) method is applied in equating, inconsistent item parameter estimates among common items can lead to large bias in equated scores. It is prudent to evaluate inconsistency in parameter…

  2. Iterative Ellipsoidal Trimming.

    DTIC Science & Technology

    1980-02-11

    to above. Iterative ellipsoidal trimming has been investigated before by other statisticians, most notably by Gnanadesikan and his coworkers...J., Gnanadesikan R., and Kettenring, J. R. (1975). "Robust estimation and outlier detection with correlation coefficients." Biometrika. 62, 531-45. [6...Duda, Richard, and Hart, Peter (1973). Pattern Classification and Scene Analysis. Wiley, New York. [7] Gnanadesikan , R. (1977). Methods for

  3. Patterns of Care for Biologic-Dosing Outliers and Nonoutliers in Biologic-Naive Patients with Rheumatoid Arthritis.

    PubMed

    Delate, Thomas; Meyer, Roxanne; Jenkins, Daniel

    2017-08-01

    Although most biologic medications for patients with rheumatoid arthritis (RA) have recommended fixed dosing, actual biologic dosing may vary among real-world patients, since some patients can receive higher (high-dose outliers) or lower (low-dose outliers) doses than what is recommended in medication package inserts. To describe the patterns of care for biologic-dosing outliers and nonoutliers in biologic-naive patients with RA. This was a retrospective, longitudinal cohort study of patients with RA who were not pregnant and were aged ≥ 18 and < 90 years from an integrated health care delivery system. Patients were newly initiated on adalimumab (ADA), etanercept (ETN), or infliximab (IFX) as index biologic therapy between July 1, 2006, and February 28, 2014. Outlier status was defined as a patient having received at least 1 dose < 90% or > 110% of the approved dose in the package insert at any time during the study period. Baseline patient profiles, treatment exposures, and outcomes were collected during the 180 days before and up to 2 years after biologic initiation and compared across index biologic outlier groups. Patients were followed for at least 1 year, with a subanalysis of those patients who remained as members for 2 years. This study included 434 RA patients with 1 year of follow-up and 372 RA patients with 2 years of follow-up. Overall, the vast majority of patients were female (≈75%) and had similar baseline characteristics. Approximately 10% of patients were outliers in both follow-up cohorts. ETN patients were least likely to become outliers, and ADA patients were most likely to become outliers. Of all outliers during the 1-year follow-up, patients were more likely to be a high-dose outlier (55%) than a low-dose outlier (45%). Median 1- and 2-year adjusted total biologic costs (based on wholesale acquisition costs) were higher for ADA and ETA nonoutliers than for IFX nonoutliers. Biologic persistence was highest for IFX patients. Charlson Comorbidity Index score, ETN and IFX index biologic, and treatment with a nonbiologic disease-modifying antirheumatic drug (DMARD) before biologic initiation were associated with becoming high- or low-dose outliers (c-statistic = 0.79). Approximately 1 in 10 study patients with RA was identified as a biologic-dosing outlier. Dosing outliers did not appear to have better clinical outcomes compared with nonoutliers. Before initiating outlier biologic dosing, health care providers may better serve their RA patients by prescribing alternate DMARD therapy. This study was sponsored by Janssen Scientific Affairs. It is the policy of Janssen Scientific Affairs to publish all sponsored studies unless they are exploratory studies or are determined a priori for internal use only (e.g., to inform business decisions). Meyer is an employee of Janssen Scientific Affairs and a stockholder in Johnson and Johnson, its parent company. Delate and Jenkins have nothing to disclose. Study concept and design were contributed by Delate and Meyer. Delate took the lead in data collection, along with Jenkins. All authors participated in data analysis. The manuscript was written primarily by Delate, along with Meyers and Jenkins, and was revised by Meyer, along with Delate and Jenkins.

  4. Kfits: a software framework for fitting and cleaning outliers in kinetic measurements.

    PubMed

    Rimon, Oded; Reichmann, Dana

    2018-01-01

    Kinetic measurements have played an important role in elucidating biochemical and biophysical phenomena for over a century. While many tools for analysing kinetic measurements exist, most require low noise levels in the data, leaving outlier measurements to be cleaned manually. This is particularly true for protein misfolding and aggregation processes, which are extremely noisy and hence difficult to model. Understanding these processes is paramount, as they are associated with diverse physiological processes and disorders, most notably neurodegenerative diseases. Therefore, a better tool for analysing and cleaning protein aggregation traces is required. Here we introduce Kfits, an intuitive graphical tool for detecting and removing noise caused by outliers in protein aggregation kinetics data. Following its workflow allows the user to quickly and easily clean large quantities of data and receive kinetic parameters for assessment of the results. With minor adjustments, the software can be applied to any type of kinetic measurements, not restricted to protein aggregation. Kfits is implemented in Python and available online at http://kfits.reichmannlab.com, in source at https://github.com/odedrim/kfits/, or by direct installation from PyPI (`pip install kfits`). danare@mail.huji.ac.il. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  5. Robustly detecting differential expression in RNA sequencing data using observation weights

    PubMed Central

    Zhou, Xiaobei; Lindsay, Helen; Robinson, Mark D.

    2014-01-01

    A popular approach for comparing gene expression levels between (replicated) conditions of RNA sequencing data relies on counting reads that map to features of interest. Within such count-based methods, many flexible and advanced statistical approaches now exist and offer the ability to adjust for covariates (e.g. batch effects). Often, these methods include some sort of ‘sharing of information’ across features to improve inferences in small samples. It is important to achieve an appropriate tradeoff between statistical power and protection against outliers. Here, we study the robustness of existing approaches for count-based differential expression analysis and propose a new strategy based on observation weights that can be used within existing frameworks. The results suggest that outliers can have a global effect on differential analyses. We demonstrate the effectiveness of our new approach with real data and simulated data that reflects properties of real datasets (e.g. dispersion-mean trend) and develop an extensible framework for comprehensive testing of current and future methods. In addition, we explore the origin of such outliers, in some cases highlighting additional biological or technical factors within the experiment. Further details can be downloaded from the project website: http://imlspenticton.uzh.ch/robinson_lab/edgeR_robust/. PMID:24753412

  6. Simulated performance of an order statistic threshold strategy for detection of narrowband signals

    NASA Technical Reports Server (NTRS)

    Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

    1988-01-01

    The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.

  7. MIDAS robust trend estimator for accurate GPS station velocities without step detection

    PubMed Central

    Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

    2016-01-01

    Abstract Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil‐Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj–xi)/(tj–ti) computed between all data pairs i > j. For normally distributed data, Theil‐Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil‐Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one‐sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root‐mean‐square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences. PMID:27668140

  8. MIDAS robust trend estimator for accurate GPS station velocities without step detection.

    PubMed

    Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C; Gazeaux, Julien

    2016-03-01

    Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes v ij  = ( x j -x i )/( t j -t i ) computed between all data pairs i  >  j . For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.

  9. MIDAS robust trend estimator for accurate GPS station velocities without step detection

    NASA Astrophysics Data System (ADS)

    Blewitt, Geoffrey; Kreemer, Corné; Hammond, William C.; Gazeaux, Julien

    2016-03-01

    Automatic estimation of velocities from GPS coordinate time series is becoming required to cope with the exponentially increasing flood of available data, but problems detectable to the human eye are often overlooked. This motivates us to find an automatic and accurate estimator of trend that is resistant to common problems such as step discontinuities, outliers, seasonality, skewness, and heteroscedasticity. Developed here, Median Interannual Difference Adjusted for Skewness (MIDAS) is a variant of the Theil-Sen median trend estimator, for which the ordinary version is the median of slopes vij = (xj-xi)/(tj-ti) computed between all data pairs i > j. For normally distributed data, Theil-Sen and least squares trend estimates are statistically identical, but unlike least squares, Theil-Sen is resistant to undetected data problems. To mitigate both seasonality and step discontinuities, MIDAS selects data pairs separated by 1 year. This condition is relaxed for time series with gaps so that all data are used. Slopes from data pairs spanning a step function produce one-sided outliers that can bias the median. To reduce bias, MIDAS removes outliers and recomputes the median. MIDAS also computes a robust and realistic estimate of trend uncertainty. Statistical tests using GPS data in the rigid North American plate interior show ±0.23 mm/yr root-mean-square (RMS) accuracy in horizontal velocity. In blind tests using synthetic data, MIDAS velocities have an RMS accuracy of ±0.33 mm/yr horizontal, ±1.1 mm/yr up, with a 5th percentile range smaller than all 20 automatic estimators tested. Considering its general nature, MIDAS has the potential for broader application in the geosciences.

  10. The variance of length of stay and the optimal DRG outlier payments.

    PubMed

    Felder, Stefan

    2009-09-01

    Prospective payment schemes in health care often include supply-side insurance for cost outliers. In hospital reimbursement, prospective payments for patient discharges, based on their classification into diagnosis related group (DRGs), are complemented by outlier payments for long stay patients. The outlier scheme fixes the length of stay (LOS) threshold, constraining the profit risk of the hospitals. In most DRG systems, this threshold increases with the standard deviation of the LOS distribution. The present paper addresses the adequacy of this DRG outlier threshold rule for risk-averse hospitals with preferences depending on the expected value and the variance of profits. It first shows that the optimal threshold solves the hospital's tradeoff between higher profit risk and lower premium loading payments. It then demonstrates for normally distributed truncated LOS that the optimal outlier threshold indeed decreases with an increase in the standard deviation.

  11. Demographically-Based Evaluation of Genomic Regions under Selection in Domestic Dogs

    PubMed Central

    Freedman, Adam H.; Schweizer, Rena M.; Ortega-Del Vecchyo, Diego; Han, Eunjung; Davis, Brian W.; Gronau, Ilan; Silva, Pedro M.; Galaverni, Marco; Fan, Zhenxin; Marx, Peter; Lorente-Galdos, Belen; Ramirez, Oscar; Hormozdiari, Farhad; Alkan, Can; Vilà, Carles; Squire, Kevin; Geffen, Eli; Kusak, Josip; Boyko, Adam R.; Parker, Heidi G.; Lee, Clarence; Tadigotla, Vasisht; Siepel, Adam; Bustamante, Carlos D.; Harkins, Timothy T.; Nelson, Stanley F.; Marques-Bonet, Tomas; Ostrander, Elaine A.; Wayne, Robert K.; Novembre, John

    2016-01-01

    Controlling for background demographic effects is important for accurately identifying loci that have recently undergone positive selection. To date, the effects of demography have not yet been explicitly considered when identifying loci under selection during dog domestication. To investigate positive selection on the dog lineage early in the domestication, we examined patterns of polymorphism in six canid genomes that were previously used to infer a demographic model of dog domestication. Using an inferred demographic model, we computed false discovery rates (FDR) and identified 349 outlier regions consistent with positive selection at a low FDR. The signals in the top 100 regions were frequently centered on candidate genes related to brain function and behavior, including LHFPL3, CADM2, GRIK3, SH3GL2, MBP, PDE7B, NTAN1, and GLRA1. These regions contained significant enrichments in behavioral ontology categories. The 3rd top hit, CCRN4L, plays a major role in lipid metabolism, that is supported by additional metabolism related candidates revealed in our scan, including SCP2D1 and PDXC1. Comparing our method to an empirical outlier approach that does not directly account for demography, we found only modest overlaps between the two methods, with 60% of empirical outliers having no overlap with our demography-based outlier detection approach. Demography-aware approaches have lower-rates of false discovery. Our top candidates for selection, in addition to expanding the set of neurobehavioral candidate genes, include genes related to lipid metabolism, suggesting a dietary target of selection that was important during the period when proto-dogs hunted and fed alongside hunter-gatherers. PMID:26943675

  12. Onboard Robust Visual Tracking for UAVs Using a Reliable Global-Local Object Model

    PubMed Central

    Fu, Changhong; Duan, Ran; Kircali, Dogan; Kayacan, Erdal

    2016-01-01

    In this paper, we present a novel onboard robust visual algorithm for long-term arbitrary 2D and 3D object tracking using a reliable global-local object model for unmanned aerial vehicle (UAV) applications, e.g., autonomous tracking and chasing a moving target. The first main approach in this novel algorithm is the use of a global matching and local tracking approach. In other words, the algorithm initially finds feature correspondences in a way that an improved binary descriptor is developed for global feature matching and an iterative Lucas–Kanade optical flow algorithm is employed for local feature tracking. The second main module is the use of an efficient local geometric filter (LGF), which handles outlier feature correspondences based on a new forward-backward pairwise dissimilarity measure, thereby maintaining pairwise geometric consistency. In the proposed LGF module, a hierarchical agglomerative clustering, i.e., bottom-up aggregation, is applied using an effective single-link method. The third proposed module is a heuristic local outlier factor (to the best of our knowledge, it is utilized for the first time to deal with outlier features in a visual tracking application), which further maximizes the representation of the target object in which we formulate outlier feature detection as a binary classification problem with the output features of the LGF module. Extensive UAV flight experiments show that the proposed visual tracker achieves real-time frame rates of more than thirty-five frames per second on an i7 processor with 640 × 512 image resolution and outperforms the most popular state-of-the-art trackers favorably in terms of robustness, efficiency and accuracy. PMID:27589769

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Delaney, Alexander R., E-mail: a.delaney@vumc.nl; Tol, Jim P.; Dahele, Max

    Purpose: RapidPlan, a commercial knowledge-based planning solution, uses a model library containing the geometry and associated dosimetry of existing plans. This model predicts achievable dosimetry for prospective patients that can be used to guide plan optimization. However, it is unknown how suboptimal model plans (outliers) influence the predictions or resulting plans. We investigated the effect of, first, removing outliers from the model (cleaning it) and subsequently adding deliberate dosimetric outliers. Methods and Materials: Clinical plans from 70 head and neck cancer patients comprised the uncleaned (UC) Model{sub UC}, from which outliers were cleaned (C) to create Model{sub C}. The lastmore » 5 to 40 patients of Model{sub C} were replanned with no attempt to spare the salivary glands. These substantial dosimetric outliers were reintroduced to the model in increments of 5, creating Model{sub 5} to Model{sub 40} (Model{sub 5-40}). These models were used to create plans for a 10-patient evaluation group. Plans from Model{sub UC} and Model{sub C}, and Model{sub C} and Model{sub 5-40} were compared on the basis of boost (B) and elective (E) target volume homogeneity indexes (HI{sub B}/HI{sub E}) and mean doses to oral cavity, composite salivary glands (comp{sub sal}) and swallowing (comp{sub swal}) structures. Results: On average, outlier removal (Model{sub C} vs Model{sub UC}) had minimal effects on HI{sub B}/HI{sub E} (0%-0.4%) and sparing of organs at risk (mean dose difference to oral cavity and comp{sub sal}/comp{sub swal} were ≤0.4 Gy). Model{sub 5-10} marginally improved comp{sub sal} sparing, whereas adding a larger number of outliers (Model{sub 20-40}) led to deteriorations in comp{sub sal} up to 3.9 Gy, on average. These increases are modest compared to the 14.9 Gy dose increases in the added outlier plans, due to the placement of optimization objectives below the inferior boundary of the dose-volume histogram-predicted range. Conclusions: Overall, dosimetric outlier removal from or addition of 5 to 10 outliers to a 70-patient model had marginal effects on resulting plan quality. Although the addition of >20 outliers deteriorated plan quality, the effect was modest. In this study, RapidPlan demonstrated robustness for moderate proportions of salivary gland dosimetric outliers.« less

  14. Bucking social norms: Examining anomalous fertility aspirations in the face of HIV in Lusaka, Zambia

    PubMed Central

    Moore, Ann M.; Keogh, Sarah; Kavanaugh, Megan; Bankole, Akinrinola; Mulambia, Chishimba; Mutombo, Namuunda

    2014-01-01

    In settings of high fertility and high HIV prevalence, individuals are making fertility decisions while simultaneously trying to avoid or manage HIV. We sought to increase our understanding of how individuals dually manage HIV risk while attempting to achieve their fertility goals as part of the project entitled HIV Status and Achieving Fertility Desires conducted in Zambia in 2011. Using multivariate regression to predict fertility patterns based on socio-demographic characteristics for respondents from facility-based and community-based surveys, we employed Anomalous Case Analysis (ACA) whereby in-depth interview respondents were selected from the groups of outliers amongst the survey respondents who reported lower or higher fertility preferences than predicted as well as those who adhered to predicted patterns, and lived in Lusaka (n=45). All of the facility-based respondents were HIV-positive. We utilize the Theory of Conjunctural Action (TCA) to categorize domains of influence on individuals’ preferences and behavior. Both community-based and facility-based right-tail respondents (outliers whose fertility intentions indicated that they wanted a/nother child when we predicted that they did not) expressed comparatively less control over their fertility and gave more weight to pressures from others to continue childbearing. Partner communication about fertility desires was greater among left-tail respondents (outliers whose fertility intentions indicated that they did not want a/nother child when we predicted that they did). HIV-positive right-tail respondents were more likely to see anti-retroviral therapies (ARTs) which prevent mother to child transmission of HIV as highly effective, mitigating inhibitions to further childbearing. Drug interactions between ARTs and contraceptives were identified as a limitation to HIV-positive individuals’ contraceptive options on both sides of the distribution. Factors that should be taken into account in the future to understand fertility behavior in high HIV-prevalent settings include couples’ communication around fertility and perception of the efficacy of ARTs. PMID:25150655

  15. Multivariate data analysis on historical IPV production data for better process understanding and future improvements.

    PubMed

    Thomassen, Yvonne E; van Sprang, Eric N M; van der Pol, Leo A; Bakker, Wilfried A M

    2010-09-01

    Historical manufacturing data can potentially harbor a wealth of information for process optimization and enhancement of efficiency and robustness. To extract useful data multivariate data analysis (MVDA) using projection methods is often applied. In this contribution, the results obtained from applying MVDA on data from inactivated polio vaccine (IPV) production runs are described. Data from over 50 batches at two different production scales (700-L and 1,500-L) were available. The explorative analysis performed on single unit operations indicated consistent manufacturing. Known outliers (e.g., rejected batches) were identified using principal component analysis (PCA). The source of operational variation was pinpointed to variation of input such as media. Other relevant process parameters were in control and, using this manufacturing data, could not be correlated to product quality attributes. The gained knowledge of the IPV production process, not only from the MVDA, but also from digitalizing the available historical data, has proven to be useful for troubleshooting, understanding limitations of available data and seeing the opportunity for improvements. 2010 Wiley Periodicals, Inc.

  16. Investigating outliers to improve conceptual models of bedrock aquifers

    NASA Astrophysics Data System (ADS)

    Worthington, Stephen R. H.

    2018-06-01

    Numerical models play a prominent role in hydrogeology, with simplifying assumptions being inevitable when implementing these models. However, there is a risk of oversimplification, where important processes become neglected. Such processes may be associated with outliers, and consideration of outliers can lead to an improved scientific understanding of bedrock aquifers. Using rigorous logic to investigate outliers can help to explain fundamental scientific questions such as why there are large variations in permeability between different bedrock lithologies.

  17. Outlier removal, sum scores, and the inflation of the Type I error rate in independent samples t tests: the power of alternatives and recommendations.

    PubMed

    Bakker, Marjan; Wicherts, Jelte M

    2014-09-01

    In psychology, outliers are often excluded before running an independent samples t test, and data are often nonnormal because of the use of sum scores based on tests and questionnaires. This article concerns the handling of outliers in the context of independent samples t tests applied to nonnormal sum scores. After reviewing common practice, we present results of simulations of artificial and actual psychological data, which show that the removal of outliers based on commonly used Z value thresholds severely increases the Type I error rate. We found Type I error rates of above 20% after removing outliers with a threshold value of Z = 2 in a short and difficult test. Inflations of Type I error rates are particularly severe when researchers are given the freedom to alter threshold values of Z after having seen the effects thereof on outcomes. We recommend the use of nonparametric Mann-Whitney-Wilcoxon tests or robust Yuen-Welch tests without removing outliers. These alternatives to independent samples t tests are found to have nominal Type I error rates with a minimal loss of power when no outliers are present in the data and to have nominal Type I error rates and good power when outliers are present. PsycINFO Database Record (c) 2014 APA, all rights reserved.

  18. Privacy-preserving outlier detection through random nonlinear data distortion.

    PubMed

    Bhaduri, Kanishka; Stefanski, Mark D; Srivastava, Ashok N

    2011-02-01

    Consider a scenario in which the data owner has some private or sensitive data and wants a data miner to access them for studying important patterns without revealing the sensitive information. Privacy-preserving data mining aims to solve this problem by randomly transforming the data prior to their release to the data miners. Previous works only considered the case of linear data perturbations--additive, multiplicative, or a combination of both--for studying the usefulness of the perturbed output. In this paper, we discuss nonlinear data distortion using potentially nonlinear random data transformation and show how it can be useful for privacy-preserving anomaly detection from sensitive data sets. We develop bounds on the expected accuracy of the nonlinear distortion and also quantify privacy by using standard definitions. The highlight of this approach is to allow a user to control the amount of privacy by varying the degree of nonlinearity. We show how our general transformation can be used for anomaly detection in practice for two specific problem instances: a linear model and a popular nonlinear model using the sigmoid function. We also analyze the proposed nonlinear transformation in full generality and then show that, for specific cases, it is distance preserving. A main contribution of this paper is the discussion between the invertibility of a transformation and privacy preservation and the application of these techniques to outlier detection. The experiments conducted on real-life data sets demonstrate the effectiveness of the approach.

  19. The search for loci under selection: trends, biases and progress.

    PubMed

    Ahrens, Collin W; Rymer, Paul D; Stow, Adam; Bragg, Jason; Dillon, Shannon; Umbers, Kate D L; Dudaniec, Rachael Y

    2018-03-01

    Detecting genetic variants under selection using F ST outlier analysis (OA) and environmental association analyses (EAAs) are popular approaches that provide insight into the genetic basis of local adaptation. Despite the frequent use of OA and EAA approaches and their increasing attractiveness for detecting signatures of selection, their application to field-based empirical data have not been synthesized. Here, we review 66 empirical studies that use Single Nucleotide Polymorphisms (SNPs) in OA and EAA. We report trends and biases across biological systems, sequencing methods, approaches, parameters, environmental variables and their influence on detecting signatures of selection. We found striking variability in both the use and reporting of environmental data and statistical parameters. For example, linkage disequilibrium among SNPs and numbers of unique SNP associations identified with EAA were rarely reported. The proportion of putatively adaptive SNPs detected varied widely among studies, and decreased with the number of SNPs analysed. We found that genomic sampling effort had a greater impact than biological sampling effort on the proportion of identified SNPs under selection. OA identified a higher proportion of outliers when more individuals were sampled, but this was not the case for EAA. To facilitate repeatability, interpretation and synthesis of studies detecting selection, we recommend that future studies consistently report geographical coordinates, environmental data, model parameters, linkage disequilibrium, and measures of genetic structure. Identifying standards for how OA and EAA studies are designed and reported will aid future transparency and comparability of SNP-based selection studies and help to progress landscape and evolutionary genomics. © 2018 John Wiley & Sons Ltd.

  20. 42 CFR 484.240 - Methodology used for the calculation of the outlier payment.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... for each case-mix group. (b) The outlier threshold for each case-mix group is the episode payment... the same for all case-mix groups. (c) The outlier payment is a proportion of the amount of estimated...

  1. Robust Averaging of Covariances for EEG Recordings Classification in Motor Imagery Brain-Computer Interfaces.

    PubMed

    Uehara, Takashi; Sartori, Matteo; Tanaka, Toshihisa; Fiori, Simone

    2017-06-01

    The estimation of covariance matrices is of prime importance to analyze the distribution of multivariate signals. In motor imagery-based brain-computer interfaces (MI-BCI), covariance matrices play a central role in the extraction of features from recorded electroencephalograms (EEGs); therefore, correctly estimating covariance is crucial for EEG classification. This letter discusses algorithms to average sample covariance matrices (SCMs) for the selection of the reference matrix in tangent space mapping (TSM)-based MI-BCI. Tangent space mapping is a powerful method of feature extraction and strongly depends on the selection of a reference covariance matrix. In general, the observed signals may include outliers; therefore, taking the geometric mean of SCMs as the reference matrix may not be the best choice. In order to deal with the effects of outliers, robust estimators have to be used. In particular, we discuss and test the use of geometric medians and trimmed averages (defined on the basis of several metrics) as robust estimators. The main idea behind trimmed averages is to eliminate data that exhibit the largest distance from the average covariance calculated on the basis of all available data. The results of the experiments show that while the geometric medians show little differences from conventional methods in terms of classification accuracy in the classification of electroencephalographic recordings, the trimmed averages show significant improvement for all subjects.

  2. Online gesture spotting from visual hull data.

    PubMed

    Peng, Bo; Qian, Gang

    2011-06-01

    This paper presents a robust framework for online full-body gesture spotting from visual hull data. Using view-invariant pose features as observations, hidden Markov models (HMMs) are trained for gesture spotting from continuous movement data streams. Two major contributions of this paper are 1) view-invariant pose feature extraction from visual hulls, and 2) a systematic approach to automatically detecting and modeling specific nongesture movement patterns and using their HMMs for outlier rejection in gesture spotting. The experimental results have shown the view-invariance property of the proposed pose features for both training poses and new poses unseen in training, as well as the efficacy of using specific nongesture models for outlier rejection. Using the IXMAS gesture data set, the proposed framework has been extensively tested and the gesture spotting results are superior to those reported on the same data set obtained using existing state-of-the-art gesture spotting methods.

  3. On Visualizing Mixed-Type Data: A Joint Metric Approach to Profile Construction and Outlier Detection

    ERIC Educational Resources Information Center

    Grané, Aurea; Romera, Rosario

    2018-01-01

    Survey data are usually of mixed type (quantitative, multistate categorical, and/or binary variables). Multidimensional scaling (MDS) is one of the most extended methodologies to visualize the profile structure of the data. Since the past 60s, MDS methods have been introduced in the literature, initially in publications in the psychometrics area.…

  4. BKG/DGFI Combination Center Annual Report 2012

    NASA Technical Reports Server (NTRS)

    Bachmann, Sabine; Loesler, Michael; Heinkelmann, Robert; Gerstl, Michael

    2013-01-01

    This report summarizes the activities of the Federal Agency for Cartography and Geodesy (Bundesamt fuer Kartographie und Geodaesie, BKG) and the German Geodetic Research Institute (Deutsches Geodaetisches Forschungsinstitut, DGFI)BKG/DGFI Combination Center in 2011 and outlines the planned activities for the year 2012. The main focus was to stabilize outlier detection and to update the Web presentation of the combined products.

  5. Extensive cross-talk and global regulators identified from an analysis of the integrated transcriptional and signaling network in Escherichia coli.

    PubMed

    Antiqueira, Lucas; Janga, Sarath Chandra; Costa, Luciano da Fontoura

    2012-11-01

    To understand the regulatory dynamics of transcription factors (TFs) and their interplay with other cellular components we have integrated transcriptional, protein-protein and the allosteric or equivalent interactions which mediate the physiological activity of TFs in Escherichia coli. To study this integrated network we computed a set of network measurements followed by principal component analysis (PCA), investigated the correlations between network structure and dynamics, and carried out a procedure for motif detection. In particular, we show that outliers identified in the integrated network based on their network properties correspond to previously characterized global transcriptional regulators. Furthermore, outliers are highly and widely expressed across conditions, thus supporting their global nature in controlling many genes in the cell. Motifs revealed that TFs not only interact physically with each other but also obtain feedback from signals delivered by signaling proteins supporting the extensive cross-talk between different types of networks. Our analysis can lead to the development of a general framework for detecting and understanding global regulatory factors in regulatory networks and reinforces the importance of integrating multiple types of interactions in underpinning the interrelationships between them.

  6. Simple automatic strategy for background drift correction in chromatographic data analysis.

    PubMed

    Fu, Hai-Yan; Li, He-Dong; Yu, Yong-Jie; Wang, Bing; Lu, Peng; Cui, Hua-Peng; Liu, Ping-Ping; She, Yuan-Bin

    2016-06-03

    Chromatographic background drift correction, which influences peak detection and time shift alignment results, is a critical stage in chromatographic data analysis. In this study, an automatic background drift correction methodology was developed. Local minimum values in a chromatogram were initially detected and organized as a new baseline vector. Iterative optimization was then employed to recognize outliers, which belong to the chromatographic peaks, in this vector, and update the outliers in the baseline until convergence. The optimized baseline vector was finally expanded into the original chromatogram, and linear interpolation was employed to estimate background drift in the chromatogram. The principle underlying the proposed method was confirmed using a complex gas chromatographic dataset. Finally, the proposed approach was applied to eliminate background drift in liquid chromatography quadrupole time-of-flight samples used in the metabolic study of Escherichia coli samples. The proposed method was comparable with three classical techniques: morphological weighted penalized least squares, moving window minimum value strategy and background drift correction by orthogonal subspace projection. The proposed method allows almost automatic implementation of background drift correction, which is convenient for practical use. Copyright © 2016 Elsevier B.V. All rights reserved.

  7. Measurement Consistency from Magnetic Resonance Images

    PubMed Central

    Chung, Dongjun; Chung, Moo K.; Durtschi, Reid B.; Lindell, R. Gentry; Vorperian, Houri K.

    2010-01-01

    Rationale and Objectives In quantifying medical images, length-based measurements are still obtained manually. Due to possible human error, a measurement protocol is required to guarantee the consistency of measurements. In this paper, we review various statistical techniques that can be used in determining measurement consistency. The focus is on detecting a possible measurement bias and determining the robustness of the procedures to outliers. Materials and Methods We review correlation analysis, linear regression, Bland-Altman method, paired t-test, and analysis of variance (ANOVA). These techniques were applied to measurements, obtained by two raters, of head and neck structures from magnetic resonance images (MRI). Results The correlation analysis and the linear regression were shown to be insufficient for detecting measurement inconsistency. They are also very sensitive to outliers. The widely used Bland-Altman method is a visualization technique so it lacks the numerical quantification. The paired t-test tends to be sensitive to small measurement bias. On the other hand, ANOVA performs well even under small measurement bias. Conclusion In almost all cases, using only one method is insufficient and it is recommended to use several methods simultaneously. In general, ANOVA performs the best. PMID:18790405

  8. Urine metabolic fingerprinting using LC-MS and GC-MS reveals metabolite changes in prostate cancer: A pilot study.

    PubMed

    Struck-Lewicka, Wiktoria; Kordalewska, Marta; Bujak, Renata; Yumba Mpanga, Arlette; Markuszewski, Marcin; Jacyna, Julia; Matuszewski, Marcin; Kaliszan, Roman; Markuszewski, Michał J

    2015-01-01

    Prostate cancer (CaP) is a leading cause of cancer deaths in men worldwide. The alarming statistics, the currently applied biomarkers are still not enough specific and selective. In addition, pathogenesis of CaP development is not totally understood. Therefore, in the present work, metabolomics study related to urinary metabolic fingerprinting analyses has been performed in order to scrutinize potential biomarkers that could help in explaining the pathomechanism of the disease and be potentially useful in its diagnosis and prognosis. Urine samples from CaP patients and healthy volunteers were analyzed with the use of high performance liquid chromatography coupled with time of flight mass spectrometry detection (HPLC-TOF/MS) in positive and negative polarity as well as gas chromatography hyphenated with triple quadruple mass spectrometry detection (GC-QqQ/MS) in a scan mode. The obtained data sets were statistically analyzed using univariate and multivariate statistical analyses. The Principal Component Analysis (PCA) was used to check systems' stability and possible outliers, whereas Partial Least Squares Discriminant Analysis (PLS-DA) was performed for evaluation of quality of the model as well as its predictive ability using statistically significant metabolites. The subsequent identification of selected metabolites using NIST library and commonly available databases allows for creation of a list of putative biomarkers and related biochemical pathways they are involved in. The selected pathways, like urea and tricarboxylic acid cycle, amino acid and purine metabolism, can play crucial role in pathogenesis of prostate cancer disease. Copyright © 2014 Elsevier B.V. All rights reserved.

  9. Moving standard deviation and moving sum of outliers as quality tools for monitoring analytical precision.

    PubMed

    Liu, Jiakai; Tan, Chin Hon; Badrick, Tony; Loh, Tze Ping

    2018-02-01

    An increase in analytical imprecision (expressed as CV a ) can introduce additional variability (i.e. noise) to the patient results, which poses a challenge to the optimal management of patients. Relatively little work has been done to address the need for continuous monitoring of analytical imprecision. Through numerical simulations, we describe the use of moving standard deviation (movSD) and a recently described moving sum of outlier (movSO) patient results as means for detecting increased analytical imprecision, and compare their performances against internal quality control (QC) and the average of normal (AoN) approaches. The power of detecting an increase in CV a is suboptimal under routine internal QC procedures. The AoN technique almost always had the highest average number of patient results affected before error detection (ANPed), indicating that it had generally the worst capability for detecting an increased CV a . On the other hand, the movSD and movSO approaches were able to detect an increased CV a at significantly lower ANPed, particularly for measurands that displayed a relatively small ratio of biological variation to CV a. CONCLUSION: The movSD and movSO approaches are effective in detecting an increase in CV a for high-risk measurands with small biological variation. Their performance is relatively poor when the biological variation is large. However, the clinical risks of an increase in analytical imprecision is attenuated for these measurands as an increased analytical imprecision will only add marginally to the total variation and less likely to impact on the clinical care. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.

  10. Multilayer Perceptron for Robust Nonlinear Interval Regression Analysis Using Genetic Algorithms

    PubMed Central

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets. PMID:25110755

  11. Multilayer perceptron for robust nonlinear interval regression analysis using genetic algorithms.

    PubMed

    Hu, Yi-Chung

    2014-01-01

    On the basis of fuzzy regression, computational models in intelligence such as neural networks have the capability to be applied to nonlinear interval regression analysis for dealing with uncertain and imprecise data. When training data are not contaminated by outliers, computational models perform well by including almost all given training data in the data interval. Nevertheless, since training data are often corrupted by outliers, robust learning algorithms employed to resist outliers for interval regression analysis have been an interesting area of research. Several approaches involving computational intelligence are effective for resisting outliers, but the required parameters for these approaches are related to whether the collected data contain outliers or not. Since it seems difficult to prespecify the degree of contamination beforehand, this paper uses multilayer perceptron to construct the robust nonlinear interval regression model using the genetic algorithm. Outliers beyond or beneath the data interval will impose slight effect on the determination of data interval. Simulation results demonstrate that the proposed method performs well for contaminated datasets.

  12. Baseline Estimation and Outlier Identification for Halocarbons

    NASA Astrophysics Data System (ADS)

    Wang, D.; Schuck, T.; Engel, A.; Gallman, F.

    2017-12-01

    The aim of this paper is to build a baseline model for halocarbons and to statistically identify the outliers under specific conditions. In this paper, time series of regional CFC-11 and Chloromethane measurements was discussed, which taken over the last 4 years at two locations, including a monitoring station at northwest of Frankfurt am Main (Germany) and Mace Head station (Ireland). In addition to analyzing time series of CFC-11 and Chloromethane, more importantly, a statistical approach of outlier identification is also introduced in this paper in order to make a better estimation of baseline. A second-order polynomial plus harmonics are fitted to CFC-11 and chloromethane mixing ratios data. Measurements with large distance to the fitting curve are regard as outliers and flagged. Under specific requirement, the routine is iteratively adopted without the flagged measurements until no additional outliers are found. Both model fitting and the proposed outlier identification method are realized with the help of a programming language, Python. During the period, CFC-11 shows a gradual downward trend. And there is a slightly upward trend in the mixing ratios of Chloromethane. The concentration of chloromethane also has a strong seasonal variation, mostly due to the seasonal cycle of OH. The usage of this statistical method has a considerable effect on the results. This method efficiently identifies a series of outliers according to the standard deviation requirements. After removing the outliers, the fitting curves and trend estimates are more reliable.

  13. The influence of outliers on results of wet deposition measurements as a function of measurement strategy

    NASA Astrophysics Data System (ADS)

    Slanina, J.; Möls, J. J.; Baard, J. H.

    The results of a wet deposition monitoring experiment, carried out by eight identical wet-only precipitation samplers operating on the basis of 24 h samples, have been used to investigate the accuracy and uncertainties in wet deposition measurements. The experiment was conducted near Lelystad, The Netherlands over the period 1 March 1983-31 December 1985. By rearranging the data for one to eight samplers and sampling periods of 1 day to 1 month both systematic and random errors were investigated as a function of measuring strategy. A Gaussian distribution of the results was observed. Outliers, detected by a Dixon test ( a = 0.05) influenced strongly both the yearly averaged results and the standard deviation of this average as a function of the number of samplers and the length of the sampling period. The systematic bias in bulk elements, using one sampler, varies typically from 2 to 20% and for trace elements from 10 to 500%, respectively. Severe problems are encountered in the case of Zn, Cu, Cr, Ni and especially Cd. For the sensitive detection of trends generally more than one sampler per measuring station is necessary as the standard deviation in the yearly averaged wet deposition is typically 10-20% relative for one sampler. Using three identical samplers, trends of, e.g. 3% per year will be generally detected in 6 years.

  14. Erosion-tectonics feedbacks in shaping the landscape: An example from the Mekele Outlier (Tigray, Ethiopia)

    NASA Astrophysics Data System (ADS)

    Sembroni, Andrea; Molin, Paola; Dramis, Francesco; Faccenna, Claudio; Abebe, Bekele

    2017-05-01

    An outlier consists of an area of younger rocks surrounded by older ones. Its formation is mainly related to the erosion of surrounding rocks which causes the interruption of the original continuity of the rocks. Because of its origin, an outlier is an important witness of the paleogeography of a region and, therefore, essential to understand its topographic and geological evolution. The Mekele Outlier (N Ethiopia) is characterized by poorly incised Mesozoic marine sediments and dolerites (∼2000 m in elevation), surrounded by strongly eroded Precambrian and Paleozoic rocks and Tertiary volcanic deposits in a context of a mantle supported topography. In the past, studies about the Mekele outlier focused mainly in the mere description of the stratigraphic and tectonic settings without taking into account the feedback between surface and deep processes in shaping such peculiar feature. In this study we present the geological and geomorphometric analyses of the Mekele Outlier taking into account the general topographic features (slope map, swath profiles, local relief), the river network and the principal tectonic lineaments of the outlier. The results trace the evolution of the study area as related not only to the mere erosion of the surrounding rocks but to a complex interaction between surface and deep processes where the lithology played a crucial role.

  15. Robust volcano plot: identification of differential metabolites in the presence of outliers.

    PubMed

    Kumar, Nishith; Hoque, Md Aminul; Sugimoto, Masahiro

    2018-04-11

    The identification of differential metabolites in metabolomics is still a big challenge and plays a prominent role in metabolomics data analyses. Metabolomics datasets often contain outliers because of analytical, experimental, and biological ambiguity, but the currently available differential metabolite identification techniques are sensitive to outliers. We propose a kernel weight based outlier-robust volcano plot for identifying differential metabolites from noisy metabolomics datasets. Two numerical experiments are used to evaluate the performance of the proposed technique against nine existing techniques, including the t-test and the Kruskal-Wallis test. Artificially generated data with outliers reveal that the proposed method results in a lower misclassification error rate and a greater area under the receiver operating characteristic curve compared with existing methods. An experimentally measured breast cancer dataset to which outliers were artificially added reveals that our proposed method produces only two non-overlapping differential metabolites whereas the other nine methods produced between seven and 57 non-overlapping differential metabolites. Our data analyses show that the performance of the proposed differential metabolite identification technique is better than that of existing methods. Thus, the proposed method can contribute to analysis of metabolomics data with outliers. The R package and user manual of the proposed method are available at https://github.com/nishithkumarpaul/Rvolcano .

  16. Updating the Standard Spatial Observer for Contrast Detection

    NASA Technical Reports Server (NTRS)

    Ahumada, Albert J.; Watson, Andrew B.

    2011-01-01

    Watson and Ahmuada (2005) constructed a Standard Spatial Observer (SSO) model for foveal luminance contrast signal detection based on the Medelfest data (Watson, 1999). Here we propose two changes to the model, dropping the oblique effect from the CSF and using the cone density data of Curcio et al. (1990) to estimate the variation of sensitivity with eccentricity. Dropping the complex images, and using medians to exclude outlier data points, the SSO model now accounts for essentially all the predictable variance in the data, with an RMS prediction error of only 0.67 dB.

  17. A multivariate assessment of changes in wetland habitat for waterbirds at Moosehorn National Wildlife Refuge, Maine, USA

    USGS Publications Warehouse

    Hierl, L.A.; Loftin, C.S.; Longcore, J.R.; McAuley, D.G.; Urban, D.L.

    2007-01-01

    We assessed changes in vegetative structure of 49 impoundments at Moosehorn National Wildlife Refuge (MNWR), Maine, USA, between the periods 1984-1985 to 2002 with a multivariate, adaptive approach that may be useful in a variety of wetland and other habitat management situations. We used Mahalanobis Distance (MD) analysis to classify the refuge?s wetlands as poor or good waterbird habitat based on five variables: percent emergent vegetation, percent shrub, percent open water, relative richness of vegetative types, and an interspersion juxtaposition index that measures adjacency of vegetation patches. Mahalanobis Distance is a multivariate statistic that examines whether a particular data point is an outlier or a member of a data cluster while accounting for correlations among inputs. For each wetland, we used MD analysis to quantify a distance from a reference condition defined a priori by habitat conditions measured in MNWR wetlands used by waterbirds. Twenty-five wetlands declined in quality between the two periods, whereas 23 wetlands improved. We identified specific wetland characteristics that may be modified to improve habitat conditions for waterbirds. The MD analysis seems ideal for instituting an adaptive wetland management approach because metrics can be easily added or removed, ranges of target habitat conditions can be defined by field-collected data, and the analysis can identify priorities for single or multiple management objectives.

  18. Gas detection by correlation spectroscopy employing a multimode diode laser.

    PubMed

    Lou, Xiutao; Somesfalean, Gabriel; Zhang, Zhiguo

    2008-05-01

    A gas sensor based on the gas-correlation technique has been developed using a multimode diode laser (MDL) in a dual-beam detection scheme. Measurement of CO(2) mixed with CO as an interfering gas is successfully demonstrated using a 1570 nm tunable MDL. Despite overlapping absorption spectra and occasional mode hops, the interfering signals can be effectively excluded by a statistical procedure including correlation analysis and outlier identification. The gas concentration is retrieved from several pair-correlated signals by a linear-regression scheme, yielding a reliable and accurate measurement. This demonstrates the utility of the unsophisticated MDLs as novel light sources for gas detection applications.

  19. Pathway-based outlier method reveals heterogeneous genomic structure of autism in blood transcriptome

    PubMed Central

    2013-01-01

    Background Decades of research strongly suggest that the genetic etiology of autism spectrum disorders (ASDs) is heterogeneous. However, most published studies focus on group differences between cases and controls. In contrast, we hypothesized that the heterogeneity of the disorder could be characterized by identifying pathways for which individuals are outliers rather than pathways representative of shared group differences of the ASD diagnosis. Methods Two previously published blood gene expression data sets – the Translational Genetics Research Institute (TGen) dataset (70 cases and 60 unrelated controls) and the Simons Simplex Consortium (Simons) dataset (221 probands and 191 unaffected family members) – were analyzed. All individuals of each dataset were projected to biological pathways, and each sample’s Mahalanobis distance from a pooled centroid was calculated to compare the number of case and control outliers for each pathway. Results Analysis of a set of blood gene expression profiles from 70 ASD and 60 unrelated controls revealed three pathways whose outliers were significantly overrepresented in the ASD cases: neuron development including axonogenesis and neurite development (29% of ASD, 3% of control), nitric oxide signaling (29%, 3%), and skeletal development (27%, 3%). Overall, 50% of cases and 8% of controls were outliers in one of these three pathways, which could not be identified using group comparison or gene-level outlier methods. In an independently collected data set consisting of 221 ASD and 191 unaffected family members, outliers in the neurogenesis pathway were heavily biased towards cases (20.8% of ASD, 12.0% of control). Interestingly, neurogenesis outliers were more common among unaffected family members (Simons) than unrelated controls (TGen), but the statistical significance of this effect was marginal (Chi squared P < 0.09). Conclusions Unlike group difference approaches, our analysis identified the samples within the case and control groups that manifested each expression signal, and showed that outlier groups were distinct for each implicated pathway. Moreover, our results suggest that by seeking heterogeneity, pathway-based outlier analysis can reveal expression signals that are not apparent when considering only shared group differences. PMID:24063311

  20. The comparison between several robust ridge regression estimators in the presence of multicollinearity and multiple outliers

    NASA Astrophysics Data System (ADS)

    Zahari, Siti Meriam; Ramli, Norazan Mohamed; Moktar, Balkiah; Zainol, Mohammad Said

    2014-09-01

    In the presence of multicollinearity and multiple outliers, statistical inference of linear regression model using ordinary least squares (OLS) estimators would be severely affected and produces misleading results. To overcome this, many approaches have been investigated. These include robust methods which were reported to be less sensitive to the presence of outliers. In addition, ridge regression technique was employed to tackle multicollinearity problem. In order to mitigate both problems, a combination of ridge regression and robust methods was discussed in this study. The superiority of this approach was examined when simultaneous presence of multicollinearity and multiple outliers occurred in multiple linear regression. This study aimed to look at the performance of several well-known robust estimators; M, MM, RIDGE and robust ridge regression estimators, namely Weighted Ridge M-estimator (WRM), Weighted Ridge MM (WRMM), Ridge MM (RMM), in such a situation. Results of the study showed that in the presence of simultaneous multicollinearity and multiple outliers (in both x and y-direction), the RMM and RIDGE are more or less similar in terms of superiority over the other estimators, regardless of the number of observation, level of collinearity and percentage of outliers used. However, when outliers occurred in only single direction (y-direction), the WRMM estimator is the most superior among the robust ridge regression estimators, by producing the least variance. In conclusion, the robust ridge regression is the best alternative as compared to robust and conventional least squares estimators when dealing with simultaneous presence of multicollinearity and outliers.

  1. 40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... 40 Protection of Environment 19 2011-07-01 2011-07-01 false Statistical Outlier Identification... (CONTINUED) Pt. 86, App. XVIII Appendix XVIII to Part 86—Statistical Outlier Identification Procedure for..., but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the...

  2. 40 CFR Appendix Xviii to Part 86 - Statistical Outlier Identification Procedure for Light-Duty Vehicles and Light Light-Duty Trucks...

    Code of Federal Regulations, 2010 CFR

    2010-07-01

    ... 40 Protection of Environment 19 2010-07-01 2010-07-01 false Statistical Outlier Identification... (CONTINUED) Pt. 86, App. XVIII Appendix XVIII to Part 86—Statistical Outlier Identification Procedure for..., but suffer theoretical deficiencies if statistical significance tests are required. Consequently, the...

  3. Information Processing Research.

    DTIC Science & Technology

    1986-09-01

    Kuroe. The 3D MOSAIC Scene Understanding System. In Alan Bundy, Editor, Proceedings of the Eighth International Joint Conference on Artificial ... Artificial Jntelligencel7(1-3):409-460, August, 1981. Given a single picture which is a projection of a three-dimensional scene onto the two...values are detected as outliers by computing the distribution of values over a sliding 80 msec window. During the third pass (based on artificial

  4. L1 norm based common spatial patterns decomposition for scalp EEG BCI.

    PubMed

    Li, Peiyang; Xu, Peng; Zhang, Rui; Guo, Lanjin; Yao, Dezhong

    2013-08-06

    Brain computer interfaces (BCI) is one of the most popular branches in biomedical engineering. It aims at constructing a communication between the disabled persons and the auxiliary equipments in order to improve the patients' life. In motor imagery (MI) based BCI, one of the popular feature extraction strategies is Common Spatial Patterns (CSP). In practical BCI situation, scalp EEG inevitably has the outlier and artifacts introduced by ocular, head motion or the loose contact of electrodes in scalp EEG recordings. Because outlier and artifacts are usually observed with large amplitude, when CSP is solved in view of L2 norm, the effect of outlier and artifacts will be exaggerated due to the imposing of square to outliers, which will finally influence the MI based BCI performance. While L1 norm will lower the outlier effects as proved in other application fields like EEG inverse problem, face recognition, etc. In this paper, we present a new CSP implementation using the L1 norm technique, instead of the L2 norm, to solve the eigen problem for spatial filter estimation with aim to improve the robustness of CSP to outliers. To evaluate the performance of our method, we applied our method as well as the standard CSP and the regularized CSP with Tikhonov regularization (TR-CSP), on both the peer BCI dataset with simulated outliers and the dataset from the MI BCI system developed in our group. The McNemar test is used to investigate whether the difference among the three CSPs is of statistical significance. The results of both the simulation and real BCI datasets consistently reveal that the proposed method has much higher classification accuracies than the conventional CSP and the TR-CSP. By combining L1 norm based Eigen decomposition into Common Spatial Patterns, the proposed approach can effectively improve the robustness of BCI system to EEG outliers and thus be potential for the actual MI BCI application, where outliers are inevitably introduced into EEG recordings.

  5. Statistical techniques applied to aerial radiometric surveys (STAARS): principal components analysis user's manual. [NURE program

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Koch, C.D.; Pirkle, F.L.; Schmidt, J.S.

    1981-01-01

    A Principal Components Analysis (PCA) has been written to aid in the interpretation of multivariate aerial radiometric data collected by the US Department of Energy (DOE) under the National Uranium Resource Evaluation (NURE) program. The variations exhibited by these data have been reduced and classified into a number of linear combinations by using the PCA program. The PCA program then generates histograms and outlier maps of the individual variates. Black and white plots can be made on a Calcomp plotter by the application of follow-up programs. All programs referred to in this guide were written for a DEC-10. From thismore » analysis a geologist may begin to interpret the data structure. Insight into geological processes underlying the data may be obtained.« less

  6. How Significant Is a Boxplot Outlier?

    ERIC Educational Resources Information Center

    Dawson, Robert

    2011-01-01

    It is common to consider Tukey's schematic ("full") boxplot as an informal test for the existence of outliers. While the procedure is useful, it should be used with caution, as at least 30% of samples from a normally-distributed population of any size will be flagged as containing an outlier, while for small samples (N less than 10) even extreme…

  7. The Impact of Outliers on Cronbach's Coefficient Alpha Estimate of Reliability: Visual Analogue Scales

    ERIC Educational Resources Information Center

    Liu, Yan; Zumbo, Bruno D.

    2007-01-01

    The impact of outliers on Cronbach's coefficient [alpha] has not been documented in the psychometric or statistical literature. This is an important gap because coefficient [alpha] is the most widely used measurement statistic in all of the social, educational, and health sciences. The impact of outliers on coefficient [alpha] is investigated for…

  8. 42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

    Code of Federal Regulations, 2012 CFR

    2012-10-01

    ... 42 Public Health 2 2012-10-01 2012-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...

  9. 42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

    Code of Federal Regulations, 2013 CFR

    2013-10-01

    ... 42 Public Health 2 2013-10-01 2013-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...

  10. 42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... 42 Public Health 2 2011-10-01 2011-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...

  11. 42 CFR 412.82 - Payment for extended length-of-stay cases (day outliers).

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... 42 Public Health 2 2010-10-01 2010-10-01 false Payment for extended length-of-stay cases (day outliers). 412.82 Section 412.82 Public Health CENTERS FOR MEDICARE & MEDICAID SERVICES, DEPARTMENT OF... Certain Replaced Devices Payment for Outlier Cases § 412.82 Payment for extended length-of-stay cases (day...

  12. Outlier Removal and the Relation with Reporting Errors and Quality of Psychological Research

    PubMed Central

    Bakker, Marjan; Wicherts, Jelte M.

    2014-01-01

    Background The removal of outliers to acquire a significant result is a questionable research practice that appears to be commonly used in psychology. In this study, we investigated whether the removal of outliers in psychology papers is related to weaker evidence (against the null hypothesis of no effect), a higher prevalence of reporting errors, and smaller sample sizes in these papers compared to papers in the same journals that did not report the exclusion of outliers from the analyses. Methods and Findings We retrieved a total of 2667 statistical results of null hypothesis significance tests from 153 articles in main psychology journals, and compared results from articles in which outliers were removed (N = 92) with results from articles that reported no exclusion of outliers (N = 61). We preregistered our hypotheses and methods and analyzed the data at the level of articles. Results show no significant difference between the two types of articles in median p value, sample sizes, or prevalence of all reporting errors, large reporting errors, and reporting errors that concerned the statistical significance. However, we did find a discrepancy between the reported degrees of freedom of t tests and the reported sample size in 41% of articles that did not report removal of any data values. This suggests common failure to report data exclusions (or missingness) in psychological articles. Conclusions We failed to find that the removal of outliers from the analysis in psychological articles was related to weaker evidence (against the null hypothesis of no effect), sample size, or the prevalence of errors. However, our control sample might be contaminated due to nondisclosure of excluded values in articles that did not report exclusion of outliers. Results therefore highlight the importance of more transparent reporting of statistical analyses. PMID:25072606

  13. Initiating statistical process control to improve quality outcomes in colorectal surgery.

    PubMed

    Keller, Deborah S; Stulberg, Jonah J; Lawrence, Justin K; Samia, Hoda; Delaney, Conor P

    2015-12-01

    Unexpected variations in postoperative length of stay (LOS) negatively impact resources and patient outcomes. Statistical process control (SPC) measures performance, evaluates productivity, and modifies processes for optimal performance. The goal of this study was to initiate SPC to identify LOS outliers and evaluate its feasibility to improve outcomes in colorectal surgery. Review of a prospective database identified colorectal procedures performed by a single surgeon. Patients were grouped into elective and emergent categories and then stratified by laparoscopic and open approaches. All followed a standardized enhanced recovery protocol. SPC was applied to identify outliers and evaluate causes within each group. A total of 1294 cases were analyzed--83% elective (n = 1074) and 17% emergent (n = 220). Emergent cases were 70.5% open and 29.5% laparoscopic; elective cases were 36.8% open and 63.2% laparoscopic. All groups had a wide range in LOS. LOS outliers ranged from 8.6% (elective laparoscopic) to 10.8% (emergent laparoscopic). Evaluation of outliers demonstrated patient characteristics of higher ASA scores, longer operating times, ICU requirement, and temporary nursing at discharge. Outliers had higher postoperative complication rates in elective open (57.1 vs. 20.0%) and elective lap groups (77.6 vs. 26.1%). Outliers also had higher readmission rates for emergent open (11.4 vs. 5.4%), emergent lap (14.3 vs. 9.2%), and elective lap (32.8 vs. 6.9%). Elective open outliers did not follow trends of longer LOS or higher reoperation rates. SPC is feasible and promising for improving colorectal surgery outcomes. SPC identified patient and process characteristics associated with increased LOS. SPC may allow real-time outlier identification, during quality improvement efforts, and reevaluation of outcomes after introducing process change. SPC has clinical implications for improving patient outcomes and resource utilization.

  14. Appending Limited Clinical Data to an Administrative Database for Acute Myocardial Infarction Patients: The Impact on the Assessment of Hospital Quality.

    PubMed

    Hannan, Edward L; Samadashvili, Zaza; Cozzens, Kimberly; Jacobs, Alice K; Venditti, Ferdinand J; Holmes, David R; Berger, Peter B; Stamato, Nicholas J; Hughes, Suzanne; Walford, Gary

    2016-05-01

    Hospitals' risk-standardized mortality rates and outlier status (significantly higher/lower rates) are reported by the Centers for Medicare and Medicaid Services (CMS) for acute myocardial infarction (AMI) patients using Medicare claims data. New York now has AMI claims data with blood pressure and heart rate added. The objective of this study was to see whether the appended database yields different hospital assessments than standard claims data. New York State clinically appended claims data for AMI were used to create 2 different risk models based on CMS methods: 1 with and 1 without the added clinical data. Model discrimination was compared, and differences between the models in hospital outlier status and tertile status were examined. Mean arterial pressure and heart rate were both significant predictors of mortality in the clinically appended model. The C statistic for the model with the clinical variables added was significantly higher (0.803 vs. 0.773, P<0.001). The model without clinical variables identified 10 low outliers and all of them were percutaneous coronary intervention hospitals. When clinical variables were included in the model, only 6 of those 10 hospitals were low outliers, but there were 2 new low outliers. The model without clinical variables had only 3 high outliers, and the model with clinical variables included identified 2 new high outliers. Appending even a small number of clinical data elements to administrative data resulted in a difference in the assessment of hospital mortality outliers for AMI. The strategy of adding limited but important clinical data elements to administrative datasets should be considered when evaluating hospital quality for procedures and other medical conditions.

  15. On the predictability of outliers in ensemble forecasts

    NASA Astrophysics Data System (ADS)

    Siegert, S.; Bröcker, J.; Kantz, H.

    2012-03-01

    In numerical weather prediction, ensembles are used to retrieve probabilistic forecasts of future weather conditions. We consider events where the verification is smaller than the smallest, or larger than the largest ensemble member of a scalar ensemble forecast. These events are called outliers. In a statistically consistent K-member ensemble, outliers should occur with a base rate of 2/(K+1). In operational ensembles this base rate tends to be higher. We study the predictability of outlier events in terms of the Brier Skill Score and find that forecast probabilities can be calculated which are more skillful than the unconditional base rate. This is shown analytically for statistically consistent ensembles. Using logistic regression, forecast probabilities for outlier events in an operational ensemble are calculated. These probabilities exhibit positive skill which is quantitatively similar to the analytical results. Possible causes of these results as well as their consequences for ensemble interpretation are discussed.

  16. An application of robust ridge regression model in the presence of outliers to real data problem

    NASA Astrophysics Data System (ADS)

    Shariff, N. S. Md.; Ferdaos, N. A.

    2017-09-01

    Multicollinearity and outliers are often leads to inconsistent and unreliable parameter estimates in regression analysis. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is believed are affected by the presence of outlier. The combination of GM-estimation and ridge parameter that is robust towards both problems is on interest in this study. As such, both techniques are employed to investigate the relationship between stock market price and macroeconomic variables in Malaysia due to curiosity of involving the multicollinearity and outlier problem in the data set. There are four macroeconomic factors selected for this study which are Consumer Price Index (CPI), Gross Domestic Product (GDP), Base Lending Rate (BLR) and Money Supply (M1). The results demonstrate that the proposed procedure is able to produce reliable results towards the presence of multicollinearity and outliers in the real data.

  17. A Comparison of Best Fit Lines for Data with Outliers

    ERIC Educational Resources Information Center

    Glaister, P.

    2005-01-01

    Three techniques for determining a straight line fit to data are compared. The methods are applied to a range of datasets containing one or more outliers, and to a specific example from the field of chemistry. For the method which is the most resistant to the presence of outliers, a Microsoft Excel spreadsheet, as well as two Matlab routines, are…

  18. Dispersal of emerald ash borer at outlier sites: three case studies

    Treesearch

    Deborah G. McCullough; Nathan W. Siegert; Therese M. Poland; David L. Cappaert; Ivich Fraser; David Williams

    2005-01-01

    We worked with cooperators from several state and federal agencies in 2003 and 2004 to assess dispersal of emerald ash borer (EAB), Agrilus planipennis Fairmaire, from known source points in three outlier sites. In February 2003, we felled and sampled more than 200 ash trees at an outlier site near Tipton, Michigan, where one generation of adult...

  19. Neural Mechanisms Behind Identification of Leptokurtic Noise and Adaptive Behavioral Response

    PubMed Central

    d'Acremont, Mathieu; Bossaerts, Peter

    2016-01-01

    Large-scale human interaction through, for example, financial markets causes ceaseless random changes in outcome variability, producing frequent and salient outliers that render the outcome distribution more peaked than the Gaussian distribution, and with longer tails. Here, we study how humans cope with this evolutionary novel leptokurtic noise, focusing on the neurobiological mechanisms that allow the brain, 1) to recognize the outliers as noise and 2) to regulate the control necessary for adaptive response. We used functional magnetic resonance imaging, while participants tracked a target whose movements were affected by leptokurtic noise. After initial overreaction and insufficient subsequent correction, participants improved performance significantly. Yet, persistently long reaction times pointed to continued need for vigilance and control. We ran a contrasting treatment where outliers reflected permanent moves of the target, as in traditional mean-shift paradigms. Importantly, outliers were equally frequent and salient. There, control was superior and reaction time was faster. We present a novel reinforcement learning model that fits observed choices better than the Bayes-optimal model. Only anterior insula discriminated between the 2 types of outliers. In both treatments, outliers initially activated an extensive bottom-up attention and belief network, followed by sustained engagement of the fronto-parietal control network. PMID:26850528

  20. Autoregressive model in the Lp norm space for EEG analysis.

    PubMed

    Li, Peiyang; Wang, Xurui; Li, Fali; Zhang, Rui; Ma, Teng; Peng, Yueheng; Lei, Xu; Tian, Yin; Guo, Daqing; Liu, Tiejun; Yao, Dezhong; Xu, Peng

    2015-01-30

    The autoregressive (AR) model is widely used in electroencephalogram (EEG) analyses such as waveform fitting, spectrum estimation, and system identification. In real applications, EEGs are inevitably contaminated with unexpected outlier artifacts, and this must be overcome. However, most of the current AR models are based on the L2 norm structure, which exaggerates the outlier effect due to the square property of the L2 norm. In this paper, a novel AR object function is constructed in the Lp (p≤1) norm space with the aim to compress the outlier effects on EEG analysis, and a fast iteration procedure is developed to solve this new AR model. The quantitative evaluation using simulated EEGs with outliers proves that the proposed Lp (p≤1) AR can estimate the AR parameters more robustly than the Yule-Walker, Burg and LS methods, under various simulated outlier conditions. The actual application to the resting EEG recording with ocular artifacts also demonstrates that Lp (p≤1) AR can effectively address the outliers and recover a resting EEG power spectrum that is more consistent with its physiological basis. Copyright © 2014 Elsevier B.V. All rights reserved.

  1. Results of continuous monitoring of the performance of rubella virus IgG and hepatitis B virus surface antibody assays using trueness controls in a multicenter trial.

    PubMed

    Kruk, Tamara; Ratnam, Sam; Preiksaitis, Jutta; Lau, Allan; Hatchette, Todd; Horsman, Greg; Van Caeseele, Paul; Timmons, Brian; Tipples, Graham

    2012-10-01

    We conducted a multicenter trial in Canada to assess the value of using trueness controls (TC) for rubella virus IgG and hepatitis B virus surface antibody (anti-HBs) serology to determine test performance across laboratories over time. TC were obtained from a single source with known international units. Seven laboratories using different test systems and kit lots included the TC in routine assay runs of the analytes. TC measurements of 1,095 rubella virus IgG and 1,195 anti-HBs runs were plotted on Levey-Jennings control charts for individual laboratories and analyzed using a multirule quality control (MQC) scheme as well as a single three-standard-deviation (3-SD) rule. All rubella virus IgG TC results were "in control" in only one of the seven laboratories. Among the rest, "out-of-control" results ranged from 5.6% to 10% with an outlier at 20.3% by MQC and from 1.1% to 5.6% with an outlier at 13.4% by the 3-SD rule. All anti-HBs TC results were "in control" in only two laboratories. Among the rest, "out-of-control" results ranged from 3.3% to 7.9% with an outlier at 19.8% by MQC and from 0% to 3.3% with an outlier at 10.5% by the 3-SD rule. In conclusion, through the continuous monitoring of assay performance using TC and quality control rules, our trial detected significant intra- and interlaboratory, test system, and kit lot variations for both analytes. In most cases the assay rejections could be attributable to the laboratories rather than to kit lots. This has implications for routine diagnostic screening and clinical practice guidelines and underscores the value of using an approach as described above for continuous quality improvement in result reporting and harmonization for these analytes.

  2. New approach for the identification of implausible values and outliers in longitudinal childhood anthropometric data.

    PubMed

    Shi, Joy; Korsiak, Jill; Roth, Daniel E

    2018-03-01

    We aimed to demonstrate the use of jackknife residuals to take advantage of the longitudinal nature of available growth data in assessing potential biologically implausible values and outliers. Artificial errors were induced in 5% of length, weight, and head circumference measurements, measured on 1211 participants from the Maternal Vitamin D for Infant Growth (MDIG) trial from birth to 24 months of age. Each child's sex- and age-standardized z-score or raw measurements were regressed as a function of age in child-specific models. Each error responsible for a biologically implausible decrease between a consecutive pair of measurements was identified based on the higher of the two absolute values of jackknife residuals in each pair. In further analyses, outliers were identified as those values beyond fixed cutoffs of the jackknife residuals (e.g., greater than +5 or less than -5 in primary analyses). Kappa, sensitivity, and specificity were calculated over 1000 simulations to assess the ability of the jackknife residual method to detect induced errors and to compare these methods with the use of conditional growth percentiles and conventional cross-sectional methods. Among the induced errors that resulted in a biologically implausible decrease in measurement between two consecutive values, the jackknife residual method identified the correct value in 84.3%-91.5% of these instances when applied to the sex- and age-standardized z-scores, with kappa values ranging from 0.685 to 0.795. Sensitivity and specificity of the jackknife method were higher than those of the conditional growth percentile method, but specificity was lower than for conventional cross-sectional methods. Using jackknife residuals provides a simple method to identify biologically implausible values and outliers in longitudinal child growth data sets in which each child contributes at least 4 serial measurements. Crown Copyright © 2018. Published by Elsevier Inc. All rights reserved.

  3. Drunk driving detection based on classification of multivariate time series.

    PubMed

    Li, Zhenlong; Jin, Xue; Zhao, Xiaohua

    2015-09-01

    This paper addresses the problem of detecting drunk driving based on classification of multivariate time series. First, driving performance measures were collected from a test in a driving simulator located in the Traffic Research Center, Beijing University of Technology. Lateral position and steering angle were used to detect drunk driving. Second, multivariate time series analysis was performed to extract the features. A piecewise linear representation was used to represent multivariate time series. A bottom-up algorithm was then employed to separate multivariate time series. The slope and time interval of each segment were extracted as the features for classification. Third, a support vector machine classifier was used to classify driver's state into two classes (normal or drunk) according to the extracted features. The proposed approach achieved an accuracy of 80.0%. Drunk driving detection based on the analysis of multivariate time series is feasible and effective. The approach has implications for drunk driving detection. Copyright © 2015 Elsevier Ltd and National Safety Council. All rights reserved.

  4. A Self-Aware Machine Platform in Manufacturing Shop Floor Utilizing MTConnect Data

    DTIC Science & Technology

    2014-10-02

    condition monitoring , and equipment time to failure prediction in manufacturing 1 ANNUAL CONFERENCE OF THE PROGNOSTICS AND HEALTH MANAGEMENT SOCIETY 2014 589...Component Level Health Monitoring and Prediction One of the characteristics of a self-aware machine is to be able to detect its components...the annual conference of the prognostics and health management society. Filzmoser, P., Garrett, R. G., & Reimann, C . (2005). Mul- tivariate outlier

  5. A Multi Agent System for Flow-Based Intrusion Detection

    DTIC Science & Technology

    2013-03-01

    Student t-test, as it is less likely to spuriously indicate significance because of the presence of outliers [128]. We use the MATLAB ranksum function [77...effectiveness of self-organization and “ entangled hierarchies” for accomplishing scenario objectives. One of the interesting features of SOMAS is the ability...cross-validation and automatic model selection. It has interfaces for Java, Python, R, Splus, MATLAB , Perl, Ruby, and LabVIEW. Kernels: linear

  6. Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection.

    PubMed

    Power, Jonathan D; Plitt, Mark; Kundu, Prantik; Bandettini, Peter A; Martin, Alex

    2017-01-01

    Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement) often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10-50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed). We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion).

  7. Temporal interpolation alters motion in fMRI scans: Magnitudes and consequences for artifact detection

    PubMed Central

    Plitt, Mark; Kundu, Prantik; Bandettini, Peter A.; Martin, Alex

    2017-01-01

    Head motion can be estimated at any point of fMRI image processing. Processing steps involving temporal interpolation (e.g., slice time correction or outlier replacement) often precede motion estimation in the literature. From first principles it can be anticipated that temporal interpolation will alter head motion in a scan. Here we demonstrate this effect and its consequences in five large fMRI datasets. Estimated head motion was reduced by 10–50% or more following temporal interpolation, and reductions were often visible to the naked eye. Such reductions make the data seem to be of improved quality. Such reductions also degrade the sensitivity of analyses aimed at detecting motion-related artifact and can cause a dataset with artifact to falsely appear artifact-free. These reduced motion estimates will be particularly problematic for studies needing estimates of motion in time, such as studies of dynamics. Based on these findings, it is sensible to obtain motion estimates prior to any image processing (regardless of subsequent processing steps and the actual timing of motion correction procedures, which need not be changed). We also find that outlier replacement procedures change signals almost entirely during times of motion and therefore have notable similarities to motion-targeting censoring strategies (which withhold or replace signals entirely during times of motion). PMID:28880888

  8. Genome scans for divergent selection in natural populations of the widespread hardwood species Eucalyptus grandis (Myrtaceae) using microsatellites

    PubMed Central

    Song, Zhijiao; Zhang, Miaomiao; Li, Fagen; Weng, Qijie; Zhou, Chanpin; Li, Mei; Li, Jie; Huang, Huanhua; Mo, Xiaoyong; Gan, Siming

    2016-01-01

    Identification of loci or genes under natural selection is important for both understanding the genetic basis of local adaptation and practical applications, and genome scans provide a powerful means for such identification purposes. In this study, genome-wide simple sequence repeats markers (SSRs) were used to scan for molecular footprints of divergent selection in Eucalyptus grandis, a hardwood species occurring widely in costal areas from 32° S to 16° S in Australia. High population diversity levels and weak population structure were detected with putatively neutral genomic SSRs. Using three FST outlier detection methods, a total of 58 outlying SSRs were collectively identified as loci under divergent selection against three non-correlated climatic variables, namely, mean annual temperature, isothermality and annual precipitation. Using a spatial analysis method, nine significant associations were revealed between FST outlier allele frequencies and climatic variables, involving seven alleles from five SSR loci. Of the five significant SSRs, two (EUCeSSR1044 and Embra394) contained alleles of putative genes with known functional importance for response to climatic factors. Our study presents critical information on the population diversity and structure of the important woody species E. grandis and provides insight into the adaptive responses of perennial trees to climatic variations. PMID:27748400

  9. Diagnostics for generalized linear hierarchical models in network meta-analysis.

    PubMed

    Zhao, Hong; Hodges, James S; Carlin, Bradley P

    2017-09-01

    Network meta-analysis (NMA) combines direct and indirect evidence comparing more than 2 treatments. Inconsistency arises when these 2 information sources differ. Previous work focuses on inconsistency detection, but little has been done on how to proceed after identifying inconsistency. The key issue is whether inconsistency changes an NMA's substantive conclusions. In this paper, we examine such discrepancies from a diagnostic point of view. Our methods seek to detect influential and outlying observations in NMA at a trial-by-arm level. These observations may have a large effect on the parameter estimates in NMA, or they may deviate markedly from other observations. We develop formal diagnostics for a Bayesian hierarchical model to check the effect of deleting any observation. Diagnostics are specified for generalized linear hierarchical NMA models and investigated for both published and simulated datasets. Results from our example dataset using either contrast- or arm-based models and from the simulated datasets indicate that the sources of inconsistency in NMA tend not to be influential, though results from the example dataset suggest that they are likely to be outliers. This mimics a familiar result from linear model theory, in which outliers with low leverage are not influential. Future extensions include incorporating baseline covariates and individual-level patient data. Copyright © 2017 John Wiley & Sons, Ltd.

  10. Reliable noninvasive measurement of blood gases

    DOEpatents

    Thomas, Edward V.; Robinson, Mark R.; Haaland, David M.; Alam, Mary K.

    1994-01-01

    Methods and apparatus for, preferably, determining noninvasively and in vivo at least two of the five blood gas parameters (i.e., pH, PCO.sub.2, [HCO.sub.3.sup.- ], PO.sub.2, and O.sub.2 sat.) in a human. The non-invasive method includes the steps of: generating light at three or more different wavelengths in the range of 500 nm to 2500 nm; irradiating blood containing tissue; measuring the intensities of the wavelengths emerging from the blood containing tissue to obtain a set of at least three spectral intensities v. wavelengths; and determining the unknown values of at least two of pH, [HCO.sub.3.sup.- ], PCO.sub.2 and a measure of oxygen concentration. The determined values are within the physiological ranges observed in blood containing tissue. The method also includes the steps of providing calibration samples, determining if the spectral intensities v. wavelengths from the tissue represents an outlier, and determining if any of the calibration samples represents an outlier. The determination of the unknown values is performed by at least one multivariate algorithm using two or more variables and at least one calibration model. Preferably, there is a separate calibration for each blood gas parameter being determined. The method can be utilized in a pulse mode and can also be used invasively. The apparatus includes a tissue positioning device, a source, at least one detector, electronics, a microprocessor, memory, and apparatus for indicating the determined values.

  11. The population genomic signature of environmental selection in the widespread insect-pollinated tree species Frangula alnus at different geographical scales

    PubMed Central

    De Kort, H; Vandepitte, K; Mergeay, J; Mijnsbrugge, K V; Honnay, O

    2015-01-01

    The evaluation of the molecular signatures of selection in species lacking an available closely related reference genome remains challenging, yet it may provide valuable fundamental insights into the capacity of populations to respond to environmental cues. We screened 25 native populations of the tree species Frangula alnus subsp. alnus (Rhamnaceae), covering three different geographical scales, for 183 annotated single-nucleotide polymorphisms (SNPs). Standard population genomic outlier screens were combined with individual-based and multivariate landscape genomic approaches to examine the strength of selection relative to neutral processes in shaping genomic variation, and to identify the main environmental agents driving selection. Our results demonstrate a more distinct signature of selection with increasing geographical distance, as indicated by the proportion of SNPs (i) showing exceptional patterns of genetic diversity and differentiation (outliers) and (ii) associated with climate. Both temperature and precipitation have an important role as selective agents in shaping adaptive genomic differentiation in F. alnus subsp. alnus, although their relative importance differed among spatial scales. At the ‘intermediate' and ‘regional' scales, where limited genetic clustering and high population diversity were observed, some indications of natural selection may suggest a major role for gene flow in safeguarding adaptability. High genetic diversity at loci under selection in particular, indicated considerable adaptive potential, which may nevertheless be compromised by the combined effects of climate change and habitat fragmentation. PMID:25944466

  12. Urban air quality assessment using monitoring data of fractionized aerosol samples, chemometrics and meteorological conditions.

    PubMed

    Yotova, Galina I; Tsitouridou, Roxani; Tsakovski, Stefan L; Simeonov, Vasil D

    2016-01-01

    The present article deals with assessment of urban air by using monitoring data for 10 different aerosol fractions (0.015-16 μm) collected at a typical urban site in City of Thessaloniki, Greece. The data set was subject to multivariate statistical analysis (cluster analysis and principal components analysis) and, additionally, to HYSPLIT back trajectory modeling in order to assess in a better way the impact of the weather conditions on the pollution sources identified. A specific element of the study is the effort to clarify the role of outliers in the data set. The reason for the appearance of outliers is strongly related to the atmospheric condition on the particular sampling days leading to enhanced concentration of pollutants (secondary emissions, sea sprays, road and soil dust, combustion processes) especially for ultra fine and coarse particles. It is also shown that three major sources affect the urban air quality of the location studied-sea sprays, mineral dust and anthropogenic influences (agricultural activity, combustion processes, and industrial sources). The level of impact is related to certain extent to the aerosol fraction size. The assessment of the meteorological conditions leads to defining of four downwind patterns affecting the air quality (Pelagic, Western and Central Europe, Eastern and Northeastern Europe and Africa and Southern Europe). Thus, the present study offers a complete urban air assessment taking into account the weather conditions, pollution sources and aerosol fractioning.

  13. Automated Detection of Knickpoints and Knickzones Across Transient Landscapes

    NASA Astrophysics Data System (ADS)

    Gailleton, B.; Mudd, S. M.; Clubb, F. J.

    2017-12-01

    Mountainous regions are ubiquitously dissected by river channels, which transmit climate and tectonic signals to the rest of the landscape by adjusting their long profiles. Fluvial response to allogenic forcing is often expressed through the upstream propagation of steepened reaches, referred to as knickpoints or knickzones. The identification and analysis of these steepened reaches has numerous applications in geomorphology, such as modelling long-term landscape evolution, understanding controls on fluvial incision, and constraining tectonic uplift histories. Traditionally, the identification of knickpoints or knickzones from fluvial profiles requires manual selection or calibration. This process is both time-consuming and subjective, as different workers may select different steepened reaches within the profile. We propose an objective, statistically-based method to systematically pick knickpoints/knickzones on a landscape scale using an outlier-detection algorithm. Our method integrates river profiles normalised by drainage area (Chi, using the approach of Perron and Royden, 2013), then separates the chi-elevation plots into a series of transient segments using the method of Mudd et al. (2014). This method allows the systematic detection of knickpoints across a DEM, regardless of size, using a high-performance algorithm implemented in the open-source Edinburgh Land Surface Dynamics Topographic Tools (LSDTopoTools) software package. After initial knickpoint identification, outliers are selected using several sorting and binning methods based on the Median Absolute Deviation, to avoid the influence sample size. We test our method on a series of DEMs and grid resolutions, and show that our method consistently identifies accurate knickpoint locations across each landscape tested.

  14. Increased frequencies of aberrant sperm as indicators of mutagenic damage in mice.

    PubMed

    Soares, E R; Sheridan, W; Haseman, J K; Segall, M

    1979-02-01

    We have tested the effects of TEM in 3 strains of mice using the sperm morphology assay. In addition, we have made an attempt to evaluate this test system with respect to experimental design, statistical problems and possible interlaboratory differences. Treatment with TEM results in significant increases in the percent of abnormally shaped sperm. These increases are readily detectable in sperm treated as spermatocytes and spermatogonial stages. Our data indicate possible problems associated with inter-laboratory variation in slide analysis. We have found that despite the introduction of such sources of variation, our data were consistent with respect to the effects of TEM. Another area of concern in the sperm morphology test is the presence of "outlier" animals. In our study, such animals comprised 4% of the total number of animals considered. Statistical analysis of the slides from these animals have shown that this problem can be dealt with and that when recognized as such, "outliers" do not effect the outcome of the sperm morphology assay.

  15. A computational study on outliers in world music

    PubMed Central

    Benetos, Emmanouil; Dixon, Simon

    2017-01-01

    The comparative analysis of world music cultures has been the focus of several ethnomusicological studies in the last century. With the advances of Music Information Retrieval and the increased accessibility of sound archives, large-scale analysis of world music with computational tools is today feasible. We investigate music similarity in a corpus of 8200 recordings of folk and traditional music from 137 countries around the world. In particular, we aim to identify music recordings that are most distinct compared to the rest of our corpus. We refer to these recordings as ‘outliers’. We use signal processing tools to extract music information from audio recordings, data mining to quantify similarity and detect outliers, and spatial statistics to account for geographical correlation. Our findings suggest that Botswana is the country with the most distinct recordings in the corpus and China is the country with the most distinct recordings when considering spatial correlation. Our analysis includes a comparison of musical attributes and styles that contribute to the ‘uniqueness’ of the music of each country. PMID:29253027

  16. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003-2012.

    PubMed

    Khan, Diba; Rossen, Lauren M; Hamilton, Brady E; He, Yulei; Wei, Rong; Dienes, Erin

    2017-06-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003-2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. Published by Elsevier Ltd.

  17. Hot spots, cluster detection and spatial outlier analysis of teen birth rates in the U.S., 2003–2012

    PubMed Central

    Khan, Diba; Rossen, Lauren M.; Hamilton, Brady E.; He, Yulei; Wei, Rong; Dienes, Erin

    2017-01-01

    Teen birth rates have evidenced a significant decline in the United States over the past few decades. Most of the states in the US have mirrored this national decline, though some reports have illustrated substantial variation in the magnitude of these decreases across the U.S. Importantly, geographic variation at the county level has largely not been explored. We used National Vital Statistics Births data and Hierarchical Bayesian space-time interaction models to produce smoothed estimates of teen birth rates at the county level from 2003–2012. Results indicate that teen birth rates show evidence of clustering, where hot and cold spots occur, and identify spatial outliers. Findings from this analysis may help inform efforts targeting the prevention efforts by illustrating how geographic patterns of teen birth rates have changed over the past decade and where clusters of high or low teen birth rates are evident. PMID:28552189

  18. Root System Water Consumption Pattern Identification on Time Series Data.

    PubMed

    Figueroa, Manuel; Pope, Christopher

    2017-06-16

    In agriculture, soil and meteorological sensors are used along low power networks to capture data, which allows for optimal resource usage and minimizing environmental impact. This study uses time series analysis methods for outliers' detection and pattern recognition on soil moisture sensor data to identify irrigation and consumption patterns and to improve a soil moisture prediction and irrigation system. This study compares three new algorithms with the current detection technique in the project; the results greatly decrease the number of false positives detected. The best result is obtained by the Series Strings Comparison (SSC) algorithm averaging a precision of 0.872 on the testing sets, vastly improving the current system's 0.348 precision.

  19. Multivariate regression methods for estimating velocity of ictal discharges from human microelectrode recordings

    NASA Astrophysics Data System (ADS)

    Liou, Jyun-you; Smith, Elliot H.; Bateman, Lisa M.; McKhann, Guy M., II; Goodman, Robert R.; Greger, Bradley; Davis, Tyler S.; Kellis, Spencer S.; House, Paul A.; Schevon, Catherine A.

    2017-08-01

    Objective. Epileptiform discharges, an electrophysiological hallmark of seizures, can propagate across cortical tissue in a manner similar to traveling waves. Recent work has focused attention on the origination and propagation patterns of these discharges, yielding important clues to their source location and mechanism of travel. However, systematic studies of methods for measuring propagation are lacking. Approach. We analyzed epileptiform discharges in microelectrode array recordings of human seizures. The array records multiunit activity and local field potentials at 400 micron spatial resolution, from a small cortical site free of obstructions. We evaluated several computationally efficient statistical methods for calculating traveling wave velocity, benchmarking them to analyses of associated neuronal burst firing. Main results. Over 90% of discharges met statistical criteria for propagation across the sampled cortical territory. Detection rate, direction and speed estimates derived from a multiunit estimator were compared to four field potential-based estimators: negative peak, maximum descent, high gamma power, and cross-correlation. Interestingly, the methods that were computationally simplest and most efficient (negative peak and maximal descent) offer non-inferior results in predicting neuronal traveling wave velocities compared to the other two, more complex methods. Moreover, the negative peak and maximal descent methods proved to be more robust against reduced spatial sampling challenges. Using least absolute deviation in place of least squares error minimized the impact of outliers, and reduced the discrepancies between local field potential-based and multiunit estimators. Significance. Our findings suggest that ictal epileptiform discharges typically take the form of exceptionally strong, rapidly traveling waves, with propagation detectable across millimeter distances. The sequential activation of neurons in space can be inferred from clinically-observable EEG data, with a variety of straightforward computation methods available. This opens possibilities for systematic assessments of ictal discharge propagation in clinical and research settings.

  20. SNP mining in Crassostrea gigas EST data: transferability to four other Crassostrea species, phylogenetic inferences and outlier SNPs under selection.

    PubMed

    Zhong, Xiaoxiao; Li, Qi; Yu, Hong; Kong, Lingfeng

    2014-01-01

    Oysters, with high levels of phenotypic plasticity and wide geographic distribution, are a challenging group for taxonomists and phylogenetics. Our study is intended to generate new EST-SNP markers and to evaluate their potential for cross-species utilization in phylogenetic study of the genus Crassostrea. In the study, 57 novel SNPs were developed from an EST database of C. gigas by the HRM (high-resolution melting) method. Transferability of 377 SNPs developed for C. gigas was examined on four other Crassostrea species: C. sikamea, C. angulata, C. hongkongensis and C. ariakensis. Among the 377 primer pairs tested, 311 (82.5%) primers showed amplification in C. sikamea, 353 (93.6%) in C. angulata, 254 (67.4%) in C. hongkongensis and 253 (67.1%) in C. ariakensis. A total of 214 SNPs were found to be transferable to all four species. Phylogenetic analyses showed that C. hongkongensis was a sister species of C. ariakensis and that this clade was sister to the clade containing C. sikamea, C. angulata and C. gigas. Within this clade, C. gigas and C. angulata had the closest relationship, with C. sikamea being the sister group. In addition, we detected eight SNPs as potentially being under selection by two outlier tests (fdist and hierarchical methods). The SNPs studied here should be useful for genetic diversity, comparative mapping and phylogenetic studies across species in Crassostrea and the candidate outlier SNPs are worth exploring in more detail regarding association genetics and functional studies.

  1. Crowdtruth validation: a new paradigm for validating algorithms that rely on image correspondences.

    PubMed

    Maier-Hein, Lena; Kondermann, Daniel; Roß, Tobias; Mersmann, Sven; Heim, Eric; Bodenstedt, Sebastian; Kenngott, Hannes Götz; Sanchez, Alexandro; Wagner, Martin; Preukschas, Anas; Wekerle, Anna-Laura; Helfert, Stefanie; März, Keno; Mehrabi, Arianeb; Speidel, Stefanie; Stock, Christian

    2015-08-01

    Feature tracking and 3D surface reconstruction are key enabling techniques to computer-assisted minimally invasive surgery. One of the major bottlenecks related to training and validation of new algorithms is the lack of large amounts of annotated images that fully capture the wide range of anatomical/scene variance in clinical practice. To address this issue, we propose a novel approach to obtaining large numbers of high-quality reference image annotations at low cost in an extremely short period of time. The concept is based on outsourcing the correspondence search to a crowd of anonymous users from an online community (crowdsourcing) and comprises four stages: (1) feature detection, (2) correspondence search via crowdsourcing, (3) merging multiple annotations per feature by fitting Gaussian finite mixture models, (4) outlier removal using the result of the clustering as input for a second annotation task. On average, 10,000 annotations were obtained within 24 h at a cost of $100. The annotation of the crowd after clustering and before outlier removal was of expert quality with a median distance of about 1 pixel to a publically available reference annotation. The threshold for the outlier removal task directly determines the maximum annotation error, but also the number of points removed. Our concept is a novel and effective method for fast, low-cost and highly accurate correspondence generation that could be adapted to various other applications related to large-scale data annotation in medical image computing and computer-assisted interventions.

  2. A robust data scaling algorithm to improve classification accuracies in biomedical data.

    PubMed

    Cao, Xi Hang; Stojkovic, Ivan; Obradovic, Zoran

    2016-09-09

    Machine learning models have been adapted in biomedical research and practice for knowledge discovery and decision support. While mainstream biomedical informatics research focuses on developing more accurate models, the importance of data preprocessing draws less attention. We propose the Generalized Logistic (GL) algorithm that scales data uniformly to an appropriate interval by learning a generalized logistic function to fit the empirical cumulative distribution function of the data. The GL algorithm is simple yet effective; it is intrinsically robust to outliers, so it is particularly suitable for diagnostic/classification models in clinical/medical applications where the number of samples is usually small; it scales the data in a nonlinear fashion, which leads to potential improvement in accuracy. To evaluate the effectiveness of the proposed algorithm, we conducted experiments on 16 binary classification tasks with different variable types and cover a wide range of applications. The resultant performance in terms of area under the receiver operation characteristic curve (AUROC) and percentage of correct classification showed that models learned using data scaled by the GL algorithm outperform the ones using data scaled by the Min-max and the Z-score algorithm, which are the most commonly used data scaling algorithms. The proposed GL algorithm is simple and effective. It is robust to outliers, so no additional denoising or outlier detection step is needed in data preprocessing. Empirical results also show models learned from data scaled by the GL algorithm have higher accuracy compared to the commonly used data scaling algorithms.

  3. Robust estimation for class averaging in cryo-EM Single Particle Reconstruction.

    PubMed

    Huang, Chenxi; Tagare, Hemant D

    2014-01-01

    Single Particle Reconstruction (SPR) for Cryogenic Electron Microscopy (cryo-EM) aligns and averages the images extracted from micrographs to improve the Signal-to-Noise ratio (SNR). Outliers compromise the fidelity of the averaging. We propose a robust cross-correlation-like w-estimator for combating the effect of outliers on the average images in cryo-EM. The estimator accounts for the natural variation of signal contrast among the images and eliminates the need for a threshold for outlier rejection. We show that the influence function of our estimator is asymptotically bounded. Evaluations of the estimator on simulated and real cryo-EM images show good performance in the presence of outliers.

  4. Hospital Characteristics Associated With Postdischarge Hospital Readmission, Observation, and Emergency Department Utilization.

    PubMed

    Horwitz, Leora I; Wang, Yongfei; Altaf, Faseeha K; Wang, Changqin; Lin, Zhenqiu; Liu, Shuling; Grady, Jacqueline; Bernheim, Susannah M; Desai, Nihar R; Venkatesh, Arjun K; Herrin, Jeph

    2018-04-01

    Whether types of hospitals with high readmission rates also have high overall postdischarge acute care utilization (including emergency department and observation care) is unknown. Cross-sectional analysis. Nonfederal United States acute care hospitals. Using methodology established by the Centers for Medicare & Medicaid Services, we calculated each hospital's "excess days in acute care" for fee-for-service (FFS) Medicare beneficiaries aged over 65 years discharged after hospitalization for acute myocardial infarction, heart failure (HF), or pneumonia, representing the mean difference between predicted and expected total days of acute care utilization in the 30 days following hospital discharge, per 100 discharges. We assessed the multivariable association of 8 hospital characteristics with excess days in acute care and the proportion of hospitals with each characteristic that were statistical outliers (95% credible interval estimate does not include 0). We included 2184 hospitals for acute myocardial infarction [228 (10.4%) better than expected, 549 (25.1%) worse than expected], 3720 hospitals for HF [484 (13.0%) better and 840 (22.6%) worse], and 4195 hospitals for pneumonia [673 (16.0%) better, 1005 (24.0%) worse]. Results for all conditions were similar. Worse than expected outliers for pneumonia included: 18.8% of safety net hospitals versus 26.1% of nonsafety net hospitals; 16.7% of public hospitals versus 33.1% of for-profit hospitals; 19.5% of nonteaching hospitals versus 52.2% of major teaching hospitals; 7.9% of rural hospitals versus 42.1% of large urban hospitals; 5.9% of hospitals with 24-<50 beds versus 58% of hospitals with >500 beds; and 29.0% of hospitals with nurse-to-bed ratios >1.0-1.5 versus 21.7% of hospitals with ratios >2.0. Including emergency department and observation stays in measures of postdischarge utilization produces similar results as measuring only readmissions in that major teaching, urban and for-profit hospitals still perform disproportionately poorly versus nonteaching or public hospitals. However, it enables identification of more outliers and a more granular assessment of the association of hospital factors and outcomes.

  5. Time of Flight Estimation in the Presence of Outliers: A Biosonar-Inspired Machine Learning Approach

    DTIC Science & Technology

    2013-08-29

    REPORT Time of Flight Estimation in the Presence of Outliers: A biosonar -inspired machine learning approach 14. ABSTRACT 16. SECURITY CLASSIFICATION OF...installations, biosonar , remote sensing, sonar resolution, sonar accuracy, sonar energy consumption Nathan Intrator, Leon N Cooper Brown University...Presence of Outliers: A biosonar -inspired machine learning approach Report Title ABSTRACT When the Signal-to-Noise Ratio (SNR) falls below a certain

  6. Outlier analysis of functional genomic profiles enriches for oncology targets and enables precision medicine.

    PubMed

    Zhu, Zhou; Ihle, Nathan T; Rejto, Paul A; Zarrinkar, Patrick P

    2016-06-13

    Genome-scale functional genomic screens across large cell line panels provide a rich resource for discovering tumor vulnerabilities that can lead to the next generation of targeted therapies. Their data analysis typically has focused on identifying genes whose knockdown enhances response in various pre-defined genetic contexts, which are limited by biological complexities as well as the incompleteness of our knowledge. We thus introduce a complementary data mining strategy to identify genes with exceptional sensitivity in subsets, or outlier groups, of cell lines, allowing an unbiased analysis without any a priori assumption about the underlying biology of dependency. Genes with outlier features are strongly and specifically enriched with those known to be associated with cancer and relevant biological processes, despite no a priori knowledge being used to drive the analysis. Identification of exceptional responders (outliers) may not lead only to new candidates for therapeutic intervention, but also tumor indications and response biomarkers for companion precision medicine strategies. Several tumor suppressors have an outlier sensitivity pattern, supporting and generalizing the notion that tumor suppressors can play context-dependent oncogenic roles. The novel application of outlier analysis described here demonstrates a systematic and data-driven analytical strategy to decipher large-scale functional genomic data for oncology target and precision medicine discoveries.

  7. Log Pearson type 3 quantile estimators with regional skew information and low outlier adjustments

    USGS Publications Warehouse

    Griffis, V.W.; Stedinger, Jery R.; Cohn, T.A.

    2004-01-01

    The recently developed expected moments algorithm (EMA) [Cohn et al., 1997] does as well as maximum likelihood estimations at estimating log‐Pearson type 3 (LP3) flood quantiles using systematic and historical flood information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study compares the performance of Bulletin 17B using the entire sample with and without regional skew with estimators that use regional skew and censor low outliers, including an extended EMA estimator, the conditional probability adjustment (CPA) from Bulletin 17B, and an estimator that uses probability plot regression (PPR) to compute substitute values for low outliers. Estimators that neglect regional skew information do much worse than estimators that use an informative regional skewness estimator. For LP3 data the low outlier rejection procedure generally results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew are generally modest in the skewness range of real interest. Samples contaminated to model actual flood data demonstrate that estimators which give special treatment to low outliers significantly outperform estimators that make no such adjustment.

  8. Log Pearson type 3 quantile estimators with regional skew information and low outlier adjustments

    NASA Astrophysics Data System (ADS)

    Griffis, V. W.; Stedinger, J. R.; Cohn, T. A.

    2004-07-01

    The recently developed expected moments algorithm (EMA) [, 1997] does as well as maximum likelihood estimations at estimating log-Pearson type 3 (LP3) flood quantiles using systematic and historical flood information. Needed extensions include use of a regional skewness estimator and its precision to be consistent with Bulletin 17B. Another issue addressed by Bulletin 17B is the treatment of low outliers. A Monte Carlo study compares the performance of Bulletin 17B using the entire sample with and without regional skew with estimators that use regional skew and censor low outliers, including an extended EMA estimator, the conditional probability adjustment (CPA) from Bulletin 17B, and an estimator that uses probability plot regression (PPR) to compute substitute values for low outliers. Estimators that neglect regional skew information do much worse than estimators that use an informative regional skewness estimator. For LP3 data the low outlier rejection procedure generally results in no loss of overall accuracy, and the differences between the MSEs of the estimators that used an informative regional skew are generally modest in the skewness range of real interest. Samples contaminated to model actual flood data demonstrate that estimators which give special treatment to low outliers significantly outperform estimators that make no such adjustment.

  9. Application of multivariate Gaussian detection theory to known non-Gaussian probability density functions

    NASA Astrophysics Data System (ADS)

    Schwartz, Craig R.; Thelen, Brian J.; Kenton, Arthur C.

    1995-06-01

    A statistical parametric multispectral sensor performance model was developed by ERIM to support mine field detection studies, multispectral sensor design/performance trade-off studies, and target detection algorithm development. The model assumes target detection algorithms and their performance models which are based on data assumed to obey multivariate Gaussian probability distribution functions (PDFs). The applicability of these algorithms and performance models can be generalized to data having non-Gaussian PDFs through the use of transforms which convert non-Gaussian data to Gaussian (or near-Gaussian) data. An example of one such transform is the Box-Cox power law transform. In practice, such a transform can be applied to non-Gaussian data prior to the introduction of a detection algorithm that is formally based on the assumption of multivariate Gaussian data. This paper presents an extension of these techniques to the case where the joint multivariate probability density function of the non-Gaussian input data is known, and where the joint estimate of the multivariate Gaussian statistics, under the Box-Cox transform, is desired. The jointly estimated multivariate Gaussian statistics can then be used to predict the performance of a target detection algorithm which has an associated Gaussian performance model.

  10. Can the national surgical quality improvement program provide surgeon-specific outcomes?

    PubMed

    Kuhnen, Angela H; Marcello, Peter W; Roberts, Patricia L; Read, Thomas E; Schoetz, David J; Rusin, Lawrence C; Hall, Jason F; Ricciardi, Rocco

    2015-02-01

    Efforts to improve the quality of surgical care and reduce morbidity and mortality have resulted in outcomes reporting at the service and institutional level. Surgeon-specific outcomes are not readily available. The aim of this study is to compare surgeon-specific outcomes from the National Surgical Quality Improvement Program and 100% capture institutional quality data. We conducted a cohort study evaluating institutional and surgeon-specific outcomes following colorectal surgery procedures at 1 institution over 5 years. All patients who underwent an operation by a colorectal surgeon at Lahey Hospital & Medical Center from January 1, 2008 through December 31, 2012 were identified. Thirty-day mortality, reoperation, urinary tract infection, deep vein thrombosis, pneumonia, superficial surgical site infection, and organ space infection were the primary outcomes measured. We compared annual and 5-year institutional and surgeon-specific adverse event rates between the data sets. In addition, we categorized individual surgeons as low-outlier, average, or high-outlier in relation to aggregate averages and determined the concordance between the data sets in identifying outliers. Concordance was designated if the 2 databases classified outlier status similarly for the same adverse event category. In the 100% capture institutional data, 6459 operative encounters were identified in comparison with 1786 National Surgical Quality Improvement Program encounters (28% sampled). Annual aggregate adverse event rates were similar between the institutional data and the National Surgical Quality Improvement Program. For annual surgeon-specific comparisons, concordance in identifying outliers between the 2 data sets was 51.4%, and gross discordance between outlier status was in 8.2%. Five-year surgeon-specific comparisons demonstrated 59% concordance in identifying outlier status with 8.2% gross discordance for the group. The inclusion of data from only 1 academic referral center is a limitation of this study. Each surgeon was identified as a "high outlier" in at least 1 adverse event category. Comparisons at the annual and 5-year points demonstrated poor concordance between our 100% capture institutional data and the National Surgical Quality Improvement Program data.

  11. Automated Discovery of Long Intergenic RNAs Associated with Breast Cancer Progression

    DTIC Science & Technology

    2012-02-01

    manuscript in preparation), (2) development and publication of an algorithm for detecting gene fusions in RNA-Seq data [1], and (3) discovery of outlier long...subjected to de novo assembly algorithms to discover novel transcripts representing either unannotated genes or novel somatic mutations such as gene...fusions. To this end the P.I. developed and published a novel algorithm called ChimeraScan to facilitate the discovery and validation of gene

  12. Automatic detection of a hand-held needle in ultrasound via phased-based analysis of the tremor motion

    NASA Astrophysics Data System (ADS)

    Beigi, Parmida; Salcudean, Septimiu E.; Rohling, Robert; Ng, Gary C.

    2016-03-01

    This paper presents an automatic localization method for a standard hand-held needle in ultrasound based on temporal motion analysis of spatially decomposed data. Subtle displacement arising from tremor motion has a periodic pattern which is usually imperceptible in the intensity image but may convey information in the phase image. Our method aims to detect such periodic motion of a hand-held needle and distinguish it from intrinsic tissue motion, using a technique inspired by video magnification. Complex steerable pyramids allow specific design of the wavelets' orientations according to the insertion angle as well as the measurement of the local phase. We therefore use steerable pairs of even and odd Gabor wavelets to decompose the ultrasound B-mode sequence into various spatial frequency bands. Variations of the local phase measurements in the spatially decomposed input data is then temporally analyzed using a finite impulse response bandpass filter to detect regions with a tremor motion pattern. Results obtained from different pyramid levels are then combined and thresholded to generate the binary mask input for the Hough transform, which determines an estimate of the direction angle and discards some of the outliers. Polynomial fitting is used at the final stage to remove any remaining outliers and improve the trajectory detection. The detected needle is finally added back to the input sequence as an overlay of a cloud of points. We demonstrate the efficiency of our approach to detect the needle using subtle tremor motion in an agar phantom and in-vivo porcine cases where intrinsic motion is also present. The localization accuracy was calculated by comparing to expert manual segmentation, and presented in (mean, standard deviation and root-mean-square error) of (0.93°, 1.26° and 0.87°) and (1.53 mm, 1.02 mm and 1.82 mm) for the trajectory and the tip, respectively.

  13. Moving from Descriptive to Causal Analytics: Case Study of the Health Indicators Warehouse

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Schryver, Jack C.; Shankar, Mallikarjun; Xu, Songhua

    The KDD community has described a multitude of methods for knowledge discovery on large datasets. We consider some of these methods and integrate them into an analyst s workflow that proceeds from the data-centric descriptive level to the model-centric causal level. Examples of the workflow are shown for the Health Indicators Warehouse, which is a public database for community health information that is a potent resource for conducting data science on a medium scale. We demonstrate the potential of HIW as a source of serious visual analytics efforts by showing correlation matrix visualizations, multivariate outlier analysis, multiple linear regression ofmore » Medicare costs, and scatterplot matrices for a broad set of health indicators. We conclude by sketching the first steps toward a causal dependence hypothesis.« less

  14. A practical method to detect the freezing/thawing onsets of seasonal frozen ground in Alaska

    NASA Astrophysics Data System (ADS)

    Chen, Xiyu; Liu, Lin

    2017-04-01

    Microwave remote sensing can provide useful information about freeze/thaw state of soil at the Earth surface. An edge detection method is applied in this study to estimate the onsets of soil freeze/thaw state transition using L band space-borne radiometer data. The Soil Moisture Active Passive (SMAP) mission has a L band radiometer and can provide daily brightness temperature (TB) with horizontal/vertical polarizations. We use the normalized polarization ratios (NPR) calculated based on the Level-1C TB product of SMAP (spatial resolution: 36 km) as the indicator for soil freeze/thaw state, to estimate the freezing and thawing onsets in Alaska in the year of 2015 and 2016. NPR is calculated based on the difference between TB at vertical and horizontal polarizations. Therefore, it is strongly sensitive to liquid water content change in the soil and independent with the soil temperature. Onset estimation is based on the detection of abrupt changes of NPR in transition seasons using edge detection method, and the validation is to compare estimated onsets with the onsets derived from in situ measurement. According to the comparison, the estimated onsets were generally 15 days earlier than the measured onsets in 2015. However, in 2016 there were 4 days in average for the estimation earlier than the measured, which may be due to the less snow cover. Moreover, we extended our estimation to the entire state of Alaska. The estimated freeze/thaw onsets showed a reasonable latitude-dependent distribution although there are still some outliers caused by the noisy variation of NPR. At last, we also try to remove these outliers and improve the performance of the method by smoothing the NPR time series.

  15. Gear Fault Detection Effectiveness as Applied to Tooth Surface Pitting Fatigue Damage

    NASA Technical Reports Server (NTRS)

    Lewicki, David G.; Dempsey, Paula J.; Heath, Gregory F.; Shanthakumaran, Perumal

    2009-01-01

    A study was performed to evaluate fault detection effectiveness as applied to gear tooth pitting fatigue damage. Vibration and oil-debris monitoring (ODM) data were gathered from 24 sets of spur pinion and face gears run during a previous endurance evaluation study. Three common condition indicators (RMS, FM4, and NA4) were deduced from the time-averaged vibration data and used with the ODM to evaluate their performance for gear fault detection. The NA4 parameter showed to be a very good condition indicator for the detection of gear tooth surface pitting failures. The FM4 and RMS parameters performed average to below average in detection of gear tooth surface pitting failures. The ODM sensor was successful in detecting a significant amount of debris from all the gear tooth pitting fatigue failures. Excluding outliers, the average cumulative mass at the end of a test was 40 mg.

  16. An Automated Algorithm to Screen Massive Training Samples for a Global Impervious Surface Classification

    NASA Technical Reports Server (NTRS)

    Tan, Bin; Brown de Colstoun, Eric; Wolfe, Robert E.; Tilton, James C.; Huang, Chengquan; Smith, Sarah E.

    2012-01-01

    An algorithm is developed to automatically screen the outliers from massive training samples for Global Land Survey - Imperviousness Mapping Project (GLS-IMP). GLS-IMP is to produce a global 30 m spatial resolution impervious cover data set for years 2000 and 2010 based on the Landsat Global Land Survey (GLS) data set. This unprecedented high resolution impervious cover data set is not only significant to the urbanization studies but also desired by the global carbon, hydrology, and energy balance researches. A supervised classification method, regression tree, is applied in this project. A set of accurate training samples is the key to the supervised classifications. Here we developed the global scale training samples from 1 m or so resolution fine resolution satellite data (Quickbird and Worldview2), and then aggregate the fine resolution impervious cover map to 30 m resolution. In order to improve the classification accuracy, the training samples should be screened before used to train the regression tree. It is impossible to manually screen 30 m resolution training samples collected globally. For example, in Europe only, there are 174 training sites. The size of the sites ranges from 4.5 km by 4.5 km to 8.1 km by 3.6 km. The amount training samples are over six millions. Therefore, we develop this automated statistic based algorithm to screen the training samples in two levels: site and scene level. At the site level, all the training samples are divided to 10 groups according to the percentage of the impervious surface within a sample pixel. The samples following in each 10% forms one group. For each group, both univariate and multivariate outliers are detected and removed. Then the screen process escalates to the scene level. A similar screen process but with a looser threshold is applied on the scene level considering the possible variance due to the site difference. We do not perform the screen process across the scenes because the scenes might vary due to the phenology, solar-view geometry, and atmospheric condition etc. factors but not actual landcover difference. Finally, we will compare the classification results from screened and unscreened training samples to assess the improvement achieved by cleaning up the training samples. Keywords:

  17. The origin of compact galaxies with anomalously high black hole masses

    NASA Astrophysics Data System (ADS)

    Barber, Christopher; Schaye, Joop; Bower, Richard G.; Crain, Robert A.; Schaller, Matthieu; Theuns, Tom

    2016-07-01

    Observations of local galaxies harbouring supermassive black holes (BH) of anomalously high mass, MBH, relative to their stellar mass, M*, appear to be at odds with simple models of the co-evolution between galaxies and their central BHs. We study the origin of such outliers in a Λ cold dark matter context using the EAGLE cosmological, hydrodynamical simulation. We find 15 `MBH(M*)-outlier' galaxies, defined as having MBH more than 1.5 dex above the median MBH(M*) relation in the simulation, MBH, med(M*). All MBH(M*)-outliers are satellite galaxies, typically with M* ˜ 1010 M⊙ and MBH ˜ 108 M⊙. They have all become outliers due to a combination of tidal stripping of their outer stellar component acting over several Gyr and early formation times leading to rapid BH growth at high redshift, with the former mechanism being most important for 67 per cent of these outliers. The same mechanisms also cause the MBH(M*)-outlier satellites to be amongst the most compact galaxies in the simulation, making them ideal candidates for ultracompact dwarf galaxy progenitors. The 10 most extreme central galaxies found at z = 0 (with log10(MBH/MBH, med(M*)) ∈ [1.2, 1.5]) grow rapidly in MBH to lie well above the present-day MBH - M* relation at early times (z ≳ 2), and either continue to evolve parallel to the z = 0 relation or remain unchanged until the present day, making them `relics' of the high-redshift universe. This high-z formation mechanism may help to explain the origin of observed MBH(M*)-outliers with extended dark matter haloes and undisturbed morphologies.

  18. Multivariate evoked response detection based on the spectral F-test.

    PubMed

    Rocha, Paulo Fábio F; Felix, Leonardo B; Miranda de Sá, Antonio Mauricio F L; Mendes, Eduardo M A M

    2016-05-01

    Objective response detection techniques, such as magnitude square coherence, component synchrony measure, and the spectral F-test, have been used to automate the detection of evoked responses. The performance of these detectors depends on both the signal-to-noise ratio (SNR) and the length of the electroencephalogram (EEG) signal. Recently, multivariate detectors were developed to increase the detection rate even in the case of a low signal-to-noise ratio or of short data records originated from EEG signals. In this context, an extension to the multivariate case of the spectral F-test detector is proposed. The performance of this technique is assessed using Monte Carlo. As an example, EEG data from 12 subjects during photic stimulation is used to demonstrate the usefulness of the proposed detector. The multivariate method showed detection rates consistently higher than those ones when only one signal was used. It is shown that the response detection in EEG signals with the multivariate technique was statistically significant if two or more EEG derivations were used. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. GraphPrints: Towards a Graph Analytic Method for Network Anomaly Detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harshaw, Chris R; Bridges, Robert A; Iannacone, Michael D

    This paper introduces a novel graph-analytic approach for detecting anomalies in network flow data called \\textit{GraphPrints}. Building on foundational network-mining techniques, our method represents time slices of traffic as a graph, then counts graphlets\\textemdash small induced subgraphs that describe local topology. By performing outlier detection on the sequence of graphlet counts, anomalous intervals of traffic are identified, and furthermore, individual IPs experiencing abnormal behavior are singled-out. Initial testing of GraphPrints is performed on real network data with an implanted anomaly. Evaluation shows false positive rates bounded by 2.84\\% at the time-interval level, and 0.05\\% at the IP-level with 100\\% truemore » positive rates at both.« less

  20. VizieR Online Data Catalog: Panchromatic SED of Herschel sources (Berta+, 2013)

    NASA Astrophysics Data System (ADS)

    Berta, S.; Lutz, D.; Santini, P.; Wuyts, S.; Rosario, D.; Brisbin, D.; Cooray, A.; Franceschini, A.; Gruppioni, C.; Hatziminaoglou, E.; Hwang, H. S.; Le Floc'h, E.; Magnelli, B.; Nordon, R.; Oliver, S.; Page, M. J.; Popesso, P.; Pozzetti, L.; Pozzi, F.; Riguccini, L.; Rodighiero, G.; Roseboom, I.; Scott, D.; Symeonidis, M.; Valtchanov, I.; Viero, M.; Wang, L.

    2016-06-01

    Combining far-infrared Herschel photometry from the PACS Evolutionary Probe (PEP) and Herschel Multi-tiered Extragalactic Survey (HerMES) guaranteed time programs with ancillary datasets in the GOODS-N, GOODS-S and COSMOS fields, it is possible to sample the 8-500 micron spectral energy distributions of galaxies with at least 7-10 bands. Extending to the UV, optical, and near- infrared, the number of bands increases up to 43. We reproduce the distribution of galaxies in a carefully selected 10 restframe color space, based on this rich data-set, using a superposition of multi-variate Gaussian modes. We use this model to classify galaxies and build median spectral energy distributions (SEDs) of each class, which are then fitted with a modified version of the MAGPHYS code that combines stellar light, emission from dust heated by stars and a possible warm dust contribution heated by an Active Galactic Nucleus (AGN). The color distribution of galaxies in each of the considered fields can be well described with the combination of 6-9 classes, spanning a large range of far- to near-IR luminosity ratios, as well as different strength of the AGN contribution to bolometric luminosities. The defined Gaussian grouping is used to identify rare or odd sources. The zoology of outliers includes Herschel-detected ellipticals, very blue z~1 Lyα-break galaxies, quiescent spirals, and torus-dominated AGN with star formation. Out of these groups and outliers, a new template library is assembled, consisting of 32 SEDs describing the intrinsic scatter in the restframe UV-to-submm colors of infrared galaxies. This library is tested against L(IR) estimates with and without Herschel data included, and compared to eight other popular methods often adopted in the literature. When implementing Herschel photometry, these approaches produce L(IR) values consistent with each other within a median absolute deviation of 10-20%, the scatter being dominated more by fine tuning of the codes, rather than by the choice of SED templates. Finally, the library is used to classify 24 micron detected sources in PEP GOODS fields on the basis of AGN content, L(60)/L(100) color and L(160)/L(1.6) luminosity ratio. AGN appear to be distributed in M*-SFR along with all other galaxies, regardless of the amount of infrared luminosity they are powering, with the tendency to lie on the high SFR side of the "main sequence". The incidence of warmer star-forming sources grows for objects with higher specific star formation rates, and they tend to populate the "off-sequence" region of the M*-SFR-z space. (4 data files).

  1. Data mining spacecraft telemetry: towards generic solutions to automatic health monitoring and status characterisation

    NASA Astrophysics Data System (ADS)

    Royer, P.; De Ridder, J.; Vandenbussche, B.; Regibo, S.; Huygen, R.; De Meester, W.; Evans, D. J.; Martinez, J.; Korte-Stapff, M.

    2016-07-01

    We present the first results of a study aimed at finding new and efficient ways to automatically process spacecraft telemetry for automatic health monitoring. The goal is to reduce the load on the flight control team while extending the "checkability" to the entire telemetry database, and provide efficient, robust and more accurate detection of anomalies in near real time. We present a set of effective methods to (a) detect outliers in the telemetry or in its statistical properties, (b) uncover and visualise special properties of the telemetry and (c) detect new behavior. Our results are structured around two main families of solutions. For parameters visiting a restricted set of signal values, i.e. all status parameters and about one third of all the others, we focus on a transition analysis, exploiting properties of Poincare plots. For parameters with an arbitrarily high number of possible signal values, we describe the statistical properties of the signal via its Kernel Density Estimate. We demonstrate that this allows for a generic and dynamic approach of the soft-limit definition. Thanks to a much more accurate description of the signal and of its time evolution, we are more sensitive and more responsive to outliers than the traditional checks against hard limits. Our methods were validated on two years of Venus Express telemetry. They are generic for assisting in health monitoring of any complex system with large amounts of diagnostic sensor data. Not only spacecraft systems but also present-day astronomical observatories can benefit from them.

  2. Domain Anomaly Detection in Machine Perception: A System Architecture and Taxonomy.

    PubMed

    Kittler, Josef; Christmas, William; de Campos, Teófilo; Windridge, David; Yan, Fei; Illingworth, John; Osman, Magda

    2014-05-01

    We address the problem of anomaly detection in machine perception. The concept of domain anomaly is introduced as distinct from the conventional notion of anomaly used in the literature. We propose a unified framework for anomaly detection which exposes the multifaceted nature of anomalies and suggest effective mechanisms for identifying and distinguishing each facet as instruments for domain anomaly detection. The framework draws on the Bayesian probabilistic reasoning apparatus which clearly defines concepts such as outlier, noise, distribution drift, novelty detection (object, object primitive), rare events, and unexpected events. Based on these concepts we provide a taxonomy of domain anomaly events. One of the mechanisms helping to pinpoint the nature of anomaly is based on detecting incongruence between contextual and noncontextual sensor(y) data interpretation. The proposed methodology has wide applicability. It underpins in a unified way the anomaly detection applications found in the literature. To illustrate some of its distinguishing features, in here the domain anomaly detection methodology is applied to the problem of anomaly detection for a video annotation system.

  3. Enhancement Strategies for Frame-To Uas Stereo Visual Odometry

    NASA Astrophysics Data System (ADS)

    Kersten, J.; Rodehorst, V.

    2016-06-01

    Autonomous navigation of indoor unmanned aircraft systems (UAS) requires accurate pose estimations usually obtained from indirect measurements. Navigation based on inertial measurement units (IMU) is known to be affected by high drift rates. The incorporation of cameras provides complementary information due to the different underlying measurement principle. The scale ambiguity problem for monocular cameras is avoided when a light-weight stereo camera setup is used. However, also frame-to-frame stereo visual odometry (VO) approaches are known to accumulate pose estimation errors over time. Several valuable real-time capable techniques for outlier detection and drift reduction in frame-to-frame VO, for example robust relative orientation estimation using random sample consensus (RANSAC) and bundle adjustment, are available. This study addresses the problem of choosing appropriate VO components. We propose a frame-to-frame stereo VO method based on carefully selected components and parameters. This method is evaluated regarding the impact and value of different outlier detection and drift-reduction strategies, for example keyframe selection and sparse bundle adjustment (SBA), using reference benchmark data as well as own real stereo data. The experimental results demonstrate that our VO method is able to estimate quite accurate trajectories. Feature bucketing and keyframe selection are simple but effective strategies which further improve the VO results. Furthermore, introducing the stereo baseline constraint in pose graph optimization (PGO) leads to significant improvements.

  4. Inference of Evolutionary Forces Acting on Human Biological Pathways

    PubMed Central

    Daub, Josephine T.; Dupanloup, Isabelle; Robinson-Rechavi, Marc; Excoffier, Laurent

    2015-01-01

    Because natural selection is likely to act on multiple genes underlying a given phenotypic trait, we study here the potential effect of ongoing and past selection on the genetic diversity of human biological pathways. We first show that genes included in gene sets are generally under stronger selective constraints than other genes and that their evolutionary response is correlated. We then introduce a new procedure to detect selection at the pathway level based on a decomposition of the classical McDonald–Kreitman test extended to multiple genes. This new test, called 2DNS, detects outlier gene sets and takes into account past demographic effects and evolutionary constraints specific to gene sets. Selective forces acting on gene sets can be easily identified by a mere visual inspection of the position of the gene sets relative to their two-dimensional null distribution. We thus find several outlier gene sets that show signals of positive, balancing, or purifying selection but also others showing an ancient relaxation of selective constraints. The principle of the 2DNS test can also be applied to other genomic contrasts. For instance, the comparison of patterns of polymorphisms private to African and non-African populations reveals that most pathways show a higher proportion of nonsynonymous mutations in non-Africans than in Africans, potentially due to different demographic histories and selective pressures. PMID:25971280

  5. Robustly Aligning a Shape Model and Its Application to Car Alignment of Unknown Pose.

    PubMed

    Li, Yan; Gu, Leon; Kanade, Takeo

    2011-09-01

    Precisely localizing in an image a set of feature points that form a shape of an object, such as car or face, is called alignment. Previous shape alignment methods attempted to fit a whole shape model to the observed data, based on the assumption of Gaussian observation noise and the associated regularization process. However, such an approach, though able to deal with Gaussian noise in feature detection, turns out not to be robust or precise because it is vulnerable to gross feature detection errors or outliers resulting from partial occlusions or spurious features from the background or neighboring objects. We address this problem by adopting a randomized hypothesis-and-test approach. First, a Bayesian inference algorithm is developed to generate a shape-and-pose hypothesis of the object from a partial shape or a subset of feature points. For alignment, a large number of hypotheses are generated by randomly sampling subsets of feature points, and then evaluated to find the one that minimizes the shape prediction error. This method of randomized subset-based matching can effectively handle outliers and recover the correct object shape. We apply this approach on a challenging data set of over 5,000 different-posed car images, spanning a wide variety of car types, lighting, background scenes, and partial occlusions. Experimental results demonstrate favorable improvements over previous methods on both accuracy and robustness.

  6. A comparative study of outlier detection for large-scale traffic data by one-class SVM and kernel density estimation

    NASA Astrophysics Data System (ADS)

    Ngan, Henry Y. T.; Yung, Nelson H. C.; Yeh, Anthony G. O.

    2015-02-01

    This paper aims at presenting a comparative study of outlier detection (OD) for large-scale traffic data. The traffic data nowadays are massive in scale and collected in every second throughout any modern city. In this research, the traffic flow dynamic is collected from one of the busiest 4-armed junction in Hong Kong in a 31-day sampling period (with 764,027 vehicles in total). The traffic flow dynamic is expressed in a high dimension spatial-temporal (ST) signal format (i.e. 80 cycles) which has a high degree of similarities among the same signal and across different signals in one direction. A total of 19 traffic directions are identified in this junction and lots of ST signals are collected in the 31-day period (i.e. 874 signals). In order to reduce its dimension, the ST signals are firstly undergone a principal component analysis (PCA) to represent as (x,y)-coordinates. Then, these PCA (x,y)-coordinates are assumed to be conformed as Gaussian distributed. With this assumption, the data points are further to be evaluated by (a) a correlation study with three variant coefficients, (b) one-class support vector machine (SVM) and (c) kernel density estimation (KDE). The correlation study could not give any explicit OD result while the one-class SVM and KDE provide average 59.61% and 95.20% DSRs, respectively.

  7. Oceanographic variation influences spatial genomic structure in the sea scallop, Placopecten magellanicus.

    PubMed

    Van Wyngaarden, Mallory; Snelgrove, Paul V R; DiBacco, Claudio; Hamilton, Lorraine C; Rodríguez-Ezpeleta, Naiara; Zhan, Luyao; Beiko, Robert G; Bradbury, Ian R

    2018-03-01

    Environmental factors can influence diversity and population structure in marine species and accurate understanding of this influence can both improve fisheries management and help predict responses to environmental change. We used 7163 SNPs derived from restriction site-associated DNA sequencing genotyped in 245 individuals of the economically important sea scallop, Placopecten magellanicus , to evaluate the correlations between oceanographic variation and a previously identified latitudinal genomic cline. Sea scallops span a broad latitudinal area (>10 degrees), and we hypothesized that climatic variation significantly drives clinal trends in allele frequency. Using a large environmental dataset, including temperature, salinity, chlorophyll a, and nutrient concentrations, we identified a suite of SNPs (285-621, depending on analysis and environmental dataset) potentially under selection through correlations with environmental variation. Principal components analysis of different outlier SNPs and environmental datasets revealed similar northern and southern clusters, with significant associations between the first axes of each ( R 2 adj  = .66-.79). Multivariate redundancy analysis of outlier SNPs and the environmental principal components indicated that environmental factors explained more than 32% of the variance. Similarly, multiple linear regressions and random-forest analysis identified winter average and minimum ocean temperatures as significant parameters in the link between genetic and environmental variation. This work indicates that oceanographic variation is associated with the observed genomic cline in this species and that seasonal periods of extreme cold may restrict gene flow along a latitudinal gradient in this marine benthic bivalve. Incorporating this finding into management may improve accuracy of management strategies and future predictions.

  8. Correlating Reactivity and Selectivity to Cyclopentadienyl Ligand Properties in Rh(III)-Catalyzed C-H Activation Reactions: An Experimental and Computational Study.

    PubMed

    Piou, Tiffany; Romanov-Michailidis, Fedor; Romanova-Michaelides, Maria; Jackson, Kelvin E; Semakul, Natthawat; Taggart, Trevor D; Newell, Brian S; Rithner, Christopher D; Paton, Robert S; Rovis, Tomislav

    2017-01-25

    Cp X Rh(III)-catalyzed C-H functionalization reactions are a proven method for the efficient assembly of small molecules. However, rationalization of the effects of cyclopentadienyl (Cp X ) ligand structure on reaction rate and selectivity has been viewed as a black box, and a truly systematic study is lacking. Consequently, predicting the outcomes of these reactions is challenging because subtle variations in ligand structure can cause notable changes in reaction behavior. A predictive tool is, nonetheless, of considerable value to the community as it would greatly accelerate reaction development. Designing a data set in which the steric and electronic properties of the Cp X Rh(III) catalysts were systematically varied allowed us to apply multivariate linear regression algorithms to establish correlations between these catalyst-based descriptors and the regio-, diastereoselectivity, and rate of model reactions. This, in turn, led to the development of quantitative predictive models that describe catalyst performance. Our newly described cone angles and Sterimol parameters for Cp X ligands served as highly correlative steric descriptors in the regression models. Through rational design of training and validation sets, key diastereoselectivity outliers were identified. Computations reveal the origins of the outstanding stereoinduction displayed by these outliers. The results are consistent with partial η 5 -η 3 ligand slippage that occurs in the transition state of the selectivity-determining step. In addition to the instructive value of our study, we believe that the insights gained are transposable to other group 9 transition metals and pave the way toward rational design of C-H functionalization catalysts.

  9. The physical properties of galaxies with unusually red mid-infrared colours

    NASA Astrophysics Data System (ADS)

    Kauffmann, Guinevere

    2018-02-01

    The goal of this paper is to investigate the physical nature of galaxies in the redshift range 0.02 < z < 0.15 that have strong excess emission at mid-infrared wavelengths and to determine whether they host a population of accreting black holes that cannot be identified using optical emission lines. We show that at fixed stellar mass M* and Dn(4000), the distribution of [3.4]-[4.6] μm (Wide-field Infrared Survey Explorer, W1 - W2 band) colours is sharply peaked, with a long tail to much redder W1 - W2 colours. We introduce a procedure to pull out the red outlier population based on a combination of three stellar population diagnostics. When compared with optically selected active galactic nucleus (AGN), red outliers are more likely to be found in massive galaxies, and they tend to have lower stellar mass densities, younger stellar ages and higher dust content than optically selected AGN hosts. They are twice as likely to be detected at radio wavelengths. We examine W1 - W2 colour profiles for a subset of the nearest, reddest outliers and find that most are not centrally peaked, indicating that the hot dust emission is spread throughout the galaxy. We find that radio luminosity is the quantity that is most predictive of a redder central W1 - W2 colour. Radio-loud galaxies with centrally concentrated hot dust emission are almost always morphologically disturbed, with compact, unresolved emission at 1.4 GHz. The 80 per cent of such systems are identifiable as AGN using optical emission line diagnostics.

  10. Groundspeed filtering for CTAS

    NASA Technical Reports Server (NTRS)

    Slater, Gary L.

    1994-01-01

    Ground speed is one of the radar observables which is obtained along with position and heading from NASA Ames Center radar. Within the Center TRACON Automation System (CTAS), groundspeed is converted into airspeed using the wind speeds which CTAS obtains from the NOAA weather grid. This airspeed is then used in the trajectory synthesis logic which computes the trajectory for each individual aircraft. The time history of the typical radar groundspeed data is generally quite noisy, with high frequency variations on the order of five knots, and occasional 'outliers' which can be significantly different from the probable true speed. To try to smooth out these speeds and make the ETA estimate less erratic, filtering of the ground speed is done within CTAS. In its base form, the CTAS filter is a 'moving average' filter which averages the last ten radar values. In addition, there is separate logic to detect and correct for 'outliers', and acceleration logic which limits the groundspeed change in adjacent time samples. As will be shown, these additional modifications do cause significant changes in the actual groundspeed filter output. The conclusion is that the current ground speed filter logic is unable to track accurately the speed variations observed on many aircraft. The Kalman filter logic however, appears to be an improvement to the current algorithm used to smooth ground speed variations, while being simpler and more efficient to implement. Additional logic which can test for true 'outliers' can easily be added by looking at the difference in the a priori and post priori Kalman estimates, and not updating if the difference in these quantities is too large.

  11. AFLP genome scan in the black rat (Rattus rattus) from Madagascar: detecting genetic markers undergoing plague-mediated selection.

    PubMed

    Tollenaere, C; Duplantier, J-M; Rahalison, L; Ranjalahy, M; Brouat, C

    2011-03-01

    The black rat (Rattus rattus) is the main reservoir of plague (Yersinia pestis infection) in Madagascar's rural zones. Black rats are highly resistant to plague within the plague focus (central highland), whereas they are susceptible where the disease is absent (low altitude zone). To better understand plague wildlife circulation and host evolution in response to a highly virulent pathogen, we attempted to determine genetic markers associated with plague resistance in this species. To this purpose, we combined a population genomics approach and an association study, both performed on 249 AFLP markers, in Malagasy R. rattus. Simulated distributions of genetic differentiation were compared to observed data in four independent pairs, each consisting of one population from the plague focus and one from the plague-free zone. We found 22 loci (9% of 249) with higher differentiation in at least two independent population pairs or with combining P-values over the four pairs significant. Among the 22 outlier loci, 16 presented significant association with plague zone (plague focus vs. plague-free zone). Population genetic structure inferred from outlier loci was structured by plague zone, whereas the neutral loci dataset revealed structure by geography (eastern vs. western populations). A phenotype association study revealed that two of the 22 loci were significantly associated with differentiation between dying and surviving rats following experimental plague challenge. The 22 outlier loci identified in this study may undergo plague selective pressure either directly or more probably indirectly due to hitchhiking with selected loci. © 2010 Blackwell Publishing Ltd.

  12. Multi-modal automatic montaging of adaptive optics retinal images

    PubMed Central

    Chen, Min; Cooper, Robert F.; Han, Grace K.; Gee, James; Brainard, David H.; Morgan, Jessica I. W.

    2016-01-01

    We present a fully automated adaptive optics (AO) retinal image montaging algorithm using classic scale invariant feature transform with random sample consensus for outlier removal. Our approach is capable of using information from multiple AO modalities (confocal, split detection, and dark field) and can accurately detect discontinuities in the montage. The algorithm output is compared to manual montaging by evaluating the similarity of the overlapping regions after montaging, and calculating the detection rate of discontinuities in the montage. Our results show that the proposed algorithm has high alignment accuracy and a discontinuity detection rate that is comparable (and often superior) to manual montaging. In addition, we analyze and show the benefits of using multiple modalities in the montaging process. We provide the algorithm presented in this paper as open-source and freely available to download. PMID:28018714

  13. Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection

    PubMed Central

    Wei, Pan; Anderson, Derek T.

    2018-01-01

    A significant challenge in object detection is accurate identification of an object’s position in image space, whereas one algorithm with one set of parameters is usually not enough, and the fusion of multiple algorithms and/or parameters can lead to more robust results. Herein, a new computational intelligence fusion approach based on the dynamic analysis of agreement among object detection outputs is proposed. Furthermore, we propose an online versus just in training image augmentation strategy. Experiments comparing the results both with and without fusion are presented. We demonstrate that the augmented and fused combination results are the best, with respect to higher accuracy rates and reduction of outlier influences. The approach is demonstrated in the context of cone, pedestrian and box detection for Advanced Driver Assistance Systems (ADAS) applications. PMID:29562609

  14. Adjustment of geochemical background by robust multivariate statistics

    USGS Publications Warehouse

    Zhou, D.

    1985-01-01

    Conventional analyses of exploration geochemical data assume that the background is a constant or slowly changing value, equivalent to a plane or a smoothly curved surface. However, it is better to regard the geochemical background as a rugged surface, varying with changes in geology and environment. This rugged surface can be estimated from observed geological, geochemical and environmental properties by using multivariate statistics. A method of background adjustment was developed and applied to groundwater and stream sediment reconnaissance data collected from the Hot Springs Quadrangle, South Dakota, as part of the National Uranium Resource Evaluation (NURE) program. Source-rock lithology appears to be a dominant factor controlling the chemical composition of groundwater or stream sediments. The most efficacious adjustment procedure is to regress uranium concentration on selected geochemical and environmental variables for each lithologic unit, and then to delineate anomalies by a common threshold set as a multiple of the standard deviation of the combined residuals. Robust versions of regression and RQ-mode principal components analysis techniques were used rather than ordinary techniques to guard against distortion caused by outliers Anomalies delineated by this background adjustment procedure correspond with uranium prospects much better than do anomalies delineated by conventional procedures. The procedure should be applicable to geochemical exploration at different scales for other metals. ?? 1985.

  15. Stealthy false data injection attacks using matrix recovery and independent component analysis in smart grid

    NASA Astrophysics Data System (ADS)

    JiWei, Tian; BuHong, Wang; FuTe, Shang; Shuaiqi, Liu

    2017-05-01

    Exact state estimation is vital important to maintain common operations of smart grids. Existing researches demonstrate that state estimation output could be compromised by malicious attacks. However, to construct the attack vectors, a usual presumption in most works is that the attacker has perfect information regarding the topology and so on even such information is difficult to acquire in practice. Recent research shows that Independent Component Analysis (ICA) can be used for inferring topology information which can be used to originate undetectable attacks and even to alter the price of electricity for the profits of attackers. However, we found that the above ICA-based blind attack tactics is merely feasible in the environment with Gaussian noises. If there are outliers (device malfunction and communication errors), the Bad Data Detector will easily detect the attack. Hence, we propose a robust ICA based blind attack strategy that one can use matrix recovery to circumvent the outlier problem and construct stealthy attack vectors. The proposed attack strategies are tested with IEEE representative 14-bus system. Simulations verify the feasibility of the proposed method.

  16. A subagging regression method for estimating the qualitative and quantitative state of groundwater

    NASA Astrophysics Data System (ADS)

    Jeong, Jina; Park, Eungyu; Han, Weon Shik; Kim, Kue-Young

    2017-08-01

    A subsample aggregating (subagging) regression (SBR) method for the analysis of groundwater data pertaining to trend-estimation-associated uncertainty is proposed. The SBR method is validated against synthetic data competitively with other conventional robust and non-robust methods. From the results, it is verified that the estimation accuracies of the SBR method are consistent and superior to those of other methods, and the uncertainties are reasonably estimated; the others have no uncertainty analysis option. To validate further, actual groundwater data are employed and analyzed comparatively with Gaussian process regression (GPR). For all cases, the trend and the associated uncertainties are reasonably estimated by both SBR and GPR regardless of Gaussian or non-Gaussian skewed data. However, it is expected that GPR has a limitation in applications to severely corrupted data by outliers owing to its non-robustness. From the implementations, it is determined that the SBR method has the potential to be further developed as an effective tool of anomaly detection or outlier identification in groundwater state data such as the groundwater level and contaminant concentration.

  17. 42 CFR 412.86 - Payment for extraordinarily high-cost day outliers.

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Outlier Cases, Special Treatment Payment for New Technology, and Payment Adjustment for Certain Replaced... amended at 62 FR 46028, Aug. 29, 1997] Additional Special Payment for Certain New Technology ...

  18. 42 CFR 412.86 - Payment for extraordinarily high-cost day outliers.

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Outlier Cases, Special Treatment Payment for New Technology, and Payment Adjustment for Certain Replaced... amended at 62 FR 46028, Aug. 29, 1997] Additional Special Payment for Certain New Technology ...

  19. The cause of outliers in electromagnetic pulse (EMP) locations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fenimore, Edward E.

    2014-10-02

    We present methods to calculate the location of EMP pulses when observed by 5 or more satellites. Simulations show that, even with a good initial guess and fitting a location to all of the data, there are sometime outlier results whose locations are much worse than most cases. By comparing simulations using different ionospheric transfer functions (ITFs), it appears that the outliers are caused by not including the additional path length due to refraction rather than being caused by not including higher order terms in the Appleton-Hartree equation. We suggest ways that the outliers can be corrected. These correction methodsmore » require one to use an electron density profile along the line of sight from the event to the satellite rather than using the total electron content (TEC) to characterize the ionosphere.« less

  20. Natural selection and neutral evolution jointly drive population divergence between alpine and lowland ecotypes of the allopolyploid plant Anemone multifida (Ranunculaceae).

    PubMed

    McEwen, Jamie R; Vamosi, Jana C; Rogers, Sean M

    2013-01-01

    Population differentiation can be driven in large part by natural selection, but selectively neutral evolution can play a prominent role in shaping patters of population divergence. The decomposition of the evolutionary history of populations into the relative effects of natural selection and selectively neutral evolution enables an understanding of the causes of population divergence and adaptation. In this study, we examined heterogeneous genomic divergence between alpine and lowland ecotypes of the allopolyploid plant, Anemone multifida. Using peak height and dominant AFLP data, we quantified population differentiation at non-outlier (neutral) and outlier loci to determine the potential contribution of natural selection and selectively neutral evolution to population divergence. We found 13 candidate loci, corresponding to 2.7% of loci, with signatures of divergent natural selection between alpine and lowland populations and between alpine populations (Fst  = 0.074-0.445 at outlier loci), but neutral population differentiation was also evident between alpine populations (FST  = 0.041-0.095 at neutral loci). By examining population structure at both neutral and outlier loci, we determined that the combined effects of selection and neutral evolution are associated with the divergence of alpine populations, which may be linked to extreme abiotic conditions and isolation between alpine sites. The presence of outlier levels of genetic variation in structured populations underscores the importance of separately analyzing neutral and outlier loci to infer the relative role of divergent natural selection and neutral evolution in population divergence.

  1. Boiling points of halogenated aliphatic compounds: a quantitative structure-property relationship for prediction and validation.

    PubMed

    Oberg, Tomas

    2004-01-01

    Halogenated aliphatic compounds have many technical uses, but substances within this group are also ubiquitous environmental pollutants that can affect the ozone layer and contribute to global warming. The establishment of quantitative structure-property relationships is of interest not only to fill in gaps in the available database but also to validate experimental data already acquired. The three-dimensional structures of 240 compounds were modeled with molecular mechanics prior to the generation of empirical descriptors. Two bilinear projection methods, principal component analysis (PCA) and partial-least-squares regression (PLSR), were used to identify outliers. PLSR was subsequently used to build a multivariate calibration model by extracting the latent variables that describe most of the covariation between the molecular structure and the boiling point. Boiling points were also estimated with an extension of the group contribution method of Stein and Brown.

  2. Multiple outer-reef tracts along the south Florida bank margin: Outlier reefs, a new windward-margin model

    USGS Publications Warehouse

    Lidz, Barbara H.; Hine, A.C.; Shinn, Eugene A.; Kindinger, Jack G.

    1991-01-01

    High-resolution seismic-reflection profiles off the lower Florida Keys reveal a multiple outlier-reef tract system ~0.5 to 1.5 km sea-ward of the bank margin. The system is characterized by a massive, outer main reef tract of high (28 m) unburied relief that parallels the margin and at least two narrower, discontinuous reef tracts of lower relief between the main tract and the shallow bank-margin reefs. The outer tract is ~0.5 to 1 km wide and extends a distance of ~57 km. A single pass divides the outer tract into two main reefs. The outlier reefs developed on antecedent, low-gradient to horizontal offbank surfaces, interpreted to be Pleistocene beaches that formed terracelike features. Radiocarbon dates of a coral core from the outer tract confirm a pre-Holocene age. These multiple outlier reefs represent a new windward-margin model that presents a significant, unique mechanism for progradation of carbonate platforms during periods of sea-level fluctuation. Infilling of the back-reef terrace basins would create new terraced promontories and would extend or "step" the platform seaward for hundreds of metres. Subsequent outlier-reef development would produce laterally accumulating sequences.

  3. Recent developments with the ORSER system

    NASA Technical Reports Server (NTRS)

    Baumer, G. M.; Turner, B. J.; Myers, W. L.

    1981-01-01

    Additions to the ORSER remote sensing data processing package are described. The ORSER package consists of about 35 individual programs that are grouped into preprocessing, data analysis, and display subsystems. Additional data formats and data management, data transformation, and geometric correlation programs were supplemented to the preprocessing subsystem. Enhancements to the data analysis techniques include a maximum likelihood classifier (MAXCLASS) and a new version of the STATS program which makes delineation of training areas easier and allows for detection of outlier points. Ongoing developments are also described.

  4. Brain tissues volume measurements from 2D MRI using parametric approach

    NASA Astrophysics Data System (ADS)

    L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

    2018-04-01

    The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.

  5. A critical evaluation of the Beckman Coulter Access hsTnI: Analytical performance, reference interval and concordance.

    PubMed

    Pretorius, Carel J; Tate, Jillian R; Wilgen, Urs; Cullen, Louise; Ungerer, Jacobus P J

    2018-05-01

    We investigated the analytical performance, outlier rate, carryover and reference interval of the Beckman Coulter Access hsTnI in detail and compared it with historical and other commercial assays. We compared the imprecision, detection capability, analytical sensitivity, outlier rate and carryover against two previous Access AccuTnI assay versions. We established the reference interval with stored samples from a previous study and compared the concordances and variances with the Access AccuTnI+3 as well as with two commercial assays. The Access hsTnI had excellent analytical sensitivity with the calibration slope 5.6 times steeper than the Access AccuTnI+3. The detection capability was markedly improved with the SD of the blank 0.18-0.20 ng/L, LoB 0.29-0.33 ng/L and LoD 0.58-0.69 ng/L. All the reference interval samples had a result above the LoB value. At a mean concentration of 2.83 ng/L the SD was 0.28 ng/L (CV 9.8%). Carryover (0.005%) and outlier (0.046%) rates were similar to the Access AccuTnI+3. The combined male and female 99th percentile reference interval was 18.2 ng/L (90% CI 13.2-21.1 ng/L). Concordance amongst the assays was poor with only 16.7%, 19.6% and 15.2% of samples identified by all 4 assays as above the 99th, 97.5th and 95th percentiles. Analytical imprecision was a minor contributor to the observed variances between assays. The Beckman Coulter Access hsTnI assay has excellent analytical sensitivity and precision characteristics close to zero. This allows cTnI measurement in all healthy individuals and the capability to identify numerically small differences between serial samples as statistically significant. Concordance in healthy individuals remains poor amongst assays. Crown Copyright © 2018. Published by Elsevier Inc. All rights reserved.

  6. A robust ridge regression approach in the presence of both multicollinearity and outliers in the data

    NASA Astrophysics Data System (ADS)

    Shariff, Nurul Sima Mohamad; Ferdaos, Nur Aqilah

    2017-08-01

    Multicollinearity often leads to inconsistent and unreliable parameter estimates in regression analysis. This situation will be more severe in the presence of outliers it will cause fatter tails in the error distributions than the normal distributions. The well-known procedure that is robust to multicollinearity problem is the ridge regression method. This method however is expected to be affected by the presence of outliers due to some assumptions imposed in the modeling procedure. Thus, the robust version of existing ridge method with some modification in the inverse matrix and the estimated response value is introduced. The performance of the proposed method is discussed and comparisons are made with several existing estimators namely, Ordinary Least Squares (OLS), ridge regression and robust ridge regression based on GM-estimates. The finding of this study is able to produce reliable parameter estimates in the presence of both multicollinearity and outliers in the data.

  7. Conditional Outlier Detection for Clinical Alerting

    PubMed Central

    Hauskrecht, Milos; Valko, Michal; Batal, Iyad; Clermont, Gilles; Visweswaran, Shyam; Cooper, Gregory F.

    2010-01-01

    We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac surgical patients. We base the evaluation on the opinions of a panel of experts. The results support that anomaly-based alerting can have reasonably low false alert rates and that stronger anomalies are correlated with higher alert rates. PMID:21346986

  8. Conditional outlier detection for clinical alerting.

    PubMed

    Hauskrecht, Milos; Valko, Michal; Batal, Iyad; Clermont, Gilles; Visweswaran, Shyam; Cooper, Gregory F

    2010-11-13

    We develop and evaluate a data-driven approach for detecting unusual (anomalous) patient-management actions using past patient cases stored in an electronic health record (EHR) system. Our hypothesis is that patient-management actions that are unusual with respect to past patients may be due to a potential error and that it is worthwhile to raise an alert if such a condition is encountered. We evaluate this hypothesis using data obtained from the electronic health records of 4,486 post-cardiac surgical patients. We base the evaluation on the opinions of a panel of experts. The results support that anomaly-based alerting can have reasonably low false alert rates and that stronger anomalies are correlated with higher alert rates.

  9. Anomaly Detection for Beam Loss Maps in the Large Hadron Collider

    NASA Astrophysics Data System (ADS)

    Valentino, Gianluca; Bruce, Roderik; Redaelli, Stefano; Rossi, Roberto; Theodoropoulos, Panagiotis; Jaster-Merz, Sonja

    2017-07-01

    In the LHC, beam loss maps are used to validate collimator settings for cleaning and machine protection. This is done by monitoring the loss distribution in the ring during infrequent controlled loss map campaigns, as well as in standard operation. Due to the complexity of the system, consisting of more than 50 collimators per beam, it is difficult to identify small changes in the collimation hierarchy, which may be due to setting errors or beam orbit drifts with such methods. A technique based on Principal Component Analysis and Local Outlier Factor is presented to detect anomalies in the loss maps and therefore provide an automatic check of the collimation hierarchy.

  10. The observational case for Jupiter being a typical massive planet.

    PubMed

    Lineweaver, Charles H; Grether, Daniel

    2002-01-01

    We identify a subsample of the recently detected extrasolar planets that is minimally affected by the selection effects of the Doppler detection method. With a simple analysis we quantify trends in the surface density of this subsample in the period-Msin(i) plane. A modest extrapolation of these trends puts Jupiter in the most densely occupied region of this parameter space, thus indicating that Jupiter is a typical massive planet rather than an outlier. Our analysis suggests that Jupiter is more typical than indicated by previous analyses. For example, instead of MJup mass exoplanets being twice as common as 2 MJup exoplanets, we find they are three times as common.

  11. Combined CT-based and image-free navigation systems in TKA reduces postoperative outliers of rotational alignment of the tibial component.

    PubMed

    Mitsuhashi, Shota; Akamatsu, Yasushi; Kobayashi, Hideo; Kusayama, Yoshihiro; Kumagai, Ken; Saito, Tomoyuki

    2018-02-01

    Rotational malpositioning of the tibial component can lead to poor functional outcome in TKA. Although various surgical techniques have been proposed, precise rotational placement of the tibial component was difficult to accomplish even with the use of a navigation system. The purpose of this study is to assess whether combined CT-based and image-free navigation systems replicate accurately the rotational alignment of tibial component that was preoperatively planned on CT, compared with the conventional method. We compared the number of outliers for rotational alignment of the tibial component using combined CT-based and image-free navigation systems (navigated group) with those of conventional method (conventional group). Seventy-two TKAs were performed between May 2012 and December 2014. In the navigated group, the anteroposterior axis was prepared using CT-based navigation system and the tibial component was positioned under control of the navigation. In the conventional group, the tibial component was placed with reference to the Akagi line that was determined visually. Fisher's exact probability test was performed to evaluate the results. There was a significant difference between the two groups with regard to the number of outliers: 3 outliers in the navigated group compared with 12 outliers in the conventional group (P < 0.01). We concluded that combined CT-based and image-free navigation systems decreased the number of rotational outliers of tibial component, and was helpful for the replication of the accurate rotational alignment of the tibial component that was preoperatively planned.

  12. Natural Selection and Neutral Evolution Jointly Drive Population Divergence between Alpine and Lowland Ecotypes of the Allopolyploid Plant Anemone multifida (Ranunculaceae)

    PubMed Central

    McEwen, Jamie R.; Vamosi, Jana C.; Rogers, Sean M.

    2013-01-01

    Population differentiation can be driven in large part by natural selection, but selectively neutral evolution can play a prominent role in shaping patters of population divergence. The decomposition of the evolutionary history of populations into the relative effects of natural selection and selectively neutral evolution enables an understanding of the causes of population divergence and adaptation. In this study, we examined heterogeneous genomic divergence between alpine and lowland ecotypes of the allopolyploid plant, Anemone multifida. Using peak height and dominant AFLP data, we quantified population differentiation at non-outlier (neutral) and outlier loci to determine the potential contribution of natural selection and selectively neutral evolution to population divergence. We found 13 candidate loci, corresponding to 2.7% of loci, with signatures of divergent natural selection between alpine and lowland populations and between alpine populations (Fst  = 0.074–0.445 at outlier loci), but neutral population differentiation was also evident between alpine populations (FST  = 0.041–0.095 at neutral loci). By examining population structure at both neutral and outlier loci, we determined that the combined effects of selection and neutral evolution are associated with the divergence of alpine populations, which may be linked to extreme abiotic conditions and isolation between alpine sites. The presence of outlier levels of genetic variation in structured populations underscores the importance of separately analyzing neutral and outlier loci to infer the relative role of divergent natural selection and neutral evolution in population divergence. PMID:23874801

  13. Variability of nursing care by APR-DRG and by severity of illness in a sample of nine Belgian hospitals.

    PubMed

    Pirson, Magali; Delo, Caroline; Di Pierdomenico, Lionel; Laport, Nancy; Biloque, Veronique; Leclercq, Pol

    2013-10-10

    As soon as Diagnosis related Groups (DRG) were introduced in many hospital financing systems, most nursing research revealed that DRG were not very homogeneous with regard to nursing care. However, few studies are based on All Patient refined Diagnosis related Groups (APR-DRGs) and few of them use recent data. Objectives of this study are: (1) to evaluate if nursing activity is homogeneous by APR-DRG and by severity of illness (SOI) (2) to evaluate the outlier's rate associated with the nursing activity and (3) to compare nursing cost homogeneity per DRG and SOI. Study done in 9 Belgian hospitals on a selection of APR-DRG with more than 30 patients (7 638 inpatient stays). The evaluation of the homogeneity is based on coefficients of variation (CV). The 75th percentile + 1.5 × inter-quartile range was used to select high outliers. 25th percentile -1.5 × inter-quartile range was used to select low outliers. Nursing costs per ward were distributed on inpatient stays of each ward following two techniques (the LOS vs. the number of nursing care minutes per stay). The homogeneity of LOS by DRG and by SOI is relatively good (CV: 0.56). The homogeneity of the nursing activity by DRG is less good (CVs between 0.36 and 1.54) and is influenced by nursing activity outliers (high outliers' rate: 5.19%, low outliers' rate: 0.14%). The outlier's rate varies according to the studied variable. The high outliers' rate is higher for nursing activity than for LOS. The homogeneity of nursing costs is higher when costs are based on the LOS of patients than when based on minutes of nursing care (CVs between 0.26 and 1.46 for nursing costs based on LOS and between 0.49 and 2.04 for nursing costs based on minutes of nursing care). It is essential that the calculation of nursing cost by stay and by DRG for hospital financing purposes was based on nursing activity data, that more reflect resources used in wards, and not on LOS data. The only way to obtain this information is the generalization of computerized nursing files.

  14. Gear Fault Detection Effectiveness as Applied to Tooth Surface Pitting Fatigue Damage

    NASA Technical Reports Server (NTRS)

    Lewicki, David G.; Dempsey, Paula J.; Heath, Gregory F.; Shanthakumaran, Perumal

    2010-01-01

    A study was performed to evaluate fault detection effectiveness as applied to gear-tooth-pitting-fatigue damage. Vibration and oil-debris monitoring (ODM) data were gathered from 24 sets of spur pinion and face gears run during a previous endurance evaluation study. Three common condition indicators (RMS, FM4, and NA4 [Ed. 's note: See Appendix A-Definitions D were deduced from the time-averaged vibration data and used with the ODM to evaluate their performance for gear fault detection. The NA4 parameter showed to be a very good condition indicator for the detection of gear tooth surface pitting failures. The FM4 and RMS parameters perfomu:d average to below average in detection of gear tooth surface pitting failures. The ODM sensor was successful in detecting a significant 8lDOunt of debris from all the gear tooth pitting fatigue failures. Excluding outliers, the average cumulative mass at the end of a test was 40 mg.

  15. WAMS measurements pre-processing for detecting low-frequency oscillations in power systems

    NASA Astrophysics Data System (ADS)

    Kovalenko, P. Y.

    2017-07-01

    Processing the data received from measurement systems implies the situation when one or more registered values stand apart from the sample collection. These values are referred to as “outliers”. The processing results may be influenced significantly by the presence of those in the data sample under consideration. In order to ensure the accuracy of low-frequency oscillations detection in power systems the corresponding algorithm has been developed for the outliers detection and elimination. The algorithm is based on the concept of the irregular component of measurement signal. This component comprises measurement errors and is assumed to be Gauss-distributed random. The median filtering is employed to detect the values lying outside the range of the normally distributed measurement error on the basis of a 3σ criterion. The algorithm has been validated involving simulated signals and WAMS data as well.

  16. Residual Error Based Anomaly Detection Using Auto-Encoder in SMD Machine Sound.

    PubMed

    Oh, Dong Yul; Yun, Il Dong

    2018-04-24

    Detecting an anomaly or an abnormal situation from given noise is highly useful in an environment where constantly verifying and monitoring a machine is required. As deep learning algorithms are further developed, current studies have focused on this problem. However, there are too many variables to define anomalies, and the human annotation for a large collection of abnormal data labeled at the class-level is very labor-intensive. In this paper, we propose to detect abnormal operation sounds or outliers in a very complex machine along with reducing the data-driven annotation cost. The architecture of the proposed model is based on an auto-encoder, and it uses the residual error, which stands for its reconstruction quality, to identify the anomaly. We assess our model using Surface-Mounted Device (SMD) machine sound, which is very complex, as experimental data, and state-of-the-art performance is successfully achieved for anomaly detection.

  17. Bayesian methods to determine performance differences and to quantify variability among centers in multi-center trials: the IHAST trial.

    PubMed

    Bayman, Emine O; Chaloner, Kathryn M; Hindman, Bradley J; Todd, Michael M

    2013-01-16

    To quantify the variability among centers and to identify centers whose performance are potentially outside of normal variability in the primary outcome and to propose a guideline that they are outliers. Novel statistical methodology using a Bayesian hierarchical model is used. Bayesian methods for estimation and outlier detection are applied assuming an additive random center effect on the log odds of response: centers are similar but different (exchangeable). The Intraoperative Hypothermia for Aneurysm Surgery Trial (IHAST) is used as an example. Analyses were adjusted for treatment, age, gender, aneurysm location, World Federation of Neurological Surgeons scale, Fisher score and baseline NIH stroke scale scores. Adjustments for differences in center characteristics were also examined. Graphical and numerical summaries of the between-center standard deviation (sd) and variability, as well as the identification of potential outliers are implemented. In the IHAST, the center-to-center variation in the log odds of favorable outcome at each center is consistent with a normal distribution with posterior sd of 0.538 (95% credible interval: 0.397 to 0.726) after adjusting for the effects of important covariates. Outcome differences among centers show no outlying centers. Four potential outlying centers were identified but did not meet the proposed guideline for declaring them as outlying. Center characteristics (number of subjects enrolled from the center, geographical location, learning over time, nitrous oxide, and temporary clipping use) did not predict outcome, but subject and disease characteristics did. Bayesian hierarchical methods allow for determination of whether outcomes from a specific center differ from others and whether specific clinical practices predict outcome, even when some centers/subgroups have relatively small sample sizes. In the IHAST no outlying centers were found. The estimated variability between centers was moderately large.

  18. TrigDB for improving the reliability of the epicenter locations by considering the neighborhood station's trigger and cutting out of outliers in operation of Earthquake Early Warning System.

    NASA Astrophysics Data System (ADS)

    Chi, H. C.; Park, J. H.; Lim, I. S.; Seong, Y. J.

    2016-12-01

    TrigDB is initially developed for the discrimination of teleseismic-origin false alarm in the case with unreasonably associated triggers producing mis-located epicenters. We have applied TrigDB to the current EEWS(Earthquake Early Warning System) from 2014. During the early stage of testing EEWS from 2011, we adapted ElarmS from US Berkeley BSL to Korean seismic network and applied more than 5 years. We found out that the real-time testing results of EEWS in Korea showed that all events inside of seismic network with bigger than magnitude 3.0 were well detected. However, two events located at sea area gave false location results with magnitude over 4.0 due to the long period and relatively high amplitude signals related to the teleseismic waves or regional deep sources. These teleseismic-relevant false events were caused by logical co-relation during association procedure and the corresponding geometric distribution of associated stations is crescent-shaped. Seismic stations are not deployed uniformly, so the expected bias ratio varies with evaluated epicentral location. This ratio is calculated in advance and stored into database, called as TrigDB, for the discrimination of teleseismic-origin false alarm. We upgraded this method, so called `TrigDB back filling', updating location with supplementary association of stations comparing triggered times between sandwiched stations which was not associated previously based on predefined criteria such as travel-time. And we have tested a module to reject outlier trigger times by setting a criteria comparing statistical values(Sigma) to the triggered times. The criteria of cutting off the outlier is slightly slow to work until the number of stations more than 8, however, the result of location is very much improved.

  19. Raman fiber-optical method for colon cancer detection: Cross-validation and outlier identification approach

    NASA Astrophysics Data System (ADS)

    Petersen, D.; Naveed, P.; Ragheb, A.; Niedieker, D.; El-Mashtoly, S. F.; Brechmann, T.; Kötting, C.; Schmiegel, W. H.; Freier, E.; Pox, C.; Gerwert, K.

    2017-06-01

    Endoscopy plays a major role in early recognition of cancer which is not externally accessible and therewith in increasing the survival rate. Raman spectroscopic fiber-optical approaches can help to decrease the impact on the patient, increase objectivity in tissue characterization, reduce expenses and provide a significant time advantage in endoscopy. In gastroenterology an early recognition of malign and precursor lesions is relevant. Instantaneous and precise differentiation between adenomas as precursor lesions for cancer and hyperplastic polyps on the one hand and between high and low-risk alterations on the other hand is important. Raman fiber-optical measurements of colon biopsy samples taken during colonoscopy were carried out during a clinical study, and samples of adenocarcinoma (22), tubular adenomas (141), hyperplastic polyps (79) and normal tissue (101) from 151 patients were analyzed. This allows us to focus on the bioinformatic analysis and to set stage for Raman endoscopic measurements. Since spectral differences between normal and cancerous biopsy samples are small, special care has to be taken in data analysis. Using a leave-one-patient-out cross-validation scheme, three different outlier identification methods were investigated to decrease the influence of systematic errors, like a residual risk in misplacement of the sample and spectral dilution of marker bands (esp. cancerous tissue) and therewith optimize the experimental design. Furthermore other validations methods like leave-one-sample-out and leave-one-spectrum-out cross-validation schemes were compared with leave-one-patient-out cross-validation. High-risk lesions were differentiated from low-risk lesions with a sensitivity of 79%, specificity of 74% and an accuracy of 77%, cancer and normal tissue with a sensitivity of 79%, specificity of 83% and an accuracy of 81%. Additionally applied outlier identification enabled us to improve the recognition of neoplastic biopsy samples.

  20. Evaluation of an automated safety surveillance system using risk adjusted sequential probability ratio testing.

    PubMed

    Matheny, Michael E; Normand, Sharon-Lise T; Gross, Thomas P; Marinac-Dabic, Danica; Loyo-Berrios, Nilsa; Vidi, Venkatesan D; Donnelly, Sharon; Resnic, Frederic S

    2011-12-14

    Automated adverse outcome surveillance tools and methods have potential utility in quality improvement and medical product surveillance activities. Their use for assessing hospital performance on the basis of patient outcomes has received little attention. We compared risk-adjusted sequential probability ratio testing (RA-SPRT) implemented in an automated tool to Massachusetts public reports of 30-day mortality after isolated coronary artery bypass graft surgery. A total of 23,020 isolated adult coronary artery bypass surgery admissions performed in Massachusetts hospitals between January 1, 2002 and September 30, 2007 were retrospectively re-evaluated. The RA-SPRT method was implemented within an automated surveillance tool to identify hospital outliers in yearly increments. We used an overall type I error rate of 0.05, an overall type II error rate of 0.10, and a threshold that signaled if the odds of dying 30-days after surgery was at least twice than expected. Annual hospital outlier status, based on the state-reported classification, was considered the gold standard. An event was defined as at least one occurrence of a higher-than-expected hospital mortality rate during a given year. We examined a total of 83 hospital-year observations. The RA-SPRT method alerted 6 events among three hospitals for 30-day mortality compared with 5 events among two hospitals using the state public reports, yielding a sensitivity of 100% (5/5) and specificity of 98.8% (79/80). The automated RA-SPRT method performed well, detecting all of the true institutional outliers with a small false positive alerting rate. Such a system could provide confidential automated notification to local institutions in advance of public reporting providing opportunities for earlier quality improvement interventions.

  1. A practical guide to environmental association analysis in landscape genomics.

    PubMed

    Rellstab, Christian; Gugerli, Felix; Eckert, Andrew J; Hancock, Angela M; Holderegger, Rolf

    2015-09-01

    Landscape genomics is an emerging research field that aims to identify the environmental factors that shape adaptive genetic variation and the gene variants that drive local adaptation. Its development has been facilitated by next-generation sequencing, which allows for screening thousands to millions of single nucleotide polymorphisms in many individuals and populations at reasonable costs. In parallel, data sets describing environmental factors have greatly improved and increasingly become publicly accessible. Accordingly, numerous analytical methods for environmental association studies have been developed. Environmental association analysis identifies genetic variants associated with particular environmental factors and has the potential to uncover adaptive patterns that are not discovered by traditional tests for the detection of outlier loci based on population genetic differentiation. We review methods for conducting environmental association analysis including categorical tests, logistic regressions, matrix correlations, general linear models and mixed effects models. We discuss the advantages and disadvantages of different approaches, provide a list of dedicated software packages and their specific properties, and stress the importance of incorporating neutral genetic structure in the analysis. We also touch on additional important aspects such as sampling design, environmental data preparation, pooled and reduced-representation sequencing, candidate-gene approaches, linearity of allele-environment associations and the combination of environmental association analyses with traditional outlier detection tests. We conclude by summarizing expected future directions in the field, such as the extension of statistical approaches, environmental association analysis for ecological gene annotation, and the need for replication and post hoc validation studies. © 2015 John Wiley & Sons Ltd.

  2. A Robust Method for Ego-Motion Estimation in Urban Environment Using Stereo Camera.

    PubMed

    Ci, Wenyan; Huang, Yingping

    2016-10-17

    Visual odometry estimates the ego-motion of an agent (e.g., vehicle and robot) using image information and is a key component for autonomous vehicles and robotics. This paper proposes a robust and precise method for estimating the 6-DoF ego-motion, using a stereo rig with optical flow analysis. An objective function fitted with a set of feature points is created by establishing the mathematical relationship between optical flow, depth and camera ego-motion parameters through the camera's 3-dimensional motion and planar imaging model. Accordingly, the six motion parameters are computed by minimizing the objective function, using the iterative Levenberg-Marquard method. One of key points for visual odometry is that the feature points selected for the computation should contain inliers as much as possible. In this work, the feature points and their optical flows are initially detected by using the Kanade-Lucas-Tomasi (KLT) algorithm. A circle matching is followed to remove the outliers caused by the mismatching of the KLT algorithm. A space position constraint is imposed to filter out the moving points from the point set detected by the KLT algorithm. The Random Sample Consensus (RANSAC) algorithm is employed to further refine the feature point set, i.e., to eliminate the effects of outliers. The remaining points are tracked to estimate the ego-motion parameters in the subsequent frames. The approach presented here is tested on real traffic videos and the results prove the robustness and precision of the method.

  3. A Robust Method for Ego-Motion Estimation in Urban Environment Using Stereo Camera

    PubMed Central

    Ci, Wenyan; Huang, Yingping

    2016-01-01

    Visual odometry estimates the ego-motion of an agent (e.g., vehicle and robot) using image information and is a key component for autonomous vehicles and robotics. This paper proposes a robust and precise method for estimating the 6-DoF ego-motion, using a stereo rig with optical flow analysis. An objective function fitted with a set of feature points is created by establishing the mathematical relationship between optical flow, depth and camera ego-motion parameters through the camera’s 3-dimensional motion and planar imaging model. Accordingly, the six motion parameters are computed by minimizing the objective function, using the iterative Levenberg–Marquard method. One of key points for visual odometry is that the feature points selected for the computation should contain inliers as much as possible. In this work, the feature points and their optical flows are initially detected by using the Kanade–Lucas–Tomasi (KLT) algorithm. A circle matching is followed to remove the outliers caused by the mismatching of the KLT algorithm. A space position constraint is imposed to filter out the moving points from the point set detected by the KLT algorithm. The Random Sample Consensus (RANSAC) algorithm is employed to further refine the feature point set, i.e., to eliminate the effects of outliers. The remaining points are tracked to estimate the ego-motion parameters in the subsequent frames. The approach presented here is tested on real traffic videos and the results prove the robustness and precision of the method. PMID:27763508

  4. Linear regression based on Minimum Covariance Determinant (MCD) and TELBS methods on the productivity of phytoplankton

    NASA Astrophysics Data System (ADS)

    Gusriani, N.; Firdaniza

    2018-03-01

    The existence of outliers on multiple linear regression analysis causes the Gaussian assumption to be unfulfilled. If the Least Square method is forcedly used on these data, it will produce a model that cannot represent most data. For that, we need a robust regression method against outliers. This paper will compare the Minimum Covariance Determinant (MCD) method and the TELBS method on secondary data on the productivity of phytoplankton, which contains outliers. Based on the robust determinant coefficient value, MCD method produces a better model compared to TELBS method.

  5. Partial Least Squares Calibration Modeling Towards the Multivariate Limit of Detection for Enriched Isotopic Mixtures via Laser Ablation Molecular Isotopic Spectroscopy

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harris, Candace; Profeta, Luisa; Akpovo, Codjo

    The psuedo univariate limit of detection was calculated to compare to the multivariate interval. ompared with results from the psuedounivariate LOD, the multivariate LOD includes other factors (i.e. signal uncertainties) and the reveals the significance in creating models that not only use the analyte’s emission line but also its entire molecular spectra.

  6. Multivariate methods to visualise colour-space and colour discrimination data.

    PubMed

    Hastings, Gareth D; Rubin, Alan

    2015-01-01

    Despite most modern colour spaces treating colour as three-dimensional (3-D), colour data is usually not visualised in 3-D (and two-dimensional (2-D) projection-plane segments and multiple 2-D perspective views are used instead). The objectives of this article are firstly, to introduce a truly 3-D percept of colour space using stereo-pairs, secondly to view colour discrimination data using that platform, and thirdly to apply formal statistics and multivariate methods to analyse the data in 3-D. This is the first demonstration of the software that generated stereo-pairs of RGB colour space, as well as of a new computerised procedure that investigated colour discrimination by measuring colour just noticeable differences (JND). An initial pilot study and thorough investigation of instrument repeatability were performed. Thereafter, to demonstrate the capabilities of the software, five colour-normal and one colour-deficient subject were examined using the JND procedure and multivariate methods of data analysis. Scatter plots of responses were meaningfully examined in 3-D and were useful in evaluating multivariate normality as well as identifying outliers. The extent and direction of the difference between each JND response and the stimulus colour point was calculated and appreciated in 3-D. Ellipsoidal surfaces of constant probability density (distribution ellipsoids) were fitted to response data; the volumes of these ellipsoids appeared useful in differentiating the colour-deficient subject from the colour-normals. Hypothesis tests of variances and covariances showed many statistically significant differences between the results of the colour-deficient subject and those of the colour-normals, while far fewer differences were found when comparing within colour-normals. The 3-D visualisation of colour data using stereo-pairs, as well as the statistics and multivariate methods of analysis employed, were found to be unique and useful tools in the representation and study of colour. Many additional studies using these methods along with the JND and other procedures have been identified and will be reported in future publications. © 2014 The Authors Ophthalmic & Physiological Optics © 2014 The College of Optometrists.

  7. Multicomponent blood lipid analysis by means of near infrared spectroscopy, in geese.

    PubMed

    Bazar, George; Eles, Viktoria; Kovacs, Zoltan; Romvari, Robert; Szabo, Andras

    2016-08-01

    This study provides accurate near infrared (NIR) spectroscopic models on some laboratory determined clinicochemical parameters (i.e. total lipid (5.57±1.95 g/l), triglyceride (2.59±1.36 mmol/l), total cholesterol (3.81±0.68 mmol/l), high density lipoprotein (HDL) cholesterol (2.45±0.58 mmol/l)) of blood serum samples of fattened geese. To increase the performance of multivariate chemometrics, samples significantly deviating from the regression models implying laboratory error were excluded from the final calibration datasets. Reference data of excluded samples having outlier spectra in principal component analysis were not marked as false. Samples deviating from the regression models but having non outlier spectra in PCA were identified as having false reference constituent values. Based on the NIR selection methods, 5% of the reference measurement data were rated as doubtful. The achieved models reached R(2) of 0.864, 0.966, 0.850, 0.793, and RMSE of 0.639 g/l, 0.232 mmol/l, 0.210 mmol/l, 0.241 mmol/l for total lipid, triglyceride, total cholesterol and HDL cholesterol, respectively, during independent validation. Classical analytical techniques focus on single constituents and often require chemicals, time-consuming measurements, and experienced technicians. NIR technique provides a quick, cost effective, non-hazardous alternative method for analysis of several constituents based on one single spectrum of each sample, and it also offers the possibility for looking at the laboratory reference data critically. Evaluation of reference data to identify and exclude falsely analyzed samples can provide warning feedback to the reference laboratory, especially in the case of analyses where laboratory methods are not perfectly suited to the subjected material and there is an increased chance of laboratory error. Copyright © 2016 Elsevier B.V. All rights reserved.

  8. Hot spots of multivariate extreme anomalies in Earth observations

    NASA Astrophysics Data System (ADS)

    Flach, M.; Sippel, S.; Bodesheim, P.; Brenning, A.; Denzler, J.; Gans, F.; Guanche, Y.; Reichstein, M.; Rodner, E.; Mahecha, M. D.

    2016-12-01

    Anomalies in Earth observations might indicate data quality issues, extremes or the change of underlying processes within a highly multivariate system. Thus, considering the multivariate constellation of variables for extreme detection yields crucial additional information over conventional univariate approaches. We highlight areas in which multivariate extreme anomalies are more likely to occur, i.e. hot spots of extremes in global atmospheric Earth observations that impact the Biosphere. In addition, we present the year of the most unusual multivariate extreme between 2001 and 2013 and show that these coincide with well known high impact extremes. Technically speaking, we account for multivariate extremes by using three sophisticated algorithms adapted from computer science applications. Namely an ensemble of the k-nearest neighbours mean distance, a kernel density estimation and an approach based on recurrences is used. However, the impact of atmosphere extremes on the Biosphere might largely depend on what is considered to be normal, i.e. the shape of the mean seasonal cycle and its inter-annual variability. We identify regions with similar mean seasonality by means of dimensionality reduction in order to estimate in each region both the `normal' variance and robust thresholds for detecting the extremes. In addition, we account for challenges like heteroscedasticity in Northern latitudes. Apart from hot spot areas, those anomalies in the atmosphere time series are of particular interest, which can only be detected by a multivariate approach but not by a simple univariate approach. Such an anomalous constellation of atmosphere variables is of interest if it impacts the Biosphere. The multivariate constellation of such an anomalous part of a time series is shown in one case study indicating that multivariate anomaly detection can provide novel insights into Earth observations.

  9. Outlier identification in colorectal surgery should separate elective and nonelective service components.

    PubMed

    Byrne, Ben E; Mamidanna, Ravikrishna; Vincent, Charles A; Faiz, Omar D

    2014-09-01

    The identification of health care institutions with outlying outcomes is of great importance for reporting health care results and for quality improvement. Historically, elective surgical outcomes have received greater attention than nonelective results, although some studies have examined both. Differences in outlier identification between these patient groups have not been adequately explored. The aim of this study was to compare the identification of institutional outliers for mortality after elective and nonelective colorectal resection in England. This was a cohort study using routine administrative data. Ninety-day mortality was determined by using statutory records of death. Adjusted Trust-level mortality rates were calculated by using multiple logistic regression. High and low mortality outliers were identified and compared across funnel plots for elective and nonelective surgery. All English National Health Service Trusts providing colorectal surgery to an unrestricted patient population were studied. Adults admitted for colorectal surgery between April 2006 and March 2012 were included. Segmental colonic or rectal resection was performed. The primary outcome measured was 90-day mortality. Included were 195,118 patients, treated at 147 Trusts. Ninety-day mortality rates after elective and nonelective surgery were 4% and 18%. No unit with high outlying mortality for elective surgery was a high outlier for nonelective mortality and vice versa. Trust level, observed-to-expected mortality for elective and nonelective surgery, was moderately correlated (Spearman ρ = 0.50, p< 0.001). This study relied on administrative data and may be limited by potential flaws in the quality of coding of clinical information. Status as an institutional mortality outlier after elective and nonelective colorectal surgery was not closely related. Therefore, mortality rates should be reported for both patient cohorts separately. This would provide a broad picture of the state of colorectal services and help direct research and quality improvement activities.

  10. A Student’s t Mixture Probability Hypothesis Density Filter for Multi-Target Tracking with Outliers

    PubMed Central

    Liu, Zhuowei; Chen, Shuxin; Wu, Hao; He, Renke; Hao, Lin

    2018-01-01

    In multi-target tracking, the outliers-corrupted process and measurement noises can reduce the performance of the probability hypothesis density (PHD) filter severely. To solve the problem, this paper proposed a novel PHD filter, called Student’s t mixture PHD (STM-PHD) filter. The proposed filter models the heavy-tailed process noise and measurement noise as a Student’s t distribution as well as approximates the multi-target intensity as a mixture of Student’s t components to be propagated in time. Then, a closed PHD recursion is obtained based on Student’s t approximation. Our approach can make full use of the heavy-tailed characteristic of a Student’s t distribution to handle the situations with heavy-tailed process and the measurement noises. The simulation results verify that the proposed filter can overcome the negative effect generated by outliers and maintain a good tracking accuracy in the simultaneous presence of process and measurement outliers. PMID:29617348

  11. An Application of Semi-parametric Estimator with Weighted Matrix of Data Depth in Variance Component Estimation

    NASA Astrophysics Data System (ADS)

    Pan, X. G.; Wang, J. Q.; Zhou, H. Y.

    2013-05-01

    The variance component estimation (VCE) based on semi-parametric estimator with weighted matrix of data depth has been proposed, because the coupling system model error and gross error exist in the multi-source heterogeneous measurement data of space and ground combined TT&C (Telemetry, Tracking and Command) technology. The uncertain model error has been estimated with the semi-parametric estimator model, and the outlier has been restrained with the weighted matrix of data depth. On the basis of the restriction of the model error and outlier, the VCE can be improved and used to estimate weighted matrix for the observation data with uncertain model error or outlier. Simulation experiment has been carried out under the circumstance of space and ground combined TT&C. The results show that the new VCE based on the model error compensation can determine the rational weight of the multi-source heterogeneous data, and restrain the outlier data.

  12. Robust nonlinear system identification: Bayesian mixture of experts using the t-distribution

    NASA Astrophysics Data System (ADS)

    Baldacchino, Tara; Worden, Keith; Rowson, Jennifer

    2017-02-01

    A novel variational Bayesian mixture of experts model for robust regression of bifurcating and piece-wise continuous processes is introduced. The mixture of experts model is a powerful model which probabilistically splits the input space allowing different models to operate in the separate regions. However, current methods have no fail-safe against outliers. In this paper, a robust mixture of experts model is proposed which consists of Student-t mixture models at the gates and Student-t distributed experts, trained via Bayesian inference. The Student-t distribution has heavier tails than the Gaussian distribution, and so it is more robust to outliers, noise and non-normality in the data. Using both simulated data and real data obtained from the Z24 bridge this robust mixture of experts performs better than its Gaussian counterpart when outliers are present. In particular, it provides robustness to outliers in two forms: unbiased parameter regression models, and robustness to overfitting/complex models.

  13. PHOTOMETRIC REDSHIFTS IN THE HAWAII-HUBBLE DEEP FIELD-NORTH (H-HDF-N)

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Yang, G.; Xue, Y. Q.; Kong, X.

    2015-01-01

    We derive photometric redshifts (z {sub phot}) for sources in the entire (∼0.4 deg{sup 2}) Hawaii-Hubble Deep Field-North (H-HDF-N) field with the EAzY code, based on point-spread-function-matched photometry of 15 broad bands from the ultraviolet (U band) to mid-infrared (IRAC 4.5 μm). Our catalog consists of a total of 131,678 sources. We evaluate the z {sub phot} quality by comparing z {sub phot} with spectroscopic redshifts (z {sub spec}) when available, and find a value of normalized median absolute deviation σ{sub NMAD} = 0.029 and an outlier fraction of 5.5% (outliers are defined as sources having |z{sub phot} – z{sub spec} |/(1more » + z{sub spec} ) > 0.15) for non-X-ray sources. More specifically, we obtain σ{sub NMAD} = 0.024 with 2.7% outliers for sources brighter than R = 23 mag, σ{sub NMAD} = 0.035 with 7.4% outliers for sources fainter than R = 23 mag, σ{sub NMAD} = 0.026 with 3.9% outliers for sources having z < 1, and σ{sub NMAD} = 0.034 with 9.0% outliers for sources having z > 1. Our z {sub phot} quality shows an overall improvement over an earlier z {sub phot} work that focused only on the central H-HDF-N area. We also classify each object as a star or galaxy through template spectral energy distribution fitting and complementary morphological parameterization, resulting in 4959 stars and 126,719 galaxies. Furthermore, we match our catalog with the 2 Ms Chandra Deep Field-North main X-ray catalog. For the 462 matched non-stellar X-ray sources (281 having z {sub spec}), we improve their z {sub phot} quality by adding three additional active galactic nucleus templates, achieving σ{sub NMAD} = 0.035 and an outlier fraction of 12.5%. We make our catalog publicly available presenting both photometry and z {sub phot}, and provide guidance on how to make use of our catalog.« less

  14. ROKU: a novel method for identification of tissue-specific genes.

    PubMed

    Kadota, Koji; Ye, Jiazhen; Nakai, Yuji; Terada, Tohru; Shimizu, Kentaro

    2006-06-12

    One of the important goals of microarray research is the identification of genes whose expression is considerably higher or lower in some tissues than in others. We would like to have ways of identifying such tissue-specific genes. We describe a method, ROKU, which selects tissue-specific patterns from gene expression data for many tissues and thousands of genes. ROKU ranks genes according to their overall tissue specificity using Shannon entropy and detects tissues specific to each gene if any exist using an outlier detection method. We evaluated the capacity for the detection of various specific expression patterns using synthetic and real data. We observed that ROKU was superior to a conventional entropy-based method in its ability to rank genes according to overall tissue specificity and to detect genes whose expression pattern are specific only to objective tissues. ROKU is useful for the detection of various tissue-specific expression patterns. The framework is also directly applicable to the selection of diagnostic markers for molecular classification of multiple classes.

  15. Comparison of univariate and multivariate calibration for the determination of micronutrients in pellets of plant materials by laser induced breakdown spectrometry

    NASA Astrophysics Data System (ADS)

    Braga, Jez Willian Batista; Trevizan, Lilian Cristina; Nunes, Lidiane Cristina; Rufini, Iolanda Aparecida; Santos, Dário, Jr.; Krug, Francisco José

    2010-01-01

    The application of laser induced breakdown spectrometry (LIBS) aiming the direct analysis of plant materials is a great challenge that still needs efforts for its development and validation. In this way, a series of experimental approaches has been carried out in order to show that LIBS can be used as an alternative method to wet acid digestions based methods for analysis of agricultural and environmental samples. The large amount of information provided by LIBS spectra for these complex samples increases the difficulties for selecting the most appropriated wavelengths for each analyte. Some applications have suggested that improvements in both accuracy and precision can be achieved by the application of multivariate calibration in LIBS data when compared to the univariate regression developed with line emission intensities. In the present work, the performance of univariate and multivariate calibration, based on partial least squares regression (PLSR), was compared for analysis of pellets of plant materials made from an appropriate mixture of cryogenically ground samples with cellulose as the binding agent. The development of a specific PLSR model for each analyte and the selection of spectral regions containing only lines of the analyte of interest were the best conditions for the analysis. In this particular application, these models showed a similar performance, but PLSR seemed to be more robust due to a lower occurrence of outliers in comparison to the univariate method. Data suggests that efforts dealing with sample presentation and fitness of standards for LIBS analysis must be done in order to fulfill the boundary conditions for matrix independent development and validation.

  16. Outlier Detection for Sensor Systems (ODSS): A MATLAB Macro for Evaluating Microphone Sensor Data Quality.

    PubMed

    Vasta, Robert; Crandell, Ian; Millican, Anthony; House, Leanna; Smith, Eric

    2017-10-13

    Microphone sensor systems provide information that may be used for a variety of applications. Such systems generate large amounts of data. One concern is with microphone failure and unusual values that may be generated as part of the information collection process. This paper describes methods and a MATLAB graphical interface that provides rapid evaluation of microphone performance and identifies irregularities. The approach and interface are described. An application to a microphone array used in a wind tunnel is used to illustrate the methodology.

  17. First order augmentation to tensor voting for boundary inference and multiscale analysis in 3D.

    PubMed

    Tong, Wai-Shun; Tang, Chi-Keung; Mordohai, Philippos; Medioni, Gérard

    2004-05-01

    Most computer vision applications require the reliable detection of boundaries. In the presence of outliers, missing data, orientation discontinuities, and occlusion, this problem is particularly challenging. We propose to address it by complementing the tensor voting framework, which was limited to second order properties, with first order representation and voting. First order voting fields and a mechanism to vote for 3D surface and volume boundaries and curve endpoints in 3D are defined. Boundary inference is also useful for a second difficult problem in grouping, namely, automatic scale selection. We propose an algorithm that automatically infers the smallest scale that can preserve the finest details. Our algorithm then proceeds with progressively larger scales to ensure continuity where it has not been achieved. Therefore, the proposed approach does not oversmooth features or delay the handling of boundaries and discontinuities until model misfit occurs. The interaction of smooth features, boundaries, and outliers is accommodated by the unified representation, making possible the perceptual organization of data in curves, surfaces, volumes, and their boundaries simultaneously. We present results on a variety of data sets to show the efficacy of the improved formalism.

  18. Investigating the Consistency of Stellar Evolution Models with Globular Cluster Observations via the Red Giant Branch Bump

    NASA Astrophysics Data System (ADS)

    Joyce, Meridith; Chaboyer, Brian

    2016-01-01

    Synthetic Red Giant Branch Bump (RGBB) magnitudes are generated with the most recent theoretical stellar evolution models computed with the Dartmouth Stellar Evolution Program (DSEP) code. They are compared to the observational work of Nataf et al. (2013), who present RGBB magnitudes for 72 globular clusters. A DSEP model using a chemical composition with enhanced α capture [α/Fe] =+0.4 and an age of 13 Gyr shows agreement with observations over metallicities ranging from [Fe/H] = 0 to [Fe/H] ≈-1.5, with discrepancy emerging at lower metallicities. A model-independent, density-based outlier detection routine known as the Local Outlying Factor (LOF) algorithm is applied to the observations in order to identify clusters that deviate most in magnitude-metallicity space from the bulk of the observations. Our model's fit is scrutinized with a series of χ^2 routines performed on subsets of the data from which highly anomalous clusters have been selectively removed based on LOF identification. In particular, NGCs 6254, 6681, 6218, and 1904 are tagged recurrently as outliers. The effects of systematic and non-systematic error in metallicity are assessed, and the robustness of observational error bars is investigated.

  19. Medicare Program; Explanation of FY 2004 Outlier Fixed-Loss Threshold as Required by Court Rulings. Clarification.

    PubMed

    2016-01-22

    In accordance with court rulings in cases that challenge the federal fiscal year (FY) 2004 outlier fixed-loss threshold rulemaking, this document provides further explanation of certain methodological choices made in the FY 2004 fixed-loss threshold determination.

  20. Analysis of Anterior Cervical Discectomy and Fusion Healthcare Costs via the Value-Driven Outcomes Tool.

    PubMed

    Reese, Jared C; Karsy, Michael; Twitchell, Spencer; Bisson, Erica F

    2018-04-11

    Examining the costs of single- and multilevel anterior cervical discectomy and fusion (ACDF) is important for the identification of cost drivers and potentially reducing patient costs. A novel tool at our institution provides direct costs for the identification of potential drivers. To assess perioperative healthcare costs for patients undergoing an ACDF. Patients who underwent an elective ACDF between July 2011 and January 2017 were identified retrospectively. Factors adding to total cost were placed into subcategories to identify the most significant contributors, and potential drivers of total cost were evaluated using a multivariable linear regression model. A total of 465 patients (mean, age 53 ± 12 yr, 54% male) met the inclusion criteria for this study. The distribution of total cost was broken down into supplies/implants (39%), facility utilization (37%), physician fees (14%), pharmacy (7%), imaging (2%), and laboratory studies (1%). A multivariable linear regression analysis showed that total cost was significantly affected by the number of levels operated on, operating room time, and length of stay. Costs also showed a narrow distribution with few outliers and did not vary significantly over time. These results suggest that facility utilization and supplies/implants are the predominant cost contributors, accounting for 76% of the total cost of ACDF procedures. Efforts at lowering costs within these categories should make the most impact on providing more cost-effective care.

  1. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bevelhimer, Mark S.; Adams, Marshall; Fortner, Allison M.

    The effect of coal ash exposure on fish health in freshwater communities is largely unknown. Given the large number of possible pathways of effects (e.g., toxicological effect of exposure to multiple metals, physical effects from ash exposure, and food web effects), measurement of only a few health metrics is not likely to give a complete picture. The authors measured a suite of 20 health metrics from 1100+ fish collected from 5 sites (3 affected and 2 reference) near a coal ash spill in east Tennessee over a 4.5-yr period. The metrics represented a wide range of physiological and energetic responsesmore » and were evaluated simultaneously using 2 multivariate techniques. Results from both hierarchical clustering and canonical discriminant analyses suggested that for most speciesXseason combinations, the suite of fish health indicators varied more among years than between spill and reference sites within a year. In a few cases, spill sites from early years in the investigation stood alone or clustered together separate from reference sites and later year spill sites. Outlier groups of fish with relatively unique health profiles were most often from spill sites, suggesting that some response to the ash exposure may have occurred. Results from the 2 multivariate methods suggest that any change in the health status of fish at the spill sites was small and appears to have diminished since the first 2 to 3 yr after the spill.« less

  2. Inclusion Detection in Aluminum Alloys Via Laser-Induced Breakdown Spectroscopy

    NASA Astrophysics Data System (ADS)

    Hudson, Shaymus W.; Craparo, Joseph; De Saro, Robert; Apelian, Diran

    2018-04-01

    Laser-induced breakdown spectroscopy (LIBS) has shown promise as a technique to quickly determine molten metal chemistry in real time. Because of its characteristics, LIBS could also be used as a technique to sense for unwanted inclusions and impurities. Simulated Al2O3 inclusions were added to molten aluminum via a metal-matrix composite. LIBS was performed in situ to determine whether particles could be detected. Outlier analysis on oxygen signal was performed on LIBS data and compared to oxide volume fraction measured through metallography. It was determined that LIBS could differentiate between melts with different amounts of inclusions by monitoring the fluctuations in signal for elements of interest. LIBS shows promise as an enabling tool for monitoring metal cleanliness.

  3. 32 CFR 199.15 - Quality and utilization review peer review organization program.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... different geographical locations or for different types of providers. (B) For healthcare services provided... TRICARE/CHAMPUS Policy Manual, or until the approved transplant occurs. (D) For healthcare services.... (3) Outlier review. Claims which qualify for additional payment as a long-stay outlier or as a cost...

  4. 32 CFR 199.15 - Quality and utilization review peer review organization program.

    Code of Federal Regulations, 2014 CFR

    2014-07-01

    ... different geographical locations or for different types of providers. (B) For healthcare services provided... TRICARE/CHAMPUS Policy Manual, or until the approved transplant occurs. (D) For healthcare services.... (3) Outlier review. Claims which qualify for additional payment as a long-stay outlier or as a cost...

  5. Robust Statistics: What They Are, and Why They Are So Important

    ERIC Educational Resources Information Center

    Corlu, Sencer M.

    2009-01-01

    The problem with "classical" statistics all invoking the mean is that these estimates are notoriously influenced by atypical scores (outliers), partly because the mean itself is differentially influenced by outliers. In theory, "modern" statistics may generate more replicable characterizations of data, because at least in some…

  6. Outliers: Elementary Teachers Who Actually Teach Social Studies

    ERIC Educational Resources Information Center

    Anderson, Derek

    2014-01-01

    This mixed methods study identified six elementary teachers, who, despite the widespread marginalization of elementary social studies, spent considerable time on the subject. These six outliers from a sample of forty-six Michigan elementary teachers were interviewed, and their teaching was observed to better understand how and why they deviate…

  7. Outliers, Cheese, and Rhizomes: Variations on a Theme of Limitation

    ERIC Educational Resources Information Center

    Stone, Lynda

    2011-01-01

    All research has limitations, for example, from paradigm, concept, theory, tradition, and discipline. In this article Lynda Stone describes three exemplars that are variations on limitation and are "extraordinary" in that they change what constitutes future research in each domain. Malcolm Gladwell's present day study of outliers makes a…

  8. The Utility of Robust Means in Statistics

    ERIC Educational Resources Information Center

    Goodwyn, Fara

    2012-01-01

    Location estimates calculated from heuristic data were examined using traditional and robust statistical methods. The current paper demonstrates the impact outliers have on the sample mean and proposes robust methods to control for outliers in sample data. Traditional methods fail because they rely on the statistical assumptions of normality and…

  9. Evaluation of a Multivariate Syndromic Surveillance System for West Nile Virus.

    PubMed

    Faverjon, Céline; Andersson, M Gunnar; Decors, Anouk; Tapprest, Jackie; Tritz, Pierre; Sandoz, Alain; Kutasi, Orsolya; Sala, Carole; Leblond, Agnès

    2016-06-01

    Various methods are currently used for the early detection of West Nile virus (WNV) but their outputs are not quantitative and/or do not take into account all available information. Our study aimed to test a multivariate syndromic surveillance system to evaluate if the sensitivity and the specificity of detection of WNV could be improved. Weekly time series data on nervous syndromes in horses and mortality in both horses and wild birds were used. Baselines were fitted to the three time series and used to simulate 100 years of surveillance data. WNV outbreaks were simulated and inserted into the baselines based on historical data and expert opinion. Univariate and multivariate syndromic surveillance systems were tested to gauge how well they detected the outbreaks; detection was based on an empirical Bayesian approach. The systems' performances were compared using measures of sensitivity, specificity, and area under receiver operating characteristic curve (AUC). When data sources were considered separately (i.e., univariate systems), the best detection performance was obtained using the data set of nervous symptoms in horses compared to those of bird and horse mortality (AUCs equal to 0.80, 0.75, and 0.50, respectively). A multivariate outbreak detection system that used nervous symptoms in horses and bird mortality generated the best performance (AUC = 0.87). The proposed approach is suitable for performing multivariate syndromic surveillance of WNV outbreaks. This is particularly relevant, given that a multivariate surveillance system performed better than a univariate approach. Such a surveillance system could be especially useful in serving as an alert for the possibility of human viral infections. This approach can be also used for other diseases for which multiple sources of evidence are available.

  10. Detection of Anomalies in Hydrometric Data Using Artificial Intelligence Techniques

    NASA Astrophysics Data System (ADS)

    Lauzon, N.; Lence, B. J.

    2002-12-01

    This work focuses on the detection of anomalies in hydrometric data sequences, such as 1) outliers, which are individual data having statistical properties that differ from those of the overall population; 2) shifts, which are sudden changes over time in the statistical properties of the historical records of data; and 3) trends, which are systematic changes over time in the statistical properties. For the purpose of the design and management of water resources systems, it is important to be aware of these anomalies in hydrometric data, for they can induce a bias in the estimation of water quantity and quality parameters. These anomalies may be viewed as specific patterns affecting the data, and therefore pattern recognition techniques can be used for identifying them. However, the number of possible patterns is very large for each type of anomaly and consequently large computing capacities are required to account for all possibilities using the standard statistical techniques, such as cluster analysis. Artificial intelligence techniques, such as the Kohonen neural network and fuzzy c-means, are clustering techniques commonly used for pattern recognition in several areas of engineering and have recently begun to be used for the analysis of natural systems. They require much less computing capacity than the standard statistical techniques, and therefore are well suited for the identification of outliers, shifts and trends in hydrometric data. This work constitutes a preliminary study, using synthetic data representing hydrometric data that can be found in Canada. The analysis of the results obtained shows that the Kohonen neural network and fuzzy c-means are reasonably successful in identifying anomalies. This work also addresses the problem of uncertainties inherent to the calibration procedures that fit the clusters to the possible patterns for both the Kohonen neural network and fuzzy c-means. Indeed, for the same database, different sets of clusters can be established with these calibration procedures. A simple method for analyzing uncertainties associated with the Kohonen neural network and fuzzy c-means is developed here. The method combines the results from several sets of clusters, either from the Kohonen neural network or fuzzy c-means, so as to provide an overall diagnosis as to the identification of outliers, shifts and trends. The results indicate an improvement in the performance for identifying anomalies when the method of combining cluster sets is used, compared with when only one cluster set is used.

  11. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus)

    PubMed Central

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus. PMID:27336696

  12. Genome-wide evidence for divergent selection between populations of a major agricultural pathogen.

    PubMed

    Hartmann, Fanny E; McDonald, Bruce A; Croll, Daniel

    2018-06-01

    The genetic and environmental homogeneity in agricultural ecosystems is thought to impose strong and uniform selection pressures. However, the impact of this selection on plant pathogen genomes remains largely unknown. We aimed to identify the proportion of the genome and the specific gene functions under positive selection in populations of the fungal wheat pathogen Zymoseptoria tritici. First, we performed genome scans in four field populations that were sampled from different continents and on distinct wheat cultivars to test which genomic regions are under recent selection. Based on extended haplotype homozygosity and composite likelihood ratio tests, we identified 384 and 81 selective sweeps affecting 4% and 0.5% of the 35 Mb core genome, respectively. We found differences both in the number and the position of selective sweeps across the genome between populations. Using a XtX-based outlier detection approach, we identified 51 extremely divergent genomic regions between the allopatric populations, suggesting that divergent selection led to locally adapted pathogen populations. We performed an outlier detection analysis between two sympatric populations infecting two different wheat cultivars to identify evidence for host-driven selection. Selective sweep regions harboured genes that are likely to play a role in successfully establishing host infections. We also identified secondary metabolite gene clusters and an enrichment in genes encoding transporter and protein localization functions. The latter gene functions mediate responses to environmental stress, including interactions with the host. The distinct gene functions under selection indicate that both local host genotypes and abiotic factors contributed to local adaptation. © 2018 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.

  13. Genome-Wide SNP Discovery, Genotyping and Their Preliminary Applications for Population Genetic Inference in Spotted Sea Bass (Lateolabrax maculatus).

    PubMed

    Wang, Juan; Xue, Dong-Xiu; Zhang, Bai-Dong; Li, Yu-Long; Liu, Bing-Jian; Liu, Jin-Xian

    2016-01-01

    Next-generation sequencing and the collection of genome-wide single-nucleotide polymorphisms (SNPs) allow identifying fine-scale population genetic structure and genomic regions under selection. The spotted sea bass (Lateolabrax maculatus) is a non-model species of ecological and commercial importance and widely distributed in northwestern Pacific. A total of 22 648 SNPs was discovered across the genome of L. maculatus by paired-end sequencing of restriction-site associated DNA (RAD-PE) for 30 individuals from two populations. The nucleotide diversity (π) for each population was 0.0028±0.0001 in Dandong and 0.0018±0.0001 in Beihai, respectively. Shallow but significant genetic differentiation was detected between the two populations analyzed by using both the whole data set (FST = 0.0550, P < 0.001) and the putatively neutral SNPs (FST = 0.0347, P < 0.001). However, the two populations were highly differentiated based on the putatively adaptive SNPs (FST = 0.6929, P < 0.001). Moreover, a total of 356 SNPs representing 298 unique loci were detected as outliers putatively under divergent selection by FST-based outlier tests as implemented in BAYESCAN and LOSITAN. Functional annotation of the contigs containing putatively adaptive SNPs yielded hits for 22 of 55 (40%) significant BLASTX matches. Candidate genes for local selection constituted a wide array of functions, including binding, catalytic and metabolic activities, etc. The analyses with the SNPs developed in the present study highlighted the importance of genome-wide genetic variation for inference of population structure and local adaptation in L. maculatus.

  14. The metal-poor stellar halo in RAVE-TGAS and its implications for the velocity distribution of dark matter

    NASA Astrophysics Data System (ADS)

    Herzog-Arbeitman, Jonah; Lisanti, Mariangela; Necib, Lina

    2018-04-01

    The local velocity distribution of dark matter plays an integral role in interpreting the results from direct detection experiments. We previously showed that metal-poor halo stars serve as excellent tracers of the virialized dark matter velocity distribution using a high-resolution hydrodynamic simulation of a Milky Way-like halo. In this paper, we take advantage of the first Gaia data release, coupled with spectroscopic measurements from the RAdial Velocity Experiment (RAVE), to study the kinematics of stars belonging to the metal-poor halo within an average distance of ~5 kpc of the Sun. We study stars with iron abundances [Fe/H] < ‑1.5 and ‑1.8 that are located more than 1.5 kpc from the Galactic plane. Using a Gaussian mixture model analysis, we identify the stars that belong to the halo population, as well as some kinematic outliers. We find that both metallicity samples have similar velocity distributions for the halo component, within uncertainties. Assuming that the stellar halo velocities adequately trace the virialized dark matter, we study the implications for direct detection experiments. The Standard Halo Model, which is typically assumed for dark matter, is discrepant with the empirical distribution by ~6σ, predicts fewer high-speed particles, and is anisotropic. As a result, the Standard Halo Model overpredicts the nuclear scattering rate for dark matter masses below ~10 GeV. The kinematic outliers that we identify may potentially be correlated with dark matter substructure, though further study is needed to establish this correspondence.

  15. 42 CFR 412.84 - Payment for extraordinarily high-cost cases (cost outliers).

    Code of Federal Regulations, 2011 CFR

    2011-10-01

    ... Payments for Outlier Cases, Special Treatment Payment for New Technology, and Payment Adjustment for... circumstances: (i) New hospitals that have not yet submitted their first Medicare cost report. (For this purpose, a new hospital is defined as an entity that has not accepted assignment of an existing hospital's...

  16. Two Examples of Transformations When There Are Possible Outliers.

    DTIC Science & Technology

    1981-01-01

    potential outliers and, as in Carroll (1980), the influence function of A Is not bounded if k is monotone, A word of caution about "Hampel" is in order...normal linear model fits well. An acceptable analysis would thus estimate X as somewhere near I. As predicted by the influence function calculations in

  17. Variation in caesarean section rates in the US: outliers, damned outliers, and statistics.

    PubMed

    Smith, Gordon C S

    2014-10-01

    Gordon C. Smith discusses the study by Katy Kozhimannil and colleagues that examines variations in cesarean section rates in the US and argues for the need for high-quality routine data collection to better understand the reasons for these variations. Please see later in the article for the Editors' Summary.

  18. 40 CFR 86.1728-99 - Compliance with emission standards.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... part to test for irregular data from a durability-data set. If any data point is identified as a... apply both the outlier procedure and averaging to the same data set, the outlier procedure shall be... shall be determined from the exhaust emission results of the durability-data vehicle(s) for each engine...

  19. 40 CFR 86.1728-99 - Compliance with emission standards.

    Code of Federal Regulations, 2013 CFR

    2013-07-01

    ... part to test for irregular data from a durability-data set. If any data point is identified as a... apply both the outlier procedure and averaging to the same data set, the outlier procedure shall be... shall be determined from the exhaust emission results of the durability-data vehicle(s) for each engine...

  20. 40 CFR 86.1728-99 - Compliance with emission standards.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... part to test for irregular data from a durability-data set. If any data point is identified as a... apply both the outlier procedure and averaging to the same data set, the outlier procedure shall be... shall be determined from the exhaust emission results of the durability-data vehicle(s) for each engine...

  1. 42 CFR 412.84 - Payment for extraordinarily high-cost cases (cost outliers).

    Code of Federal Regulations, 2010 CFR

    2010-10-01

    ... Certain Replaced Devices Payment for Outlier Cases § 412.84 Payment for extraordinarily high-cost cases... each hospital based on the latest available settled cost report for that hospital and charge data for..., whichever is from the latest cost reporting period. (3) For discharges occurring on or after August 8, 2003...

  2. Application of adaptive Kalman filter in vehicle laser Doppler velocimetry

    NASA Astrophysics Data System (ADS)

    Fan, Zhe; Sun, Qiao; Du, Lei; Bai, Jie; Liu, Jingyun

    2018-03-01

    Due to the variation of road conditions and motor characteristics of vehicle, great root-mean-square (rms) error and outliers would be caused. Application of Kalman filter in laser Doppler velocimetry(LDV) is important to improve the velocity measurement accuracy. In this paper, the state-space model is built by using current statistical model. A strategy containing two steps is adopted to make the filter adaptive and robust. First, the acceleration variance is adaptively adjusted by using the difference of predictive observation and measured observation. Second, the outliers would be identified and the measured noise variance would be adjusted according to the orthogonal property of innovation to reduce the impaction of outliers. The laboratory rotating table experiments show that adaptive Kalman filter greatly reduces the rms error from 0.59 cm/s to 0.22 cm/s and has eliminated all the outliers. Road experiments compared with a microwave radar show that the rms error of LDV is 0.0218 m/s, and it proves that the adaptive Kalman filtering is suitable for vehicle speed signal processing.

  3. Stoicism, the physician, and care of medical outliers

    PubMed Central

    Papadimos, Thomas J

    2004-01-01

    Background Medical outliers present a medical, psychological, social, and economic challenge to the physicians who care for them. The determinism of Stoic thought is explored as an intellectual basis for the pursuit of a correct mental attitude that will provide aid and comfort to physicians who care for medical outliers, thus fostering continued physician engagement in their care. Discussion The Stoic topics of good, the preferable, the morally indifferent, living consistently, and appropriate actions are reviewed. Furthermore, Zeno's cardinal virtues of Justice, Temperance, Bravery, and Wisdom are addressed, as are the Stoic passions of fear, lust, mental pain, and mental pleasure. These concepts must be understood by physicians if they are to comprehend and accept the Stoic view as it relates to having the proper attitude when caring for those with long-term and/or costly illnesses. Summary Practicing physicians, especially those that are hospital based, and most assuredly those practicing critical care medicine, will be emotionally challenged by the medical outlier. A Stoic approach to such a social and psychological burden may be of benefit. PMID:15588293

  4. The Effect of Ionospheric Models on Electromagnetic Pulse Locations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fenimore, Edward E.; Triplett, Laurie A.

    2014-07-01

    Locations of electromagnetic pulses (EMPs) determined by time-of-arrival (TOA) often have outliers with significantly larger errors than expected. In the past, these errors were thought to arise from high order terms in the Appleton-Hartree equation. We simulated 1000 events randomly spread around the Earth into a constellation of 22 GPS satellites. We used four different ionospheres: “simple” where the time delay goes as the inverse of the frequency-squared, “full Appleton-Hartree”, the “BobRD integrals” and a full raytracing code. The simple and full Appleton-Hartree ionospheres do not show outliers whereas the BobRD and raytracing do. This strongly suggests that the causemore » of the outliers is not additional terms in the Appleton-Hartree equation, but rather is due to the additional path length due to refraction. A method to fix the outliers is suggested based on fitting a time to the delays calculated at the 5 GPS frequencies with BobRD and simple ionospheres. The difference in time is used as a correction to the TOAs.« less

  5. Performance of digital RGB reflectance color extraction for plaque lesion

    NASA Astrophysics Data System (ADS)

    Hashim, Hadzli; Taib, Mohd Nasir; Jailani, Rozita; Sulaiman, Saadiah; Baba, Roshidah

    2005-01-01

    Several clinical psoriasis lesion groups are been studied for digital RGB color features extraction. Previous works have used samples size that included all the outliers lying beyond the standard deviation factors from the peak histograms. This paper described the statistical performances of the RGB model with and without removing these outliers. Plaque lesion is experimented with other types of psoriasis. The statistical tests are compared with respect to three samples size; the original 90 samples, the first size reduction by removing outliers from 2 standard deviation distances (2SD) and the second size reduction by removing outliers from 1 standard deviation distance (1SD). Quantification of data images through the normal/direct and differential of the conventional reflectance method is considered. Results performances are concluded by observing the error plots with 95% confidence interval and findings of the inference T-tests applied. The statistical tests outcomes have shown that B component for conventional differential method can be used to distinctively classify plaque from the other psoriasis groups in consistent with the error plots finding with an improvement in p-value greater than 0.5.

  6. Robust Correlation Analyses: False Positive and Power Validation Using a New Open Source Matlab Toolbox

    PubMed Central

    Pernet, Cyril R.; Wilcox, Rand; Rousselet, Guillaume A.

    2012-01-01

    Pearson’s correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab(R) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand. PMID:23335907

  7. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox.

    PubMed

    Pernet, Cyril R; Wilcox, Rand; Rousselet, Guillaume A

    2012-01-01

    Pearson's correlation measures the strength of the association between two variables. The technique is, however, restricted to linear associations and is overly sensitive to outliers. Indeed, a single outlier can result in a highly inaccurate summary of the data. Yet, it remains the most commonly used measure of association in psychology research. Here we describe a free Matlab((R)) based toolbox (http://sourceforge.net/projects/robustcorrtool/) that computes robust measures of association between two or more random variables: the percentage-bend correlation and skipped-correlations. After illustrating how to use the toolbox, we show that robust methods, where outliers are down weighted or removed and accounted for in significance testing, provide better estimates of the true association with accurate false positive control and without loss of power. The different correlation methods were tested with normal data and normal data contaminated with marginal or bivariate outliers. We report estimates of effect size, false positive rate and power, and advise on which technique to use depending on the data at hand.

  8. Identification and classification of silks using infrared spectroscopy

    PubMed Central

    Boulet-Audet, Maxime; Vollrath, Fritz; Holland, Chris

    2015-01-01

    ABSTRACT Lepidopteran silks number in the thousands and display a vast diversity of structures, properties and industrial potential. To map this remarkable biochemical diversity, we present an identification and screening method based on the infrared spectra of native silk feedstock and cocoons. Multivariate analysis of over 1214 infrared spectra obtained from 35 species allowed us to group silks into distinct hierarchies and a classification that agrees well with current phylogenetic data and taxonomies. This approach also provides information on the relative content of sericin, calcium oxalate, phenolic compounds, poly-alanine and poly(alanine-glycine) β-sheets. It emerged that the domesticated mulberry silkmoth Bombyx mori represents an outlier compared with other silkmoth taxa in terms of spectral properties. Interestingly, Epiphora bauhiniae was found to contain the highest amount of β-sheets reported to date for any wild silkmoth. We conclude that our approach provides a new route to determine cocoon chemical composition and in turn a novel, biological as well as material, classification of silks. PMID:26347557

  9. Visible and near infrared spectroscopy coupled to random forest to quantify some soil quality parameters

    NASA Astrophysics Data System (ADS)

    de Santana, Felipe Bachion; de Souza, André Marcelo; Poppi, Ronei Jesus

    2018-02-01

    This study evaluates the use of visible and near infrared spectroscopy (Vis-NIRS) combined with multivariate regression based on random forest to quantify some quality soil parameters. The parameters analyzed were soil cation exchange capacity (CEC), sum of exchange bases (SB), organic matter (OM), clay and sand present in the soils of several regions of Brazil. Current methods for evaluating these parameters are laborious, timely and require various wet analytical methods that are not adequate for use in precision agriculture, where faster and automatic responses are required. The random forest regression models were statistically better than PLS regression models for CEC, OM, clay and sand, demonstrating resistance to overfitting, attenuating the effect of outlier samples and indicating the most important variables for the model. The methodology demonstrates the potential of the Vis-NIR as an alternative for determination of CEC, SB, OM, sand and clay, making possible to develop a fast and automatic analytical procedure.

  10. On the identification of Dragon Kings among extreme-valued outliers

    NASA Astrophysics Data System (ADS)

    Riva, M.; Neuman, S. P.; Guadagnini, A.

    2013-07-01

    Extreme values of earth, environmental, ecological, physical, biological, financial and other variables often form outliers to heavy tails of empirical frequency distributions. Quite commonly such tails are approximated by stretched exponential, log-normal or power functions. Recently there has been an interest in distinguishing between extreme-valued outliers that belong to the parent population of most data in a sample and those that do not. The first type, called Gray Swans by Nassim Nicholas Taleb (often confused in the literature with Taleb's totally unknowable Black Swans), is drawn from a known distribution of the tails which can thus be extrapolated beyond the range of sampled values. However, the magnitudes and/or space-time locations of unsampled Gray Swans cannot be foretold. The second type of extreme-valued outliers, termed Dragon Kings by Didier Sornette, may in his view be sometimes predicted based on how other data in the sample behave. This intriguing prospect has recently motivated some authors to propose statistical tests capable of identifying Dragon Kings in a given random sample. Here we apply three such tests to log air permeability data measured on the faces of a Berea sandstone block and to synthetic data generated in a manner statistically consistent with these measurements. We interpret the measurements to be, and generate synthetic data that are, samples from α-stable sub-Gaussian random fields subordinated to truncated fractional Gaussian noise (tfGn). All these data have frequency distributions characterized by power-law tails with extreme-valued outliers about the tail edges.

  11. A linkage disequilibrium perspective on the genetic mosaic of speciation in two hybridizing Mediterranean white oaks

    PubMed Central

    Goicoechea, P G; Herrán, A; Durand, J; Bodénès, C; Plomion, C; Kremer, A

    2015-01-01

    We analyzed the genetic mosaic of speciation in two hybridizing Mediterranean white oaks from the Iberian Peninsula (Quercus faginea Lamb. and Quercus pyrenaica Willd.). The two species show ecological divergence in flowering phenology, leaf morphology and composition, and in their basic or acidic soil preferences. Ninety expressed sequence tag-simple sequence repeats (EST-SSRs) and eight nuclear SSRs were genotyped in 96 trees from each species. Genotyping was designed in two steps. First, we used 69 markers evenly distributed over the 12 linkage groups (LGs) of the oak linkage map to confirm the species genetic identity of the sampled genotypes, and searched for differentiation outliers. Then, we genotyped 29 additional markers from the chromosome bins containing the outliers and repeated the multilocus scans. We found one or two additional outliers within four saturated bins, thus confirming that outliers are organized into clusters. Linkage disequilibrium (LD) was extensive; even for loosely linked and for independent markers. Consequently, score tests for association between two-marker haplotypes and the ‘species trait' showed a broad genomic divergence, although substantial variation across the genome and within LGs was also observed. We discuss the influence of several confounding effects on neutrality tests and review the evolutionary processes leading to extensive LD. Finally, we examine how LD analyses within regions that contain outlier clusters and quantitative trait loci can help to identify regions of divergence and/or genomic hitchhiking in the light of predictions from ecological speciation theory. PMID:25515016

  12. Anomalously low C6+/C5+ ratio in solar wind: ACE/SWICS observation

    NASA Astrophysics Data System (ADS)

    Zhao, L.; Landi, E.; Kocher, M.; Lepri, S. T.; Fisk, L. A.; Zurbuchen, T. H.

    2016-03-01

    The Carbon and Oxygen ionization states in the solar wind plasma freeze-in within 2 solar radii (Rs) from the solar surface, and then they do not change as they propagate with the solar wind into the heliosphere. Therefore, the O7+/O6+ and C6+/C5+ charge state ratios measured in situ maintain a record of the thermal properties (electron temperature and density) of the inner corona where the solar wind originates. Since these two ratios freeze-in at very similar height, they are expected to be correlated. However, an investigation of the correlation between these two ratios as measured by ACE/SWICS instrument from 1998 to 201l shows that there is a subset of "Outliers" departing from the expected correlation. We find about 49.4% of these Outliers is related to the Interplanetary Coronal Mass Ejections (ICMEs), while 49.6% of them is slow speed wind (Vp < 500 km/s) and about 1.0% of them is fast solar wind (Vp > 500 km/s). We compare the outlier-slow-speed wind with the normal slow wind (defined as Vp < 500 km/s and O7+/O6+ > 0.2) and find that the reason that causes the Outliers to depart from the correlation is their extremely depleted C6+/C5+ ratio which is decreased by 80% compared to the normal slow wind. We discuss the implication of the Outlier solar wind for the solar wind acceleration mechanism.

  13. [Meta analysis of three-dimensional printing patient-specific instrumentation versus conventional instrumentation in total knee arthroplasty].

    PubMed

    Ren, J T; Xu, C; Wang, J S; Liu, X L

    2017-10-01

    Objective: To evaluate the effects of three-dimensional printing patient-specific instrumentation(PSI) versus conventional instrumentation(CI) in the total knee arthroplasty. Methods: According to "patient-specific" , "patient-matched" , "custom" , "Instrumentation" , "Guide Instrumentation" , "cutting blocks" , "total knee arthroplasty" , "total knee replacement" , "TKA" and "TKR" , the literature on PubMed, EMbase, Cochrane library, CBM and WanFang were searched. According to the inclusion and exclusion criteria, the high quality randomized control trial (RCT) studies about three-dimensional (3D) printing patient-specific instrumentation versus conventional instrumentation in the total knee arthroplasty were collected. The post-operative limb mechanical axis outlier, the position of the components outlier, post-operative knee function, operative time, post-operative blood transfusion and complications were analyzed by RevMan 5.3 software. Results: A total of 13 high quality RCT studies were included. The results of Meta-analysis show that there were no statistical differences in the post-operative limb mechanical axis outlier( Z =0.55, P =0.58, 95% CI: 0.78 to 1.56), femoral coronal component outlier( Z =0.38, P =0.71, 95% CI: 0.69 to 1.72), tibia coronal component outlier( Z =1.95, P =0.05, 95% CI: 1.00 to 3.38), femoral rotation angle outlier( Z =0.36, P =0.72, 95% CI: 0.49 to 1.64), post-operative knee function( Z =1.18, P =0.24, 95% CI : -0.66 to 2.63), post-operative blood transfusions( Z =0.74, P =0.46, 95% CI: -0.10 to 0.05) and complications( Z =0.18, P =0.86, 95% CI: -0.07 to 0.05) between the PSI group and the CI group. But there are statistical differences in the operation time( Z =2.66, P =0.01, 95% CI: -15.97 to -2.41)and tibia sagittal component outlier ( Z =3.69, P =0.00, 95% CI: 1.43 to 3.18)between the PSI group and the CI group. Conclusions: In the primary total knee arthroplasty the PSI is not superior over the CI for the knee without severe knee varus or valgus deformity or contracture deformity, without the deformity around the knee and without the knee bone loss and obesity. The use of PSI in the primary total knee arthroplasty are not recommend.

  14. Automated peroperative assessment of stents apposition from OCT pullbacks.

    PubMed

    Dubuisson, Florian; Péry, Emilie; Ouchchane, Lemlih; Combaret, Nicolas; Kauffmann, Claude; Souteyrand, Géraud; Motreff, Pascal; Sarry, Laurent

    2015-04-01

    This study's aim was to control the stents apposition by automatically analyzing endovascular optical coherence tomography (OCT) sequences. Lumen is detected using threshold, morphological and gradient operators to run a Dijkstra algorithm. Wrong detection tagged by the user and caused by bifurcation, struts'presence, thrombotic lesions or dissections can be corrected using a morphing algorithm. Struts are also segmented by computing symmetrical and morphological operators. Euclidian distance between detected struts and wall artery initializes a stent's complete distance map and missing data are interpolated with thin-plate spline functions. Rejection of detected outliers, regularization of parameters by generalized cross-validation and using the one-side cyclic property of the map also optimize accuracy. Several indices computed from the map provide quantitative values of malapposition. Algorithm was run on four in-vivo OCT sequences including different incomplete stent apposition's cases. Comparison with manual expert measurements validates the segmentation׳s accuracy and shows an almost perfect concordance of automated results. Copyright © 2014 Elsevier Ltd. All rights reserved.

  15. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  16. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.

  17. A Simulation-Based Comparison of Several Stochastic Linear Regression Methods in the Presence of Outliers.

    ERIC Educational Resources Information Center

    Rule, David L.

    Several regression methods were examined within the framework of weighted structural regression (WSR), comparing their regression weight stability and score estimation accuracy in the presence of outlier contamination. The methods compared are: (1) ordinary least squares; (2) WSR ridge regression; (3) minimum risk regression; (4) minimum risk 2;…

  18. Investigation of IRT-Based Equating Methods in the Presence of Outlier Common Items

    ERIC Educational Resources Information Center

    Hu, Huiqin; Rogers, W. Todd; Vukmirovic, Zarko

    2008-01-01

    Common items with inconsistent b-parameter estimates may have a serious impact on item response theory (IRT)--based equating results. To find a better way to deal with the outlier common items with inconsistent b-parameters, the current study investigated the comparability of 10 variations of four IRT-based equating methods (i.e., concurrent…

  19. Unsupervised Outlier Profile Analysis

    PubMed Central

    Ghosh, Debashis; Li, Song

    2014-01-01

    In much of the analysis of high-throughput genomic data, “interesting” genes have been selected based on assessment of differential expression between two groups or generalizations thereof. Most of the literature focuses on changes in mean expression or the entire distribution. In this article, we explore the use of C(α) tests, which have been applied in other genomic data settings. Their use for the outlier expression problem, in particular with continuous data, is problematic but nevertheless motivates new statistics that give an unsupervised analog to previously developed outlier profile analysis approaches. Some simulation studies are used to evaluate the proposal. A bivariate extension is described that can accommodate data from two platforms on matched samples. The proposed methods are applied to data from a prostate cancer study. PMID:25452686

  20. Efficient Robust Regression via Two-Stage Generalized Empirical Likelihood

    PubMed Central

    Bondell, Howard D.; Stefanski, Leonard A.

    2013-01-01

    Large- and finite-sample efficiency and resistance to outliers are the key goals of robust statistics. Although often not simultaneously attainable, we develop and study a linear regression estimator that comes close. Efficiency obtains from the estimator’s close connection to generalized empirical likelihood, and its favorable robustness properties are obtained by constraining the associated sum of (weighted) squared residuals. We prove maximum attainable finite-sample replacement breakdown point, and full asymptotic efficiency for normal errors. Simulation evidence shows that compared to existing robust regression estimators, the new estimator has relatively high efficiency for small sample sizes, and comparable outlier resistance. The estimator is further illustrated and compared to existing methods via application to a real data set with purported outliers. PMID:23976805

  1. Toward the detection of abnormal chest radiographs the way radiologists do it

    NASA Astrophysics Data System (ADS)

    Alzubaidi, Mohammad; Patel, Ameet; Panchanathan, Sethuraman; Black, John A., Jr.

    2011-03-01

    Computer Aided Detection (CADe) and Computer Aided Diagnosis (CADx) are relatively recent areas of research that attempt to employ feature extraction, pattern recognition, and machine learning algorithms to aid radiologists in detecting and diagnosing abnormalities in medical images. However, these computational methods are based on the assumption that there are distinct classes of abnormalities, and that each class has some distinguishing features that set it apart from other classes. However, abnormalities in chest radiographs tend to be very heterogeneous. The literature suggests that thoracic (chest) radiologists develop their ability to detect abnormalities by developing a sense of what is normal, so that anything that is abnormal attracts their attention. This paper discusses an approach to CADe that is based on a technique called anomaly detection (which aims to detect outliers in data sets) for the purpose of detecting atypical regions in chest radiographs. However, in order to apply anomaly detection to chest radiographs, it is necessary to develop a basis for extracting features from corresponding anatomical locations in different chest radiographs. This paper proposes a method for doing this, and describes how it can be used to support CADe.

  2. feets: feATURE eXTRACTOR for tIME sERIES

    NASA Astrophysics Data System (ADS)

    Cabral, Juan; Sanchez, Bruno; Ramos, Felipe; Gurovich, Sebastián; Granitto, Pablo; VanderPlas, Jake

    2018-06-01

    feets characterizes and analyzes light-curves from astronomical photometric databases for modelling, classification, data cleaning, outlier detection and data analysis. It uses machine learning algorithms to determine the numerical descriptors that characterize and distinguish the different variability classes of light-curves; these range from basic statistical measures such as the mean or standard deviation to complex time-series characteristics such as the autocorrelation function. The library is not restricted to the astronomical field and could also be applied to any kind of time series. This project is a derivative work of FATS (ascl:1711.017).

  3. The LSST Data Mining Research Agenda

    NASA Astrophysics Data System (ADS)

    Borne, K.; Becla, J.; Davidson, I.; Szalay, A.; Tyson, J. A.

    2008-12-01

    We describe features of the LSST science database that are amenable to scientific data mining, object classification, outlier identification, anomaly detection, image quality assurance, and survey science validation. The data mining research agenda includes: scalability (at petabytes scales) of existing machine learning and data mining algorithms; development of grid-enabled parallel data mining algorithms; designing a robust system for brokering classifications from the LSST event pipeline (which may produce 10,000 or more event alerts per night) multi-resolution methods for exploration of petascale databases; indexing of multi-attribute multi-dimensional astronomical databases (beyond spatial indexing) for rapid querying of petabyte databases; and more.

  4. Observability of satellite launcher navigation with INS, GPS, attitude sensors and reference trajectory

    NASA Astrophysics Data System (ADS)

    Beaudoin, Yanick; Desbiens, André; Gagnon, Eric; Landry, René

    2018-01-01

    The navigation system of a satellite launcher is of paramount importance. In order to correct the trajectory of the launcher, the position, velocity and attitude must be known with the best possible precision. In this paper, the observability of four navigation solutions is investigated. The first one is the INS/GPS couple. Then, attitude reference sensors, such as magnetometers, are added to the INS/GPS solution. The authors have already demonstrated that the reference trajectory could be used to improve the navigation performance. This approach is added to the two previously mentioned navigation systems. For each navigation solution, the observability is analyzed with different sensor error models. First, sensor biases are neglected. Then, sensor biases are modelled as random walks and as first order Markov processes. The observability is tested with the rank and condition number of the observability matrix, the time evolution of the covariance matrix and sensitivity to measurement outlier tests. The covariance matrix is exploited to evaluate the correlation between states in order to detect structural unobservability problems. Finally, when an unobservable subspace is detected, the result is verified with theoretical analysis of the navigation equations. The results show that evaluating only the observability of a model does not guarantee the ability of the aiding sensors to correct the INS estimates within the mission time. The analysis of the covariance matrix time evolution could be a powerful tool to detect this situation, however in some cases, the problem is only revealed with a sensitivity to measurement outlier test. None of the tested solutions provide GPS position bias observability. For the considered mission, the modelling of the sensor biases as random walks or Markov processes gives equivalent results. Relying on the reference trajectory can improve the precision of the roll estimates. But, in the context of a satellite launcher, the roll estimation error and gyroscope bias are only observable if attitude reference sensors are present.

  5. DARHT Multi-intelligence Seismic and Acoustic Data Analysis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stevens, Garrison Nicole; Van Buren, Kendra Lu; Hemez, Francois M.

    The purpose of this report is to document the analysis of seismic and acoustic data collected at the Dual-Axis Radiographic Hydrodynamic Test (DARHT) facility at Los Alamos National Laboratory for robust, multi-intelligence decision making. The data utilized herein is obtained from two tri-axial seismic sensors and three acoustic sensors, resulting in a total of nine data channels. The goal of this analysis is to develop a generalized, automated framework to determine internal operations at DARHT using informative features extracted from measurements collected external of the facility. Our framework involves four components: (1) feature extraction, (2) data fusion, (3) classification, andmore » finally (4) robustness analysis. Two approaches are taken for extracting features from the data. The first of these, generic feature extraction, involves extraction of statistical features from the nine data channels. The second approach, event detection, identifies specific events relevant to traffic entering and leaving the facility as well as explosive activities at DARHT and nearby explosive testing sites. Event detection is completed using a two stage method, first utilizing signatures in the frequency domain to identify outliers and second extracting short duration events of interest among these outliers by evaluating residuals of an autoregressive exogenous time series model. Features extracted from each data set are then fused to perform analysis with a multi-intelligence paradigm, where information from multiple data sets are combined to generate more information than available through analysis of each independently. The fused feature set is used to train a statistical classifier and predict the state of operations to inform a decision maker. We demonstrate this classification using both generic statistical features and event detection and provide a comparison of the two methods. Finally, the concept of decision robustness is presented through a preliminary analysis where uncertainty is added to the system through noise in the measurements.« less

  6. Adaptive Genetic Divergence Despite Significant Isolation-by-Distance in Populations of Taiwan Cow-Tail Fir (Keteleeria davidiana var. formosana)

    PubMed Central

    Shih, Kai-Ming; Chang, Chung-Te; Chung, Jeng-Der; Chiang, Yu-Chung; Hwang, Shih-Ying

    2018-01-01

    Double digest restriction site-associated DNA sequencing (ddRADseq) is a tool for delivering genome-wide single nucleotide polymorphism (SNP) markers for non-model organisms useful in resolving fine-scale population structure and detecting signatures of selection. This study performs population genetic analysis, based on ddRADseq data, of a coniferous species, Keteleeria davidiana var. formosana, disjunctly distributed in northern and southern Taiwan, for investigation of population adaptive divergence in response to environmental heterogeneity. A total of 13,914 SNPs were detected and used to assess genetic diversity, FST outlier detection, population genetic structure, and individual assignments of five populations (62 individuals) of K. davidiana var. formosana. Principal component analysis (PCA), individual assignments, and the neighbor-joining tree were successful in differentiating individuals between northern and southern populations of K. davidiana var. formosana, but apparent gene flow between the southern DW30 population and northern populations was also revealed. Fifteen of 23 highly differentiated SNPs identified were found to be strongly associated with environmental variables, suggesting isolation-by-environment (IBE). However, multiple matrix regression with randomization analysis revealed strong IBE as well as significant isolation-by-distance. Environmental impacts on divergence were found between populations of the North and South regions and also between the two southern neighboring populations. BLASTN annotation of the sequences flanking outlier SNPs gave significant hits for three of 23 markers that might have biological relevance to mitochondrial homeostasis involved in the survival of locally adapted lineages. Species delimitation between K. davidiana var. formosana and its ancestor, K. davidiana, was also examined (72 individuals). This study has produced highly informative population genomic data for the understanding of population attributes, such as diversity, connectivity, and adaptive divergence associated with large- and small-scale environmental heterogeneity in K. davidiana var. formosana. PMID:29449860

  7. Ecotypes of an ecologically dominant prairie grass (Andropogon gerardii) exhibit genetic divergence across the U.S. Midwest grasslands' environmental gradient.

    PubMed

    Gray, Miranda M; St Amand, Paul; Bello, Nora M; Galliart, Matthew B; Knapp, Mary; Garrett, Karen A; Morgan, Theodore J; Baer, Sara G; Maricle, Brian R; Akhunov, Eduard D; Johnson, Loretta C

    2014-12-01

    Big bluestem (Andropogon gerardii) is an ecologically dominant grass with wide distribution across the environmental gradient of U.S. Midwest grasslands. This system offers an ideal natural laboratory to study population divergence and adaptation in spatially varying climates. Objectives were to: (i) characterize neutral genetic diversity and structure within and among three regional ecotypes derived from 11 prairies across the U.S. Midwest environmental gradient, (ii) distinguish between the relative roles of isolation by distance (IBD) vs. isolation by environment (IBE) on ecotype divergence, (iii) identify outlier loci under selection and (iv) assess the association between outlier loci and climate. Using two primer sets, we genotyped 378 plants at 384 polymorphic AFLP loci across regional ecotypes from central and eastern Kansas and Illinois. Neighbour-joining tree and PCoA revealed strong genetic differentiation between Kansas and Illinois ecotypes, which was better explained by IBE than IBD. We found high genetic variability within prairies (80%) and even fragmented Illinois prairies, surprisingly, contained high within-prairie genetic diversity (92%). Using Bayenv2, 14 top-ranked outlier loci among ecotypes were associated with temperature and precipitation variables. Six of seven BayeScanFST outliers were in common with Bayenv2 outliers. High genetic diversity may enable big bluestem populations to better withstand changing climates; however, population divergence supports the use of local ecotypes in grassland restoration. Knowledge of genetic variation in this ecological dominant and other grassland species will be critical to understanding grassland response and restoration challenges in the face of a changing climate. © 2014 John Wiley & Sons Ltd.

  8. The relationship between processes and outcomes for injured older adults: a study of a statewide trauma system.

    PubMed

    Saillant, N N; Earl-Royal, E; Pascual, J L; Allen, S R; Kim, P K; Delgado, M K; Carr, B G; Wiebe, D; Holena, D N

    2017-02-01

    Age is a risk factor for death, adverse outcomes, and health care use following trauma. The American College of Surgeons' Trauma Quality Improvement Program (TQIP) has published "best practices" of geriatric trauma care; adoption of these guidelines is unknown. We sought to determine which evidence-based geriatric protocols, including TQIP guidelines, were correlated with decreased mortality in Pennsylvania's trauma centers. PA's level I and II trauma centers self-reported adoption of geriatric protocols. Survey data were merged with risk-adjusted mortality data for patients ≥65 from a statewide database, the Pennsylvania Trauma Systems Foundation (PTSF), to compare mortality outlier status and processes of care. Exposures of interest were center-specific processes of care; outcome of interest was PTSF mortality outlier status. 26 of 27 eligible trauma centers participated. There was wide variation in care processes. Four trauma centers were low outliers; three centers were high outliers for risk-adjusted mortality rates in adults ≥65. Results remained consistent when accounting for center volume. The only process associated with mortality outlier status was age-specific solid organ injury protocols (p = 0.04). There was no cumulative effect of multiple evidence-based processes on mortality rate (p = 0.50). We did not see a link between adoption of geriatric best-practices trauma guidelines and reduced mortality at PA trauma centers. The increased susceptibility of elderly to adverse consequences of injury, combined with the rapid growth rate of this demographic, emphasizes the importance of identifying interventions tailored to this population. III. Descriptive.

  9. Exploring Innovative Approaches and Patient-Centered Outcomes from Positive Outliers in Childhood Obesity

    PubMed Central

    Sharifi, Mona; Marshall, Gareth; Goldman, Roberta; Rifas-Shiman, Sheryl L; Horan, Christine M; Koziol, Renata; Marshall, Richard; Sequist, Thomas D; Taveras, Elsie M

    2015-01-01

    Objective New approaches for obesity prevention and management can be gleaned from 'positive outliers', i.e., individuals who have succeeded in changing health behaviors and reducing their body mass index (BMI) in the context of adverse built and social environments. We explored perspectives and strategies of parents of positive outlier children living in high risk neighborhoods. Methods We collected up to five years of height/weight data from the electronic health records of 22,443 Massachusetts children, ages 6-12 years, seen for well-child care. We identified children with any history of BMI ≥95th percentile (n=4007) and generated a BMI z-score slope for each child using a linear mixed effects model. We recruited parents for focus groups from the sub-sample of children with negative slopes who also lived in zip codes where >15% of children were obese. We analyzed focus group transcripts using an immersion/crystallization approach. Results We reached thematic saturation after 5 focus groups with 41 parents. Commonly cited outcomes that mattered most to parents and motivated change were child inactivity, above-average clothing sizes, exercise intolerance, and negative peer interactions; few reported BMI as a motivator. Convergent strategies among positive outlier families were family-level changes, parent modeling, consistency, household rules/limits, and creativity in overcoming resistance. Parents voiced preferences for obesity interventions that include tailored education and support that extend outside clinical settings and are delivered by both health care professionals and successful peers. Conclusions Successful strategies learned from positive outlier families can be generalized and tested to accelerate progress in reducing childhood obesity. PMID:25439163

  10. M-estimation for robust sparse unmixing of hyperspectral images

    NASA Astrophysics Data System (ADS)

    Toomik, Maria; Lu, Shijian; Nelson, James D. B.

    2016-10-01

    Hyperspectral unmixing methods often use a conventional least squares based lasso which assumes that the data follows the Gaussian distribution. The normality assumption is an approximation which is generally invalid for real imagery data. We consider a robust (non-Gaussian) approach to sparse spectral unmixing of remotely sensed imagery which reduces the sensitivity of the estimator to outliers and relaxes the linearity assumption. The method consists of several appropriate penalties. We propose to use an lp norm with 0 < p < 1 in the sparse regression problem, which induces more sparsity in the results, but makes the problem non-convex. On the other hand, the problem, though non-convex, can be solved quite straightforwardly with an extensible algorithm based on iteratively reweighted least squares. To deal with the huge size of modern spectral libraries we introduce a library reduction step, similar to the multiple signal classification (MUSIC) array processing algorithm, which not only speeds up unmixing but also yields superior results. In the hyperspectral setting we extend the traditional least squares method to the robust heavy-tailed case and propose a generalised M-lasso solution. M-estimation replaces the Gaussian likelihood with a fixed function ρ(e) that restrains outliers. The M-estimate function reduces the effect of errors with large amplitudes or even assigns the outliers zero weights. Our experimental results on real hyperspectral data show that noise with large amplitudes (outliers) often exists in the data. This ability to mitigate the influence of such outliers can therefore offer greater robustness. Qualitative hyperspectral unmixing results on real hyperspectral image data corroborate the efficacy of the proposed method.

  11. Contrasts within an outlier-reef system: Evidence for differential quaternary evolution, south Florida windward margin, U.S.A.

    USGS Publications Warehouse

    Lidz, B.H.; Shinn, E.A.; Hine, A.C.; Locker, S.D.

    1997-01-01

    Closely spaced, high-resolution, seismic-reflection profiles acquired off the upper Florida Keys (i.e., north) reveal a platform-margin reef-and-trough system grossly similar to, yet quite different from, that previously described off the lower Keys (i.e., south). Profiles and maps generated for both areas show that development was controlled by antecedent Pleistocene topography (presence or absence of an upper-slope bedrock terrace), sediment availability, fluctuating sea level, and coral growth rate and distribution. The north terrace is sediment-covered and exhibits linear, buried, low-relief, seismic features of unknown character and origin. The south terrace is essentially sediment-free and supports multiple, massive, high-relief outlier reefs. Uranium disequilibrium series dates on outlier-reef corals indicate a Pleistocene age (~83-84 ka). A massive Pleistocene reef with both aggradational (north) and progradational (south) aspects forms the modern margin escarpment landward of the terrace. Depending upon interpretation (the north margin-escarpment reef may or may not be an outlier reef), the north margin is either more advanced or less advanced than the south margin. During Holocene sea-level rise, Pleistocene bedrock was inundated earlier and faster first to the north (deeper offbank terrace), then to the south (deeper platform surface). Holocene overgrowth is thick (8 m) on the north outer-bank reefs but thin (0.3 m) on the south outlier reefs. Differential evolution resulted from interplay between fluctuating sea level and energy regime established by prevailing east-southeasterly winds and waves along an arcuate (ENE-WSW) platform margin.

  12. Abundant Topological Outliers in Social Media Data and Their Effect on Spatial Analysis.

    PubMed

    Westerholt, Rene; Steiger, Enrico; Resch, Bernd; Zipf, Alexander

    2016-01-01

    Twitter and related social media feeds have become valuable data sources to many fields of research. Numerous researchers have thereby used social media posts for spatial analysis, since many of them contain explicit geographic locations. However, despite its widespread use within applied research, a thorough understanding of the underlying spatial characteristics of these data is still lacking. In this paper, we investigate how topological outliers influence the outcomes of spatial analyses of social media data. These outliers appear when different users contribute heterogeneous information about different phenomena simultaneously from similar locations. As a consequence, various messages representing different spatial phenomena are captured closely to each other, and are at risk to be falsely related in a spatial analysis. Our results reveal indications for corresponding spurious effects when analyzing Twitter data. Further, we show how the outliers distort the range of outcomes of spatial analysis methods. This has significant influence on the power of spatial inferential techniques, and, more generally, on the validity and interpretability of spatial analysis results. We further investigate how the issues caused by topological outliers are composed in detail. We unveil that multiple disturbing effects are acting simultaneously and that these are related to the geographic scales of the involved overlapping patterns. Our results show that at some scale configurations, the disturbances added through overlap are more severe than at others. Further, their behavior turns into a volatile and almost chaotic fluctuation when the scales of the involved patterns become too different. Overall, our results highlight the critical importance of thoroughly considering the specific characteristics of social media data when analyzing them spatially.

  13. No genetic adaptation of the Mediterranean keystone shrub Cistus ladanifer in response to experimental fire and extreme drought.

    PubMed

    Torres, Iván; Parra, Antonio; Moreno, José M; Durka, Walter

    2018-01-01

    In Mediterranean ecosystems, climate change is projected to increase fire danger and summer drought, thus reducing post-fire recruitment of obligate seeder species, and possibly affecting the population genetic structure. We performed a genome-wide genetic marker study, using AFLP markers, on individuals from one Central Spain population of the obligate post-fire seeder Cistus ladanifer L. that established after experimental fire and survived during four subsequent years under simulated drought implemented with a rainout shelter system. We explored the effects of the treatments on marker diversity, spatial genetic structure and presence of outlier loci suggestive of selection. We found no effect of fire or drought on any of the genetic diversity metrics. Analysis of Molecular Variance showed very low genetic differentiation among treatments. Neither fire nor drought altered the small-scale spatial genetic structure of the population. Only one locus was significantly associated with the fire treatment, but inconsistently across outlier detection methods. Neither fire nor drought are likely to affect the genetic makeup of emerging C. ladanifer, despite reduced recruitment caused by drought. The lack of genetic change suggests that reduced recruitment is a random, non-selective process with no genome-wide consequences on this keystone, drought- and fire tolerant Mediterranean species.

  14. Quantifying white matter structural integrity with high-definition fiber tracking in traumatic brain injury.

    PubMed

    Presson, Nora; Krishnaswamy, Deepa; Wagener, Lauren; Bird, William; Jarbo, Kevin; Pathak, Sudhir; Puccio, Ava M; Borasso, Allison; Benso, Steven; Okonkwo, David O; Schneider, Walter

    2015-03-01

    There is an urgent, unmet demand for definitive biological diagnosis of traumatic brain injury (TBI) to pinpoint the location and extent of damage. We have developed High-Definition Fiber Tracking, a 3 T magnetic resonance imaging-based diffusion spectrum imaging and tractography analysis protocol, to quantify axonal injury in military and civilian TBI patients. A novel analytical methodology quantified white matter integrity in patients with TBI and healthy controls. Forty-one subjects (23 TBI, 18 controls) were scanned with the High-Definition Fiber Tracking diffusion spectrum imaging protocol. After reconstruction, segmentation was used to isolate bilateral hemisphere homologues of eight major tracts. Integrity of segmented tracts was estimated by calculating homologue correlation and tract coverage. Both groups showed high correlations for all tracts. TBI patients showed reduced homologue correlation and tract spread and increased outlier count (correlations>2.32 SD below control mean). On average, 6.5% of tracts in the TBI group were outliers with substantial variability among patients. Number and summed deviation of outlying tracts correlated with initial Glasgow Coma Scale score and 6-month Glasgow Outcome Scale-Extended score. The correlation metric used here can detect heterogeneous damage affecting a low proportion of tracts, presenting a potential mechanism for advancing TBI diagnosis. Reprint & Copyright © 2015 Association of Military Surgeons of the U.S.

  15. Identification of suitable fundus images using automated quality assessment methods.

    PubMed

    Şevik, Uğur; Köse, Cemal; Berber, Tolga; Erdöl, Hidayet

    2014-04-01

    Retinal image quality assessment (IQA) is a crucial process for automated retinal image analysis systems to obtain an accurate and successful diagnosis of retinal diseases. Consequently, the first step in a good retinal image analysis system is measuring the quality of the input image. We present an approach for finding medically suitable retinal images for retinal diagnosis. We used a three-class grading system that consists of good, bad, and outlier classes. We created a retinal image quality dataset with a total of 216 consecutive images called the Diabetic Retinopathy Image Database. We identified the suitable images within the good images for automatic retinal image analysis systems using a novel method. Subsequently, we evaluated our retinal image suitability approach using the Digital Retinal Images for Vessel Extraction and Standard Diabetic Retinopathy Database Calibration level 1 public datasets. The results were measured through the F1 metric, which is a harmonic mean of precision and recall metrics. The highest F1 scores of the IQA tests were 99.60%, 96.50%, and 85.00% for good, bad, and outlier classes, respectively. Additionally, the accuracy of our suitable image detection approach was 98.08%. Our approach can be integrated into any automatic retinal analysis system with sufficient performance scores.

  16. Population genomic analysis uncovers environmental stress-driven selection and adaptation of Lentinula edodes population in China.

    PubMed

    Xiao, Yang; Cheng, Xuanjin; Liu, Jun; Li, Chuang; Nong, Wenyan; Bian, Yinbing; Cheung, Man Kit; Kwan, Hoi Shan

    2016-11-10

    The elucidation of genome-wide variations could help reveal aspects of divergence, domestication, and adaptation of edible mushrooms. Here, we resequenced the whole genomes of 39 wild and 21 cultivated strains of Chinese Lentinula edodes, the shiitake mushroom. We identified three distinct genetic groups in the Chinese L. edodes population with robust differentiation. Results of phylogenetic and population structure analyses suggest that the cultivated strains and most of the wild trains of L. edodes in China possess different gene pools and two outlier strains show signatures of hybridization between groups. Eighty-four candidate genes contributing to population divergence were detected in outlier analysis, 18 of which are involved in response to environmental stresses. Gene enrichment analysis of group-specific single nucleotide polymorphisms showed that the cultivated strains were genetically diversified in biological processes related to stress response. As the formation of fruiting bodies is a stress-response process, we postulate that environment factors, such as temperature, drove the population divergence of L. edodes in China by natural or artificial selection. We also found phenotypic variations between groups and identified some wild strains that have potential to diversify the genetic pool for improving agricultural traits of L. edodes cultivars in China.

  17. Population genomic analysis uncovers environmental stress-driven selection and adaptation of Lentinula edodes population in China

    PubMed Central

    Xiao, Yang; Cheng, Xuanjin; Liu, Jun; Li, Chuang; Nong, Wenyan; Bian, Yinbing; Cheung, Man Kit; Kwan, Hoi Shan

    2016-01-01

    The elucidation of genome-wide variations could help reveal aspects of divergence, domestication, and adaptation of edible mushrooms. Here, we resequenced the whole genomes of 39 wild and 21 cultivated strains of Chinese Lentinula edodes, the shiitake mushroom. We identified three distinct genetic groups in the Chinese L. edodes population with robust differentiation. Results of phylogenetic and population structure analyses suggest that the cultivated strains and most of the wild trains of L. edodes in China possess different gene pools and two outlier strains show signatures of hybridization between groups. Eighty-four candidate genes contributing to population divergence were detected in outlier analysis, 18 of which are involved in response to environmental stresses. Gene enrichment analysis of group-specific single nucleotide polymorphisms showed that the cultivated strains were genetically diversified in biological processes related to stress response. As the formation of fruiting bodies is a stress-response process, we postulate that environment factors, such as temperature, drove the population divergence of L. edodes in China by natural or artificial selection. We also found phenotypic variations between groups and identified some wild strains that have potential to diversify the genetic pool for improving agricultural traits of L. edodes cultivars in China. PMID:27830835

  18. Identifying Hot-Spots of Metal Contamination in Campus Dust of Xi’an, China

    PubMed Central

    Chen, Hao; Lu, Xinwei; Gao, Tianning; Chang, Yuyu

    2016-01-01

    The concentrations of heavy metals (As, Ba, Co, Cr, Cu, Mn, Ni, Pb, V, and Zn) in campus dust from kindergartens, elementary schools, middle schools, and universities in the city of Xi’an, China, were determined by X-ray fluorescence spectrometry. The pollution levels and hotspots of metals were analyzed using a geoaccumulation index and Local Moran’s I, an indicator of spatial association, respectively. The dust samples from the campuses had metal concentrations higher than background levels, especially for Pb, Zn, Co, Cu, Cr, and Ba. The pollution assessment indicated that the campus dusts were not contaminated with As, Mn, Ni, or V, were moderately or not contaminated with Ba and Cr and were moderately to strongly contaminated with Co, Cu, Pb, and Zn. Local Moran’s I analysis detected the locations of spatial clusters and outliers and indicated that the pollution with these 10 metals occurred in significant high-high spatial clusters, low-high, or even high-low spatial outliers. As, Cu, Mn, Ni, Pb, V, and Zn had important high-high patterns in the center of Xi’an. The western and southwestern regions of the study area, i.e., areas of old and high-tech industries, have strongly contributed to the Co content in the campus dust. PMID:27271645

  19. Bayesian hierarchical modeling for subject-level response classification in peptide microarray immunoassays

    PubMed Central

    Imholte, Gregory; Gottardo, Raphael

    2017-01-01

    Summary The peptide microarray immunoassay simultaneously screens sample serum against thousands of peptides, determining the presence of antibodies bound to array probes. Peptide microarrays tiling immunogenic regions of pathogens (e.g. envelope proteins of a virus) are an important high throughput tool for querying and mapping antibody binding. Because of the assay’s many steps, from probe synthesis to incubation, peptide microarray data can be noisy with extreme outliers. In addition, subjects may produce different antibody profiles in response to an identical vaccine stimulus or infection, due to variability among subjects’ immune systems. We present a robust Bayesian hierarchical model for peptide microarray experiments, pepBayes, to estimate the probability of antibody response for each subject/peptide combination. Heavy-tailed error distributions accommodate outliers and extreme responses, and tailored random effect terms automatically incorporate technical effects prevalent in the assay. We apply our model to two vaccine trial datasets to demonstrate model performance. Our approach enjoys high sensitivity and specificity when detecting vaccine induced antibody responses. A simulation study shows an adaptive thresholding classification method has appropriate false discovery rate control with high sensitivity, and receiver operating characteristics generated on vaccine trial data suggest that pepBayes clearly separates responses from non-responses. PMID:27061097

  20. ROBUST: an interactive FORTRAN-77 package for exploratory data analysis using parametric, ROBUST and nonparametric location and scale estimates, data transformations, normality tests, and outlier assessment

    NASA Astrophysics Data System (ADS)

    Rock, N. M. S.

    ROBUST calculates 53 statistics, plus significance levels for 6 hypothesis tests, on each of up to 52 variables. These together allow the following properties of the data distribution for each variable to be examined in detail: (1) Location. Three means (arithmetic, geometric, harmonic) are calculated, together with the midrange and 19 high-performance robust L-, M-, and W-estimates of location (combined, adaptive, trimmed estimates, etc.) (2) Scale. The standard deviation is calculated along with the H-spread/2 (≈ semi-interquartile range), the mean and median absolute deviations from both mean and median, and a biweight scale estimator. The 23 location and 6 scale estimators programmed cover all possible degrees of robustness. (3) Normality: Distributions are tested against the null hypothesis that they are normal, using the 3rd (√ h1) and 4th ( b 2) moments, Geary's ratio (mean deviation/standard deviation), Filliben's probability plot correlation coefficient, and a more robust test based on the biweight scale estimator. These statistics collectively are sensitive to most usual departures from normality. (4) Presence of outliers. The maximum and minimum values are assessed individually or jointly using Grubbs' maximum Studentized residuals, Harvey's and Dixon's criteria, and the Studentized range. For a single input variable, outliers can be either winsorized or eliminated and all estimates recalculated iteratively as desired. The following data-transformations also can be applied: linear, log 10, generalized Box Cox power (including log, reciprocal, and square root), exponentiation, and standardization. For more than one variable, all results are tabulated in a single run of ROBUST. Further options are incorporated to assess ratios (of two variables) as well as discrete variables, and be concerned with missing data. Cumulative S-plots (for assessing normality graphically) also can be generated. The mutual consistency or inconsistency of all these measures helps to detect errors in data as well as to assess data-distributions themselves.

Top