Sample records for detecting statistically significant

  1. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    DTIC Science & Technology

    2016-04-26

    REPORT TYPE Final 3. DATES COVERED (From - To) 15 Oct 2014 to 14 Jan 2015 4. TITLE AND SUBTITLE Detecting statistically significant clusters of...extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex...priori, the need for triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was

  2. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  3. Increasing the statistical significance of entanglement detection in experiments.

    PubMed

    Jungnitsch, Bastian; Niekamp, Sönke; Kleinmann, Matthias; Gühne, Otfried; Lu, He; Gao, Wei-Bo; Chen, Yu-Ao; Chen, Zeng-Bing; Pan, Jian-Wei

    2010-05-28

    Entanglement is often verified by a violation of an inequality like a Bell inequality or an entanglement witness. Considerable effort has been devoted to the optimization of such inequalities in order to obtain a high violation. We demonstrate theoretically and experimentally that such an optimization does not necessarily lead to a better entanglement test, if the statistical error is taken into account. Theoretically, we show for different error models that reducing the violation of an inequality can improve the significance. Experimentally, we observe this phenomenon in a four-photon experiment, testing the Mermin and Ardehali inequality for different levels of noise. Furthermore, we provide a way to develop entanglement tests with high statistical significance.

  4. Visualizing statistical significance of disease clusters using cartograms.

    PubMed

    Kronenfeld, Barry J; Wong, David W S

    2017-05-15

    Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate

  5. Scalable detection of statistically significant communities and hierarchies, using message passing for modularity

    PubMed Central

    Zhang, Pan; Moore, Cristopher

    2014-01-01

    Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ‘‘communities’’ in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods. PMID:25489096

  6. Finding Statistically Significant Communities in Networks

    PubMed Central

    Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo

    2011-01-01

    Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480

  7. Common pitfalls in statistical analysis: Clinical versus statistical significance

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754

  8. Statistically significant relational data mining :

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less

  9. Statistical significance of combinatorial regulations

    PubMed Central

    Terada, Aika; Okada-Hatakeyama, Mariko; Tsuda, Koji; Sese, Jun

    2013-01-01

    More than three transcription factors often work together to enable cells to respond to various signals. The detection of combinatorial regulation by multiple transcription factors, however, is not only computationally nontrivial but also extremely unlikely because of multiple testing correction. The exponential growth in the number of tests forces us to set a strict limit on the maximum arity. Here, we propose an efficient branch-and-bound algorithm called the “limitless arity multiple-testing procedure” (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists significant combinations without any limit, whereas the family-wise error rate is rigorously controlled under the threshold. In the human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data. PMID:23882073

  10. Statistical Significance Testing from Three Perspectives and Interpreting Statistical Significance and Nonsignificance and the Role of Statistics in Research.

    ERIC Educational Resources Information Center

    Levin, Joel R.; And Others

    1993-01-01

    Journal editors respond to criticisms of reliance on statistical significance in research reporting. Joel R. Levin ("Journal of Educational Psychology") defends its use, whereas William D. Schafer ("Measurement and Evaluation in Counseling and Development") emphasizes the distinction between statistically significant and important. William Asher…

  11. Social significance of community structure: Statistical view

    NASA Astrophysics Data System (ADS)

    Li, Hui-Jia; Daniels, Jasmine J.

    2015-01-01

    Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.

  12. Scan statistics with local vote for target detection in distributed system

    NASA Astrophysics Data System (ADS)

    Luo, Junhai; Wu, Qi

    2017-12-01

    Target detection has occupied a pivotal position in distributed system. Scan statistics, as one of the most efficient detection methods, has been applied to a variety of anomaly detection problems and significantly improves the probability of detection. However, scan statistics cannot achieve the expected performance when the noise intensity is strong, or the signal emitted by the target is weak. The local vote algorithm can also achieve higher target detection rate. After the local vote, the counting rule is always adopted for decision fusion. The counting rule does not use the information about the contiguity of sensors but takes all sensors' data into consideration, which makes the result undesirable. In this paper, we propose a scan statistics with local vote (SSLV) method. This method combines scan statistics with local vote decision. Before scan statistics, each sensor executes local vote decision according to the data of its neighbors and its own. By combining the advantages of both, our method can obtain higher detection rate in low signal-to-noise ratio environment than the scan statistics. After the local vote decision, the distribution of sensors which have detected the target becomes more intensive. To make full use of local vote decision, we introduce a variable-step-parameter for the SSLV. It significantly shortens the scan period especially when the target is absent. Analysis and simulations are presented to demonstrate the performance of our method.

  13. Statistical Significance for Hierarchical Clustering

    PubMed Central

    Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

    2017-01-01

    Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990

  14. Statistical significance of the rich-club phenomenon in complex networks

    NASA Astrophysics Data System (ADS)

    Jiang, Zhi-Qiang; Zhou, Wei-Xing

    2008-04-01

    We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.

  15. Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

    NASA Astrophysics Data System (ADS)

    Williams, Arnold C.; Pachowicz, Peter W.

    2004-09-01

    Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.

  16. Human papillomavirus detection with genotyping by the cobas and Aptima assays: Significant differences in HPV 16 detection?

    PubMed

    Chorny, Joseph A; Frye, Teresa C; Fisher, Beth L; Remmers, Carol L

    2018-03-23

    The primary high-risk human papillomavirus (hrHPV) assays in the United States are the cobas (Roche) and the Aptima (Hologic). The cobas assay detects hrHPV by DNA analysis while the Aptima detects messenger RNA (mRNA) oncogenic transcripts. As the Aptima assay identifies oncogenic expression, it should have a lower rate of hrHPV and genotype detection. The Kaiser Permanente Regional Reference Laboratory in Denver, Colorado changed its hrHPV assay from the cobas to the Aptima assay. The rates of hrHPV detection and genotyping were compared over successive six-month periods. The overall hrHPV detection rates by the two platforms were similar (9.5% versus 9.1%) and not statistically different. For genotyping, the HPV 16 rate by the cobas was 1.6% and by the Aptima it was 1.1%. These differences were statistically different with the Aptima detecting nearly one-third less HPV 16 infections. With the HPV 18 and HPV 18/45, there was a slightly higher detection rate of HPV 18/45 by the Aptima platform (0.5% versus 0.9%) and this was statistically significant. While HPV 16 represents a low percentage of hrHPV infections, it was detected significantly less by the Aptima assay compared to the cobas assay. This has been previously reported, although not highlighted. Given the test methodologies, one would expect the Aptima to detect less HPV 16. This difference appears to be mainly due to a significantly increased number of non-oncogenic HPV 16 infections detected by the cobas test as there were no differences in HPV 16 detection rates in the high-grade squamous intraepithelial lesions indicating that the two tests have similar sensitivities for oncogenic HPV 16. © 2018 Wiley Periodicals, Inc.

  17. Significance levels for studies with correlated test statistics.

    PubMed

    Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

    2008-07-01

    When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.

  18. Infants with Williams syndrome detect statistical regularities in continuous speech.

    PubMed

    Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B

    2016-09-01

    Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Statistical significance of task related deep brain EEG dynamic changes in the time-frequency domain.

    PubMed

    Chládek, J; Brázdil, M; Halámek, J; Plešinger, F; Jurák, P

    2013-01-01

    We present an off-line analysis procedure for exploring brain activity recorded from intra-cerebral electroencephalographic data (SEEG). The objective is to determine the statistical differences between different types of stimulations in the time-frequency domain. The procedure is based on computing relative signal power change and subsequent statistical analysis. An example of characteristic statistically significant event-related de/synchronization (ERD/ERS) detected across different frequency bands following different oddball stimuli is presented. The method is used for off-line functional classification of different brain areas.

  20. Statistical Significance vs. Practical Significance: An Exploration through Health Education

    ERIC Educational Resources Information Center

    Rosen, Brittany L.; DeMaria, Andrea L.

    2012-01-01

    The purpose of this paper is to examine the differences between statistical and practical significance, including strengths and criticisms of both methods, as well as provide information surrounding the application of various effect sizes and confidence intervals within health education research. Provided are recommendations, explanations and…

  1. Health significance and statistical uncertainty. The value of P-value.

    PubMed

    Consonni, Dario; Bertazzi, Pier Alberto

    2017-10-27

    The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P<0.05" (defined as "statistically significant") and "P>0.05" ("statistically not significant"), with the former implying a "positive" result and the latter a "negative" one. To show the unsuitability of such an approach when evaluating the effects of environmental and occupational risk factors. We provide examples of distorted use of P-value and of the negative consequences for science and public health of such a black-and-white vision. The rigid interpretation of P-value as a dichotomy favors the confusion between health relevance and statistical significance, discourages thoughtful thinking, and distorts attention from what really matters, the health significance. A much better way to express and communicate scientific results involves reporting effect estimates (e.g., risks, risks ratios or risk differences) and their confidence intervals (CI), which summarize and convey both health significance and statistical uncertainty. Unfortunately, many researchers do not usually consider the whole interval of CI but only examine if it includes the null-value, therefore degrading this procedure to the same P-value dichotomy (statistical significance or not). In reporting statistical results of scientific research present effects estimates with their confidence intervals and do not qualify the P-value as "significant" or "not significant".

  2. Statistical detection of patterns in unidimensional distributions by continuous wavelet transforms

    NASA Astrophysics Data System (ADS)

    Baluev, R. V.

    2018-04-01

    Objective detection of specific patterns in statistical distributions, like groupings or gaps or abrupt transitions between different subsets, is a task with a rich range of applications in astronomy: Milky Way stellar population analysis, investigations of the exoplanets diversity, Solar System minor bodies statistics, extragalactic studies, etc. We adapt the powerful technique of the wavelet transforms to this generalized task, making a strong emphasis on the assessment of the patterns detection significance. Among other things, our method also involves optimal minimum-noise wavelets and minimum-noise reconstruction of the distribution density function. Based on this development, we construct a self-closed algorithmic pipeline aimed to process statistical samples. It is currently applicable to single-dimensional distributions only, but it is flexible enough to undergo further generalizations and development.

  3. The questioned p value: clinical, practical and statistical significance.

    PubMed

    Jiménez-Paneque, Rosa

    2016-09-09

    The use of p-value and statistical significance have been questioned since the early 80s in the last century until today. Much has been discussed about it in the field of statistics and its applications, especially in Epidemiology and Public Health. As a matter of fact, the p-value and its equivalent, statistical significance, are difficult concepts to grasp for the many health professionals some way involved in research applied to their work areas. However, its meaning should be clear in intuitive terms although it is based on theoretical concepts of the field of Statistics. This paper attempts to present the p-value as a concept that applies to everyday life and therefore intuitively simple but whose proper use cannot be separated from theoretical and methodological elements of inherent complexity. The reasons behind the criticism received by the p-value and its isolated use are intuitively explained, mainly the need to demarcate statistical significance from clinical significance and some of the recommended remedies for these problems are approached as well. It finally refers to the current trend to vindicate the p-value appealing to the convenience of its use in certain situations and the recent statement of the American Statistical Association in this regard.

  4. Spatial scan statistics for detection of multiple clusters with arbitrary shapes.

    PubMed

    Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray

    2016-12-01

    In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.

  5. Using the bootstrap to establish statistical significance for relative validity comparisons among patient-reported outcome measures

    PubMed Central

    2013-01-01

    Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463

  6. Wildfire cluster detection using space-time scan statistics

    NASA Astrophysics Data System (ADS)

    Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.

    2009-04-01

    The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely

  7. Tests of Statistical Significance Made Sound

    ERIC Educational Resources Information Center

    Haig, Brian D.

    2017-01-01

    This article considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology's understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis…

  8. Simulated performance of an order statistic threshold strategy for detection of narrowband signals

    NASA Technical Reports Server (NTRS)

    Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

    1988-01-01

    The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.

  9. Précis of statistical significance: rationale, validity, and utility.

    PubMed

    Chow, S L

    1998-04-01

    The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of

  10. The Use of Meta-Analytic Statistical Significance Testing

    ERIC Educational Resources Information Center

    Polanin, Joshua R.; Pigott, Terri D.

    2015-01-01

    Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…

  11. An experimental validation of a statistical-based damage detection approach.

    DOT National Transportation Integrated Search

    2011-01-01

    In this work, a previously-developed, statistical-based, damage-detection approach was validated for its ability to : autonomously detect damage in bridges. The damage-detection approach uses statistical differences in the actual and : predicted beha...

  12. Statistical Traffic Anomaly Detection in Time-Varying Communication Networks

    DTIC Science & Technology

    2015-02-01

    methods perform better than their vanilla counterparts, which assume that normal traffic is stationary. Statistical Traffic Anomaly Detection in Time...our methods perform better than their vanilla counterparts, which assume that normal traffic is stationary. Index Terms—Statistical anomaly detection...anomaly detection but also for understanding the normal traffic in time-varying networks. C. Comparison with vanilla stochastic methods For both types

  13. Statistical Traffic Anomaly Detection in Time Varying Communication Networks

    DTIC Science & Technology

    2015-02-01

    methods perform better than their vanilla counterparts, which assume that normal traffic is stationary. Statistical Traffic Anomaly Detection in Time...our methods perform better than their vanilla counterparts, which assume that normal traffic is stationary. Index Terms—Statistical anomaly detection...anomaly detection but also for understanding the normal traffic in time-varying networks. C. Comparison with vanilla stochastic methods For both types

  14. Common pitfalls in statistical analysis: “P” values, statistical significance and confidence intervals

    PubMed Central

    Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

    2015-01-01

    In the second part of a series on pitfalls in statistical analysis, we look at various ways in which a statistically significant study result can be expressed. We debunk some of the myths regarding the ‘P’ value, explain the importance of ‘confidence intervals’ and clarify the importance of including both values in a paper PMID:25878958

  15. Statistical methods for convergence detection of multi-objective evolutionary algorithms.

    PubMed

    Trautmann, H; Wagner, T; Naujoks, B; Preuss, M; Mehnen, J

    2009-01-01

    In this paper, two approaches for estimating the generation in which a multi-objective evolutionary algorithm (MOEA) shows statistically significant signs of convergence are introduced. A set-based perspective is taken where convergence is measured by performance indicators. The proposed techniques fulfill the requirements of proper statistical assessment on the one hand and efficient optimisation for real-world problems on the other hand. The first approach accounts for the stochastic nature of the MOEA by repeating the optimisation runs for increasing generation numbers and analysing the performance indicators using statistical tools. This technique results in a very robust offline procedure. Moreover, an online convergence detection method is introduced as well. This method automatically stops the MOEA when either the variance of the performance indicators falls below a specified threshold or a stagnation of their overall trend is detected. Both methods are analysed and compared for two MOEA and on different classes of benchmark functions. It is shown that the methods successfully operate on all stated problems needing less function evaluations while preserving good approximation quality at the same time.

  16. A space-time scan statistic for detecting emerging outbreaks.

    PubMed

    Tango, Toshiro; Takahashi, Kunihiko; Kohriyama, Kazuaki

    2011-03-01

    As a major analytical method for outbreak detection, Kulldorff's space-time scan statistic (2001, Journal of the Royal Statistical Society, Series A 164, 61-72) has been implemented in many syndromic surveillance systems. Since, however, it is based on circular windows in space, it has difficulty correctly detecting actual noncircular clusters. Takahashi et al. (2008, International Journal of Health Geographics 7, 14) proposed a flexible space-time scan statistic with the capability of detecting noncircular areas. It seems to us, however, that the detection of the most likely cluster defined in these space-time scan statistics is not the same as the detection of localized emerging disease outbreaks because the former compares the observed number of cases with the conditional expected number of cases. In this article, we propose a new space-time scan statistic which compares the observed number of cases with the unconditional expected number of cases, takes a time-to-time variation of Poisson mean into account, and implements an outbreak model to capture localized emerging disease outbreaks more timely and correctly. The proposed models are illustrated with data from weekly surveillance of the number of absentees in primary schools in Kitakyushu-shi, Japan, 2006. © 2010, The International Biometric Society.

  17. A statistical method (cross-validation) for bone loss region detection after spaceflight

    PubMed Central

    Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

    2010-01-01

    Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144

  18. A randomized trial in a massive online open course shows people don't know what a statistically significant relationship looks like, but they can learn.

    PubMed

    Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff

    2014-01-01

    Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/.

  19. A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn

    PubMed Central

    Fisher, Aaron; Anderson, G. Brooke; Peng, Roger

    2014-01-01

    Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%–49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%–76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457

  20. Power, effects, confidence, and significance: an investigation of statistical practices in nursing research.

    PubMed

    Gaskin, Cadeyrn J; Happell, Brenda

    2014-05-01

    To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial

  1. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  2. Statistical methods for change-point detection in surface temperature records

    NASA Astrophysics Data System (ADS)

    Pintar, A. L.; Possolo, A.; Zhang, N. F.

    2013-09-01

    We describe several statistical methods to detect possible change-points in a time series of values of surface temperature measured at a meteorological station, and to assess the statistical significance of such changes, taking into account the natural variability of the measured values, and the autocorrelations between them. These methods serve to determine whether the record may suffer from biases unrelated to the climate signal, hence whether there may be a need for adjustments as considered by M. J. Menne and C. N. Williams (2009) "Homogenization of Temperature Series via Pairwise Comparisons", Journal of Climate 22 (7), 1700-1717. We also review methods to characterize patterns of seasonality (seasonal decomposition using monthly medians or robust local regression), and explain the role they play in the imputation of missing values, and in enabling robust decompositions of the measured values into a seasonal component, a possible climate signal, and a station-specific remainder. The methods for change-point detection that we describe include statistical process control, wavelet multi-resolution analysis, adaptive weights smoothing, and a Bayesian procedure, all of which are applicable to single station records.

  3. Determining the Statistical Significance of Relative Weights

    ERIC Educational Resources Information Center

    Tonidandel, Scott; LeBreton, James M.; Johnson, Jeff W.

    2009-01-01

    Relative weight analysis is a procedure for estimating the relative importance of correlated predictors in a regression equation. Because the sampling distribution of relative weights is unknown, researchers using relative weight analysis are unable to make judgments regarding the statistical significance of the relative weights. J. W. Johnson…

  4. How many spectral lines are statistically significant?

    NASA Astrophysics Data System (ADS)

    Freund, J.

    When experimental line spectra are fitted with least squares techniques one frequently does not know whether n or n + 1 lines may be fitted safely. This paper shows how an F-test can be applied in order to determine the statistical significance of including an extra line into the fitting routine.

  5. Detecting cell death with optical coherence tomography and envelope statistics

    NASA Astrophysics Data System (ADS)

    Farhat, Golnaz; Yang, Victor X. D.; Czarnota, Gregory J.; Kolios, Michael C.

    2011-02-01

    Currently no standard clinical or preclinical noninvasive method exists to monitor cell death based on morphological changes at the cellular level. In our past work we have demonstrated that quantitative high frequency ultrasound imaging can detect cell death in vitro and in vivo. In this study we apply quantitative methods previously used with high frequency ultrasound to optical coherence tomography (OCT) to detect cell death. The ultimate goal of this work is to use these methods for optically-based clinical and preclinical cancer treatment monitoring. Optical coherence tomography data were acquired from acute myeloid leukemia cells undergoing three modes of cell death. Significant increases in integrated backscatter were observed for cells undergoing apoptosis and mitotic arrest, while necrotic cells induced a decrease. These changes appear to be linked to structural changes observed in histology obtained from the cell samples. Signal envelope statistics were analyzed from fittings of the generalized gamma distribution to histograms of envelope intensities. The parameters from this distribution demonstrated sensitivities to morphological changes in the cell samples. These results indicate that OCT integrated backscatter and first order envelope statistics can be used to detect and potentially differentiate between modes of cell death in vitro.

  6. Statistical significance test for transition matrices of atmospheric Markov chains

    NASA Technical Reports Server (NTRS)

    Vautard, Robert; Mo, Kingtse C.; Ghil, Michael

    1990-01-01

    Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.

  7. Advances in Testing the Statistical Significance of Mediation Effects

    ERIC Educational Resources Information Center

    Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.

    2006-01-01

    P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…

  8. Detection of significant protein coevolution.

    PubMed

    Ochoa, David; Juan, David; Valencia, Alfonso; Pazos, Florencio

    2015-07-01

    The evolution of proteins cannot be fully understood without taking into account the coevolutionary linkages entangling them. From a practical point of view, coevolution between protein families has been used as a way of detecting protein interactions and functional relationships from genomic information. The most common approach to inferring protein coevolution involves the quantification of phylogenetic tree similarity using a family of methodologies termed mirrortree. In spite of their success, a fundamental problem of these approaches is the lack of an adequate statistical framework to assess the significance of a given coevolutionary score (tree similarity). As a consequence, a number of ad hoc filters and arbitrary thresholds are required in an attempt to obtain a final set of confident coevolutionary signals. In this work, we developed a method for associating confidence estimators (P values) to the tree-similarity scores, using a null model specifically designed for the tree comparison problem. We show how this approach largely improves the quality and coverage (number of pairs that can be evaluated) of the detected coevolution in all the stages of the mirrortree workflow, independently of the starting genomic information. This not only leads to a better understanding of protein coevolution and its biological implications, but also to obtain a highly reliable and comprehensive network of predicted interactions, as well as information on the substructure of macromolecular complexes using only genomic information. The software and datasets used in this work are freely available at: http://csbg.cnb.csic.es/pMT/. pazos@cnb.csic.es Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  9. Application of Scan Statistics to Detect Suicide Clusters in Australia

    PubMed Central

    Cheung, Yee Tak Derek; Spittal, Matthew J.; Williamson, Michelle Kate; Tung, Sui Jay; Pirkis, Jane

    2013-01-01

    Background Suicide clustering occurs when multiple suicide incidents take place in a small area or/and within a short period of time. In spite of the multi-national research attention and particular efforts in preparing guidelines for tackling suicide clusters, the broader picture of epidemiology of suicide clustering remains unclear. This study aimed to develop techniques in using scan statistics to detect clusters, with the detection of suicide clusters in Australia as example. Methods and Findings Scan statistics was applied to detect clusters among suicides occurring between 2004 and 2008. Manipulation of parameter settings and change of area for scan statistics were performed to remedy shortcomings in existing methods. In total, 243 suicides out of 10,176 (2.4%) were identified as belonging to 15 suicide clusters. These clusters were mainly located in the Northern Territory, the northern part of Western Australia, and the northern part of Queensland. Among the 15 clusters, 4 (26.7%) were detected by both national and state cluster detections, 8 (53.3%) were only detected by the state cluster detection, and 3 (20%) were only detected by the national cluster detection. Conclusions These findings illustrate that the majority of spatial-temporal clusters of suicide were located in the inland northern areas, with socio-economic deprivation and higher proportions of indigenous people. Discrepancies between national and state/territory cluster detection by scan statistics were due to the contrast of the underlying suicide rates across states/territories. Performing both small-area and large-area analyses, and applying multiple parameter settings may yield the maximum benefits for exploring clusters. PMID:23342098

  10. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic

  11. Assessment of statistical significance and clinical relevance.

    PubMed

    Kieser, Meinhard; Friede, Tim; Gondan, Matthias

    2013-05-10

    In drug development, it is well accepted that a successful study will demonstrate not only a statistically significant result but also a clinically relevant effect size. Whereas standard hypothesis tests are used to demonstrate the former, it is less clear how the latter should be established. In the first part of this paper, we consider the responder analysis approach and study the performance of locally optimal rank tests when the outcome distribution is a mixture of responder and non-responder distributions. We find that these tests are quite sensitive to their planning assumptions and have therefore not really any advantage over standard tests such as the t-test and the Wilcoxon-Mann-Whitney test, which perform overall well and can be recommended for applications. In the second part, we present a new approach to the assessment of clinical relevance based on the so-called relative effect (or probabilistic index) and derive appropriate sample size formulae for the design of studies aiming at demonstrating both a statistically significant and clinically relevant effect. Referring to recent studies in multiple sclerosis, we discuss potential issues in the application of this approach. Copyright © 2012 John Wiley & Sons, Ltd.

  12. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  13. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  14. A flexibly shaped space-time scan statistic for disease outbreak detection and monitoring.

    PubMed

    Takahashi, Kunihiko; Kulldorff, Martin; Tango, Toshiro; Yih, Katherine

    2008-04-11

    Early detection of disease outbreaks enables public health officials to implement disease control and prevention measures at the earliest possible time. A time periodic geographical disease surveillance system based on a cylindrical space-time scan statistic has been used extensively for disease surveillance along with the SaTScan software. In the purely spatial setting, many different methods have been proposed to detect spatial disease clusters. In particular, some spatial scan statistics are aimed at detecting irregularly shaped clusters which may not be detected by the circular spatial scan statistic. Based on the flexible purely spatial scan statistic, we propose a flexibly shaped space-time scan statistic for early detection of disease outbreaks. The performance of the proposed space-time scan statistic is compared with that of the cylindrical scan statistic using benchmark data. In order to compare their performances, we have developed a space-time power distribution by extending the purely spatial bivariate power distribution. Daily syndromic surveillance data in Massachusetts, USA, are used to illustrate the proposed test statistic. The flexible space-time scan statistic is well suited for detecting and monitoring disease outbreaks in irregularly shaped areas.

  15. Flexible statistical modelling detects clinical functional magnetic resonance imaging activation in partially compliant subjects.

    PubMed

    Waites, Anthony B; Mannfolk, Peter; Shaw, Marnie E; Olsrud, Johan; Jackson, Graeme D

    2007-02-01

    Clinical functional magnetic resonance imaging (fMRI) occasionally fails to detect significant activation, often due to variability in task performance. The present study seeks to test whether a more flexible statistical analysis can better detect activation, by accounting for variance associated with variable compliance to the task over time. Experimental results and simulated data both confirm that even at 80% compliance to the task, such a flexible model outperforms standard statistical analysis when assessed using the extent of activation (experimental data), goodness of fit (experimental data), and area under the operator characteristic curve (simulated data). Furthermore, retrospective examination of 14 clinical fMRI examinations reveals that in patients where the standard statistical approach yields activation, there is a measurable gain in model performance in adopting the flexible statistical model, with little or no penalty in lost sensitivity. This indicates that a flexible model should be considered, particularly for clinical patients who may have difficulty complying fully with the study task.

  16. Timescales for detecting a significant acceleration in sea level rise

    PubMed Central

    Haigh, Ivan D.; Wahl, Thomas; Rohling, Eelco J.; Price, René M.; Pattiaratchi, Charitha B.; Calafat, Francisco M.; Dangendorf, Sönke

    2014-01-01

    There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records. PMID:24728012

  17. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model.

    PubMed

    Austin, Peter C

    2018-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.

  18. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model

    PubMed Central

    Austin, Peter C.

    2017-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest. PMID:29321694

  19. Testing the Difference of Correlated Agreement Coefficients for Statistical Significance

    ERIC Educational Resources Information Center

    Gwet, Kilem L.

    2016-01-01

    This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…

  20. Application of hotspot detection using spatial scan statistic: Study of criminality in Indonesia

    NASA Astrophysics Data System (ADS)

    Runadi, Taruga; Widyaningsih, Yekti

    2017-03-01

    According to the police registered data, the number of criminal cases tends to fluctuate during 2011 to 2013. It means there is no significant reduction cases number of criminal acts during that period. Local government needs to observe whether their area was a high risk of criminal case. The objectives of this study are to detect hotspot area of certain criminal cases using spatial scan statistic. This study analyzed the data of 22 criminal types cases based on province in Indonesia that occurred during 2013. The data was obtained from Badan Pusat Statistik (BPS) that was released in 2014. Hotspot detection was performed according to the likelihood ratio of the Poisson model using SaTScanTM software and then mapped using R. The spatial scan statistic method successfully detected provinces that was categorized as hotspot for 22 crime types cases being analyzed with p-value less than 0.05. The local governments of province that were detected as hotspot area of certain crime cases should provide more attention to improve security quality.

  1. Detection by voxel-wise statistical analysis of significant changes in regional cerebral glucose uptake in an APP/PS1 transgenic mouse model of Alzheimer's disease.

    PubMed

    Dubois, Albertine; Hérard, Anne-Sophie; Delatour, Benoît; Hantraye, Philippe; Bonvento, Gilles; Dhenain, Marc; Delzescaux, Thierry

    2010-06-01

    Biomarkers and technologies similar to those used in humans are essential for the follow-up of Alzheimer's disease (AD) animal models, particularly for the clarification of mechanisms and the screening and validation of new candidate treatments. In humans, changes in brain metabolism can be detected by 1-deoxy-2-[(18)F] fluoro-D-glucose PET (FDG-PET) and assessed in a user-independent manner with dedicated software, such as Statistical Parametric Mapping (SPM). FDG-PET can be carried out in small animals, but its resolution is low as compared to the size of rodent brain structures. In mouse models of AD, changes in cerebral glucose utilization are usually detected by [(14)C]-2-deoxyglucose (2DG) autoradiography, but this requires prior manual outlining of regions of interest (ROI) on selected sections. Here, we evaluate the feasibility of applying the SPM method to 3D autoradiographic data sets mapping brain metabolic activity in a transgenic mouse model of AD. We report the preliminary results obtained with 4 APP/PS1 (64+/-1 weeks) and 3 PS1 (65+/-2 weeks) mice. We also describe new procedures for the acquisition and use of "blockface" photographs and provide the first demonstration of their value for the 3D reconstruction and spatial normalization of post mortem mouse brain volumes. Despite this limited sample size, our results appear to be meaningful, consistent, and more comprehensive than findings from previously published studies based on conventional ROI-based methods. The establishment of statistical significance at the voxel level, rather than with a user-defined ROI, makes it possible to detect more reliably subtle differences in geometrically complex regions, such as the hippocampus. Our approach is generic and could be easily applied to other biomarkers and extended to other species and applications. Copyright 2010 Elsevier Inc. All rights reserved.

  2. Statistical significance of trace evidence matches using independent physicochemical measurements

    NASA Astrophysics Data System (ADS)

    Almirall, Jose R.; Cole, Michael; Furton, Kenneth G.; Gettinby, George

    1997-02-01

    A statistical approach to the significance of glass evidence is proposed using independent physicochemical measurements and chemometrics. Traditional interpretation of the significance of trace evidence matches or exclusions relies on qualitative descriptors such as 'indistinguishable from,' 'consistent with,' 'similar to' etc. By performing physical and chemical measurements with are independent of one another, the significance of object exclusions or matches can be evaluated statistically. One of the problems with this approach is that the human brain is excellent at recognizing and classifying patterns and shapes but performs less well when that object is represented by a numerical list of attributes. Chemometrics can be employed to group similar objects using clustering algorithms and provide statistical significance in a quantitative manner. This approach is enhanced when population databases exist or can be created and the data in question can be evaluated given these databases. Since the selection of the variables used and their pre-processing can greatly influence the outcome, several different methods could be employed in order to obtain a more complete picture of the information contained in the data. Presently, we report on the analysis of glass samples using refractive index measurements and the quantitative analysis of the concentrations of the metals: Mg, Al, Ca, Fe, Mn, Ba, Sr, Ti and Zr. The extension of this general approach to fiber and paint comparisons also is discussed. This statistical approach should not replace the current interpretative approaches to trace evidence matches or exclusions but rather yields an additional quantitative measure. The lack of sufficient general population databases containing the needed physicochemical measurements and the potential for confusion arising from statistical analysis currently hamper this approach and ways of overcoming these obstacles are presented.

  3. Fukunaga-Koontz feature transformation for statistical structural damage detection and hierarchical neuro-fuzzy damage localisation

    NASA Astrophysics Data System (ADS)

    Hoell, Simon; Omenzetter, Piotr

    2017-07-01

    Considering jointly damage sensitive features (DSFs) of signals recorded by multiple sensors, applying advanced transformations to these DSFs and assessing systematically their contribution to damage detectability and localisation can significantly enhance the performance of structural health monitoring systems. This philosophy is explored here for partial autocorrelation coefficients (PACCs) of acceleration responses. They are interrogated with the help of the linear discriminant analysis based on the Fukunaga-Koontz transformation using datasets of the healthy and selected reference damage states. Then, a simple but efficient fast forward selection procedure is applied to rank the DSF components with respect to statistical distance measures specialised for either damage detection or localisation. For the damage detection task, the optimal feature subsets are identified based on the statistical hypothesis testing. For damage localisation, a hierarchical neuro-fuzzy tool is developed that uses the DSF ranking to establish its own optimal architecture. The proposed approaches are evaluated experimentally on data from non-destructively simulated damage in a laboratory scale wind turbine blade. The results support our claim of being able to enhance damage detectability and localisation performance by transforming and optimally selecting DSFs. It is demonstrated that the optimally selected PACCs from multiple sensors or their Fukunaga-Koontz transformed versions can not only improve the detectability of damage via statistical hypothesis testing but also increase the accuracy of damage localisation when used as inputs into a hierarchical neuro-fuzzy network. Furthermore, the computational effort of employing these advanced soft computing models for damage localisation can be significantly reduced by using transformed DSFs.

  4. Statistical algorithms improve accuracy of gene fusion detection

    PubMed Central

    Hsieh, Gillian; Bierman, Rob; Szabo, Linda; Lee, Alex Gia; Freeman, Donald E.; Watson, Nathaniel; Sweet-Cordero, E. Alejandro

    2017-01-01

    Abstract Gene fusions are known to play critical roles in tumor pathogenesis. Yet, sensitive and specific algorithms to detect gene fusions in cancer do not currently exist. In this paper, we present a new statistical algorithm, MACHETE (Mismatched Alignment CHimEra Tracking Engine), which achieves highly sensitive and specific detection of gene fusions from RNA-Seq data, including the highest Positive Predictive Value (PPV) compared to the current state-of-the-art, as assessed in simulated data. We show that the best performing published algorithms either find large numbers of fusions in negative control data or suffer from low sensitivity detecting known driving fusions in gold standard settings, such as EWSR1-FLI1. As proof of principle that MACHETE discovers novel gene fusions with high accuracy in vivo, we mined public data to discover and subsequently PCR validate novel gene fusions missed by other algorithms in the ovarian cancer cell line OVCAR3. These results highlight the gains in accuracy achieved by introducing statistical models into fusion detection, and pave the way for unbiased discovery of potentially driving and druggable gene fusions in primary tumors. PMID:28541529

  5. Empirical Tryout of a New Statistic for Detecting Temporally Inconsistent Responders.

    PubMed

    Kerry, Matthew J

    2018-01-01

    Statistical screening of self-report data is often advised to support the quality of analyzed responses - For example, reduction of insufficient effort responding (IER). One recently introduced index based on Mahalanobis's D for detecting outliers in cross-sectional designs replaces centered scores with difference scores between repeated-measure items: Termed person temporal consistency ( D 2 ptc ). Although the adapted D 2 ptc index demonstrated usefulness in simulation datasets, it has not been applied to empirical data. The current study addresses D 2 ptc 's low uptake by critically appraising its performance across three empirical applications. Independent samples were selected to represent a range of scenarios commonly encountered by organizational researchers. First, in Sample 1, a repeat-measure of future time perspective (FTP) inexperienced working adults (age >40-years; n = 620) indicated that temporal inconsistency was significantly related to respondent age and item reverse-scoring. Second, in repeat-measure of team efficacy aggregations, D 2 ptc successfully detected team-level inconsistency across repeat-performance cycles. Thirdly, the usefulness of the D 2 ptc was examined in an experimental study dataset of subjective life expectancy indicated significantly more stable responding in experimental conditions compared to controls. The empirical findings support D 2 ptc 's flexible and useful application to distinct study designs. Discussion centers on current limitations and further extensions that may be of value to psychologists screening self-report data for strengthening response quality and meaningfulness of inferences from repeated-measures self-reports. Taken together, the findings support the usefulness of the newly devised statistic for detecting IER and other extreme response patterns.

  6. Estimation of the geochemical threshold and its statistical significance

    USGS Publications Warehouse

    Miesch, A.T.

    1981-01-01

    A statistic is proposed for estimating the geochemical threshold and its statistical significance, or it may be used to identify a group of extreme values that can be tested for significance by other means. The statistic is the maximum gap between adjacent values in an ordered array after each gap has been adjusted for the expected frequency. The values in the ordered array are geochemical values transformed by either ln(?? - ??) or ln(?? - ??) and then standardized so that the mean is zero and the variance is unity. The expected frequency is taken from a fitted normal curve with unit area. The midpoint of an adjusted gap that exceeds the corresponding critical value may be taken as an estimate of the geochemical threshold, and the associated probability indicates the likelihood that the threshold separates two geochemical populations. The adjusted gap test may fail to identify threshold values if the variation tends to be continuous from background values to the higher values that reflect mineralized ground. However, the test will serve to identify other anomalies that may be too subtle to have been noted by other means. ?? 1981.

  7. Damage detection of engine bladed-disks using multivariate statistical analysis

    NASA Astrophysics Data System (ADS)

    Fang, X.; Tang, J.

    2006-03-01

    The timely detection of damage in aero-engine bladed-disks is an extremely important and challenging research topic. Bladed-disks have high modal density and, particularly, their vibration responses are subject to significant uncertainties due to manufacturing tolerance (blade-to-blade difference or mistuning), operating condition change and sensor noise. In this study, we present a new methodology for the on-line damage detection of engine bladed-disks using their vibratory responses during spin-up or spin-down operations which can be measured by blade-tip-timing sensing technique. We apply a principle component analysis (PCA)-based approach for data compression, feature extraction, and denoising. The non-model based damage detection is achieved by analyzing the change between response features of the healthy structure and of the damaged one. We facilitate such comparison by incorporating the Hotelling's statistic T2 analysis, which yields damage declaration with a given confidence level. The effectiveness of the method is demonstrated by case studies.

  8. Decadal power in land air temperatures: Is it statistically significant?

    NASA Astrophysics Data System (ADS)

    Thejll, Peter A.

    2001-12-01

    The geographical distribution and properties of the well-known 10-11 year signal in terrestrial temperature records is investigated. By analyzing the Global Historical Climate Network data for surface air temperatures we verify that the signal is strongest in North America and is similar in nature to that reported earlier by R. G. Currie. The decadal signal is statistically significant for individual stations, but it is not possible to show that the signal is statistically significant globally, using strict tests. In North America, during the twentieth century, the decadal variability in the solar activity cycle is associated with the decadal part of the North Atlantic Oscillation index series in such a way that both of these signals correspond to the same spatial pattern of cooling and warming. A method for testing statistical results with Monte Carlo trials on data fields with specified temporal structure and specific spatial correlation retained is presented.

  9. Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

    NASA Astrophysics Data System (ADS)

    Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

    2018-07-01

    A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.

  10. Applying a statistical PTB detection procedure to complement the gold standard.

    PubMed

    Noor, Norliza Mohd; Yunus, Ashari; Bakar, S A R Abu; Hussin, Amran; Rijal, Omar Mohd

    2011-04-01

    This paper investigates a novel statistical discrimination procedure to detect PTB when the gold standard requirement is taken into consideration. Archived data were used to establish two groups of patients which are the control and test group. The control group was used to develop the statistical discrimination procedure using four vectors of wavelet coefficients as feature vectors for the detection of pulmonary tuberculosis (PTB), lung cancer (LC), and normal lung (NL). This discrimination procedure was investigated using the test group where the number of sputum positive and sputum negative cases that were correctly classified as PTB cases were noted. The proposed statistical discrimination method is able to detect PTB patients and LC with high true positive fraction. The method is also able to detect PTB patients that are sputum negative and therefore may be used as a complement to the gold standard. Copyright © 2010 Elsevier Ltd. All rights reserved.

  11. Detection of reflecting surfaces by a statistical model

    NASA Astrophysics Data System (ADS)

    He, Qiang; Chu, Chee-Hung H.

    2009-02-01

    Remote sensing is widely used assess the destruction from natural disasters and to plan relief and recovery operations. How to automatically extract useful features and segment interesting objects from digital images, including remote sensing imagery, becomes a critical task for image understanding. Unfortunately, current research on automated feature extraction is ignorant of contextual information. As a result, the fidelity of populating attributes corresponding to interesting features and objects cannot be satisfied. In this paper, we present an exploration on meaningful object extraction integrating reflecting surfaces. Detection of specular reflecting surfaces can be useful in target identification and then can be applied to environmental monitoring, disaster prediction and analysis, military, and counter-terrorism. Our method is based on a statistical model to capture the statistical properties of specular reflecting surfaces. And then the reflecting surfaces are detected through cluster analysis.

  12. Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance.

    PubMed

    Kramer, Karen L; Veile, Amanda; Otárola-Castillo, Erik

    2016-01-01

    Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children's growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children's monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children's growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children's growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children's growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance.

  13. Quantile regression for the statistical analysis of immunological data with many non-detects.

    PubMed

    Eilers, Paul H C; Röder, Esther; Savelkoul, Huub F J; van Wijk, Roy Gerth

    2012-07-07

    Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects.

  14. Your Chi-Square Test Is Statistically Significant: Now What?

    ERIC Educational Resources Information Center

    Sharpe, Donald

    2015-01-01

    Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…

  15. Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform

    ERIC Educational Resources Information Center

    Norris, John M.

    2015-01-01

    Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…

  16. Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance

    PubMed Central

    Kramer, Karen L.; Veile, Amanda; Otárola-Castillo, Erik

    2016-01-01

    Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children’s growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children’s monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children’s growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children’s growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children’s growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance. PMID:26938742

  17. Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies

    PubMed Central

    Goodpaster, Aaron M.; Kennedy, Michael A.

    2015-01-01

    Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647

  18. Autoregressive statistical pattern recognition algorithms for damage detection in civil structures

    NASA Astrophysics Data System (ADS)

    Yao, Ruigen; Pakzad, Shamim N.

    2012-08-01

    Statistical pattern recognition has recently emerged as a promising set of complementary methods to system identification for automatic structural damage assessment. Its essence is to use well-known concepts in statistics for boundary definition of different pattern classes, such as those for damaged and undamaged structures. In this paper, several statistical pattern recognition algorithms using autoregressive models, including statistical control charts and hypothesis testing, are reviewed as potentially competitive damage detection techniques. To enhance the performance of statistical methods, new feature extraction techniques using model spectra and residual autocorrelation, together with resampling-based threshold construction methods, are proposed. Subsequently, simulated acceleration data from a multi degree-of-freedom system is generated to test and compare the efficiency of the existing and proposed algorithms. Data from laboratory experiments conducted on a truss and a large-scale bridge slab model are then used to further validate the damage detection methods and demonstrate the superior performance of proposed algorithms.

  19. Why Are People Bad at Detecting Randomness? A Statistical Argument

    ERIC Educational Resources Information Center

    Williams, Joseph J.; Griffiths, Thomas L.

    2013-01-01

    Errors in detecting randomness are often explained in terms of biases and misconceptions. We propose and provide evidence for an account that characterizes the contribution of the inherent statistical difficulty of the task. Our account is based on a Bayesian statistical analysis, focusing on the fact that a random process is a special case of…

  20. Statistical detection of systematic election irregularities.

    PubMed

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-10-09

    Democratic societies are built around the principle of free and fair elections, and that each citizen's vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons.

  1. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a

  2. A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.

    PubMed

    Tango, Toshiro; Takahashi, Kunihiko

    2012-12-30

    Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan. Copyright © 2012 John Wiley & Sons, Ltd.

  3. Fusion of Local Statistical Parameters for Buried Underwater Mine Detection in Sonar Imaging

    NASA Astrophysics Data System (ADS)

    Maussang, F.; Rombaut, M.; Chanussot, J.; Hétet, A.; Amate, M.

    2008-12-01

    Detection of buried underwater objects, and especially mines, is a current crucial strategic task. Images provided by sonar systems allowing to penetrate in the sea floor, such as the synthetic aperture sonars (SASs), are of great interest for the detection and classification of such objects. However, the signal-to-noise ratio is fairly low and advanced information processing is required for a correct and reliable detection of the echoes generated by the objects. The detection method proposed in this paper is based on a data-fusion architecture using the belief theory. The input data of this architecture are local statistical characteristics extracted from SAS data corresponding to the first-, second-, third-, and fourth-order statistical properties of the sonar images, respectively. The interest of these parameters is derived from a statistical model of the sonar data. Numerical criteria are also proposed to estimate the detection performances and to validate the method.

  4. Assessing segmentation processes by click detection: online measure of statistical learning, or simple interference?

    PubMed

    Franco, Ana; Gaillard, Vinciane; Cleeremans, Axel; Destrebecqz, Arnaud

    2015-12-01

    Statistical learning can be used to extract the words from continuous speech. Gómez, Bion, and Mehler (Language and Cognitive Processes, 26, 212-223, 2011) proposed an online measure of statistical learning: They superimposed auditory clicks on a continuous artificial speech stream made up of a random succession of trisyllabic nonwords. Participants were instructed to detect these clicks, which could be located either within or between words. The results showed that, over the length of exposure, reaction times (RTs) increased more for within-word than for between-word clicks. This result has been accounted for by means of statistical learning of the between-word boundaries. However, even though statistical learning occurs without an intention to learn, it nevertheless requires attentional resources. Therefore, this process could be affected by a concurrent task such as click detection. In the present study, we evaluated the extent to which the click detection task indeed reflects successful statistical learning. Our results suggest that the emergence of RT differences between within- and between-word click detection is neither systematic nor related to the successful segmentation of the artificial language. Therefore, instead of being an online measure of learning, the click detection task seems to interfere with the extraction of statistical regularities.

  5. Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms

    NASA Astrophysics Data System (ADS)

    Berkson, Emily E.; Messinger, David W.

    2016-05-01

    Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.

  6. Statistical Significance of Periodicity and Log-Periodicity with Heavy-Tailed Correlated Noise

    NASA Astrophysics Data System (ADS)

    Zhou, Wei-Xing; Sornette, Didier

    We estimate the probability that random noise, of several plausible standard distributions, creates a false alarm that a periodicity (or log-periodicity) is found in a time series. The solution of this problem is already known for independent Gaussian distributed noise. We investigate more general situations with non-Gaussian correlated noises and present synthetic tests on the detectability and statistical significance of periodic components. A periodic component of a time series is usually detected by some sort of Fourier analysis. Here, we use the Lomb periodogram analysis, which is suitable and outperforms Fourier transforms for unevenly sampled time series. We examine the false-alarm probability of the largest spectral peak of the Lomb periodogram in the presence of power-law distributed noises, of short-range and of long-range fractional-Gaussian noises. Increasing heavy-tailness (respectively correlations describing persistence) tends to decrease (respectively increase) the false-alarm probability of finding a large spurious Lomb peak. Increasing anti-persistence tends to decrease the false-alarm probability. We also study the interplay between heavy-tailness and long-range correlations. In order to fully determine if a Lomb peak signals a genuine rather than a spurious periodicity, one should in principle characterize the Lomb peak height, its width and its relations to other peaks in the complete spectrum. As a step towards this full characterization, we construct the joint-distribution of the frequency position (relative to other peaks) and of the height of the highest peak of the power spectrum. We also provide the distributions of the ratio of the highest Lomb peak to the second highest one. Using the insight obtained by the present statistical study, we re-examine previously reported claims of ``log-periodicity'' and find that the credibility for log-periodicity in 2D-freely decaying turbulence is weakened while it is strengthened for fracture, for the

  7. Multilayer Statistical Intrusion Detection in Wireless Networks

    NASA Astrophysics Data System (ADS)

    Hamdi, Mohamed; Meddeb-Makhlouf, Amel; Boudriga, Noureddine

    2008-12-01

    The rapid proliferation of mobile applications and services has introduced new vulnerabilities that do not exist in fixed wired networks. Traditional security mechanisms, such as access control and encryption, turn out to be inefficient in modern wireless networks. Given the shortcomings of the protection mechanisms, an important research focuses in intrusion detection systems (IDSs). This paper proposes a multilayer statistical intrusion detection framework for wireless networks. The architecture is adequate to wireless networks because the underlying detection models rely on radio parameters and traffic models. Accurate correlation between radio and traffic anomalies allows enhancing the efficiency of the IDS. A radio signal fingerprinting technique based on the maximal overlap discrete wavelet transform (MODWT) is developed. Moreover, a geometric clustering algorithm is presented. Depending on the characteristics of the fingerprinting technique, the clustering algorithm permits to control the false positive and false negative rates. Finally, simulation experiments have been carried out to validate the proposed IDS.

  8. Statistical fingerprinting for malware detection and classification

    DOEpatents

    Prowell, Stacy J.; Rathgeb, Christopher T.

    2015-09-15

    A system detects malware in a computing architecture with an unknown pedigree. The system includes a first computing device having a known pedigree and operating free of malware. The first computing device executes a series of instrumented functions that, when executed, provide a statistical baseline that is representative of the time it takes the software application to run on a computing device having a known pedigree. A second computing device executes a second series of instrumented functions that, when executed, provides an actual time that is representative of the time the known software application runs on the second computing device. The system detects malware when there is a difference in execution times between the first and the second computing devices.

  9. Statistical significance versus clinical relevance.

    PubMed

    van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

    2017-04-01

    In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.

  10. Statistical detection of systematic election irregularities

    PubMed Central

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-01-01

    Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons. PMID:23010929

  11. Detection of nonlinear transfer functions by the use of Gaussian statistics

    NASA Technical Reports Server (NTRS)

    Sheppard, J. G.

    1972-01-01

    The possibility of using on-line signal statistics to detect electronic equipment nonlinearities is discussed. The results of an investigation using Gaussian statistics are presented, and a nonlinearity test that uses ratios of the moments of a Gaussian random variable is developed and discussed. An outline for further investigation is presented.

  12. A scan statistic for binary outcome based on hypergeometric probability model, with an application to detecting spatial clusters of Japanese encephalitis.

    PubMed

    Zhao, Xing; Zhou, Xiao-Hua; Feng, Zijian; Guo, Pengfei; He, Hongyan; Zhang, Tao; Duan, Lei; Li, Xiaosong

    2013-01-01

    As a useful tool for geographical cluster detection of events, the spatial scan statistic is widely applied in many fields and plays an increasingly important role. The classic version of the spatial scan statistic for the binary outcome is developed by Kulldorff, based on the Bernoulli or the Poisson probability model. In this paper, we apply the Hypergeometric probability model to construct the likelihood function under the null hypothesis. Compared with existing methods, the likelihood function under the null hypothesis is an alternative and indirect method to identify the potential cluster, and the test statistic is the extreme value of the likelihood function. Similar with Kulldorff's methods, we adopt Monte Carlo test for the test of significance. Both methods are applied for detecting spatial clusters of Japanese encephalitis in Sichuan province, China, in 2009, and the detected clusters are identical. Through a simulation to independent benchmark data, it is indicated that the test statistic based on the Hypergeometric model outweighs Kulldorff's statistics for clusters of high population density or large size; otherwise Kulldorff's statistics are superior.

  13. Statistical Significance and Effect Size: Two Sides of a Coin.

    ERIC Educational Resources Information Center

    Fan, Xitao

    This paper suggests that statistical significance testing and effect size are two sides of the same coin; they complement each other, but do not substitute for one another. Good research practice requires that both should be taken into consideration to make sound quantitative decisions. A Monte Carlo simulation experiment was conducted, and a…

  14. Publication of statistically significant research findings in prosthodontics & implant dentistry in the context of other dental specialties.

    PubMed

    Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos

    2015-10-01

    To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. Significant Statistics: Viewed with a Contextual Lens

    ERIC Educational Resources Information Center

    Tait-McCutcheon, Sandi

    2010-01-01

    This paper examines the pedagogical and organisational changes three lead teachers made to their statistics teaching and learning programs. The lead teachers posed the research question: What would the effect of contextually integrating statistical investigations and literacies into other curriculum areas be on student achievement? By finding the…

  16. Tables of square-law signal detection statistics for Hann spectra with 50 percent overlap

    NASA Technical Reports Server (NTRS)

    Deans, Stanley R.; Cullers, D. Kent

    1991-01-01

    The Search for Extraterrestrial Intelligence, currently being planned by NASA, will require that an enormous amount of data be analyzed in real time by special purpose hardware. It is expected that overlapped Hann data windows will play an important role in this analysis. In order to understand the statistical implication of this approach, it has been necessary to compute detection statistics for overlapped Hann spectra. Tables of signal detection statistics are given for false alarm rates from 10(exp -14) to 10(exp -1) and signal detection probabilities from 0.50 to 0.99; the number of computed spectra ranges from 4 to 2000.

  17. The Detection of Focal Liver Lesions Using Abdominal CT: A Comparison of Image Quality Between Adaptive Statistical Iterative Reconstruction V and Adaptive Statistical Iterative Reconstruction.

    PubMed

    Lee, Sangyun; Kwon, Heejin; Cho, Jihan

    2016-12-01

    To investigate image quality characteristics of abdominal computed tomography (CT) scans reconstructed with adaptive statistical iterative reconstruction V (ASIR-V) vs currently using applied adaptive statistical iterative reconstruction (ASIR). This institutional review board-approved study included 35 consecutive patients who underwent CT of the abdomen. Among these 35 patients, 27 with focal liver lesions underwent abdomen CT with a 128-slice multidetector unit using the following parameters: fixed noise index of 30, 1.25 mm slice thickness, 120 kVp, and a gantry rotation time of 0.5 seconds. CT images were analyzed depending on the method of reconstruction: ASIR (30%, 50%, and 70%) vs ASIR-V (30%, 50%, and 70%). Three radiologists independently assessed randomized images in a blinded manner. Imaging sets were compared to focal lesion detection numbers, overall image quality, and objective noise with a paired sample t test. Interobserver agreement was assessed with the intraclass correlation coefficient. The detection of small focal liver lesions (<10 mm) was significantly higher when ASIR-V was used when compared to ASIR (P <0.001). Subjective image noise, artifact, and objective image noise in liver were generally significantly better for ASIR-V compared to ASIR, especially in 50% ASIR-V. Image sharpness and diagnostic acceptability were significantly worse in 70% ASIR-V compared to various levels of ASIR. Images analyzed using 50% ASIR-V were significantly better than three different series of ASIR or other ASIR-V conditions at providing diagnostically acceptable CT scans without compromising image quality and in the detection of focal liver lesions. Copyright © 2016 The Association of University Radiologists. Published by Elsevier Inc. All rights reserved.

  18. Adaptive Statistical Iterative Reconstruction-Applied Ultra-Low-Dose CT with Radiography-Comparable Radiation Dose: Usefulness for Lung Nodule Detection.

    PubMed

    Yoon, Hyun Jung; Chung, Myung Jin; Hwang, Hye Sun; Moon, Jung Won; Lee, Kyung Soo

    2015-01-01

    To assess the performance of adaptive statistical iterative reconstruction (ASIR)-applied ultra-low-dose CT (ULDCT) in detecting small lung nodules. Thirty patients underwent both ULDCT and standard dose CT (SCT). After determining the reference standard nodules, five observers, blinded to the reference standard reading results, independently evaluated SCT and both subsets of ASIR- and filtered back projection (FBP)-driven ULDCT images. Data assessed by observers were compared statistically. Converted effective doses in SCT and ULDCT were 2.81 ± 0.92 and 0.17 ± 0.02 mSv, respectively. A total of 114 lung nodules were detected on SCT as a standard reference. There was no statistically significant difference in sensitivity between ASIR-driven ULDCT and SCT for three out of the five observers (p = 0.678, 0.735, < 0.01, 0.038, and < 0.868 for observers 1, 2, 3, 4, and 5, respectively). The sensitivity of FBP-driven ULDCT was significantly lower than that of ASIR-driven ULDCT in three out of the five observers (p < 0.01 for three observers, and p = 0.064 and 0.146 for two observers). In jackknife alternative free-response receiver operating characteristic analysis, the mean values of figure-of-merit (FOM) for FBP, ASIR-driven ULDCT, and SCT were 0.682, 0.772, and 0.821, respectively, and there were no significant differences in FOM values between ASIR-driven ULDCT and SCT (p = 0.11), but the FOM value of FBP-driven ULDCT was significantly lower than that of ASIR-driven ULDCT and SCT (p = 0.01 and 0.00). Adaptive statistical iterative reconstruction-driven ULDCT delivering a radiation dose of only 0.17 mSv offers acceptable sensitivity in nodule detection compared with SCT and has better performance than FBP-driven ULDCT.

  19. Adaptive Statistical Iterative Reconstruction-Applied Ultra-Low-Dose CT with Radiography-Comparable Radiation Dose: Usefulness for Lung Nodule Detection

    PubMed Central

    Yoon, Hyun Jung; Hwang, Hye Sun; Moon, Jung Won; Lee, Kyung Soo

    2015-01-01

    Objective To assess the performance of adaptive statistical iterative reconstruction (ASIR)-applied ultra-low-dose CT (ULDCT) in detecting small lung nodules. Materials and Methods Thirty patients underwent both ULDCT and standard dose CT (SCT). After determining the reference standard nodules, five observers, blinded to the reference standard reading results, independently evaluated SCT and both subsets of ASIR- and filtered back projection (FBP)-driven ULDCT images. Data assessed by observers were compared statistically. Results Converted effective doses in SCT and ULDCT were 2.81 ± 0.92 and 0.17 ± 0.02 mSv, respectively. A total of 114 lung nodules were detected on SCT as a standard reference. There was no statistically significant difference in sensitivity between ASIR-driven ULDCT and SCT for three out of the five observers (p = 0.678, 0.735, < 0.01, 0.038, and < 0.868 for observers 1, 2, 3, 4, and 5, respectively). The sensitivity of FBP-driven ULDCT was significantly lower than that of ASIR-driven ULDCT in three out of the five observers (p < 0.01 for three observers, and p = 0.064 and 0.146 for two observers). In jackknife alternative free-response receiver operating characteristic analysis, the mean values of figure-of-merit (FOM) for FBP, ASIR-driven ULDCT, and SCT were 0.682, 0.772, and 0.821, respectively, and there were no significant differences in FOM values between ASIR-driven ULDCT and SCT (p = 0.11), but the FOM value of FBP-driven ULDCT was significantly lower than that of ASIR-driven ULDCT and SCT (p = 0.01 and 0.00). Conclusion Adaptive statistical iterative reconstruction-driven ULDCT delivering a radiation dose of only 0.17 mSv offers acceptable sensitivity in nodule detection compared with SCT and has better performance than FBP-driven ULDCT. PMID:26357505

  20. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    ERIC Educational Resources Information Center

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  1. Statistical lamb wave localization based on extreme value theory

    NASA Astrophysics Data System (ADS)

    Harley, Joel B.

    2018-04-01

    Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.

  2. Statistics provide guidance for indigenous organic carbon detection on Mars missions.

    PubMed

    Sephton, Mark A; Carter, Jonathan N

    2014-08-01

    Data from the Viking and Mars Science Laboratory missions indicate the presence of organic compounds that are not definitively martian in origin. Both contamination and confounding mineralogies have been suggested as alternatives to indigenous organic carbon. Intuitive thought suggests that we are repeatedly obtaining data that confirms the same level of uncertainty. Bayesian statistics may suggest otherwise. If an organic detection method has a true positive to false positive ratio greater than one, then repeated organic matter detection progressively increases the probability of indigeneity. Bayesian statistics also reveal that methods with higher ratios of true positives to false positives give higher overall probabilities and that detection of organic matter in a sample with a higher prior probability of indigenous organic carbon produces greater confidence. Bayesian statistics, therefore, provide guidance for the planning and operation of organic carbon detection activities on Mars. Suggestions for future organic carbon detection missions and instruments are as follows: (i) On Earth, instruments should be tested with analog samples of known organic content to determine their true positive to false positive ratios. (ii) On the mission, for an instrument with a true positive to false positive ratio above one, it should be recognized that each positive detection of organic carbon will result in a progressive increase in the probability of indigenous organic carbon being present; repeated measurements, therefore, can overcome some of the deficiencies of a less-than-definitive test. (iii) For a fixed number of analyses, the highest true positive to false positive ratio method or instrument will provide the greatest probability that indigenous organic carbon is present. (iv) On Mars, analyses should concentrate on samples with highest prior probability of indigenous organic carbon; intuitive desires to contrast samples of high prior probability and low prior

  3. Detection of Clostridium difficile infection clusters, using the temporal scan statistic, in a community hospital in southern Ontario, Canada, 2006-2011.

    PubMed

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-05-12

    In hospitals, Clostridium difficile infection (CDI) surveillance relies on unvalidated guidelines or threshold criteria to identify outbreaks. This can result in false-positive and -negative cluster alarms. The application of statistical methods to identify and understand CDI clusters may be a useful alternative or complement to standard surveillance techniques. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting CDI clusters and determine if there are significant differences in the rate of CDI cases by month, season, and year in a community hospital. Bacteriology reports of patients identified with a CDI from August 2006 to February 2011 were collected. For patients detected with CDI from March 2010 to February 2011, stool specimens were obtained. Clostridium difficile isolates were characterized by ribotyping and investigated for the presence of toxin genes by PCR. CDI clusters were investigated using a retrospective temporal scan test statistic. Statistically significant clusters were compared to known CDI outbreaks within the hospital. A negative binomial regression model was used to identify associations between year, season, month and the rate of CDI cases. Overall, 86 CDI cases were identified. Eighteen specimens were analyzed and nine ribotypes were classified with ribotype 027 (n = 6) the most prevalent. The temporal scan statistic identified significant CDI clusters at the hospital (n = 5), service (n = 6), and ward (n = 4) levels (P ≤ 0.05). Three clusters were concordant with the one C. difficile outbreak identified by hospital personnel. Two clusters were identified as potential outbreaks. The negative binomial model indicated years 2007-2010 (P ≤ 0.05) had decreased CDI rates compared to 2006 and spring had an increased CDI rate compared to the fall (P = 0.023). Application of the temporal scan statistic identified several clusters, including potential outbreaks not detected by hospital

  4. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

    PubMed Central

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364

  5. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.

    PubMed

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.

  6. Which Statistic Should Be Used to Detect Item Preknowledge When the Set of Compromised Items Is Known?

    PubMed

    Sinharay, Sandip

    2017-09-01

    Benefiting from item preknowledge is a major type of fraudulent behavior during educational assessments. Belov suggested the posterior shift statistic for detection of item preknowledge and showed its performance to be better on average than that of seven other statistics for detection of item preknowledge for a known set of compromised items. Sinharay suggested a statistic based on the likelihood ratio test for detection of item preknowledge; the advantage of the statistic is that its null distribution is known. Results from simulated and real data and adaptive and nonadaptive tests are used to demonstrate that the Type I error rate and power of the statistic based on the likelihood ratio test are very similar to those of the posterior shift statistic. Thus, the statistic based on the likelihood ratio test appears promising in detecting item preknowledge when the set of compromised items is known.

  7. Using a higher criticism statistic to detect modest effects in a genome-wide study of rheumatoid arthritis

    PubMed Central

    2009-01-01

    In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects. PMID:20018032

  8. Structure-guided statistical textural distinctiveness for salient region detection in natural images.

    PubMed

    Scharfenberger, Christian; Wong, Alexander; Clausi, David A

    2015-01-01

    We propose a simple yet effective structure-guided statistical textural distinctiveness approach to salient region detection. Our method uses a multilayer approach to analyze the structural and textural characteristics of natural images as important features for salient region detection from a scale point of view. To represent the structural characteristics, we abstract the image using structured image elements and extract rotational-invariant neighborhood-based textural representations to characterize each element by an individual texture pattern. We then learn a set of representative texture atoms for sparse texture modeling and construct a statistical textural distinctiveness matrix to determine the distinctiveness between all representative texture atom pairs in each layer. Finally, we determine saliency maps for each layer based on the occurrence probability of the texture atoms and their respective statistical textural distinctiveness and fuse them to compute a final saliency map. Experimental results using four public data sets and a variety of performance evaluation metrics show that our approach provides promising results when compared with existing salient region detection approaches.

  9. Fostering Students' Statistical Literacy through Significant Learning Experience

    ERIC Educational Resources Information Center

    Krishnan, Saras

    2015-01-01

    A major objective of statistics education is to develop students' statistical literacy that enables them to be educated users of data in context. Teaching statistics in today's educational settings is not an easy feat because teachers have a huge task in keeping up with the demands of the new generation of learners. The present day students have…

  10. Distinguishing between statistical significance and practical/clinical meaningfulness using statistical inference.

    PubMed

    Wilkinson, Michael

    2014-03-01

    Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.

  11. Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

    PubMed

    Farrell, Mary Beth

    2018-06-01

    This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being

  12. Evaluation of the Gini Coefficient in Spatial Scan Statistics for Detecting Irregularly Shaped Clusters

    PubMed Central

    Kim, Jiyu; Jung, Inkyung

    2017-01-01

    Spatial scan statistics with circular or elliptic scanning windows are commonly used for cluster detection in various applications, such as the identification of geographical disease clusters from epidemiological data. It has been pointed out that the method may have difficulty in correctly identifying non-compact, arbitrarily shaped clusters. In this paper, we evaluated the Gini coefficient for detecting irregularly shaped clusters through a simulation study. The Gini coefficient, the use of which in spatial scan statistics was recently proposed, is a criterion measure for optimizing the maximum reported cluster size. Our simulation study results showed that using the Gini coefficient works better than the original spatial scan statistic for identifying irregularly shaped clusters, by reporting an optimized and refined collection of clusters rather than a single larger cluster. We have provided a real data example that seems to support the simulation results. We think that using the Gini coefficient in spatial scan statistics can be helpful for the detection of irregularly shaped clusters. PMID:28129368

  13. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  14. Mass spectrometry-based protein identification with accurate statistical significance assignment.

    PubMed

    Alves, Gelio; Yu, Yi-Kuo

    2015-03-01

    Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.

  15. Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps.

    PubMed

    Garud, Nandita R; Rosenberg, Noah A

    2015-06-01

    Soft selective sweeps represent an important form of adaptation in which multiple haplotypes bearing adaptive alleles rise to high frequency. Most statistical methods for detecting selective sweeps from genetic polymorphism data, however, have focused on identifying hard selective sweeps in which a favored allele appears on a single haplotypic background; these methods might be underpowered to detect soft sweeps. Among exceptions is the set of haplotype homozygosity statistics introduced for the detection of soft sweeps by Garud et al. (2015). These statistics, examining frequencies of multiple haplotypes in relation to each other, include H12, a statistic designed to identify both hard and soft selective sweeps, and H2/H1, a statistic that conditional on high H12 values seeks to distinguish between hard and soft sweeps. A challenge in the use of H2/H1 is that its range depends on the associated value of H12, so that equal H2/H1 values might provide different levels of support for a soft sweep model at different values of H12. Here, we enhance the H12 and H2/H1 haplotype homozygosity statistics for selective sweep detection by deriving the upper bound on H2/H1 as a function of H12, thereby generating a statistic that normalizes H2/H1 to lie between 0 and 1. Through a reanalysis of resequencing data from inbred lines of Drosophila, we show that the enhanced statistic both strengthens interpretations obtained with the unnormalized statistic and leads to empirical insights that are less readily apparent without the normalization. Copyright © 2015 Elsevier Inc. All rights reserved.

  16. Statistical detection of EEG synchrony using empirical bayesian inference.

    PubMed

    Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven

    2015-01-01

    There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.

  17. Comparison of probability statistics for automated ship detection in SAR imagery

    NASA Astrophysics Data System (ADS)

    Henschel, Michael D.; Rey, Maria T.; Campbell, J. W. M.; Petrovic, D.

    1998-12-01

    This paper discuses the initial results of a recent operational trial of the Ocean Monitoring Workstation's (OMW) ship detection algorithm which is essentially a Constant False Alarm Rate filter applied to Synthetic Aperture Radar data. The choice of probability distribution and methodologies for calculating scene specific statistics are discussed in some detail. An empirical basis for the choice of probability distribution used is discussed. We compare the results using a l-look, k-distribution function with various parameter choices and methods of estimation. As a special case of sea clutter statistics the application of a (chi) 2-distribution is also discussed. Comparisons are made with reference to RADARSAT data collected during the Maritime Command Operation Training exercise conducted in Atlantic Canadian Waters in June 1998. Reference is also made to previously collected statistics. The OMW is a commercial software suite that provides modules for automated vessel detection, oil spill monitoring, and environmental monitoring. This work has been undertaken to fine tune the OMW algorithm's, with special emphasis on the false alarm rate of each algorithm.

  18. Statistical Requirements For Pass-Fail Testing Of Contraband Detection Systems

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilliam, David M.

    2011-06-01

    Contraband detection systems for homeland security applications are typically tested for probability of detection (PD) and probability of false alarm (PFA) using pass-fail testing protocols. Test protocols usually require specified values for PD and PFA to be demonstrated at a specified level of statistical confidence CL. Based on a recent more theoretical treatment of this subject [1], this summary reviews the definition of CL and provides formulas and spreadsheet functions for constructing tables of general test requirements and for determining the minimum number of tests required. The formulas and tables in this article may be generally applied to many othermore » applications of pass-fail testing, in addition to testing of contraband detection systems.« less

  19. Statistical Requirements For Pass-Fail Testing Of Contraband Detection Systems

    NASA Astrophysics Data System (ADS)

    Gilliam, David M.

    2011-06-01

    Contraband detection systems for homeland security applications are typically tested for probability of detection (PD) and probability of false alarm (PFA) using pass-fail testing protocols. Test protocols usually require specified values for PD and PFA to be demonstrated at a specified level of statistical confidence CL. Based on a recent more theoretical treatment of this subject [1], this summary reviews the definition of CL and provides formulas and spreadsheet functions for constructing tables of general test requirements and for determining the minimum number of tests required. The formulas and tables in this article may be generally applied to many other applications of pass-fail testing, in addition to testing of contraband detection systems.

  20. A Space–Time Permutation Scan Statistic for Disease Outbreak Detection

    PubMed Central

    Kulldorff, Martin; Heffernan, Richard; Hartman, Jessica; Assunção, Renato; Mostashari, Farzad

    2005-01-01

    Background The ability to detect disease outbreaks early is important in order to minimize morbidity and mortality through timely implementation of disease prevention and control measures. Many national, state, and local health departments are launching disease surveillance systems with daily analyses of hospital emergency department visits, ambulance dispatch calls, or pharmacy sales for which population-at-risk information is unavailable or irrelevant. Methods and Findings We propose a prospective space–time permutation scan statistic for the early detection of disease outbreaks that uses only case numbers, with no need for population-at-risk data. It makes minimal assumptions about the time, geographical location, or size of the outbreak, and it adjusts for natural purely spatial and purely temporal variation. The new method was evaluated using daily analyses of hospital emergency department visits in New York City. Four of the five strongest signals were likely local precursors to citywide outbreaks due to rotavirus, norovirus, and influenza. The number of false signals was at most modest. Conclusion If such results hold up over longer study times and in other locations, the space–time permutation scan statistic will be an important tool for local and national health departments that are setting up early disease detection surveillance systems. PMID:15719066

  1. A Network-Based Method to Assess the Statistical Significance of Mild Co-Regulation Effects

    PubMed Central

    Horvát, Emőke-Ágnes; Zhang, Jitao David; Uhlmann, Stefan; Sahin, Özgür; Zweig, Katharina Anna

    2013-01-01

    Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis. PMID:24039936

  2. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data frommore » the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.« less

  3. Anomaly detection of turbopump vibration in Space Shuttle Main Engine using statistics and neural networks

    NASA Technical Reports Server (NTRS)

    Lo, C. F.; Wu, K.; Whitehead, B. A.

    1993-01-01

    The statistical and neural networks methods have been applied to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. The anomalies are detected based on the amplitude of peaks of fundamental and harmonic frequencies in the power spectral density. These data are reduced to the proper format from sensor data measured by strain gauges and accelerometers. Both methods are feasible to detect the vibration anomalies. The statistical method requires sufficient data points to establish a reasonable statistical distribution data bank. This method is applicable for on-line operation. The neural networks method also needs to have enough data basis to train the neural networks. The testing procedure can be utilized at any time so long as the characteristics of components remain unchanged.

  4. Inappropriate Fiddling with Statistical Analyses to Obtain a Desirable P-value: Tests to Detect its Presence in Published Literature

    PubMed Central

    Gadbury, Gary L.; Allison, David B.

    2012-01-01

    Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a “near significant p-value” to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called “fiddling”) in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000. PMID:23056287

  5. Inappropriate fiddling with statistical analyses to obtain a desirable p-value: tests to detect its presence in published literature.

    PubMed

    Gadbury, Gary L; Allison, David B

    2012-01-01

    Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a "near significant p-value" to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called "fiddling") in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000.

  6. Clinical relevance vs. statistical significance: Using neck outcomes in patients with temporomandibular disorders as an example.

    PubMed

    Armijo-Olivo, Susan; Warren, Sharon; Fuentes, Jorge; Magee, David J

    2011-12-01

    Statistical significance has been used extensively to evaluate the results of research studies. Nevertheless, it offers only limited information to clinicians. The assessment of clinical relevance can facilitate the interpretation of the research results into clinical practice. The objective of this study was to explore different methods to evaluate the clinical relevance of the results using a cross-sectional study as an example comparing different neck outcomes between subjects with temporomandibular disorders and healthy controls. Subjects were compared for head and cervical posture, maximal cervical muscle strength, endurance of the cervical flexor and extensor muscles, and electromyographic activity of the cervical flexor muscles during the CranioCervical Flexion Test (CCFT). The evaluation of clinical relevance of the results was performed based on the effect size (ES), minimal important difference (MID), and clinical judgement. The results of this study show that it is possible to have statistical significance without having clinical relevance, to have both statistical significance and clinical relevance, to have clinical relevance without having statistical significance, or to have neither statistical significance nor clinical relevance. The evaluation of clinical relevance in clinical research is crucial to simplify the transfer of knowledge from research into practice. Clinical researchers should present the clinical relevance of their results. Copyright © 2011 Elsevier Ltd. All rights reserved.

  7. Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Solaimani, Mohiuddin; Iftekhar, Mohammed; Khan, Latifur

    Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. Asmore » a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.« less

  8. Robust Statistical Detection of Power-Law Cross-Correlation.

    PubMed

    Blythe, Duncan A J; Nikulin, Vadim V; Müller, Klaus-Robert

    2016-06-02

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.

  9. Robust Statistical Detection of Power-Law Cross-Correlation

    PubMed Central

    Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert

    2016-01-01

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630

  10. Action detection by double hierarchical multi-structure space-time statistical matching model

    NASA Astrophysics Data System (ADS)

    Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang

    2018-03-01

    Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.

  11. Action detection by double hierarchical multi-structure space–time statistical matching model

    NASA Astrophysics Data System (ADS)

    Han, Jing; Zhu, Junwei; Cui, Yiyin; Bai, Lianfa; Yue, Jiang

    2018-06-01

    Aimed at the complex information in videos and low detection efficiency, an actions detection model based on neighboring Gaussian structure and 3D LARK features is put forward. We exploit a double hierarchical multi-structure space-time statistical matching model (DMSM) in temporal action localization. First, a neighboring Gaussian structure is presented to describe the multi-scale structural relationship. Then, a space-time statistical matching method is proposed to achieve two similarity matrices on both large and small scales, which combines double hierarchical structural constraints in model by both the neighboring Gaussian structure and the 3D LARK local structure. Finally, the double hierarchical similarity is fused and analyzed to detect actions. Besides, the multi-scale composite template extends the model application into multi-view. Experimental results of DMSM on the complex visual tracker benchmark data sets and THUMOS 2014 data sets show the promising performance. Compared with other state-of-the-art algorithm, DMSM achieves superior performances.

  12. Identifying Minefields and Verifying Clearance: Adapting Statistical Methods for UXO Target Detection

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Gilbert, Richard O.; O'Brien, Robert F.; Wilson, John E.

    2003-09-01

    It may not be feasible to completely survey large tracts of land suspected of containing minefields. It is desirable to develop a characterization protocol that will confidently identify minefields within these large land tracts if they exist. Naturally, surveying areas of greatest concern and most likely locations would be necessary but will not provide the needed confidence that an unknown minefield had not eluded detection. Once minefields are detected, methods are needed to bound the area that will require detailed mine detection surveys. The US Department of Defense Strategic Environmental Research and Development Program (SERDP) is sponsoring the development ofmore » statistical survey methods and tools for detecting potential UXO targets. These methods may be directly applicable to demining efforts. Statistical methods are employed to determine the optimal geophysical survey transect spacing to have confidence of detecting target areas of a critical size, shape, and anomaly density. Other methods under development determine the proportion of a land area that must be surveyed to confidently conclude that there are no UXO present. Adaptive sampling schemes are also being developed as an approach for bounding the target areas. These methods and tools will be presented and the status of relevant research in this area will be discussed.« less

  13. An Automated Energy Detection Algorithm Based on Morphological and Statistical Processing Techniques

    DTIC Science & Technology

    2018-01-09

    ARL-TR-8272 ● JAN 2018 US Army Research Laboratory An Automated Energy Detection Algorithm Based on Morphological and...is no longer needed. Do not return it to the originator. ARL-TR-8272 ● JAN 2018 US Army Research Laboratory An Automated Energy ...4. TITLE AND SUBTITLE An Automated Energy Detection Algorithm Based on Morphological and Statistical Processing Techniques 5a. CONTRACT NUMBER

  14. Statistical Model Applied to NetFlow for Network Intrusion Detection

    NASA Astrophysics Data System (ADS)

    Proto, André; Alexandre, Leandro A.; Batista, Maira L.; Oliveira, Isabela L.; Cansian, Adriano M.

    The computers and network services became presence guaranteed in several places. These characteristics resulted in the growth of illicit events and therefore the computers and networks security has become an essential point in any computing environment. Many methodologies were created to identify these events; however, with increasing of users and services on the Internet, many difficulties are found in trying to monitor a large network environment. This paper proposes a methodology for events detection in large-scale networks. The proposal approaches the anomaly detection using the NetFlow protocol, statistical methods and monitoring the environment in a best time for the application.

  15. Statistical methods to detect novel genetic variants using publicly available GWAS summary data.

    PubMed

    Guo, Bin; Wu, Baolin

    2018-03-01

    We propose statistical methods to detect novel genetic variants using only genome-wide association studies (GWAS) summary data without access to raw genotype and phenotype data. With more and more summary data being posted for public access in the post GWAS era, the proposed methods are practically very useful to identify additional interesting genetic variants and shed lights on the underlying disease mechanism. We illustrate the utility of our proposed methods with application to GWAS meta-analysis results of fasting glucose from the international MAGIC consortium. We found several novel genome-wide significant loci that are worth further study. Copyright © 2018 Elsevier Ltd. All rights reserved.

  16. Statistical power for detecting trends with applications to seabird monitoring

    USGS Publications Warehouse

    Hatch, Shyla A.

    2003-01-01

    Power analysis is helpful in defining goals for ecological monitoring and evaluating the performance of ongoing efforts. I examined detection standards proposed for population monitoring of seabirds using two programs (MONITOR and TRENDS) specially designed for power analysis of trend data. Neither program models within- and among-years components of variance explicitly and independently, thus an error term that incorporates both components is an essential input. Residual variation in seabird counts consisted of day-to-day variation within years and unexplained variation among years in approximately equal parts. The appropriate measure of error for power analysis is the standard error of estimation (S.E.est) from a regression of annual means against year. Replicate counts within years are helpful in minimizing S.E.est but should not be treated as independent samples for estimating power to detect trends. Other issues include a choice of assumptions about variance structure and selection of an exponential or linear model of population change. Seabird count data are characterized by strong correlations between S.D. and mean, thus a constant CV model is appropriate for power calculations. Time series were fit about equally well with exponential or linear models, but log transformation ensures equal variances over time, a basic assumption of regression analysis. Using sample data from seabird monitoring in Alaska, I computed the number of years required (with annual censusing) to detect trends of -1.4% per year (50% decline in 50 years) and -2.7% per year (50% decline in 25 years). At ??=0.05 and a desired power of 0.9, estimated study intervals ranged from 11 to 69 years depending on species, trend, software, and study design. Power to detect a negative trend of 6.7% per year (50% decline in 10 years) is suggested as an alternative standard for seabird monitoring that achieves a reasonable match between statistical and biological significance.

  17. StegoWall: blind statistical detection of hidden data

    NASA Astrophysics Data System (ADS)

    Voloshynovskiy, Sviatoslav V.; Herrigel, Alexander; Rytsar, Yuri B.; Pun, Thierry

    2002-04-01

    Novel functional possibilities, provided by recent data hiding technologies, carry out the danger of uncontrolled (unauthorized) and unlimited information exchange that might be used by people with unfriendly interests. The multimedia industry as well as the research community recognize the urgent necessity for network security and copyright protection, or rather the lack of adequate law for digital multimedia protection. This paper advocates the need for detecting hidden data in digital and analog media as well as in electronic transmissions, and for attempting to identify the underlying hidden data. Solving this problem calls for the development of an architecture for blind stochastic hidden data detection in order to prevent unauthorized data exchange. The proposed architecture is called StegoWall; its key aspects are the solid investigation, the deep understanding, and the prediction of possible tendencies in the development of advanced data hiding technologies. The basic idea of our complex approach is to exploit all information about hidden data statistics to perform its detection based on a stochastic framework. The StegoWall system will be used for four main applications: robust watermarking, secret communications, integrity control and tamper proofing, and internet/network security.

  18. Detecting rater bias using a person-fit statistic: a Monte Carlo simulation study.

    PubMed

    Aubin, André-Sébastien; St-Onge, Christina; Renaud, Jean-Sébastien

    2018-04-01

    With the Standards voicing concern for the appropriateness of response processes, we need to explore strategies that would allow us to identify inappropriate rater response processes. Although certain statistics can be used to help detect rater bias, their use is complicated by either a lack of data about their actual power to detect rater bias or the difficulty related to their application in the context of health professions education. This exploratory study aimed to establish the worthiness of pursuing the use of l z to detect rater bias. We conducted a Monte Carlo simulation study to investigate the power of a specific detection statistic, that is: the standardized likelihood l z person-fit statistics (PFS). Our primary outcome was the detection rate of biased raters, namely: raters whom we manipulated into being either stringent (giving lower scores) or lenient (giving higher scores), using the l z statistic while controlling for the number of biased raters in a sample (6 levels) and the rate of bias per rater (6 levels). Overall, stringent raters (M = 0.84, SD = 0.23) were easier to detect than lenient raters (M = 0.31, SD = 0.28). More biased raters were easier to detect then less biased raters (60% bias: 62, SD = 0.37; 10% bias: 43, SD = 0.36). The PFS l z seems to offer an interesting potential to identify biased raters. We observed detection rates as high as 90% for stringent raters, for whom we manipulated more than half their checklist. Although we observed very interesting results, we cannot generalize these results to the use of PFS with estimated item/station parameters or real data. Such studies should be conducted to assess the feasibility of using PFS to identify rater bias.

  19. A new statistical approach to climate change detection and attribution

    NASA Astrophysics Data System (ADS)

    Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

    2017-01-01

    We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).

  20. Single- and multiple-pulse noncoherent detection statistics associated with partially developed speckle.

    PubMed

    Osche, G R

    2000-08-20

    Single- and multiple-pulse detection statistics are presented for aperture-averaged direct detection optical receivers operating against partially developed speckle fields. A partially developed speckle field arises when the probability density function of the received intensity does not follow negative exponential statistics. The case of interest here is the target surface that exhibits diffuse as well as specular components in the scattered radiation. An approximate expression is derived for the integrated intensity at the aperture, which leads to single- and multiple-pulse discrete probability density functions for the case of a Poisson signal in Poisson noise with an additive coherent component. In the absence of noise, the single-pulse discrete density function is shown to reduce to a generalized negative binomial distribution. The radar concept of integration loss is discussed in the context of direct detection optical systems where it is shown that, given an appropriate set of system parameters, multiple-pulse processing can be more efficient than single-pulse processing over a finite range of the integration parameter n.

  1. The fragility of statistically significant findings from randomized trials in head and neck surgery.

    PubMed

    Noel, Christopher W; McMullen, Caitlin; Yao, Christopher; Monteiro, Eric; Goldstein, David P; Eskander, Antoine; de Almeida, John R

    2018-04-23

    The Fragility Index (FI) is a novel tool for evaluating the robustness of statistically significant findings in a randomized control trial (RCT). It measures the number of events upon which statistical significance depends. We sought to calculate the FI scores for RCTs in the head and neck cancer literature where surgery was a primary intervention. Potential articles were identified in PubMed (MEDLINE), Embase, and Cochrane without publication date restrictions. Two reviewers independently screened eligible RCTs reporting at least one dichotomous and statistically significant outcome. The data from each trial were extracted and the FI scores were calculated. Associations between trial characteristics and FI were determined. In total, 27 articles were identified. The median sample size was 67.5 (interquartile range [IQR] = 42-143) and the median number of events per trial was 8 (IQR = 2.25-18.25). The median FI score was 1 (IQR = 0-2.5), meaning that changing one patient from a nonevent to an event in the treatment arm would change the result to a statistically nonsignificant result, or P > .05. The FI score was less than the number of patients lost to follow-up in 71% of cases. The FI score was found to be moderately correlated with P value (ρ = -0.52, P = .007) and with journal impact factor (ρ = 0.49, P = .009) on univariable analysis. On multivariable analysis, only the P value was found to be a predictor of FI score (P = .001). Randomized trials in the head and neck cancer literature where surgery is a primary modality are relatively nonrobust statistically with low FI scores. Laryngoscope, 2018. © 2018 The American Laryngological, Rhinological and Otological Society, Inc.

  2. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

    2016-02-01

    Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  3. Characterization of binary string statistics for syntactic landmine detection

    NASA Astrophysics Data System (ADS)

    Nasif, Ahmed O.; Mark, Brian L.; Hintz, Kenneth J.

    2011-06-01

    Syntactic landmine detection has been proposed to detect and classify non-metallic landmines using ground penetrating radar (GPR). In this approach, the GPR return is processed to extract characteristic binary strings for landmine and clutter discrimination. In our previous work, we discussed the preprocessing methodology by which the amplitude information of the GPR A-scan signal can be effectively converted into binary strings, which identify the impedance discontinuities in the signal. In this work, we study the statistical properties of the binary string space. In particular, we develop a Markov chain model to characterize the observed bit sequence of the binary strings. The state is defined as the number of consecutive zeros between two ones in the binarized A-scans. Since the strings are highly sparse (the number of zeros is much greater than the number of ones), defining the state this way leads to fewer number of states compared to the case where each bit is defined as a state. The number of total states is further reduced by quantizing the number of consecutive zeros. In order to identify the correct order of the Markov model, the mean square difference (MSD) between the transition matrices of mine strings and non-mine strings is calculated up to order four using training data. The results show that order one or two maximizes this MSD. The specification of the transition probabilities of the chain can be used to compute the likelihood of any given string. Such a model can be used to identify characteristic landmine strings during the training phase. These developments on modeling and characterizing the string statistics can potentially be part of a real-time landmine detection algorithm that identifies landmine and clutter in an adaptive fashion.

  4. Statistical Track-Before-Detect Methods Applied to Faint Optical Observations of Resident Space Objects

    NASA Astrophysics Data System (ADS)

    Fujimoto, K.; Yanagisawa, T.; Uetsuhara, M.

    Automated detection and tracking of faint objects in optical, or bearing-only, sensor imagery is a topic of immense interest in space surveillance. Robust methods in this realm will lead to better space situational awareness (SSA) while reducing the cost of sensors and optics. They are especially relevant in the search for high area-to-mass ratio (HAMR) objects, as their apparent brightness can change significantly over time. A track-before-detect (TBD) approach has been shown to be suitable for faint, low signal-to-noise ratio (SNR) images of resident space objects (RSOs). TBD does not rely upon the extraction of feature points within the image based on some thresholding criteria, but rather directly takes as input the intensity information from the image file. Not only is all of the available information from the image used, TBD avoids the computational intractability of the conventional feature-based line detection (i.e., "string of pearls") approach to track detection for low SNR data. Implementation of TBD rooted in finite set statistics (FISST) theory has been proposed recently by Vo, et al. Compared to other TBD methods applied so far to SSA, such as the stacking method or multi-pass multi-period denoising, the FISST approach is statistically rigorous and has been shown to be more computationally efficient, thus paving the path toward on-line processing. In this paper, we intend to apply a multi-Bernoulli filter to actual CCD imagery of RSOs. The multi-Bernoulli filter can explicitly account for the birth and death of multiple targets in a measurement arc. TBD is achieved via a sequential Monte Carlo implementation. Preliminary results with simulated single-target data indicate that a Bernoulli filter can successfully track and detect objects with measurement SNR as low as 2.4. Although the advent of fast-cadence scientific CMOS sensors have made the automation of faint object detection a realistic goal, it is nonetheless a difficult goal, as measurements

  5. Clinical significance of atypical squamous cells of undetermined significance in detecting preinvasive cervical lesions in post- menopausal Turkish women.

    PubMed

    Tokmak, Aytekin; Guzel, Ali Irfan; Ozgu, Emre; Oz, Murat; Akbay, Serap; Erkaya, Salim; Gungor, Tayfun

    2014-01-01

    To evaluate the clinical significance of atypical squamous cells of undetermined significance (ASCUS) in PAP test in post-menopausal women and compare with reproductive age women. A total of 367 patients who referred to our gynecologic oncology clinic were included to the study between September 2012 and August 2013. Data for 164 post-menopausal (group 1) and 203 pre-menopausal (group 2) women with ASCUS cytology were evaluated retrospectively. Immediate colposcopy and endocervical curettage was performed for both groups and conization for all women with a result suggestive of CIN2-3. Histopathological results and demographic features of patients were compared between the two groups. Mean age of the patients was 54.6±6.5 years in group 1 and 38±6.6 years in group 2. Some 14 (8.5%) of post- menopausal women and 36 (17.7%) of pre-menopausal women were current smokers (p=011). Totals of 38 (23.2%) post-menopausal and 64 (31.5%) pre-menopausal women were assessed for HPV-DNA. High risk HPV was detected in 7 (4.3%) and 21 (10.3%), respectively (p=0.029). Final histopathological results recorded were normal cervix, low grade cervical intra-epithelial neoplasia (CIN 1), and high grade cervical intra-epithelial neoplasia (CIN2-3). In group 1 results were 84.8%, 12.2% and 1.8%, respectively, and in group 2 were 71.9%, 23.2% and 4.9%. There were no cases of micro invasive or invasive cervical carcinoma in either group. Two cases were detected as endometrial carcinoma in the menopausal group (1.2%). In current study we found that preinvasive lesions were statistically significantly higher in pre-menopausal women than post- menopausal women with ASCUS. Cervicitis was more common in menopausal women. Therefore, we think that in case of ASCUS in a post-menopausal woman there is no need for radical management.

  6. Kepler Planet Detection Metrics: Statistical Bootstrap Test

    NASA Technical Reports Server (NTRS)

    Jenkins, Jon M.; Burke, Christopher J.

    2016-01-01

    This document describes the data produced by the Statistical Bootstrap Test over the final three Threshold Crossing Event (TCE) deliveries to NExScI: SOC 9.1 (Q1Q16)1 (Tenenbaum et al. 2014), SOC 9.2 (Q1Q17) aka DR242 (Seader et al. 2015), and SOC 9.3 (Q1Q17) aka DR253 (Twicken et al. 2016). The last few years have seen significant improvements in the SOC science data processing pipeline, leading to higher quality light curves and more sensitive transit searches. The statistical bootstrap analysis results presented here and the numerical results archived at NASAs Exoplanet Science Institute (NExScI) bear witness to these software improvements. This document attempts to introduce and describe the main features and differences between these three data sets as a consequence of the software changes.

  7. Searching for statistically significant regulatory modules.

    PubMed

    Bailey, Timothy L; Noble, William Stafford

    2003-10-01

    The regulatory machinery controlling gene expression is complex, frequently requiring multiple, simultaneous DNA-protein interactions. The rate at which a gene is transcribed may depend upon the presence or absence of a collection of transcription factors bound to the DNA near the gene. Locating transcription factor binding sites in genomic DNA is difficult because the individual sites are small and tend to occur frequently by chance. True binding sites may be identified by their tendency to occur in clusters, sometimes known as regulatory modules. We describe an algorithm for detecting occurrences of regulatory modules in genomic DNA. The algorithm, called mcast, takes as input a DNA database and a collection of binding site motifs that are known to operate in concert. mcast uses a motif-based hidden Markov model with several novel features. The model incorporates motif-specific p-values, thereby allowing scores from motifs of different widths and specificities to be compared directly. The p-value scoring also allows mcast to only accept motif occurrences with significance below a user-specified threshold, while still assigning better scores to motif occurrences with lower p-values. mcast can search long DNA sequences, modeling length distributions between motifs within a regulatory module, but ignoring length distributions between modules. The algorithm produces a list of predicted regulatory modules, ranked by E-value. We validate the algorithm using simulated data as well as real data sets from fruitfly and human. http://meme.sdsc.edu/MCAST/paper

  8. Archival Legacy Investigations of Circumstellar Environments (ALICE): Statistical assessment of point source detections

    NASA Astrophysics Data System (ADS)

    Choquet, Élodie; Pueyo, Laurent; Soummer, Rémi; Perrin, Marshall D.; Hagan, J. Brendan; Gofas-Salas, Elena; Rajan, Abhijith; Aguilar, Jonathan

    2015-09-01

    The ALICE program, for Archival Legacy Investigation of Circumstellar Environment, is currently conducting a virtual survey of about 400 stars, by re-analyzing the HST-NICMOS coronagraphic archive with advanced post-processing techniques. We present here the strategy that we adopted to identify detections and potential candidates for follow-up observations, and we give a preliminary overview of our detections. We present a statistical analysis conducted to evaluate the confidence level on these detection and the completeness of our candidate search.

  9. Quantum-statistical theory of microwave detection using superconducting tunnel junctions

    NASA Astrophysics Data System (ADS)

    Deviatov, I. A.; Kuzmin, L. S.; Likharev, K. K.; Migulin, V. V.; Zorin, A. B.

    1986-09-01

    A quantum-statistical theory of microwave and millimeter-wave detection using superconducting tunnel junctions is developed, with a rigorous account of quantum, thermal, and shot noise arising from fluctuation sources associated with the junctions, signal source, and matching circuits. The problem of the noise characterization in the quantum sensitivity range is considered and a general noise parameter Theta(N) is introduced. This parameter is shown to be an adequate figure of merit for most receivers of interest while some devices can require a more complex characterization. Analytical expressions and/or numerically calculated plots for Theta(N) are presented for the most promising detection modes including the parametric amplification, heterodyne mixing, and quadratic videodetection, using both the quasiparticle-current and the Cooper-pair-current nonlinearities. Ultimate minimum values of Theta(N) for each detection mode are compared and found to be in agreement with limitations imposed by the quantum-mechanical uncertainty principle.

  10. Detection of changes of high-frequency activity by statistical time-frequency analysis in epileptic spikes

    PubMed Central

    Kobayashi, Katsuhiro; Jacobs, Julia; Gotman, Jean

    2013-01-01

    Objective A novel type of statistical time-frequency analysis was developed to elucidate changes of high-frequency EEG activity associated with epileptic spikes. Methods The method uses the Gabor Transform and detects changes of power in comparison to background activity using t-statistics that are controlled by the false discovery rate (FDR) to correct type I error of multiple testing. The analysis was applied to EEGs recorded at 2000 Hz from three patients with mesial temporal lobe epilepsy. Results Spike-related increase of high-frequency oscillations (HFOs) was clearly shown in the FDR-controlled t-spectra: it was most dramatic in spikes recorded from the hippocampus when the hippocampus was the seizure onset zone (SOZ). Depression of fast activity was observed immediately after the spikes, especially consistently in the discharges from the hippocampal SOZ. It corresponded to the slow wave part in case of spike-and-slow-wave complexes, but it was noted even in spikes without apparent slow waves. In one patient, a gradual increase of power above 200 Hz preceded spikes. Conclusions FDR-controlled t-spectra clearly detected the spike-related changes of HFOs that were unclear in standard power spectra. Significance We developed a promising tool to study the HFOs that may be closely linked to the pathophysiology of epileptogenesis. PMID:19394892

  11. Statistical methods for detecting and comparing periodic data and their application to the nycthemeral rhythm of bodily harm: A population based study

    PubMed Central

    2010-01-01

    Background Animals, including humans, exhibit a variety of biological rhythms. This article describes a method for the detection and simultaneous comparison of multiple nycthemeral rhythms. Methods A statistical method for detecting periodic patterns in time-related data via harmonic regression is described. The method is particularly capable of detecting nycthemeral rhythms in medical data. Additionally a method for simultaneously comparing two or more periodic patterns is described, which derives from the analysis of variance (ANOVA). This method statistically confirms or rejects equality of periodic patterns. Mathematical descriptions of the detecting method and the comparing method are displayed. Results Nycthemeral rhythms of incidents of bodily harm in Middle Franconia are analyzed in order to demonstrate both methods. Every day of the week showed a significant nycthemeral rhythm of bodily harm. These seven patterns of the week were compared to each other revealing only two different nycthemeral rhythms, one for Friday and Saturday and one for the other weekdays. PMID:21059197

  12. Reporting of statistically significant results at ClinicalTrials.gov for completed superiority randomized controlled trials.

    PubMed

    Dechartres, Agnes; Bond, Elizabeth G; Scheer, Jordan; Riveros, Carolina; Atal, Ignacio; Ravaud, Philippe

    2016-11-30

    Publication bias and other reporting bias have been well documented for journal articles, but no study has evaluated the nature of results posted at ClinicalTrials.gov. We aimed to assess how many randomized controlled trials (RCTs) with results posted at ClinicalTrials.gov report statistically significant results and whether the proportion of trials with significant results differs when no treatment effect estimate or p-value is posted. We searched ClinicalTrials.gov in June 2015 for all studies with results posted. We included completed RCTs with a superiority hypothesis and considered results for the first primary outcome with results posted. For each trial, we assessed whether a treatment effect estimate and/or p-value was reported at ClinicalTrials.gov and if yes, whether results were statistically significant. If no treatment effect estimate or p-value was reported, we calculated the treatment effect and corresponding p-value using results per arm posted at ClinicalTrials.gov when sufficient data were reported. From the 17,536 studies with results posted at ClinicalTrials.gov, we identified 2823 completed phase 3 or 4 randomized trials with a superiority hypothesis. Of these, 1400 (50%) reported a treatment effect estimate and/or p-value. Results were statistically significant for 844 trials (60%), with a median p-value of 0.01 (Q1-Q3: 0.001-0.26). For the 1423 trials with no treatment effect estimate or p-value posted, we could calculate the treatment effect and corresponding p-value using results reported per arm for 929 (65%). For 494 trials (35%), p-values could not be calculated mainly because of insufficient reporting, censored data, or repeated measurements over time. For the 929 trials we could calculate p-values, we found statistically significant results for 342 (37%), with a median p-value of 0.19 (Q1-Q3: 0.005-0.59). Half of the trials with results posted at ClinicalTrials.gov reported a treatment effect estimate and/or p-value, with significant

  13. Statistical significance of hair analysis of clenbuterol to discriminate therapeutic use from contamination.

    PubMed

    Krumbholz, Aniko; Anielski, Patricia; Gfrerer, Lena; Graw, Matthias; Geyer, Hans; Schänzer, Wilhelm; Dvorak, Jiri; Thieme, Detlef

    2014-01-01

    Clenbuterol is a well-established β2-agonist, which is prohibited in sports and strictly regulated for use in the livestock industry. During the last few years clenbuterol-positive results in doping controls and in samples from residents or travellers from a high-risk country were suspected to be related the illegal use of clenbuterol for fattening. A sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) method was developed to detect low clenbuterol residues in hair with a detection limit of 0.02 pg/mg. A sub-therapeutic application study and a field study with volunteers, who have a high risk of contamination, were performed. For the application study, a total dosage of 30 µg clenbuterol was applied to 20 healthy volunteers on 5 subsequent days. One month after the beginning of the application, clenbuterol was detected in the proximal hair segment (0-1 cm) in concentrations between 0.43 and 4.76 pg/mg. For the second part, samples of 66 Mexican soccer players were analyzed. In 89% of these volunteers, clenbuterol was detectable in their hair at concentrations between 0.02 and 1.90 pg/mg. A comparison of both parts showed no statistical difference between sub-therapeutic application and contamination. In contrast, discrimination to a typical abuse of clenbuterol is apparently possible. Due to these findings results of real doping control samples can be evaluated. Copyright © 2014 John Wiley & Sons, Ltd.

  14. Intensive inpatient treatment for bulimia nervosa: Statistical and clinical significance of symptom changes.

    PubMed

    Diedrich, Alice; Schlegl, Sandra; Greetfeld, Martin; Fumi, Markus; Voderholzer, Ulrich

    2018-03-01

    This study examines the statistical and clinical significance of symptom changes during an intensive inpatient treatment program with a strong psychotherapeutic focus for individuals with severe bulimia nervosa. 295 consecutively admitted bulimic patients were administered the Structured Interview for Anorexic and Bulimic Syndromes-Self-Rating (SIAB-S), the Eating Disorder Inventory-2 (EDI-2), the Brief Symptom Inventory (BSI), and the Beck Depression Inventory-II (BDI-II) at treatment intake and discharge. Results indicated statistically significant symptom reductions with large effect sizes regarding severity of binge eating and compensatory behavior (SIAB-S), overall eating disorder symptom severity (EDI-2), overall psychopathology (BSI), and depressive symptom severity (BDI-II) even when controlling for antidepressant medication. The majority of patients showed either reliable (EDI-2: 33.7%, BSI: 34.8%, BDI-II: 18.1%) or even clinically significant symptom changes (EDI-2: 43.2%, BSI: 33.9%, BDI-II: 56.9%). Patients with clinically significant improvement were less distressed at intake and less likely to suffer from a comorbid borderline personality disorder when compared with those who did not improve to a clinically significant extent. Findings indicate that intensive psychotherapeutic inpatient treatment may be effective in about 75% of severely affected bulimic patients. For the remaining non-responding patients, inpatient treatment might be improved through an even stronger focus on the reduction of comorbid borderline personality traits.

  15. The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

    PubMed

    Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

    2014-07-08

    In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital

  16. Visually directed vs. software-based targeted biopsy compared to transperineal template mapping biopsy in the detection of clinically significant prostate cancer.

    PubMed

    Valerio, Massimo; McCartan, Neil; Freeman, Alex; Punwani, Shonit; Emberton, Mark; Ahmed, Hashim U

    2015-10-01

    Targeted biopsy based on cognitive or software magnetic resonance imaging (MRI) to transrectal ultrasound registration seems to increase the detection rate of clinically significant prostate cancer as compared with standard biopsy. However, these strategies have not been directly compared against an accurate test yet. The aim of this study was to obtain pilot data on the diagnostic ability of visually directed targeted biopsy vs. software-based targeted biopsy, considering transperineal template mapping (TPM) biopsy as the reference test. Prospective paired cohort study included 50 consecutive men undergoing TPM with one or more visible targets detected on preoperative multiparametric MRI. Targets were contoured on the Biojet software. Patients initially underwent software-based targeted biopsies, then visually directed targeted biopsies, and finally systematic TPM. The detection rate of clinically significant disease (Gleason score ≥3+4 and/or maximum cancer core length ≥4mm) of one strategy against another was compared by 3×3 contingency tables. Secondary analyses were performed using a less stringent threshold of significance (Gleason score ≥4+3 and/or maximum cancer core length ≥6mm). Median age was 68 (interquartile range: 63-73); median prostate-specific antigen level was 7.9ng/mL (6.4-10.2). A total of 79 targets were detected with a mean of 1.6 targets per patient. Of these, 27 (34%), 28 (35%), and 24 (31%) were scored 3, 4, and 5, respectively. At a patient level, the detection rate was 32 (64%), 34 (68%), and 38 (76%) for visually directed targeted, software-based biopsy, and TPM, respectively. Combining the 2 targeted strategies would have led to detection rate of 39 (78%). At a patient level and at a target level, software-based targeted biopsy found more clinically significant diseases than did visually directed targeted biopsy, although this was not statistically significant (22% vs. 14%, P = 0.48; 51.9% vs. 44.3%, P = 0.24). Secondary

  17. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: I. Theory

    NASA Astrophysics Data System (ADS)

    Ren, W. X.; Lin, Y. Q.; Fang, S. E.

    2011-11-01

    One of the key issues in vibration-based structural health monitoring is to extract the damage-sensitive but environment-insensitive features from sampled dynamic response measurements and to carry out the statistical analysis of these features for structural damage detection. A new damage feature is proposed in this paper by using the system matrices of the forward innovation model based on the covariance-driven stochastic subspace identification of a vibrating system. To overcome the variations of the system matrices, a non-singularity transposition matrix is introduced so that the system matrices are normalized to their standard forms. For reducing the effects of modeling errors, noise and environmental variations on measured structural responses, a statistical pattern recognition paradigm is incorporated into the proposed method. The Mahalanobis and Euclidean distance decision functions of the damage feature vector are adopted by defining a statistics-based damage index. The proposed structural damage detection method is verified against one numerical signal and two numerical beams. It is demonstrated that the proposed statistics-based damage index is sensitive to damage and shows some robustness to the noise and false estimation of the system ranks. The method is capable of locating damage of the beam structures under different types of excitations. The robustness of the proposed damage detection method to the variations in environmental temperature is further validated in a companion paper by a reinforced concrete beam tested in the laboratory and a full-scale arch bridge tested in the field.

  18. The Suzaku View of Highly Ionized Outflows in AGN. 1; Statistical Detection and Global Absorber Properties

    NASA Technical Reports Server (NTRS)

    Gofford, Jason; Reeves, James N.; Tombesi, Francesco; Braito, Valentina; Turner, T. Jane; Miller, Lance; Cappi, Massimo

    2013-01-01

    We present the results of a new spectroscopic study of Fe K-band absorption in active galactic nuclei (AGN). Using data obtained from the Suzaku public archive we have performed a statistically driven blind search for Fe XXV Healpha and/or Fe XXVI Lyalpha absorption lines in a large sample of 51 Type 1.0-1.9 AGN. Through extensive Monte Carlo simulations we find that statistically significant absorption is detected at E greater than or approximately equal to 6.7 keV in 20/51 sources at the P(sub MC) greater than or equal tov 95 per cent level, which corresponds to approximately 40 per cent of the total sample. In all cases, individual absorption lines are detected independently and simultaneously amongst the two (or three) available X-ray imaging spectrometer detectors, which confirms the robustness of the line detections. The most frequently observed outflow phenomenology consists of two discrete absorption troughs corresponding to Fe XXV Healpha and Fe XXVI Lyalpha at a common velocity shift. From xstar fitting the mean column density and ionization parameter for the Fe K absorption components are log (N(sub H) per square centimeter)) is approximately equal to 23 and log (Xi/erg centimeter per second) is approximately equal to 4.5, respectively. Measured outflow velocities span a continuous range from less than1500 kilometers per second up to approximately100 000 kilometers per second, with mean and median values of approximately 0.1 c and approximately 0.056 c, respectively. The results of this work are consistent with those recently obtained using XMM-Newton and independently provides strong evidence for the existence of very highly ionized circumnuclear material in a significant fraction of both radio-quiet and radio-loud AGN in the local universe.

  19. Using the Descriptive Bootstrap to Evaluate Result Replicability (Because Statistical Significance Doesn't)

    ERIC Educational Resources Information Center

    Spinella, Sarah

    2011-01-01

    As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…

  20. Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

    PubMed

    Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

    2013-03-23

    Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.

  1. Improved statistical power with a sparse shape model in detecting an aging effect in the hippocampus and amygdala

    NASA Astrophysics Data System (ADS)

    Chung, Moo K.; Kim, Seung-Goo; Schaefer, Stacey M.; van Reekum, Carien M.; Peschke-Schmitz, Lara; Sutterer, Matthew J.; Davidson, Richard J.

    2014-03-01

    The sparse regression framework has been widely used in medical image processing and analysis. However, it has been rarely used in anatomical studies. We present a sparse shape modeling framework using the Laplace- Beltrami (LB) eigenfunctions of the underlying shape and show its improvement of statistical power. Tradition- ally, the LB-eigenfunctions are used as a basis for intrinsically representing surface shapes as a form of Fourier descriptors. To reduce high frequency noise, only the first few terms are used in the expansion and higher frequency terms are simply thrown away. However, some lower frequency terms may not necessarily contribute significantly in reconstructing the surfaces. Motivated by this idea, we present a LB-based method to filter out only the significant eigenfunctions by imposing a sparse penalty. For dense anatomical data such as deformation fields on a surface mesh, the sparse regression behaves like a smoothing process, which will reduce the error of incorrectly detecting false negatives. Hence the statistical power improves. The sparse shape model is then applied in investigating the influence of age on amygdala and hippocampus shapes in the normal population. The advantage of the LB sparse framework is demonstrated by showing the increased statistical power.

  2. Statistical Significance of Optical Map Alignments

    PubMed Central

    Sarkar, Deepayan; Goldstein, Steve; Schwartz, David C.

    2012-01-01

    Abstract The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities. PMID:22506568

  3. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  4. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.

  5. Detecting temporal change in freshwater fisheries surveys: statistical power and the important linkages between management questions and monitoring objectives

    USGS Publications Warehouse

    Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,

    2016-01-01

    Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.

  6. An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data

    PubMed Central

    Carty, Mark; Zamparo, Lee; Sahin, Merve; González, Alvaro; Pelossof, Raphael; Elemento, Olivier; Leslie, Christina S.

    2017-01-01

    Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts—for example, distance-dependent random polymer ligation and GC content and mappability bias—and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant interactions at the sub-topologically associating domain level, identifying potential structural and regulatory interactions supported by CTCF binding sites, DNase accessibility, and/or active histone marks. CTCF-associated interactions are most strongly enriched in the middle genomic distance range (∼700 kb–1.5 Mb), while interactions involving actively marked DNase accessible elements are enriched both at short (<500 kb) and longer (>1.5 Mb) genomic distances. There is a striking enrichment of longer-range interactions connecting replication-dependent histone genes on chromosome 6, potentially representing the chromatin architecture at the histone locus body. PMID:28513628

  7. Improved Statistical Fault Detection Technique and Application to Biological Phenomena Modeled by S-Systems.

    PubMed

    Mansouri, Majdi; Nounou, Mohamed N; Nounou, Hazem N

    2017-09-01

    In our previous work, we have demonstrated the effectiveness of the linear multiscale principal component analysis (PCA)-based moving window (MW)-generalized likelihood ratio test (GLRT) technique over the classical PCA and multiscale principal component analysis (MSPCA)-based GLRT methods. The developed fault detection algorithm provided optimal properties by maximizing the detection probability for a particular false alarm rate (FAR) with different values of windows, and however, most real systems are nonlinear, which make the linear PCA method not able to tackle the issue of non-linearity to a great extent. Thus, in this paper, first, we apply a nonlinear PCA to obtain an accurate principal component of a set of data and handle a wide range of nonlinearities using the kernel principal component analysis (KPCA) model. The KPCA is among the most popular nonlinear statistical methods. Second, we extend the MW-GLRT technique to one that utilizes exponential weights to residuals in the moving window (instead of equal weightage) as it might be able to further improve fault detection performance by reducing the FAR using exponentially weighed moving average (EWMA). The developed detection method, which is called EWMA-GLRT, provides improved properties, such as smaller missed detection and FARs and smaller average run length. The idea behind the developed EWMA-GLRT is to compute a new GLRT statistic that integrates current and previous data information in a decreasing exponential fashion giving more weight to the more recent data. This provides a more accurate estimation of the GLRT statistic and provides a stronger memory that will enable better decision making with respect to fault detection. Therefore, in this paper, a KPCA-based EWMA-GLRT method is developed and utilized in practice to improve fault detection in biological phenomena modeled by S-systems and to enhance monitoring process mean. The idea behind a KPCA-based EWMA-GLRT fault detection algorithm is to

  8. An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic

    ERIC Educational Resources Information Center

    Maeda, Hotaka; Zhang, Bo

    2017-01-01

    The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…

  9. Detection of Undocumented Changepoints Using Multiple Test Statistics and Composite Reference Series.

    NASA Astrophysics Data System (ADS)

    Menne, Matthew J.; Williams, Claude N., Jr.

    2005-10-01

    An evaluation of three hypothesis test statistics that are commonly used in the detection of undocumented changepoints is described. The goal of the evaluation was to determine whether the use of multiple tests could improve undocumented, artificial changepoint detection skill in climate series. The use of successive hypothesis testing is compared to optimal approaches, both of which are designed for situations in which multiple undocumented changepoints may be present. In addition, the importance of the form of the composite climate reference series is evaluated, particularly with regard to the impact of undocumented changepoints in the various component series that are used to calculate the composite.In a comparison of single test changepoint detection skill, the composite reference series formulation is shown to be less important than the choice of the hypothesis test statistic, provided that the composite is calculated from the serially complete and homogeneous component series. However, each of the evaluated composite series is not equally susceptible to the presence of changepoints in its components, which may be erroneously attributed to the target series. Moreover, a reference formulation that is based on the averaging of the first-difference component series is susceptible to random walks when the composition of the component series changes through time (e.g., values are missing), and its use is, therefore, not recommended. When more than one test is required to reject the null hypothesis of no changepoint, the number of detected changepoints is reduced proportionately less than the number of false alarms in a wide variety of Monte Carlo simulations. Consequently, a consensus of hypothesis tests appears to improve undocumented changepoint detection skill, especially when reference series homogeneity is violated. A consensus of successive hypothesis tests using a semihierarchic splitting algorithm also compares favorably to optimal solutions, even when

  10. Statistical sensor fusion analysis of near-IR polarimetric and thermal imagery for the detection of minelike targets

    NASA Astrophysics Data System (ADS)

    Weisenseel, Robert A.; Karl, William C.; Castanon, David A.; DiMarzio, Charles A.

    1999-02-01

    We present an analysis of statistical model based data-level fusion for near-IR polarimetric and thermal data, particularly for the detection of mines and mine-like targets. Typical detection-level data fusion methods, approaches that fuse detections from individual sensors rather than fusing at the level of the raw data, do not account rationally for the relative reliability of different sensors, nor the redundancy often inherent in multiple sensors. Representative examples of such detection-level techniques include logical AND/OR operations on detections from individual sensors and majority vote methods. In this work, we exploit a statistical data model for the detection of mines and mine-like targets to compare and fuse multiple sensor channels. Our purpose is to quantify the amount of knowledge that each polarimetric or thermal channel supplies to the detection process. With this information, we can make reasonable decisions about the usefulness of each channel. We can use this information to improve the detection process, or we can use it to reduce the number of required channels.

  11. Neural Evidence of Statistical Learning: Efficient Detection of Visual Regularities without Awareness

    ERIC Educational Resources Information Center

    Turk-Browne, Nicholas B.; Scholl, Brian J.; Chun, Marvin M.; Johnson, Marcia K.

    2009-01-01

    Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during…

  12. Statistical methods for detecting periodic fragments in DNA sequence data

    PubMed Central

    2011-01-01

    Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the

  13. Statistical text classifier to detect specific type of medical incidents.

    PubMed

    Wong, Zoie Shui-Yee; Akiyama, Masanori

    2013-01-01

    WHO Patient Safety has put focus to increase the coherence and expressiveness of patient safety classification with the foundation of International Classification for Patient Safety (ICPS). Text classification and statistical approaches has showed to be successful to identifysafety problems in the Aviation industryusing incident text information. It has been challenging to comprehend the taxonomy of medical incidents in a structured manner. Independent reporting mechanisms for patient safety incidents have been established in the UK, Canada, Australia, Japan, Hong Kong etc. This research demonstrates the potential to construct statistical text classifiers to detect specific type of medical incidents using incident text data. An illustrative example for classifying look-alike sound-alike (LASA) medication incidents using structured text from 227 advisories related to medication errors from Global Patient Safety Alerts (GPSA) is shown in this poster presentation. The classifier was built using logistic regression model. ROC curve and the AUC value indicated that this is a satisfactory good model.

  14. Statistical approach for the detection of motion/noise artifacts in Photoplethysmogram.

    PubMed

    Selvaraj, Nandakumar; Mendelson, Yitzhak; Shelley, Kirk H; Silverman, David G; Chon, Ki H

    2011-01-01

    Motion and noise artifacts (MNA) have been a serious obstacle in realizing the potential of Photoplethysmogram (PPG) signals for real-time monitoring of vital signs. We present a statistical approach based on the computation of kurtosis and Shannon Entropy (SE) for the accurate detection of MNA in PPG data. The MNA detection algorithm was verified on multi-site PPG data collected from both laboratory and clinical settings. The accuracy of the fusion of kurtosis and SE metrics for the artifact detection was 99.0%, 94.8% and 93.3% in simultaneously recorded ear, finger and forehead PPGs obtained in a clinical setting, respectively. For laboratory PPG data recorded from a finger with contrived artifacts, the accuracy was 88.8%. It was identified that the measurements from the forehead PPG sensor contained the most artifacts followed by finger and ear. The proposed MNA algorithm can be implemented in real-time as the computation time was 0.14 seconds using Matlab®.

  15. A powerful score-based test statistic for detecting gene-gene co-association.

    PubMed

    Xu, Jing; Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Li, Hongkai; Wu, Xuesen; Xue, Fuzhong; Liu, Yanxun

    2016-01-29

    The genetic variants identified by Genome-wide association study (GWAS) can only account for a small proportion of the total heritability for complex disease. The existence of gene-gene joint effects which contains the main effects and their co-association is one of the possible explanations for the "missing heritability" problems. Gene-gene co-association refers to the extent to which the joint effects of two genes differ from the main effects, not only due to the traditional interaction under nearly independent condition but the correlation between genes. Generally, genes tend to work collaboratively within specific pathway or network contributing to the disease and the specific disease-associated locus will often be highly correlated (e.g. single nucleotide polymorphisms (SNPs) in linkage disequilibrium). Therefore, we proposed a novel score-based statistic (SBS) as a gene-based method for detecting gene-gene co-association. Various simulations illustrate that, under different sample sizes, marginal effects of causal SNPs and co-association levels, the proposed SBS has the better performance than other existed methods including single SNP-based and principle component analysis (PCA)-based logistic regression model, the statistics based on canonical correlations (CCU), kernel canonical correlation analysis (KCCU), partial least squares path modeling (PLSPM) and delta-square (δ (2)) statistic. The real data analysis of rheumatoid arthritis (RA) further confirmed its advantages in practice. SBS is a powerful and efficient gene-based method for detecting gene-gene co-association.

  16. Infrared maritime target detection using the high order statistic filtering in fractional Fourier domain

    NASA Astrophysics Data System (ADS)

    Zhou, Anran; Xie, Weixin; Pei, Jihong

    2018-06-01

    Accurate detection of maritime targets in infrared imagery under various sea clutter conditions is always a challenging task. The fractional Fourier transform (FRFT) is the extension of the Fourier transform in the fractional order, and has richer spatial-frequency information. By combining it with the high order statistic filtering, a new ship detection method is proposed. First, the proper range of angle parameter is determined to make it easier for the ship components and background to be separated. Second, a new high order statistic curve (HOSC) at each fractional frequency point is designed. It is proved that maximal peak interval in HOSC reflects the target information, while the points outside the interval reflect the background. And the value of HOSC relative to the ship is much bigger than that to the sea clutter. Then, search the curve's maximal target peak interval and extract the interval by bandpass filtering in fractional Fourier domain. The value outside the peak interval of HOSC decreases rapidly to 0, so the background is effectively suppressed. Finally, the detection result is obtained by the double threshold segmenting and the target region selection method. The results show the proposed method is excellent for maritime targets detection with high clutters.

  17. Signal Waveform Detection with Statistical Automaton for Internet and Web Service Streaming

    PubMed Central

    Liu, Yiming; Huang, Nai-Lun; Zeng, Fufu; Lin, Fang-Ying

    2014-01-01

    In recent years, many approaches have been suggested for Internet and web streaming detection. In this paper, we propose an approach to signal waveform detection for Internet and web streaming, with novel statistical automatons. The system records network connections over a period of time to form a signal waveform and compute suspicious characteristics of the waveform. Network streaming according to these selected waveform features by our newly designed Aho-Corasick (AC) automatons can be classified. We developed two versions, that is, basic AC and advanced AC-histogram waveform automata, and conducted comprehensive experimentation. The results confirm that our approach is feasible and suitable for deployment. PMID:25032231

  18. Ship detection using STFT sea background statistical modeling for large-scale oceansat remote sensing image

    NASA Astrophysics Data System (ADS)

    Wang, Lixia; Pei, Jihong; Xie, Weixin; Liu, Jinyuan

    2018-03-01

    Large-scale oceansat remote sensing images cover a big area sea surface, which fluctuation can be considered as a non-stationary process. Short-Time Fourier Transform (STFT) is a suitable analysis tool for the time varying nonstationary signal. In this paper, a novel ship detection method using 2-D STFT sea background statistical modeling for large-scale oceansat remote sensing images is proposed. First, the paper divides the large-scale oceansat remote sensing image into small sub-blocks, and 2-D STFT is applied to each sub-block individually. Second, the 2-D STFT spectrum of sub-blocks is studied and the obvious different characteristic between sea background and non-sea background is found. Finally, the statistical model for all valid frequency points in the STFT spectrum of sea background is given, and the ship detection method based on the 2-D STFT spectrum modeling is proposed. The experimental result shows that the proposed algorithm can detect ship targets with high recall rate and low missing rate.

  19. Propensity score to detect baseline imbalance in cluster randomized trials: the role of the c-statistic.

    PubMed

    Leyrat, Clémence; Caille, Agnès; Foucher, Yohann; Giraudeau, Bruno

    2016-01-22

    Despite randomization, baseline imbalance and confounding bias may occur in cluster randomized trials (CRTs). Covariate imbalance may jeopardize the validity of statistical inferences if they occur on prognostic factors. Thus, the diagnosis of a such imbalance is essential to adjust statistical analysis if required. We developed a tool based on the c-statistic of the propensity score (PS) model to detect global baseline covariate imbalance in CRTs and assess the risk of confounding bias. We performed a simulation study to assess the performance of the proposed tool and applied this method to analyze the data from 2 published CRTs. The proposed method had good performance for large sample sizes (n =500 per arm) and when the number of unbalanced covariates was not too small as compared with the total number of baseline covariates (≥40% of unbalanced covariates). We also provide a strategy for pre selection of the covariates needed to be included in the PS model to enhance imbalance detection. The proposed tool could be useful in deciding whether covariate adjustment is required before performing statistical analyses of CRTs.

  20. Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation

    NASA Technical Reports Server (NTRS)

    Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.

    2016-01-01

    A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.

  1. A spatial scan statistic for multiple clusters.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2011-10-01

    Spatial scan statistics are commonly used for geographical disease surveillance and cluster detection. While there are multiple clusters coexisting in the study area, they become difficult to detect because of clusters' shadowing effect to each other. The recently proposed sequential method showed its better power for detecting the second weaker cluster, but did not improve the ability of detecting the first stronger cluster which is more important than the second one. We propose a new extension of the spatial scan statistic which could be used to detect multiple clusters. Through constructing two or more clusters in the alternative hypothesis, our proposed method accounts for other coexisting clusters in the detecting and evaluating process. The performance of the proposed method is compared to the sequential method through an intensive simulation study, in which our proposed method shows better power in terms of both rejecting the null hypothesis and accurately detecting the coexisting clusters. In the real study of hand-foot-mouth disease data in Pingdu city, a true cluster town is successfully detected by our proposed method, which cannot be evaluated to be statistically significant by the standard method due to another cluster's shadowing effect. Copyright © 2011 Elsevier Inc. All rights reserved.

  2. [Significance of bacteria detection with filter paper method on diagnosis of diabetic foot wound infection].

    PubMed

    Zou, X H; Zhu, Y P; Ren, G Q; Li, G C; Zhang, J; Zou, L J; Feng, Z B; Li, B H

    2017-02-20

    Objective: To evaluate the significance of bacteria detection with filter paper method on diagnosis of diabetic foot wound infection. Methods: Eighteen patients with diabetic foot ulcer conforming to the study criteria were hospitalized in Liyuan Hospital Affiliated to Tongji Medical College of Huazhong University of Science and Technology from July 2014 to July 2015. Diabetic foot ulcer wounds were classified according to the University of Texas diabetic foot classification (hereinafter referred to as Texas grade) system, and general condition of patients with wounds in different Texas grade was compared. Exudate and tissue of wounds were obtained, and filter paper method and biopsy method were adopted to detect the bacteria of wounds of patients respectively. Filter paper method was regarded as the evaluation method, and biopsy method was regarded as the control method. The relevance, difference, and consistency of the detection results of two methods were tested. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of filter paper method in bacteria detection were calculated. Receiver operating characteristic (ROC) curve was drawn based on the specificity and sensitivity of filter paper method in bacteria detection of 18 patients to predict the detection effect of the method. Data were processed with one-way analysis of variance and Fisher's exact test. In patients tested positive for bacteria by biopsy method, the correlation between bacteria number detected by biopsy method and that by filter paper method was analyzed with Pearson correlation analysis. Results: (1) There were no statistically significant differences among patients with wounds in Texas grade 1, 2, and 3 in age, duration of diabetes, duration of wound, wound area, ankle brachial index, glycosylated hemoglobin, fasting blood sugar, blood platelet count, erythrocyte sedimentation rate, C-reactive protein, aspartate aminotransferase, serum creatinine, and

  3. Detection of ground fog in mountainous areas from MODIS (Collection 051) daytime data using a statistical approach

    NASA Astrophysics Data System (ADS)

    Schulz, Hans Martin; Thies, Boris; Chang, Shih-Chieh; Bendix, Jörg

    2016-03-01

    The mountain cloud forest of Taiwan can be delimited from other forest types using a map of the ground fog frequency. In order to create such a frequency map from remotely sensed data, an algorithm able to detect ground fog is necessary. Common techniques for ground fog detection based on weather satellite data cannot be applied to fog occurrences in Taiwan as they rely on several assumptions regarding cloud properties. Therefore a new statistical method for the detection of ground fog in mountainous terrain from MODIS Collection 051 data is presented. Due to the sharpening of input data using MODIS bands 1 and 2, the method provides fog masks in a resolution of 250 m per pixel. The new technique is based on negative correlations between optical thickness and terrain height that can be observed if a cloud that is relatively plane-parallel is truncated by the terrain. A validation of the new technique using camera data has shown that the quality of fog detection is comparable to that of another modern fog detection scheme developed and validated for the temperate zones. The method is particularly applicable to optically thinner water clouds. Beyond a cloud optical thickness of ≈ 40, classification errors significantly increase.

  4. How to get statistically significant effects in any ERP experiment (and why you shouldn't).

    PubMed

    Luck, Steven J; Gaspelin, Nicholas

    2017-01-01

    ERP experiments generate massive datasets, often containing thousands of values for each participant, even after averaging. The richness of these datasets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant but bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand-averaged data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multifactor statistical analyses. Reanalyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant but bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. © 2016 Society for Psychophysiological Research.

  5. How to Get Statistically Significant Effects in Any ERP Experiment (and Why You Shouldn’t)

    PubMed Central

    Luck, Steven J.; Gaspelin, Nicholas

    2016-01-01

    Event-related potential (ERP) experiments generate massive data sets, often containing thousands of values for each participant, even after averaging. The richness of these data sets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant-but-bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand average data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multi-factor statistical analyses. Re-analyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant-but-bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. PMID:28000253

  6. Significant Association of Urinary Toxic Metals and Autism-Related Symptoms—A Nonlinear Statistical Analysis with Cross Validation

    PubMed Central

    Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen

    2017-01-01

    Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate

  7. Statistical significance estimation of a signal within the GooFit framework on GPUs

    NASA Astrophysics Data System (ADS)

    Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis

    2017-03-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  8. Use of power analysis to develop detectable significance criteria for sea urchin toxicity tests

    USGS Publications Warehouse

    Carr, R.S.; Biedenbach, J.M.

    1999-01-01

    When sufficient data are available, the statistical power of a test can be determined using power analysis procedures. The term “detectable significance” has been coined to refer to this criterion based on power analysis and past performance of a test. This power analysis procedure has been performed with sea urchin (Arbacia punctulata) fertilization and embryological development data from sediment porewater toxicity tests. Data from 3100 and 2295 tests for the fertilization and embryological development tests, respectively, were used to calculate the criteria and regression equations describing the power curves. Using Dunnett's test, a minimum significant difference (MSD) (β = 0.05) of 15.5% and 19% for the fertilization test, and 16.4% and 20.6% for the embryological development test, for α ≤ 0.05 and α ≤ 0.01, respectively, were determined. The use of this second criterion reduces type I (false positive) errors and helps to establish a critical level of difference based on the past performance of the test.

  9. The statistical power to detect cross-scale interactions at macroscales

    USGS Publications Warehouse

    Wagner, Tyler; Fergus, C. Emi; Stow, Craig A.; Cheruvelil, Kendra S.; Soranno, Patricia A.

    2016-01-01

    Macroscale studies of ecological phenomena are increasingly common because stressors such as climate and land-use change operate at large spatial and temporal scales. Cross-scale interactions (CSIs), where ecological processes operating at one spatial or temporal scale interact with processes operating at another scale, have been documented in a variety of ecosystems and contribute to complex system dynamics. However, studies investigating CSIs are often dependent on compiling multiple data sets from different sources to create multithematic, multiscaled data sets, which results in structurally complex, and sometimes incomplete data sets. The statistical power to detect CSIs needs to be evaluated because of their importance and the challenge of quantifying CSIs using data sets with complex structures and missing observations. We studied this problem using a spatially hierarchical model that measures CSIs between regional agriculture and its effects on the relationship between lake nutrients and lake productivity. We used an existing large multithematic, multiscaled database, LAke multiscaled GeOSpatial, and temporal database (LAGOS), to parameterize the power analysis simulations. We found that the power to detect CSIs was more strongly related to the number of regions in the study rather than the number of lakes nested within each region. CSI power analyses will not only help ecologists design large-scale studies aimed at detecting CSIs, but will also focus attention on CSI effect sizes and the degree to which they are ecologically relevant and detectable with large data sets.

  10. Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses

    PubMed Central

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754

  11. Teaching statistics in biology: using inquiry-based learning to strengthen understanding of statistical analysis in biology laboratory courses.

    PubMed

    Metz, Anneke M

    2008-01-01

    There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.

  12. Probability of Detection (POD) as a statistical model for the validation of qualitative methods.

    PubMed

    Wehling, Paul; LaBudde, Robert A; Brunelle, Sharon L; Nelson, Maria T

    2011-01-01

    A statistical model is presented for use in validation of qualitative methods. This model, termed Probability of Detection (POD), harmonizes the statistical concepts and parameters between quantitative and qualitative method validation. POD characterizes method response with respect to concentration as a continuous variable. The POD model provides a tool for graphical representation of response curves for qualitative methods. In addition, the model allows comparisons between candidate and reference methods, and provides calculations of repeatability, reproducibility, and laboratory effects from collaborative study data. Single laboratory study and collaborative study examples are given.

  13. A novel chi-square statistic for detecting group differences between pathways in systems epidemiology.

    PubMed

    Yuan, Zhongshang; Ji, Jiadong; Zhang, Tao; Liu, Yi; Zhang, Xiaoshuai; Chen, Wei; Xue, Fuzhong

    2016-12-20

    Traditional epidemiology often pays more attention to the identification of a single factor rather than to the pathway that is related to a disease, and therefore, it is difficult to explore the disease mechanism. Systems epidemiology aims to integrate putative lifestyle exposures and biomarkers extracted from multiple omics platforms to offer new insights into the pathway mechanisms that underlie disease at the human population level. One key but inadequately addressed question is how to develop powerful statistics to identify whether one candidate pathway is associated with a disease. Bearing in mind that a pathway difference can result from not only changes in the nodes but also changes in the edges, we propose a novel statistic for detecting group differences between pathways, which in principle, captures the nodes changes and edge changes, as well as simultaneously accounting for the pathway structure simultaneously. The proposed test has been proven to follow the chi-square distribution, and various simulations have shown it has better performance than other existing methods. Integrating genome-wide DNA methylation data, we analyzed one real data set from the Bogalusa cohort study and significantly identified a potential pathway, Smoking → SOCS3 → PIK3R1, which was strongly associated with abdominal obesity. The proposed test was powerful and efficient at identifying pathway differences between two groups, and it can be extended to other disciplines that involve statistical comparisons between pathways. The source code in R is available on our website. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  14. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    PubMed

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  15. Statistical Methods for Generalized Linear Models with Covariates Subject to Detection Limits.

    PubMed

    Bernhardt, Paul W; Wang, Huixia J; Zhang, Daowen

    2015-05-01

    Censored observations are a common occurrence in biomedical data sets. Although a large amount of research has been devoted to estimation and inference for data with censored responses, very little research has focused on proper statistical procedures when predictors are censored. In this paper, we consider statistical methods for dealing with multiple predictors subject to detection limits within the context of generalized linear models. We investigate and adapt several conventional methods and develop a new multiple imputation approach for analyzing data sets with predictors censored due to detection limits. We establish the consistency and asymptotic normality of the proposed multiple imputation estimator and suggest a computationally simple and consistent variance estimator. We also demonstrate that the conditional mean imputation method often leads to inconsistent estimates in generalized linear models, while several other methods are either computationally intensive or lead to parameter estimates that are biased or more variable compared to the proposed multiple imputation estimator. In an extensive simulation study, we assess the bias and variability of different approaches within the context of a logistic regression model and compare variance estimation methods for the proposed multiple imputation estimator. Lastly, we apply several methods to analyze the data set from a recently-conducted GenIMS study.

  16. A statistical approach to bioclimatic trend detection in the airborne pollen records of Catalonia (NE Spain)

    NASA Astrophysics Data System (ADS)

    Fernández-Llamazares, Álvaro; Belmonte, Jordina; Delgado, Rosario; De Linares, Concepción

    2014-04-01

    Airborne pollen records are a suitable indicator for the study of climate change. The present work focuses on the role of annual pollen indices for the detection of bioclimatic trends through the analysis of the aerobiological spectra of 11 taxa of great biogeographical relevance in Catalonia over an 18-year period (1994-2011), by means of different parametric and non-parametric statistical methods. Among others, two non-parametric rank-based statistical tests were performed for detecting monotonic trends in time series data of the selected airborne pollen types and we have observed that they have similar power in detecting trends. Except for those cases in which the pollen data can be well-modeled by a normal distribution, it is better to apply non-parametric statistical methods to aerobiological studies. Our results provide a reliable representation of the pollen trends in the region and suggest that greater pollen quantities are being liberated to the atmosphere in the last years, specially by Mediterranean taxa such as Pinus, Total Quercus and Evergreen Quercus, although the trends may differ geographically. Longer aerobiological monitoring periods are required to corroborate these results and survey the increasing levels of certain pollen types that could exert an impact in terms of public health.

  17. Detecting higher spin fields through statistical anisotropy in the CMB and galaxy power spectra

    NASA Astrophysics Data System (ADS)

    Bartolo, Nicola; Kehagias, Alex; Liguori, Michele; Riotto, Antonio; Shiraishi, Maresuke; Tansella, Vittorio

    2018-01-01

    Primordial inflation may represent the most powerful collider to test high-energy physics models. In this paper we study the impact on the inflationary power spectrum of the comoving curvature perturbation in the specific model where massive higher spin fields are rendered effectively massless during a de Sitter epoch through suitable couplings to the inflaton field. In particular, we show that such fields with spin s induce a distinctive statistical anisotropic signal on the power spectrum, in such a way that not only the usual g2 M-statistical anisotropy coefficients, but also higher-order ones (i.e., g4 M,g6 M,…,g(2 s -2 )M and g(2 s )M) are nonvanishing. We examine their imprints in the cosmic microwave background and galaxy power spectra. Our Fisher matrix forecasts indicate that the detectability of gL M depends very weakly on L : all coefficients could be detected in near future if their magnitudes are bigger than about 10-3.

  18. The intriguing evolution of effect sizes in biomedical research over time: smaller but more often statistically significant.

    PubMed

    Monsarrat, Paul; Vergnes, Jean-Noel

    2018-01-01

    In medicine, effect sizes (ESs) allow the effects of independent variables (including risk/protective factors or treatment interventions) on dependent variables (e.g., health outcomes) to be quantified. Given that many public health decisions and health care policies are based on ES estimates, it is important to assess how ESs are used in the biomedical literature and to investigate potential trends in their reporting over time. Through a big data approach, the text mining process automatically extracted 814 120 ESs from 13 322 754 PubMed abstracts. Eligible ESs were risk ratio, odds ratio, and hazard ratio, along with their confidence intervals. Here we show a remarkable decrease of ES values in PubMed abstracts between 1990 and 2015 while, concomitantly, results become more often statistically significant. Medians of ES values have decreased over time for both "risk" and "protective" values. This trend was found in nearly all fields of biomedical research, with the most marked downward tendency in genetics. Over the same period, the proportion of statistically significant ESs increased regularly: among the abstracts with at least 1 ES, 74% were statistically significant in 1990-1995, vs 85% in 2010-2015. whereas decreasing ESs could be an intrinsic evolution in biomedical research, the concomitant increase of statistically significant results is more intriguing. Although it is likely that growing sample sizes in biomedical research could explain these results, another explanation may lie in the "publish or perish" context of scientific research, with the probability of a growing orientation toward sensationalism in research reports. Important provisions must be made to improve the credibility of biomedical research and limit waste of resources. © The Authors 2017. Published by Oxford University Press.

  19. Statistical tests for detecting associations with groups of genetic variants: generalization, evaluation, and implementation

    PubMed Central

    Ferguson, John; Wheeler, William; Fu, YiPing; Prokunina-Olsson, Ludmila; Zhao, Hongyu; Sampson, Joshua

    2013-01-01

    With recent advances in sequencing, genotyping arrays, and imputation, GWAS now aim to identify associations with rare and uncommon genetic variants. Here, we describe and evaluate a class of statistics, generalized score statistics (GSS), that can test for an association between a group of genetic variants and a phenotype. GSS are a simple weighted sum of single-variant statistics and their cross-products. We show that the majority of statistics currently used to detect associations with rare variants are equivalent to choosing a specific set of weights within this framework. We then evaluate the power of various weighting schemes as a function of variant characteristics, such as MAF, the proportion associated with the phenotype, and the direction of effect. Ultimately, we find that two classical tests are robust and powerful, but details are provided as to when other GSS may perform favorably. The software package CRaVe is available at our website (http://dceg.cancer.gov/bb/tools/crave). PMID:23092956

  20. Evaluation and comparison of statistical methods for early temporal detection of outbreaks: A simulation-based study

    PubMed Central

    Le Strat, Yann

    2017-01-01

    The objective of this paper is to evaluate a panel of statistical algorithms for temporal outbreak detection. Based on a large dataset of simulated weekly surveillance time series, we performed a systematic assessment of 21 statistical algorithms, 19 implemented in the R package surveillance and two other methods. We estimated false positive rate (FPR), probability of detection (POD), probability of detection during the first week, sensitivity, specificity, negative and positive predictive values and F1-measure for each detection method. Then, to identify the factors associated with these performance measures, we ran multivariate Poisson regression models adjusted for the characteristics of the simulated time series (trend, seasonality, dispersion, outbreak sizes, etc.). The FPR ranged from 0.7% to 59.9% and the POD from 43.3% to 88.7%. Some methods had a very high specificity, up to 99.4%, but a low sensitivity. Methods with a high sensitivity (up to 79.5%) had a low specificity. All methods had a high negative predictive value, over 94%, while positive predictive values ranged from 6.5% to 68.4%. Multivariate Poisson regression models showed that performance measures were strongly influenced by the characteristics of time series. Past or current outbreak size and duration strongly influenced detection performances. PMID:28715489

  1. Spatial heterogeneity in statistical power to detect changes in lake area in Alaskan National Wildlife Refuges

    USGS Publications Warehouse

    Nicol, Samuel; Roach, Jennifer K.; Griffith, Brad

    2013-01-01

    Over the past 50 years, the number and size of high-latitude lakes have decreased throughout many regions; however, individual lake trends have been variable in direction and magnitude. This spatial heterogeneity in lake change makes statistical detection of temporal trends challenging, particularly in small analysis areas where weak trends are difficult to separate from inter- and intra-annual variability. Factors affecting trend detection include inherent variability, trend magnitude, and sample size. In this paper, we investigated how the statistical power to detect average linear trends in lake size of 0.5, 1.0 and 2.0 %/year was affected by the size of the analysis area and the number of years of monitoring in National Wildlife Refuges in Alaska. We estimated power for large (930–4,560 sq km) study areas within refuges and for 2.6, 12.9, and 25.9 sq km cells nested within study areas over temporal extents of 4–50 years. We found that: (1) trends in study areas could be detected within 5–15 years, (2) trends smaller than 2.0 %/year would take >50 years to detect in cells within study areas, and (3) there was substantial spatial variation in the time required to detect change among cells. Power was particularly low in the smallest cells which typically had the fewest lakes. Because small but ecologically meaningful trends may take decades to detect, early establishment of long-term monitoring will enhance power to detect change. Our results have broad applicability and our method is useful for any study involving change detection among variable spatial and temporal extents.

  2. Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

    DTIC Science & Technology

    2015-03-16

    moderately-sized networks. As a consequence, throughout this effort, a simulated annealing (SA) algorithm will be employed to effectively search the...then increment k by 1 and repeat the search to find z∗3. Once can continue to increment k until W < zδ, at which point the algorithm will stop and...collection of information is estimated to average 1 hour per response, including the time for reviewing instructions, searching existing data sources

  3. Model-based iterative reconstruction and adaptive statistical iterative reconstruction: dose-reduced CT for detecting pancreatic calcification

    PubMed Central

    Katsura, Masaki; Akahane, Masaaki; Sato, Jiro; Matsuda, Izuru; Ohtomo, Kuni

    2016-01-01

    Background Iterative reconstruction methods have attracted attention for reducing radiation doses in computed tomography (CT). Purpose To investigate the detectability of pancreatic calcification using dose-reduced CT reconstructed with model-based iterative construction (MBIR) and adaptive statistical iterative reconstruction (ASIR). Material and Methods This prospective study approved by Institutional Review Board included 85 patients (57 men, 28 women; mean age, 69.9 years; mean body weight, 61.2 kg). Unenhanced CT was performed three times with different radiation doses (reference-dose CT [RDCT], low-dose CT [LDCT], ultralow-dose CT [ULDCT]). From RDCT, LDCT, and ULDCT, images were reconstructed with filtered-back projection (R-FBP, used for establishing reference standard), ASIR (L-ASIR), and MBIR and ASIR (UL-MBIR and UL-ASIR), respectively. A lesion (pancreatic calcification) detection test was performed by two blinded radiologists with a five-point certainty level scale. Results Dose-length products of RDCT, LDCT, and ULDCT were 410, 97, and 36 mGy-cm, respectively. Nine patients had pancreatic calcification. The sensitivity for detecting pancreatic calcification with UL-MBIR was high (0.67–0.89) compared to L-ASIR or UL-ASIR (0.11–0.44), and a significant difference was seen between UL-MBIR and UL-ASIR for one reader (P = 0.014). The area under the receiver-operating characteristic curve for UL-MBIR (0.818–0.860) was comparable to that for L-ASIR (0.696–0.844). The specificity was lower with UL-MBIR (0.79–0.92) than with L-ASIR or UL-ASIR (0.96–0.99), and a significant difference was seen for one reader (P < 0.01). Conclusion In UL-MBIR, pancreatic calcification can be detected with high sensitivity, however, we should pay attention to the slightly lower specificity. PMID:27110389

  4. Model-based iterative reconstruction and adaptive statistical iterative reconstruction: dose-reduced CT for detecting pancreatic calcification.

    PubMed

    Yasaka, Koichiro; Katsura, Masaki; Akahane, Masaaki; Sato, Jiro; Matsuda, Izuru; Ohtomo, Kuni

    2016-01-01

    Iterative reconstruction methods have attracted attention for reducing radiation doses in computed tomography (CT). To investigate the detectability of pancreatic calcification using dose-reduced CT reconstructed with model-based iterative construction (MBIR) and adaptive statistical iterative reconstruction (ASIR). This prospective study approved by Institutional Review Board included 85 patients (57 men, 28 women; mean age, 69.9 years; mean body weight, 61.2 kg). Unenhanced CT was performed three times with different radiation doses (reference-dose CT [RDCT], low-dose CT [LDCT], ultralow-dose CT [ULDCT]). From RDCT, LDCT, and ULDCT, images were reconstructed with filtered-back projection (R-FBP, used for establishing reference standard), ASIR (L-ASIR), and MBIR and ASIR (UL-MBIR and UL-ASIR), respectively. A lesion (pancreatic calcification) detection test was performed by two blinded radiologists with a five-point certainty level scale. Dose-length products of RDCT, LDCT, and ULDCT were 410, 97, and 36 mGy-cm, respectively. Nine patients had pancreatic calcification. The sensitivity for detecting pancreatic calcification with UL-MBIR was high (0.67-0.89) compared to L-ASIR or UL-ASIR (0.11-0.44), and a significant difference was seen between UL-MBIR and UL-ASIR for one reader (P = 0.014). The area under the receiver-operating characteristic curve for UL-MBIR (0.818-0.860) was comparable to that for L-ASIR (0.696-0.844). The specificity was lower with UL-MBIR (0.79-0.92) than with L-ASIR or UL-ASIR (0.96-0.99), and a significant difference was seen for one reader (P < 0.01). In UL-MBIR, pancreatic calcification can be detected with high sensitivity, however, we should pay attention to the slightly lower specificity.

  5. Data analysis of gravitational-wave signals from spinning neutron stars. III. Detection statistics and computational requirements

    NASA Astrophysics Data System (ADS)

    Jaranowski, Piotr; Królak, Andrzej

    2000-03-01

    We develop the analytic and numerical tools for data analysis of the continuous gravitational-wave signals from spinning neutron stars for ground-based laser interferometric detectors. The statistical data analysis method that we investigate is maximum likelihood detection which for the case of Gaussian noise reduces to matched filtering. We study in detail the statistical properties of the optimum functional that needs to be calculated in order to detect the gravitational-wave signal and estimate its parameters. We find it particularly useful to divide the parameter space into elementary cells such that the values of the optimal functional are statistically independent in different cells. We derive formulas for false alarm and detection probabilities both for the optimal and the suboptimal filters. We assess the computational requirements needed to do the signal search. We compare a number of criteria to build sufficiently accurate templates for our data analysis scheme. We verify the validity of our concepts and formulas by means of the Monte Carlo simulations. We present algorithms by which one can estimate the parameters of the continuous signals accurately. We find, confirming earlier work of other authors, that given a 100 Gflops computational power an all-sky search for observation time of 7 days and directed search for observation time of 120 days are possible whereas an all-sky search for 120 days of observation time is computationally prohibitive.

  6. Tipping points in the arctic: eyeballing or statistical significance?

    PubMed

    Carstensen, Jacob; Weydmann, Agata

    2012-02-01

    Arctic ecosystems have experienced and are projected to experience continued large increases in temperature and declines in sea ice cover. It has been hypothesized that small changes in ecosystem drivers can fundamentally alter ecosystem functioning, and that this might be particularly pronounced for Arctic ecosystems. We present a suite of simple statistical analyses to identify changes in the statistical properties of data, emphasizing that changes in the standard error should be considered in addition to changes in mean properties. The methods are exemplified using sea ice extent, and suggest that the loss rate of sea ice accelerated by factor of ~5 in 1996, as reported in other studies, but increases in random fluctuations, as an early warning signal, were observed already in 1990. We recommend to employ the proposed methods more systematically for analyzing tipping points to document effects of climate change in the Arctic.

  7. CHOBS: Color Histogram of Block Statistics for Automatic Bleeding Detection in Wireless Capsule Endoscopy Video.

    PubMed

    Ghosh, Tonmoy; Fattah, Shaikh Anowarul; Wahid, Khan A

    2018-01-01

    Wireless capsule endoscopy (WCE) is the most advanced technology to visualize whole gastrointestinal (GI) tract in a non-invasive way. But the major disadvantage here, it takes long reviewing time, which is very laborious as continuous manual intervention is necessary. In order to reduce the burden of the clinician, in this paper, an automatic bleeding detection method for WCE video is proposed based on the color histogram of block statistics, namely CHOBS. A single pixel in WCE image may be distorted due to the capsule motion in the GI tract. Instead of considering individual pixel values, a block surrounding to that individual pixel is chosen for extracting local statistical features. By combining local block features of three different color planes of RGB color space, an index value is defined. A color histogram, which is extracted from those index values, provides distinguishable color texture feature. A feature reduction technique utilizing color histogram pattern and principal component analysis is proposed, which can drastically reduce the feature dimension. For bleeding zone detection, blocks are classified using extracted local features that do not incorporate any computational burden for feature extraction. From extensive experimentation on several WCE videos and 2300 images, which are collected from a publicly available database, a very satisfactory bleeding frame and zone detection performance is achieved in comparison to that obtained by some of the existing methods. In the case of bleeding frame detection, the accuracy, sensitivity, and specificity obtained from proposed method are 97.85%, 99.47%, and 99.15%, respectively, and in the case of bleeding zone detection, 95.75% of precision is achieved. The proposed method offers not only low feature dimension but also highly satisfactory bleeding detection performance, which even can effectively detect bleeding frame and zone in a continuous WCE video data.

  8. CHOBS: Color Histogram of Block Statistics for Automatic Bleeding Detection in Wireless Capsule Endoscopy Video

    PubMed Central

    Ghosh, Tonmoy; Wahid, Khan A.

    2018-01-01

    Wireless capsule endoscopy (WCE) is the most advanced technology to visualize whole gastrointestinal (GI) tract in a non-invasive way. But the major disadvantage here, it takes long reviewing time, which is very laborious as continuous manual intervention is necessary. In order to reduce the burden of the clinician, in this paper, an automatic bleeding detection method for WCE video is proposed based on the color histogram of block statistics, namely CHOBS. A single pixel in WCE image may be distorted due to the capsule motion in the GI tract. Instead of considering individual pixel values, a block surrounding to that individual pixel is chosen for extracting local statistical features. By combining local block features of three different color planes of RGB color space, an index value is defined. A color histogram, which is extracted from those index values, provides distinguishable color texture feature. A feature reduction technique utilizing color histogram pattern and principal component analysis is proposed, which can drastically reduce the feature dimension. For bleeding zone detection, blocks are classified using extracted local features that do not incorporate any computational burden for feature extraction. From extensive experimentation on several WCE videos and 2300 images, which are collected from a publicly available database, a very satisfactory bleeding frame and zone detection performance is achieved in comparison to that obtained by some of the existing methods. In the case of bleeding frame detection, the accuracy, sensitivity, and specificity obtained from proposed method are 97.85%, 99.47%, and 99.15%, respectively, and in the case of bleeding zone detection, 95.75% of precision is achieved. The proposed method offers not only low feature dimension but also highly satisfactory bleeding detection performance, which even can effectively detect bleeding frame and zone in a continuous WCE video data. PMID:29468094

  9. A Note on Comparing the Power of Test Statistics at Low Significance Levels.

    PubMed

    Morris, Nathan; Elston, Robert

    2011-01-01

    It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10 -8 , which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.

  10. Frequency and Clinical Significance of Previously Undetected Incidental Findings Detected on Computed Tomography Simulation Scans for Breast Cancer Patients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Nakamura, Naoki, E-mail: naokinak@luke.or.jp; Tsunoda, Hiroko; Takahashi, Osamu

    2012-11-01

    Purpose: To determine the frequency and clinical significance of previously undetected incidental findings found on computed tomography (CT) simulation images for breast cancer patients. Methods and Materials: All CT simulation images were first interpreted prospectively by radiation oncologists and then double-checked by diagnostic radiologists. The official reports of CT simulation images for 881 consecutive postoperative breast cancer patients from 2009 to 2010 were retrospectively reviewed. Potentially important incidental findings (PIIFs) were defined as any previously undetected benign or malignancy-related findings requiring further medical follow-up or investigation. For all patients in whom a PIIF was detected, we reviewed the clinical recordsmore » to determine the clinical significance of the PIIF. If the findings from the additional studies prompted by a PIIF required a change in management, the PIIF was also recorded as a clinically important incidental finding (CIIF). Results: There were a total of 57 (6%) PIIFs. The 57 patients in whom a PIIF was detected were followed for a median of 17 months (range, 3-26). Six cases of CIIFs (0.7% of total) were detected. Of the six CIIFs, three (50%) cases had not been noted by the radiation oncologist until the diagnostic radiologist detected the finding. On multivariate analysis, previous CT examination was an independent predictor for PIIF (p = 0.04). Patients who had not previously received chest CT examinations within 1 year had a statistically significantly higher risk of PIIF than those who had received CT examinations within 6 months (odds ratio, 3.54; 95% confidence interval, 1.32-9.50; p = 0.01). Conclusions: The rate of incidental findings prompting a change in management was low. However, radiation oncologists appear to have some difficulty in detecting incidental findings that require a change in management. Considering cost, it may be reasonable that routine interpretations are given to those who

  11. Statistical Detection of the He II Transverse Proximity Effect: Evidence for Sustained Quasar Activity for >25 Million Years

    NASA Astrophysics Data System (ADS)

    Schmidt, Tobias M.; Worseck, Gabor; Hennawi, Joseph F.; Prochaska, J. Xavier; Crighton, Neil H. M.

    2017-09-01

    The He II transverse proximity effect—enhanced He II {Ly}α transmission in a background sightline caused by the ionizing radiation of a foreground quasar—offers a unique opportunity to probe the morphology of quasar-driven He II reionization. We conduct a comprehensive spectroscopic survey to find z˜ 3 quasars in the foreground of 22 background quasar sightlines with Hubble Space Telescope/COS He II {Ly}α transmission spectra. With our two-tiered survey strategy, consisting of a deep pencil-beam survey and a shallow wide-field survey, we discover 131 new quasars, which we complement with known SDSS/BOSS quasars in our fields. Using a restricted sample of 66 foreground quasars with inferred He II photoionization rates greater than the expected UV background at these redshifts ({{{Γ }}}{QSO}{He {{II}}}> 5× {10}-16 {{{s}}}-1) we perform the first statistical analysis of the He II transverse proximity effect. Our results show qualitative evidence for a large object-to-object variance: among the four foreground quasars with the highest {{{Γ }}}{QSO}{He {{II}}} only one (previously known) quasar is associated with a significant He II transmission spike. We perform a stacking analysis to average down these fluctuations, and detect an excess in the average He II transmission near the foreground quasars at 3σ significance. This statistical evidence for the transverse proximity effect is corroborated by a clear dependence of the signal strength on {{{Γ }}}{QSO}{He {{II}}}. Our detection places a purely geometrical lower limit on the quasar lifetime of {t}{{Q}}> 25 {Myr}. Improved modeling would additionally constrain quasar obscuration and the mean free path of He II-ionizing photons.

  12. Statistical data mining of streaming motion data for fall detection in assistive environments.

    PubMed

    Tasoulis, S K; Doukas, C N; Maglogiannis, I; Plagianakos, V P

    2011-01-01

    The analysis of human motion data is interesting for the purpose of activity recognition or emergency event detection, especially in the case of elderly or disabled people living independently in their homes. Several techniques have been proposed for identifying such distress situations using either motion, audio or video sensors on the monitored subject (wearable sensors) or the surrounding environment. The output of such sensors is data streams that require real time recognition, especially in emergency situations, thus traditional classification approaches may not be applicable for immediate alarm triggering or fall prevention. This paper presents a statistical mining methodology that may be used for the specific problem of real time fall detection. Visual data captured from the user's environment, using overhead cameras along with motion data are collected from accelerometers on the subject's body and are fed to the fall detection system. The paper includes the details of the stream data mining methodology incorporated in the system along with an initial evaluation of the achieved accuracy in detecting falls.

  13. Efficient detection of wound-bed and peripheral skin with statistical colour models.

    PubMed

    Veredas, Francisco J; Mesa, Héctor; Morente, Laura

    2015-04-01

    A pressure ulcer is a clinical pathology of localised damage to the skin and underlying tissue caused by pressure, shear or friction. Reliable diagnosis supported by precise wound evaluation is crucial in order to success on treatment decisions. This paper presents a computer-vision approach to wound-area detection based on statistical colour models. Starting with a training set consisting of 113 real wound images, colour histogram models are created for four different tissue types. Back-projections of colour pixels on those histogram models are used, from a Bayesian perspective, to get an estimate of the posterior probability of a pixel to belong to any of those tissue classes. Performance measures obtained from contingency tables based on a gold standard of segmented images supplied by experts have been used for model selection. The resulting fitted model has been validated on a training set consisting of 322 wound images manually segmented and labelled by expert clinicians. The final fitted segmentation model shows robustness and gives high mean performance rates [(AUC: .9426 (SD .0563); accuracy: .8777 (SD .0799); F-score: 0.7389 (SD .1550); Cohen's kappa: .6585 (SD .1787)] when segmenting significant wound areas that include healing tissues.

  14. Improvement of statistical methods for detecting anomalies in climate and environmental monitoring systems

    NASA Astrophysics Data System (ADS)

    Yakunin, A. G.; Hussein, H. M.

    2018-01-01

    The article shows how the known statistical methods, which are widely used in solving financial problems and a number of other fields of science and technology, can be effectively applied after minor modification for solving such problems in climate and environment monitoring systems, as the detection of anomalies in the form of abrupt changes in signal levels, the occurrence of positive and negative outliers and the violation of the cycle form in periodic processes.

  15. A statistical method for the detection of variants from next-generation resequencing of DNA pools.

    PubMed

    Bansal, Vikas

    2010-06-15

    Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.

  16. Fast microcalcification detection in ultrasound images using image enhancement and threshold adjacency statistics

    NASA Astrophysics Data System (ADS)

    Cho, Baek Hwan; Chang, Chuho; Lee, Jong-Ha; Ko, Eun Young; Seong, Yeong Kyeong; Woo, Kyoung-Gu

    2013-02-01

    The existence of microcalcifications (MCs) is an important marker of malignancy in breast cancer. In spite of the benefits in mass detection for dense breasts, ultrasonography is believed that it might not reliably detect MCs. For computer aided diagnosis systems, however, accurate detection of MCs has the possibility of improving the performance in both Breast Imaging-Reporting and Data System (BI-RADS) lexicon description for calcifications and malignancy classification. We propose a new efficient and effective method for MC detection using image enhancement and threshold adjacency statistics (TAS). The main idea of TAS is to threshold an image and to count the number of white pixels with a given number of adjacent white pixels. Our contribution is to adopt TAS features and apply image enhancement to facilitate MC detection in ultrasound images. We employed fuzzy logic, tophat filter, and texture filter to enhance images for MCs. Using a total of 591 images, the classification accuracy of the proposed method in MC detection showed 82.75%, which is comparable to that of Haralick texture features (81.38%). When combined, the performance was as high as 85.11%. In addition, our method also showed the ability in mass classification when combined with existing features. In conclusion, the proposed method exploiting image enhancement and TAS features has the potential to deal with MC detection in ultrasound images efficiently and extend to the real-time localization and visualization of MCs.

  17. When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values

    PubMed Central

    Baldi, Pierre

    2010-01-01

    As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in large databases. The framework can be applied to different chemical representations and similarity measures but is demonstrated here using the most common binary fingerprints with the Tanimoto similarity measure. After introducing several probabilistic models of fingerprints, including the Conditional Gaussian Uniform model, we show that the distribution of Tanimoto scores can be approximated by the distribution of the ratio of two correlated Normal random variables associated with the corresponding unions and intersections. This remains true also when the distribution of similarity scores is conditioned on the size of the query molecules in order to derive more fine-grained results and improve chemical retrieval. The corresponding extreme value distributions for the maximum scores are approximated by Weibull distributions. From these various distributions and their analytical forms, Z-scores, E-values, and p-values are derived to assess the significance of similarity scores. In addition, the framework allows one to predict also the value of standard chemical retrieval metrics, such as Sensitivity and Specificity at fixed thresholds, or ROC (Receiver Operating Characteristic) curves at multiple thresholds, and to detect outliers in the form of atypical molecules. Numerous and diverse experiments carried in part with large sets of molecules from the ChemDB show remarkable agreement between theory and empirical results. PMID:20540577

  18. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution

    PubMed Central

    Gangnon, Ronald E.

    2011-01-01

    Summary The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, while rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. PMID:21762118

  19. Use of Tests of Statistical Significance and Other Analytic Choices in a School Psychology Journal: Review of Practices and Suggested Alternatives.

    ERIC Educational Resources Information Center

    Snyder, Patricia A.; Thompson, Bruce

    The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…

  20. Online platform for applying space-time scan statistics for prospectively detecting emerging hot spots of dengue fever.

    PubMed

    Chen, Chien-Chou; Teng, Yung-Chu; Lin, Bo-Cheng; Fan, I-Chun; Chan, Ta-Chien

    2016-11-25

    Cases of dengue fever have increased in areas of Southeast Asia in recent years. Taiwan hit a record-high 42,856 cases in 2015, with the majority in southern Tainan and Kaohsiung Cities. Leveraging spatial statistics and geo-visualization techniques, we aim to design an online analytical tool for local public health workers to prospectively identify ongoing hot spots of dengue fever weekly at the village level. A total of 57,516 confirmed cases of dengue fever in 2014 and 2015 were obtained from the Taiwan Centers for Disease Control (TCDC). Incorporating demographic information as covariates with cumulative cases (365 days) in a discrete Poisson model, we iteratively applied space-time scan statistics by SaTScan software to detect the currently active cluster of dengue fever (reported as relative risk) in each village of Tainan and Kaohsiung every week. A village with a relative risk >1 and p value <0.05 was identified as a dengue-epidemic area. Assuming an ongoing transmission might continuously spread for two consecutive weeks, we estimated the sensitivity and specificity for detecting outbreaks by comparing the scan-based classification (dengue-epidemic vs. dengue-free village) with the true cumulative case numbers from the TCDC's surveillance statistics. Among the 1648 villages in Tainan and Kaohsiung, the overall sensitivity for detecting outbreaks increases as case numbers grow in a total of 92 weekly simulations. The specificity for detecting outbreaks behaves inversely, compared to the sensitivity. On average, the mean sensitivity and specificity of 2-week hot spot detection were 0.615 and 0.891 respectively (p value <0.001) for the covariate adjustment model, as the maximum spatial and temporal windows were specified as 50% of the total population at risk and 28 days. Dengue-epidemic villages were visualized and explored in an interactive map. We designed an online analytical tool for front-line public health workers to prospectively detect ongoing

  1. Damages detection in cylindrical metallic specimens by means of statistical baseline models and updated daily temperature profiles

    NASA Astrophysics Data System (ADS)

    Villamizar-Mejia, Rodolfo; Mujica-Delgado, Luis-Eduardo; Ruiz-Ordóñez, Magda-Liliana; Camacho-Navarro, Jhonatan; Moreno-Beltrán, Gustavo

    2017-05-01

    In previous works, damage detection of metallic specimens exposed to temperature changes has been achieved by using a statistical baseline model based on Principal Component Analysis (PCA), piezodiagnostics principle and taking into account temperature effect by augmenting the baseline model or by using several baseline models according to the current temperature. In this paper a new approach is presented, where damage detection is based in a new index that combine Q and T2 statistical indices with current temperature measurements. Experimental tests were achieved in a carbon-steel pipe of 1m length and 1.5 inches diameter, instrumented with piezodevices acting as actuators or sensors. A PCA baseline model was obtained to a temperature of 21º and then T2 and Q statistical indices were obtained for a 24h temperature profile. Also, mass adding at different points of pipe between sensor and actuator was used as damage. By using the combined index the temperature contribution can be separated and a better differentiation of damages respect to undamaged cases can be graphically obtained.

  2. Statistical Methods for Detecting Differentially Abundant Features in Clinical Metagenomic Samples

    PubMed Central

    White, James Robert; Nagarajan, Niranjan; Pop, Mihai

    2009-01-01

    Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them. We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied

  3. Random forest learning of ultrasonic statistical physics and object spaces for lesion detection in 2D sonomammography

    NASA Astrophysics Data System (ADS)

    Sheet, Debdoot; Karamalis, Athanasios; Kraft, Silvan; Noël, Peter B.; Vag, Tibor; Sadhu, Anup; Katouzian, Amin; Navab, Nassir; Chatterjee, Jyotirmoy; Ray, Ajoy K.

    2013-03-01

    Breast cancer is the most common form of cancer in women. Early diagnosis can significantly improve lifeexpectancy and allow different treatment options. Clinicians favor 2D ultrasonography for breast tissue abnormality screening due to high sensitivity and specificity compared to competing technologies. However, inter- and intra-observer variability in visual assessment and reporting of lesions often handicaps its performance. Existing Computer Assisted Diagnosis (CAD) systems though being able to detect solid lesions are often restricted in performance. These restrictions are inability to (1) detect lesion of multiple sizes and shapes, and (2) differentiate between hypo-echoic lesions from their posterior acoustic shadowing. In this work we present a completely automatic system for detection and segmentation of breast lesions in 2D ultrasound images. We employ random forests for learning of tissue specific primal to discriminate breast lesions from surrounding normal tissues. This enables it to detect lesions of multiple shapes and sizes, as well as discriminate between hypo-echoic lesion from associated posterior acoustic shadowing. The primal comprises of (i) multiscale estimated ultrasonic statistical physics and (ii) scale-space characteristics. The random forest learns lesion vs. background primal from a database of 2D ultrasound images with labeled lesions. For segmentation, the posterior probabilities of lesion pixels estimated by the learnt random forest are hard thresholded to provide a random walks segmentation stage with starting seeds. Our method achieves detection with 99.19% accuracy and segmentation with mean contour-to-contour error < 3 pixels on a set of 40 images with 49 lesions.

  4. On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs

    PubMed Central

    Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner

    2017-01-01

    Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771

  5. REANALYSIS OF F-STATISTIC GRAVITATIONAL-WAVE SEARCHES WITH THE HIGHER CRITICISM STATISTIC

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bennett, M. F.; Melatos, A.; Delaigle, A.

    2013-04-01

    We propose a new method of gravitational-wave detection using a modified form of higher criticism, a statistical technique introduced by Donoho and Jin. Higher criticism is designed to detect a group of sparse, weak sources, none of which are strong enough to be reliably estimated or detected individually. We apply higher criticism as a second-pass method to synthetic F-statistic and C-statistic data for a monochromatic periodic source in a binary system and quantify the improvement relative to the first-pass methods. We find that higher criticism on C-statistic data is more sensitive by {approx}6% than the C-statistic alone under optimal conditionsmore » (i.e., binary orbit known exactly) and the relative advantage increases as the error in the orbital parameters increases. Higher criticism is robust even when the source is not monochromatic (e.g., phase-wandering in an accreting system). Applying higher criticism to a phase-wandering source over multiple time intervals gives a {approx}> 30% increase in detectability with few assumptions about the frequency evolution. By contrast, in all-sky searches for unknown periodic sources, which are dominated by the brightest source, second-pass higher criticism does not provide any benefits over a first-pass search.« less

  6. Hotspot detection using space-time scan statistics on children under five years of age in Depok

    NASA Astrophysics Data System (ADS)

    Verdiana, Miranti; Widyaningsih, Yekti

    2017-03-01

    Some problems that affect the health level in Depok is the high malnutrition rates from year to year and the more spread infectious and non-communicable diseases in some areas. Children under five years old is a vulnerable part of population to get the malnutrition and diseases. Based on this reason, it is important to observe the location and time, where and when, malnutrition in Depok happened in high intensity. To obtain the location and time of the hotspots of malnutrition and diseases that attack children under five years old, space-time scan statistics method can be used. Space-time scan statistic is a hotspot detection method, where the area and time of information and time are taken into account simultaneously in detecting the hotspots. This method detects a hotspot with a cylindrical scanning window: the cylindrical pedestal describes the area, and the height of cylinder describe the time. Cylinders formed is a hotspot candidate that may occur, which require testing of hypotheses, whether a cylinder can be summed up as a hotspot. Hotspot detection in this study carried out by forming a combination of several variables. Some combination of variables provides hotspot detection results that tend to be the same, so as to form groups (clusters). In the case of infant health level in Depok city, Beji health care center region in 2011-2012 is a hotspot. According to the combination of the variables used in the detection of hotspots, Beji health care center is most frequently as a hotspot. Hopefully the local government can take the right policy to improve the health level of children under five in the city of Depok.

  7. Automatic detection of health changes using statistical process control techniques on measured transfer times of elderly.

    PubMed

    Baldewijns, Greet; Luca, Stijn; Nagels, William; Vanrumste, Bart; Croonenborghs, Tom

    2015-01-01

    It has been shown that gait speed and transfer times are good measures of functional ability in elderly. However, data currently acquired by systems that measure either gait speed or transfer times in the homes of elderly people require manual reviewing by healthcare workers. This reviewing process is time-consuming. To alleviate this burden, this paper proposes the use of statistical process control methods to automatically detect both positive and negative changes in transfer times. Three SPC techniques: tabular CUSUM, standardized CUSUM and EWMA, known for their ability to detect small shifts in the data, are evaluated on simulated transfer times. This analysis shows that EWMA is the best-suited method with a detection accuracy of 82% and an average detection time of 9.64 days.

  8. STATISTICAL METHOD FOR DETECTION OF A TREND IN ATMOSPHERIC SULFATE

    EPA Science Inventory

    Daily atmospheric concentrations of sulfate collected in northeastern Pennsylvania are regressed against meteorological factors, ozone, and time in order to determine if a significant trend in sulfate can be detected. he data used in this analysis were collected during the Sulfat...

  9. Detecting microsatellites within genomes: significant variation among algorithms.

    PubMed

    Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

    2007-04-18

    Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.

  10. Methods for computational disease surveillance in infection prevention and control: Statistical process control versus Twitter's anomaly and breakout detection algorithms.

    PubMed

    Wiemken, Timothy L; Furmanek, Stephen P; Mattingly, William A; Wright, Marc-Oliver; Persaud, Annuradha K; Guinn, Brian E; Carrico, Ruth M; Arnold, Forest W; Ramirez, Julio A

    2018-02-01

    Although not all health care-associated infections (HAIs) are preventable, reducing HAIs through targeted intervention is key to a successful infection prevention program. To identify areas in need of targeted intervention, robust statistical methods must be used when analyzing surveillance data. The objective of this study was to compare and contrast statistical process control (SPC) charts with Twitter's anomaly and breakout detection algorithms. SPC and anomaly/breakout detection (ABD) charts were created for vancomycin-resistant Enterococcus, Acinetobacter baumannii, catheter-associated urinary tract infection, and central line-associated bloodstream infection data. Both SPC and ABD charts detected similar data points as anomalous/out of control on most charts. The vancomycin-resistant Enterococcus ABD chart detected an extra anomalous point that appeared to be higher than the same time period in prior years. Using a small subset of the central line-associated bloodstream infection data, the ABD chart was able to detect anomalies where the SPC chart was not. SPC charts and ABD charts both performed well, although ABD charts appeared to work better in the context of seasonal variation and autocorrelation. Because they account for common statistical issues in HAI data, ABD charts may be useful for practitioners for analysis of HAI surveillance data. Copyright © 2018 Association for Professionals in Infection Control and Epidemiology, Inc. Published by Elsevier Inc. All rights reserved.

  11. A Third-Generation Adaptive Statistical Iterative Reconstruction Technique: Phantom Study of Image Noise, Spatial Resolution, Lesion Detectability, and Dose Reduction Potential.

    PubMed

    Euler, André; Solomon, Justin; Marin, Daniele; Nelson, Rendon C; Samei, Ehsan

    2018-06-01

    The purpose of this study was to assess image noise, spatial resolution, lesion detectability, and the dose reduction potential of a proprietary third-generation adaptive statistical iterative reconstruction (ASIR-V) technique. A phantom representing five different body sizes (12-37 cm) and a contrast-detail phantom containing lesions of five low-contrast levels (5-20 HU) and three sizes (2-6 mm) were deployed. Both phantoms were scanned on a 256-MDCT scanner at six different radiation doses (1.25-10 mGy). Images were reconstructed with filtered back projection (FBP), ASIR-V with 50% blending with FBP (ASIR-V 50%), and ASIR-V without blending (ASIR-V 100%). In the first phantom, noise properties were assessed by noise power spectrum analysis. Spatial resolution properties were measured by use of task transfer functions for objects of different contrasts. Noise magnitude, noise texture, and resolution were compared between the three groups. In the second phantom, low-contrast detectability was assessed by nine human readers independently for each condition. The dose reduction potential of ASIR-V was estimated on the basis of a generalized linear statistical regression model. On average, image noise was reduced 37.3% with ASIR-V 50% and 71.5% with ASIR-V 100% compared with FBP. ASIR-V shifted the noise power spectrum toward lower frequencies compared with FBP. The spatial resolution of ASIR-V was equivalent or slightly superior to that of FBP, except for the low-contrast object, which had lower resolution. Lesion detection significantly increased with both ASIR-V levels (p = 0.001), with an estimated radiation dose reduction potential of 15% ± 5% (SD) for ASIR-V 50% and 31% ± 9% for ASIR-V 100%. ASIR-V reduced image noise and improved lesion detection compared with FBP and had potential for radiation dose reduction while preserving low-contrast detectability.

  12. Probabilistic Model for Untargeted Peak Detection in LC-MS Using Bayesian Statistics.

    PubMed

    Woldegebriel, Michael; Vivó-Truyols, Gabriel

    2015-07-21

    We introduce a novel Bayesian probabilistic peak detection algorithm for liquid chromatography-mass spectroscopy (LC-MS). The final probabilistic result allows the user to make a final decision about which points in a chromatogram are affected by a chromatographic peak and which ones are only affected by noise. The use of probabilities contrasts with the traditional method in which a binary answer is given, relying on a threshold. By contrast, with the Bayesian peak detection presented here, the values of probability can be further propagated into other preprocessing steps, which will increase (or decrease) the importance of chromatographic regions into the final results. The present work is based on the use of the statistical overlap theory of component overlap from Davis and Giddings (Davis, J. M.; Giddings, J. Anal. Chem. 1983, 55, 418-424) as prior probability in the Bayesian formulation. The algorithm was tested on LC-MS Orbitrap data and was able to successfully distinguish chemical noise from actual peaks without any data preprocessing.

  13. A statistical model of false negative and false positive detection of phase singularities.

    PubMed

    Jacquemet, Vincent

    2017-10-01

    The complexity of cardiac fibrillation dynamics can be assessed by analyzing the distribution of phase singularities (PSs) observed using mapping systems. Interelectrode distance, however, limits the accuracy of PS detection. To investigate in a theoretical framework the PS false negative and false positive rates in relation to the characteristics of the mapping system and fibrillation dynamics, we propose a statistical model of phase maps with controllable number and locations of PSs. In this model, phase maps are generated from randomly distributed PSs with physiologically-plausible directions of rotation. Noise and distortion of the phase are added. PSs are detected using topological charge contour integrals on regular grids of varying resolutions. Over 100 × 10 6 realizations of the random field process are used to estimate average false negative and false positive rates using a Monte-Carlo approach. The false detection rates are shown to depend on the average distance between neighboring PSs expressed in units of interelectrode distance, following approximately a power law with exponents in the range of 1.14 to 2 for false negatives and around 2.8 for false positives. In the presence of noise or distortion of phase, false detection rates at high resolution tend to a non-zero noise-dependent lower bound. This model provides an easy-to-implement tool for benchmarking PS detection algorithms over a broad range of configurations with multiple PSs.

  14. Built-Up Area Detection from High-Resolution Satellite Images Using Multi-Scale Wavelet Transform and Local Spatial Statistics

    NASA Astrophysics Data System (ADS)

    Chen, Y.; Zhang, Y.; Gao, J.; Yuan, Y.; Lv, Z.

    2018-04-01

    Recently, built-up area detection from high-resolution satellite images (HRSI) has attracted increasing attention because HRSI can provide more detailed object information. In this paper, multi-resolution wavelet transform and local spatial autocorrelation statistic are introduced to model the spatial patterns of built-up areas. First, the input image is decomposed into high- and low-frequency subbands by wavelet transform at three levels. Then the high-frequency detail information in three directions (horizontal, vertical and diagonal) are extracted followed by a maximization operation to integrate the information in all directions. Afterward, a cross-scale operation is implemented to fuse different levels of information. Finally, local spatial autocorrelation statistic is introduced to enhance the saliency of built-up features and an adaptive threshold algorithm is used to achieve the detection of built-up areas. Experiments are conducted on ZY-3 and Quickbird panchromatic satellite images, and the results show that the proposed method is very effective for built-up area detection.

  15. Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomic studies.

    PubMed

    Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos

    2005-01-01

    Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.

  16. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

    2018-06-01

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  17. Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

    PubMed

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

    2018-06-05

    Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.

  18. Statistical image segmentation for the detection of skin lesion borders in UV fluorescence excitation

    NASA Astrophysics Data System (ADS)

    Ortega-Martinez, Antonio; Padilla-Martinez, Juan Pablo; Franco, Walfre

    2016-04-01

    The skin contains several fluorescent molecules or fluorophores that serve as markers of structure, function and composition. UV fluorescence excitation photography is a simple and effective way to image specific intrinsic fluorophores, such as the one ascribed to tryptophan which emits at a wavelength of 345 nm upon excitation at 295 nm, and is a marker of cellular proliferation. Earlier, we built a clinical UV photography system to image cellular proliferation. In some samples, the naturally low intensity of the fluorescence can make it difficult to separate the fluorescence of cells in higher proliferation states from background fluorescence and other imaging artifacts -- like electronic noise. In this work, we describe a statistical image segmentation method to separate the fluorescence of interest. Statistical image segmentation is based on image averaging, background subtraction and pixel statistics. This method allows to better quantify the intensity and surface distributions of fluorescence, which in turn simplify the detection of borders. Using this method we delineated the borders of highly-proliferative skin conditions and diseases, in particular, allergic contact dermatitis, psoriatic lesions and basal cell carcinoma. Segmented images clearly define lesion borders. UV fluorescence excitation photography along with statistical image segmentation may serve as a quick and simple diagnostic tool for clinicians.

  19. Local multiplicity adjustment for the spatial scan statistic using the Gumbel distribution.

    PubMed

    Gangnon, Ronald E

    2012-03-01

    The spatial scan statistic is an important and widely used tool for cluster detection. It is based on the simultaneous evaluation of the statistical significance of the maximum likelihood ratio test statistic over a large collection of potential clusters. In most cluster detection problems, there is variation in the extent of local multiplicity across the study region. For example, using a fixed maximum geographic radius for clusters, urban areas typically have many overlapping potential clusters, whereas rural areas have relatively few. The spatial scan statistic does not account for local multiplicity variation. We describe a previously proposed local multiplicity adjustment based on a nested Bonferroni correction and propose a novel adjustment based on a Gumbel distribution approximation to the distribution of a local scan statistic. We compare the performance of all three statistics in terms of power and a novel unbiased cluster detection criterion. These methods are then applied to the well-known New York leukemia dataset and a Wisconsin breast cancer incidence dataset. © 2011, The International Biometric Society.

  20. On the Determination of Poisson Statistics for Haystack Radar Observations of Orbital Debris

    NASA Technical Reports Server (NTRS)

    Stokely, Christopher L.; Benbrook, James R.; Horstman, Matt

    2007-01-01

    A convenient and powerful method is used to determine if radar detections of orbital debris are observed according to Poisson statistics. This is done by analyzing the time interval between detection events. For Poisson statistics, the probability distribution of the time interval between events is shown to be an exponential distribution. This distribution is a special case of the Erlang distribution that is used in estimating traffic loads on telecommunication networks. Poisson statistics form the basis of many orbital debris models but the statistical basis of these models has not been clearly demonstrated empirically until now. Interestingly, during the fiscal year 2003 observations with the Haystack radar in a fixed staring mode, there are no statistically significant deviations observed from that expected with Poisson statistics, either independent or dependent of altitude or inclination. One would potentially expect some significant clustering of events in time as a result of satellite breakups, but the presence of Poisson statistics indicates that such debris disperse rapidly with respect to Haystack's very narrow radar beam. An exception to Poisson statistics is observed in the months following the intentional breakup of the Fengyun satellite in January 2007.

  1. Statistics of the detection rates for tensor and scalar gravitational waves from the Local Galaxy universe

    NASA Astrophysics Data System (ADS)

    Baryshev, Yu. V.; Paturel, G.

    2001-05-01

    We use data on the local 3-dimensional galaxy distribution for studying the statistics of the detection rates of gravitational waves (GW) coming from supernova explosions. We consider both tensor and scalar gravitational waves which are possible in a wide range of relativistic and quantum gravity theories. We show that statistics of GW events as a function of sidereal time can be used for distinction between scalar and tensor gravitational waves because of the anisotropy of spatial galaxy distribution. For calculation of the expected amplitudes of GW signals we use the values of the released GW energy, frequency and duration of GW pulse which are consistent with existing scenarios of SN core collapse. The amplitudes of the signals produced by Virgo and the Great Attractor clusters of galaxies is expressed as a function of the sidereal time for resonant bar detectors operating now (IGEC) and for forthcoming laser interferometric detectors (VIRGO). Then, we calculate the expected number of GW events as a function of sidereal time produced by all the galaxies within 100 Mpc. In the case of axisymmetric rotational core collapse which radiates a GW energy of 10-9Msunc2, only the closest explosions can be detected. However, in the case of nonaxisymmetric supernova explosion, due to such phenomena as centrifugal hangup, bar and lump formation, the GW radiation could be as strong as that from a coalescing neutron-star binary. For radiated GW energy higher than 10-6Msunc2 and sensitivity of detectors at the level h ~ 10-23 it is possible to detect Virgo cluster and Great Attractor, and hence to use the statistics of GW events for testing gravity theories.

  2. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

    PubMed Central

    Kim, Sung-Min; Choi, Yosoon

    2017-01-01

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168

  3. Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

    PubMed

    Kim, Sung-Min; Choi, Yosoon

    2017-06-18

    To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.

  4. A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking

    NASA Astrophysics Data System (ADS)

    Hussein, I.; MacMillan, R.

    2014-09-01

    Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.

  5. Detecting microsatellites within genomes: significant variation among algorithms

    PubMed Central

    Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe

    2007-01-01

    Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102

  6. Revisiting Information Technology tools serving authorship and editorship: a case-guided tutorial to statistical analysis and plagiarism detection

    PubMed Central

    Bamidis, P D; Lithari, C; Konstantinidis, S T

    2010-01-01

    With the number of scientific papers published in journals, conference proceedings, and international literature ever increasing, authors and reviewers are not only facilitated with an abundance of information, but unfortunately continuously confronted with risks associated with the erroneous copy of another's material. In parallel, Information Communication Technology (ICT) tools provide to researchers novel and continuously more effective ways to analyze and present their work. Software tools regarding statistical analysis offer scientists the chance to validate their work and enhance the quality of published papers. Moreover, from the reviewers and the editor's perspective, it is now possible to ensure the (text-content) originality of a scientific article with automated software tools for plagiarism detection. In this paper, we provide a step-bystep demonstration of two categories of tools, namely, statistical analysis and plagiarism detection. The aim is not to come up with a specific tool recommendation, but rather to provide useful guidelines on the proper use and efficiency of either category of tools. In the context of this special issue, this paper offers a useful tutorial to specific problems concerned with scientific writing and review discourse. A specific neuroscience experimental case example is utilized to illustrate the young researcher's statistical analysis burden, while a test scenario is purpose-built using open access journal articles to exemplify the use and comparative outputs of seven plagiarism detection software pieces. PMID:21487489

  7. Revisiting Information Technology tools serving authorship and editorship: a case-guided tutorial to statistical analysis and plagiarism detection.

    PubMed

    Bamidis, P D; Lithari, C; Konstantinidis, S T

    2010-12-01

    With the number of scientific papers published in journals, conference proceedings, and international literature ever increasing, authors and reviewers are not only facilitated with an abundance of information, but unfortunately continuously confronted with risks associated with the erroneous copy of another's material. In parallel, Information Communication Technology (ICT) tools provide to researchers novel and continuously more effective ways to analyze and present their work. Software tools regarding statistical analysis offer scientists the chance to validate their work and enhance the quality of published papers. Moreover, from the reviewers and the editor's perspective, it is now possible to ensure the (text-content) originality of a scientific article with automated software tools for plagiarism detection. In this paper, we provide a step-bystep demonstration of two categories of tools, namely, statistical analysis and plagiarism detection. The aim is not to come up with a specific tool recommendation, but rather to provide useful guidelines on the proper use and efficiency of either category of tools. In the context of this special issue, this paper offers a useful tutorial to specific problems concerned with scientific writing and review discourse. A specific neuroscience experimental case example is utilized to illustrate the young researcher's statistical analysis burden, while a test scenario is purpose-built using open access journal articles to exemplify the use and comparative outputs of seven plagiarism detection software pieces.

  8. The role of baryons in creating statistically significant planes of satellites around Milky Way-mass galaxies

    NASA Astrophysics Data System (ADS)

    Ahmed, Sheehan H.; Brooks, Alyson M.; Christensen, Charlotte R.

    2017-04-01

    We investigate whether the inclusion of baryonic physics influences the formation of thin, coherently rotating planes of satellites such as those seen around the Milky Way and Andromeda. For four Milky Way-mass simulations, each run both as dark matter-only and with baryons included, we are able to identify a planar configuration that significantly maximizes the number of plane satellite members. The maximum plane member satellites are consistently different between the dark matter-only and baryonic versions of the same run due to the fact that satellites are both more likely to be destroyed and to infall later in the baryonic runs. Hence, studying satellite planes in dark matter-only simulations is misleading, because they will be composed of different satellite members than those that would exist if baryons were included. Additionally, the destruction of satellites in the baryonic runs leads to less radially concentrated satellite distributions, a result that is critical to making planes that are statistically significant compared to a random distribution. Since all planes pass through the centre of the galaxy, it is much harder to create a plane of a given height from a random distribution if the satellites have a low radial concentration. We identify Andromeda's low radial satellite concentration as a key reason why the plane in Andromeda is highly significant. Despite this, when corotation is considered, none of the satellite planes identified for the simulated galaxies are as statistically significant as the observed planes around the Milky Way and Andromeda, even in the baryonic runs.

  9. Detection of Clinically Significant Retinopathy of Prematurity Using Wide-angle Digital Retinal Photography

    PubMed Central

    Chiang, Michael F.; Melia, Michele; Buffenn, Angela N.; Lambert, Scott R.; Recchia, Franco M.; Simpson, Jennifer L.; Yang, Michael B.

    2013-01-01

    Objective To evaluate the accuracy of detecting clinically significant retinopathy of prematurity (ROP) using wide-angle digital retinal photography. Methods Literature searches of PubMed and the Cochrane Library databases were conducted last on December 7, 2010, and yielded 414 unique citations. The authors assessed these 414 citations and marked 82 that potentially met the inclusion criteria. These 82 studies were reviewed in full text; 28 studies met inclusion criteria. The authors extracted from these studies information about study design, interventions, outcomes, and study quality. After data abstraction, 18 were excluded for study deficiencies or because they were superseded by a more recent publication. The methodologist reviewed the remaining 10 studies and assigned ratings of evidence quality; 7 studies were rated level I evidence and 3 studies were rated level III evidence. Results There is level I evidence from ≥5 studies demonstrating that digital retinal photography has high accuracy for detection of clinically significant ROP. Level III studies have reported high accuracy, without any detectable complications, from real-world operational programs intended to detect clinically significant ROP through remote site interpretation of wide-angle retinal photographs. Conclusions Wide-angle digital retinal photography has the potential to complement standard ROP care. It may provide advantages through objective documentation of clinical examination findings, improved recognition of disease progression by comparing previous photographs, and the creation of image libraries for education and research. Financial Disclosure(s) Proprietary or commercial disclosure may be found after the references. PMID:22541632

  10. Statistical optics

    NASA Astrophysics Data System (ADS)

    Goodman, J. W.

    This book is based on the thesis that some training in the area of statistical optics should be included as a standard part of any advanced optics curriculum. Random variables are discussed, taking into account definitions of probability and random variables, distribution functions and density functions, an extension to two or more random variables, statistical averages, transformations of random variables, sums of real random variables, Gaussian random variables, complex-valued random variables, and random phasor sums. Other subjects examined are related to random processes, some first-order properties of light waves, the coherence of optical waves, some problems involving high-order coherence, effects of partial coherence on imaging systems, imaging in the presence of randomly inhomogeneous media, and fundamental limits in photoelectric detection of light. Attention is given to deterministic versus statistical phenomena and models, the Fourier transform, and the fourth-order moment of the spectrum of a detected speckle image.

  11. Structural damage detection based on stochastic subspace identification and statistical pattern recognition: II. Experimental validation under varying temperature

    NASA Astrophysics Data System (ADS)

    Lin, Y. Q.; Ren, W. X.; Fang, S. E.

    2011-11-01

    Although most vibration-based damage detection methods can acquire satisfactory verification on analytical or numerical structures, most of them may encounter problems when applied to real-world structures under varying environments. The damage detection methods that directly extract damage features from the periodically sampled dynamic time history response measurements are desirable but relevant research and field application verification are still lacking. In this second part of a two-part paper, the robustness and performance of the statistics-based damage index using the forward innovation model by stochastic subspace identification of a vibrating structure proposed in the first part have been investigated against two prestressed reinforced concrete (RC) beams tested in the laboratory and a full-scale RC arch bridge tested in the field under varying environments. Experimental verification is focused on temperature effects. It is demonstrated that the proposed statistics-based damage index is insensitive to temperature variations but sensitive to the structural deterioration or state alteration. This makes it possible to detect the structural damage for the real-scale structures experiencing ambient excitations and varying environmental conditions.

  12. Change detection in a time series of polarimetric SAR data by an omnibus test statistic and its factorization (Conference Presentation)

    NASA Astrophysics Data System (ADS)

    Nielsen, Allan A.; Conradsen, Knut; Skriver, Henning

    2016-10-01

    Test statistics for comparison of real (as opposed to complex) variance-covariance matrices exist in the statistics literature [1]. In earlier publications we have described a test statistic for the equality of two variance-covariance matrices following the complex Wishart distribution with an associated p-value [2]. We showed their application to bitemporal change detection and to edge detection [3] in multilook, polarimetric synthetic aperture radar (SAR) data in the covariance matrix representation [4]. The test statistic and the associated p-value is described in [5] also. In [6] we focussed on the block-diagonal case, we elaborated on some computer implementation issues, and we gave examples on the application to change detection in both full and dual polarization bitemporal, bifrequency, multilook SAR data. In [7] we described an omnibus test statistic Q for the equality of k variance-covariance matrices following the complex Wishart distribution. We also described a factorization of Q = R2 R3 … Rk where Q and Rj determine if and when a difference occurs. Additionally, we gave p-values for Q and Rj. Finally, we demonstrated the use of Q and Rj and the p-values to change detection in truly multitemporal, full polarization SAR data. Here we illustrate the methods by means of airborne L-band SAR data (EMISAR) [8,9]. The methods may be applied to other polarimetric SAR data also such as data from Sentinel-1, COSMO-SkyMed, TerraSAR-X, ALOS, and RadarSat-2 and also to single-pol data. The account given here closely follows that given our recent IEEE TGRS paper [7]. Selected References [1] Anderson, T. W., An Introduction to Multivariate Statistical Analysis, John Wiley, New York, third ed. (2003). [2] Conradsen, K., Nielsen, A. A., Schou, J., and Skriver, H., "A test statistic in the complex Wishart distribution and its application to change detection in polarimetric SAR data," IEEE Transactions on Geoscience and Remote Sensing 41(1): 4-19, 2003. [3] Schou, J

  13. Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test

    NASA Astrophysics Data System (ADS)

    Protassov, Rostislav; van Dyk, David A.; Connors, Alanna; Kashyap, Vinay L.; Siemiginowska, Aneta

    2002-05-01

    The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive

  14. "Clinical" Significance: "Clinical" Significance and "Practical" Significance are NOT the Same Things

    ERIC Educational Resources Information Center

    Peterson, Lisa S.

    2008-01-01

    Clinical significance is an important concept in research, particularly in education and the social sciences. The present article first compares clinical significance to other measures of "significance" in statistics. The major methods used to determine clinical significance are explained and the strengths and weaknesses of clinical significance…

  15. The distribution of P-values in medical research articles suggested selective reporting associated with statistical significance.

    PubMed

    Perneger, Thomas V; Combescure, Christophe

    2017-07-01

    Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.

  16. The thresholds for statistical and clinical significance – a five-step procedure for evaluation of intervention effects in randomised clinical trials

    PubMed Central

    2014-01-01

    Background Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Methods Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. Results For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. Conclusions If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials. PMID:24588900

  17. Detection of Significant Pneumococcal Meningitis Biomarkers by Ego Network.

    PubMed

    Wang, Qian; Lou, Zhifeng; Zhai, Liansuo; Zhao, Haibin

    2017-06-01

    To identify significant biomarkers for detection of pneumococcal meningitis based on ego network. Based on the gene expression data of pneumococcal meningitis and global protein-protein interactions (PPIs) data recruited from open access databases, the authors constructed a differential co-expression network (DCN) to identify pneumococcal meningitis biomarkers in a network view. Here EgoNet algorithm was employed to screen the significant ego networks that could accurately distinguish pneumococcal meningitis from healthy controls, by sequentially seeking ego genes, searching candidate ego networks, refinement of candidate ego networks and significance analysis to identify ego networks. Finally, the functional inference of the ego networks was performed to identify significant pathways for pneumococcal meningitis. By differential co-expression analysis, the authors constructed the DCN that covered 1809 genes and 3689 interactions. From the DCN, a total of 90 ego genes were identified. Starting from these ego genes, three significant ego networks (Module 19, Module 70 and Module 71) that could predict clinical outcomes for pneumococcal meningitis were identified by EgoNet algorithm, and the corresponding ego genes were GMNN, MAD2L1 and TPX2, respectively. Pathway analysis showed that these three ego networks were related to CDT1 association with the CDC6:ORC:origin complex, inactivation of APC/C via direct inhibition of the APC/C complex pathway, and DNA strand elongation, respectively. The authors successfully screened three significant ego modules which could accurately predict the clinical outcomes for pneumococcal meningitis and might play important roles in host response to pathogen infection in pneumococcal meningitis.

  18. A novel tactile-guided detection and three-dimensional localization of clinically significant breast masses.

    PubMed

    Mojra, A; Najarian, S; Kashani, S M Towliat; Panahi, F

    2012-01-01

    This paper presents a novel robotic sensory system 'Robo-Tac-BMI', which manipulates an indentation probe for the detection and three-dimensional localization of an abnormal mass embedded in the breast tissue. The Robo-Tac-BMI is designed based on artificial tactile sensing technology which is a new non-invasive method for mimicking the surgeon's palpation quantitatively. The intelligent processor of the device provides an overall stiffness map of the scanned areas. The extracted stiffness parameters provide a decisive factor for certifying the mass existence. Results are validated by 'gold standard' tests. Following the mass detection, its 3D localization is of essential importance in the treatment procedures. The planar 2D coordinate is readily available for all points on the tissue surface. Mass depth estimation is achieved by a comprehensive model utilizing the logistic regression algorithm and a Receiver Operating Characteristic (ROC) Curve for the highest accuracy. Statistical analysis is performed over 27 cases with 346 scanned areas. Copyright © 2012 Informa UK, Ltd.

  19. A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

    PubMed Central

    Eddy, Sean R.

    2008-01-01

    Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments. PMID:18516236

  20. Defect detection of castings in radiography images using a robust statistical feature.

    PubMed

    Zhao, Xinyue; He, Zaixing; Zhang, Shuyou

    2014-01-01

    One of the most commonly used optical methods for defect detection is radiographic inspection. Compared with methods that extract defects directly from the radiography image, model-based methods deal with the case of an object with complex structure well. However, detection of small low-contrast defects in nonuniformly illuminated images is still a major challenge for them. In this paper, we present a new method based on the grayscale arranging pairs (GAP) feature to detect casting defects in radiography images automatically. First, a model is built using pixel pairs with a stable intensity relationship based on the GAP feature from previously acquired images. Second, defects can be extracted by comparing the difference of intensity-difference signs between the input image and the model statistically. The robustness of the proposed method to noise and illumination variations has been verified on casting radioscopic images with defects. The experimental results showed that the average computation time of the proposed method in the testing stage is 28 ms per image on a computer with a Pentium Core 2 Duo 3.00 GHz processor. For the comparison, we also evaluated the performance of the proposed method as well as that of the mixture-of-Gaussian-based and crossing line profile methods. The proposed method achieved 2.7% and 2.0% false negative rates in the noise and illumination variation experiments, respectively.

  1. Integration of biomimicry and nanotechnology for significantly improved detection of circulating tumor cells (CTCs).

    PubMed

    Myung, Ja Hye; Park, Sin-Jung; Wang, Andrew Z; Hong, Seungpyo

    2017-12-13

    Circulating tumor cells (CTCs) have received a great deal of scientific and clinical attention as a biomarker for diagnosis and prognosis of many types of cancer. Given their potential significance in clinics, a variety of detection methods, utilizing the recent advances in nanotechnology and microfluidics, have been introduced in an effort of achieving clinically significant detection of CTCs. However, effective detection and isolation of CTCs still remain a tremendous challenge due to their extreme rarity and phenotypic heterogeneity. Among many approaches that are currently under development, this review paper focuses on a unique, promising approach that takes advantages of naturally occurring processes achievable through application of nanotechnology to realize significant improvement in sensitivity and specificity of CTC capture. We provide an overview of successful outcome of this biomimetic CTC capture system in detection of tumor cells from in vitro, in vivo, and clinical pilot studies. We also emphasize the clinical impact of CTCs as biomarkers in cancer diagnosis and predictive prognosis, which provides a cost-effective, minimally invasive method that potentially replaces or supplements existing methods such as imaging technologies and solid tissue biopsy. In addition, their potential prognostic values as treatment guidelines and that ultimately help to realize personalized therapy are discussed. Copyright © 2017. Published by Elsevier B.V.

  2. Using Statistical Process Control for detecting anomalies in multivariate spatiotemporal Earth Observations

    NASA Astrophysics Data System (ADS)

    Flach, Milan; Mahecha, Miguel; Gans, Fabian; Rodner, Erik; Bodesheim, Paul; Guanche-Garcia, Yanira; Brenning, Alexander; Denzler, Joachim; Reichstein, Markus

    2016-04-01

    The number of available Earth observations (EOs) is currently substantially increasing. Detecting anomalous patterns in these multivariate time series is an important step in identifying changes in the underlying dynamical system. Likewise, data quality issues might result in anomalous multivariate data constellations and have to be identified before corrupting subsequent analyses. In industrial application a common strategy is to monitor production chains with several sensors coupled to some statistical process control (SPC) algorithm. The basic idea is to raise an alarm when these sensor data depict some anomalous pattern according to the SPC, i.e. the production chain is considered 'out of control'. In fact, the industrial applications are conceptually similar to the on-line monitoring of EOs. However, algorithms used in the context of SPC or process monitoring are rarely considered for supervising multivariate spatio-temporal Earth observations. The objective of this study is to exploit the potential and transferability of SPC concepts to Earth system applications. We compare a range of different algorithms typically applied by SPC systems and evaluate their capability to detect e.g. known extreme events in land surface processes. Specifically two main issues are addressed: (1) identifying the most suitable combination of data pre-processing and detection algorithm for a specific type of event and (2) analyzing the limits of the individual approaches with respect to the magnitude, spatio-temporal size of the event as well as the data's signal to noise ratio. Extensive artificial data sets that represent the typical properties of Earth observations are used in this study. Our results show that the majority of the algorithms used can be considered for the detection of multivariate spatiotemporal events and directly transferred to real Earth observation data as currently assembled in different projects at the European scale, e.g. http://baci-h2020.eu

  3. Conducting tests for statistically significant differences using forest inventory data

    Treesearch

    James A. Westfall; Scott A. Pugh; John W. Coulston

    2013-01-01

    Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...

  4. Detailed noise statistics for an optically preamplified direct detection receiver

    NASA Astrophysics Data System (ADS)

    Danielsen, Soeren Lykke; Mikkelsen, Benny; Durhuus, Terji; Joergensen, Carsten; Stubkjaer, Kristian E.

    We describe the exact statistics of an optically preamplified direct detection receiver by means of the moment generating function. The theory allows an arbitrary shaped electrical filter in the receiver circuit. The moment generating function (MGF) allows for a precise calculation of the error rate by using the inverse Fast Fourier transform (FFT). The exact results are compared with the usual Gaussian approximation (GA), the saddlepoint approximation (SAP) and the modified Chernoff bound (MCB). This comparison shows that the noise is not Gaussian distributed for all values of the optical amplifier gain. In the region from 20-30 dB gain, calculations shows that the GA underestimates the receiver sensitivity while the SAP is very close to the results of our exact model. Using the MGF derived in the article we then find the optimal bandwidth of the electrical filter in the receiver circuit and calculate the sensitivity degradation due to inter symbol interference (ISI).

  5. Statistical modeling, detection, and segmentation of stains in digitized fabric images

    NASA Astrophysics Data System (ADS)

    Gururajan, Arunkumar; Sari-Sarraf, Hamed; Hequet, Eric F.

    2007-02-01

    This paper will describe a novel and automated system based on a computer vision approach, for objective evaluation of stain release on cotton fabrics. Digitized color images of the stained fabrics are obtained, and the pixel values in the color and intensity planes of these images are probabilistically modeled as a Gaussian Mixture Model (GMM). Stain detection is posed as a decision theoretic problem, where the null hypothesis corresponds to absence of a stain. The null hypothesis and the alternate hypothesis mathematically translate into a first order GMM and a second order GMM respectively. The parameters of the GMM are estimated using a modified Expectation-Maximization (EM) algorithm. Minimum Description Length (MDL) is then used as the test statistic to decide the verity of the null hypothesis. The stain is then segmented by a decision rule based on the probability map generated by the EM algorithm. The proposed approach was tested on a dataset of 48 fabric images soiled with stains of ketchup, corn oil, mustard, ragu sauce, revlon makeup and grape juice. The decision theoretic part of the algorithm produced a correct detection rate (true positive) of 93% and a false alarm rate of 5% on these set of images.

  6. Statistical comparison of pooled nitrogen washout data of various altitude decompression response groups

    NASA Technical Reports Server (NTRS)

    Edwards, B. F.; Waligora, J. M.; Horrigan, D. J., Jr.

    1985-01-01

    This analysis was done to determine whether various decompression response groups could be characterized by the pooled nitrogen (N2) washout profiles of the group members, pooling individual washout profiles provided a smooth time dependent function of means representative of the decompression response group. No statistically significant differences were detected. The statistical comparisons of the profiles were performed by means of univariate weighted t-test at each 5 minute profile point, and with levels of significance of 5 and 10 percent. The estimated powers of the tests (i.e., probabilities) to detect the observed differences in the pooled profiles were of the order of 8 to 30 percent.

  7. Significance of MPEG-7 textural features for improved mass detection in mammography.

    PubMed

    Eltonsy, Nevine H; Tourassi, Georgia D; Fadeev, Aleksey; Elmaghraby, Adel S

    2006-01-01

    The purpose of the study is to investigate the significance of MPEG-7 textural features for improving the detection of masses in screening mammograms. The detection scheme was originally based on morphological directional neighborhood features extracted from mammographic regions of interest (ROIs). Receiver Operating Characteristics (ROC) was performed to evaluate the performance of each set of features independently and merged into a back-propagation artificial neural network (BPANN) using the leave-one-out sampling scheme (LOOSS). The study was based on a database of 668 mammographic ROIs (340 depicting cancer regions and 328 depicting normal parenchyma). Overall, the ROC area index of the BPANN using the directional morphological features was Az=0.85+/-0.01. The MPEG-7 edge histogram descriptor-based BPNN showed an ROC area index of Az=0.71+/-0.01 while homogeneous textural descriptors using 30 and 120 channels helped the BPNN achieve similar ROC area indexes of Az=0.882+/-0.02 and Az=0.877+/-0.01 respectively. After merging the MPEG-7 homogeneous textural features with the directional neighborhood features the performance of the BPANN increased providing an ROC area index of Az=0.91+/-0.01. MPEG-7 homogeneous textural descriptor significantly improved the morphology-based detection scheme.

  8. The effect of a graphical interpretation of a statistic trend indicator (Trigg's Tracking Variable) on the detection of simulated changes.

    PubMed

    Kennedy, R R; Merry, A F

    2011-09-01

    Anaesthesia involves processing large amounts of information over time. One task of the anaesthetist is to detect substantive changes in physiological variables promptly and reliably. It has been previously demonstrated that a graphical trend display of historical data leads to more rapid detection of such changes. We examined the effect of a graphical indication of the magnitude of Trigg's Tracking Variable, a simple statistically based trend detection algorithm, on the accuracy and latency of the detection of changes in a micro-simulation. Ten anaesthetists each viewed 20 simulations with four variables displayed as the current value with a simple graphical trend display. Values for these variables were generated by a computer model, and updated every second; after a period of stability a change occurred to a new random value at least 10 units from baseline. In 50% of the simulations an indication of the rate of change was given by a five level graphical representation of the value of Trigg's Tracking Variable. Participants were asked to indicate when they thought a change was occurring. Changes were detected 10.9% faster with the trend indicator present (mean 13.1 [SD 3.1] cycles vs 14.6 [SD 3.4] cycles, 95% confidence interval 0.4 to 2.5 cycles, P = 0.013. There was no difference in accuracy of detection (median with trend detection 97% [interquartile range 95 to 100%], without trend detection 100% [98 to 100%]), P = 0.8. We conclude that simple statistical trend detection may speed detection of changes during routine anaesthesia, even when a graphical trend display is present.

  9. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features.

    PubMed

    Amudha, P; Karthik, S; Sivakumari, S

    2015-01-01

    Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup'99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different.

  10. A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features

    PubMed Central

    Amudha, P.; Karthik, S.; Sivakumari, S.

    2015-01-01

    Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup'99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different. PMID:26221625

  11. Detection of significant variation in acoustic output of an electromagnetic lithotriptor.

    PubMed

    Pishchalnikov, Yuri A; McAteer, James A; Vonderhaar, R Jason; Pishchalnikova, Irina V; Williams, James C; Evan, Andrew P

    2006-11-01

    We describe the observation of significant instability in the output of an electromagnetic lithotriptor. This instability had a form that was not detected by routine assessment, but rather was observed only by collecting many consecutive shock waves in nonstop regimen. A Dornier DoLi-50 lithotriptor used exclusively for basic research was tested and approved by the regional technician. This assessment included hydrophone measures at select power levels with the collection of about 25 shock waves per setting. Subsequent laboratory characterization used a fiberoptic hydrophone and storage oscilloscope for data acquisition. Waveforms were collected nonstop for hundreds of pulses. Output was typically stable for greater than 1,000 shock waves but substantial fluctuations in acoustic pressures were also observed. For example, output at power level 3 (mean peak positive acoustic pressure +/- SD normally 44 +/- 2 MPa) increased dramatically to greater than 50 MPa or decreased significantly to approximately 30 MPa for hundreds of shock waves. The cause of instability was eventually traced to a faulty lithotriptor power supply. Instability in lithotriptor acoustic output can occur and it may not be detected by routine assessment. Collecting waveforms in a nonstop regimen dramatically increases sampling size, improving the detection of instability. Had the instability that we observed occurred during patient treatment, the energy delivered may well have exceeded the planned dose. Since the potential for adverse effects in lithotripsy increases as the dose is increased, it would be valuable to develop ways to better monitor the acoustic output of lithotriptors.

  12. Statistical significance of seasonal warming/cooling trends

    NASA Astrophysics Data System (ADS)

    Ludescher, Josef; Bunde, Armin; Schellnhuber, Hans Joachim

    2017-04-01

    The question whether a seasonal climate trend (e.g., the increase of summer temperatures in Antarctica in the last decades) is of anthropogenic or natural origin is of great importance for mitigation and adaption measures alike. The conventional significance analysis assumes that (i) the seasonal climate trends can be quantified by linear regression, (ii) the different seasonal records can be treated as independent records, and (iii) the persistence in each of these seasonal records can be characterized by short-term memory described by an autoregressive process of first order. Here we show that assumption ii is not valid, due to strong intraannual correlations by which different seasons are correlated. We also show that, even in the absence of correlations, for Gaussian white noise, the conventional analysis leads to a strong overestimation of the significance of the seasonal trends, because multiple testing has not been taken into account. In addition, when the data exhibit long-term memory (which is the case in most climate records), assumption iii leads to a further overestimation of the trend significance. Combining Monte Carlo simulations with the Holm-Bonferroni method, we demonstrate how to obtain reliable estimates of the significance of the seasonal climate trends in long-term correlated records. For an illustration, we apply our method to representative temperature records from West Antarctica, which is one of the fastest-warming places on Earth and belongs to the crucial tipping elements in the Earth system.

  13. A Statistical Analysis of Brain Morphology Using Wild Bootstrapping

    PubMed Central

    Ibrahim, Joseph G.; Tang, Niansheng; Rowe, Daniel B.; Hao, Xuejun; Bansal, Ravi; Peterson, Bradley S.

    2008-01-01

    Methods for the analysis of brain morphology, including voxel-based morphology and surface-based morphometries, have been used to detect associations between brain structure and covariates of interest, such as diagnosis, severity of disease, age, IQ, and genotype. The statistical analysis of morphometric measures usually involves two statistical procedures: 1) invoking a statistical model at each voxel (or point) on the surface of the brain or brain subregion, followed by mapping test statistics (e.g., t test) or their associated p values at each of those voxels; 2) correction for the multiple statistical tests conducted across all voxels on the surface of the brain region under investigation. We propose the use of new statistical methods for each of these procedures. We first use a heteroscedastic linear model to test the associations between the morphological measures at each voxel on the surface of the specified subregion (e.g., cortical or subcortical surfaces) and the covariates of interest. Moreover, we develop a robust test procedure that is based on a resampling method, called wild bootstrapping. This procedure assesses the statistical significance of the associations between a measure of given brain structure and the covariates of interest. The value of this robust test procedure lies in its computationally simplicity and in its applicability to a wide range of imaging data, including data from both anatomical and functional magnetic resonance imaging (fMRI). Simulation studies demonstrate that this robust test procedure can accurately control the family-wise error rate. We demonstrate the application of this robust test procedure to the detection of statistically significant differences in the morphology of the hippocampus over time across gender groups in a large sample of healthy subjects. PMID:17649909

  14. Inverse statistics and information content

    NASA Astrophysics Data System (ADS)

    Ebadi, H.; Bolgorian, Meysam; Jafari, G. R.

    2010-12-01

    Inverse statistics analysis studies the distribution of investment horizons to achieve a predefined level of return. This distribution provides a maximum investment horizon which determines the most likely horizon for gaining a specific return. There exists a significant difference between inverse statistics of financial market data and a fractional Brownian motion (fBm) as an uncorrelated time-series, which is a suitable criteria to measure information content in financial data. In this paper we perform this analysis for the DJIA and S&P500 as two developed markets and Tehran price index (TEPIX) as an emerging market. We also compare these probability distributions with fBm probability, to detect when the behavior of the stocks are the same as fBm.

  15. Statistics for characterizing data on the periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, James P; Hush, Donald R

    2010-01-01

    We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.

  16. Advances in statistics

    Treesearch

    Howard Stauffer; Nadav Nur

    2005-01-01

    The papers included in the Advances in Statistics section of the Partners in Flight (PIF) 2002 Proceedings represent a small sample of statistical topics of current importance to Partners In Flight research scientists: hierarchical modeling, estimation of detection probabilities, and Bayesian applications. Sauer et al. (this volume) examines a hierarchical model...

  17. Evaluation of statistical treatments of left-censored environmental data using coincident uncensored data sets: I. Summary statistics

    USGS Publications Warehouse

    Antweiler, Ronald C.; Taylor, Howard E.

    2008-01-01

    The main classes of statistical treatment of below-detection limit (left-censored) environmental data for the determination of basic statistics that have been used in the literature are substitution methods, maximum likelihood, regression on order statistics (ROS), and nonparametric techniques. These treatments, along with using all instrument-generated data (even those below detection), were evaluated by examining data sets in which the true values of the censored data were known. It was found that for data sets with less than 70% censored data, the best technique overall for determination of summary statistics was the nonparametric Kaplan-Meier technique. ROS and the two substitution methods of assigning one-half the detection limit value to censored data or assigning a random number between zero and the detection limit to censored data were adequate alternatives. The use of these two substitution methods, however, requires a thorough understanding of how the laboratory censored the data. The technique of employing all instrument-generated data - including numbers below the detection limit - was found to be less adequate than the above techniques. At high degrees of censoring (greater than 70% censored data), no technique provided good estimates of summary statistics. Maximum likelihood techniques were found to be far inferior to all other treatments except substituting zero or the detection limit value to censored data.

  18. Experimental Mathematics and Computational Statistics

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bailey, David H.; Borwein, Jonathan M.

    2009-04-30

    The field of statistics has long been noted for techniques to detect patterns and regularities in numerical data. In this article we explore connections between statistics and the emerging field of 'experimental mathematics'. These includes both applications of experimental mathematics in statistics, as well as statistical methods applied to computational mathematics.

  19. Intraoperative detection of 18F-FDG-avid tissue sites using the increased probe counting efficiency of the K-alpha probe design and variance-based statistical analysis with the three-sigma criteria

    PubMed Central

    2013-01-01

    Background Intraoperative detection of 18F-FDG-avid tissue sites during 18F-FDG-directed surgery can be very challenging when utilizing gamma detection probes that rely on a fixed target-to-background (T/B) ratio (ratiometric threshold) for determination of probe positivity. The purpose of our study was to evaluate the counting efficiency and the success rate of in situ intraoperative detection of 18F-FDG-avid tissue sites (using the three-sigma statistical threshold criteria method and the ratiometric threshold criteria method) for three different gamma detection probe systems. Methods Of 58 patients undergoing 18F-FDG-directed surgery for known or suspected malignancy using gamma detection probes, we identified nine 18F-FDG-avid tissue sites (from amongst seven patients) that were seen on same-day preoperative diagnostic PET/CT imaging, and for which each 18F-FDG-avid tissue site underwent attempted in situ intraoperative detection concurrently using three gamma detection probe systems (K-alpha probe, and two commercially-available PET-probe systems), and then were subsequently surgical excised. Results The mean relative probe counting efficiency ratio was 6.9 (± 4.4, range 2.2–15.4) for the K-alpha probe, as compared to 1.5 (± 0.3, range 1.0–2.1) and 1.0 (± 0, range 1.0–1.0), respectively, for two commercially-available PET-probe systems (P < 0.001). Successful in situ intraoperative detection of 18F-FDG-avid tissue sites was more frequently accomplished with each of the three gamma detection probes tested by using the three-sigma statistical threshold criteria method than by using the ratiometric threshold criteria method, specifically with the three-sigma statistical threshold criteria method being significantly better than the ratiometric threshold criteria method for determining probe positivity for the K-alpha probe (P = 0.05). Conclusions Our results suggest that the improved probe counting efficiency of the K-alpha probe design used in

  20. 1.5-Tesla Multiparametric-Magnetic Resonance Imaging for the detection of clinically significant prostate cancer.

    PubMed

    Popita, Cristian; Popita, Anca Raluca; Sitar-Taut, Adela; Petrut, Bogdan; Fetica, Bogdan; Coman, Ioan

    2017-01-01

    Multiparametric-magnetic resonance imaging (mp-MRI) is the main imaging modality used for prostate cancer detection. The aim of this study is to evaluate the diagnostic performance of mp-MRI at 1.5-Tesla (1.5-T) for the detection of clinically significant prostate cancer. In this ethical board approved prospective study, 39 patients with suspected prostate cancer were included. Patients with a history of positive prostate biopsy and patients treated for prostate cancer were excluded. All patients were examined at 1.5-T MRI, before standard transrectal ultrasonography-guided biopsy. The overall sensitivity, specificity, positive predictive value and negative predictive value for mp-MRI were 100%, 73.68%, 80% and 100%, respectively. Our results showed that 1.5 T mp-MRI has a high sensitivity for detection of clinically significant prostate cancer and high negative predictive value in order to rule out significant disease.

  1. The Detection and Statistics of Giant Arcs behind CLASH Clusters

    NASA Astrophysics Data System (ADS)

    Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Zheng, Wei; Bradley, Larry; Vega, Jesus; Koekemoer, Anton

    2016-02-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift zs = 1.9 with 33% of the detected arcs having zs > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c-M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.

  2. Dispersal of potato cyst nematodes measured using historical and spatial statistical analyses.

    PubMed

    Banks, N C; Hodda, M; Singh, S K; Matveeva, E M

    2012-06-01

    Rates and modes of dispersal of potato cyst nematodes (PCNs) were investigated. Analysis of records from eight countries suggested that PCNs spread a mean distance of 5.3 km/year radially from the site of first detection, and spread 212 km over ≈40 years before detection. Data from four countries with more detailed histories of invasion were analyzed further, using distance from first detection, distance from previous detection, distance from nearest detection, straight line distance, and road distance. Linear distance from first detection was significantly related to the time since the first detection. Estimated rate of spread was 5.7 km/year, and did not differ statistically between countries. Time between the first detection and estimated introduction date varied between 0 and 20 years, and differed among countries. Road distances from nearest and first detection were statistically significantly related to time, and gave slightly higher estimates for rate of spread of 6.0 and 7.9 km/year, respectively. These results indicate that the original site of introduction of PCNs may act as a source for subsequent spread and that this may occur at a relatively constant rate over time regardless of whether this distance is measured by road or by a straight line. The implications of this constant radial rate of dispersal for biosecurity and pest management are discussed, along with the effects of control strategies.

  3. Lineaments on Skylab photographs: Detection, mapping, and hydrologic significance in central Tennessee

    NASA Technical Reports Server (NTRS)

    Moore, G. K.

    1976-01-01

    An investigation was carried out to determine the feasibility of mapping lineaments on SKYLAB photographs of central Tennessee and to determine the hydrologic significance of these lineaments, particularly as concerns the occurrence and productivity of ground water. Sixty-nine percent more lineaments were found on SKYLAB photographs by stereo viewing than by projection viewing, but longer lineaments were detected by projection viewing. Most SKYLAB lineaments consisted of topographic depressions and they followed or paralleled the streams. The remainder were found by vegetation alinements and the straight sides of ridges. Test drilling showed that the median yield of wells located on SKYLAB lineaments were about six times the median yield of wells located by random drilling. The best single detection method, in terms of potential savings, was stereo viewing. Larger savings might be achieved by locating wells on lineaments detected by both stereo viewing and projection.

  4. Principles of Statistics: What the Sports Medicine Professional Needs to Know.

    PubMed

    Riemann, Bryan L; Lininger, Monica R

    2018-07-01

    Understanding the results and statistics reported in original research remains a large challenge for many sports medicine practitioners and, in turn, may be among one of the biggest barriers to integrating research into sports medicine practice. The purpose of this article is to provide minimal essentials a sports medicine practitioner needs to know about interpreting statistics and research results to facilitate the incorporation of the latest evidence into practice. Topics covered include the difference between statistical significance and clinical meaningfulness; effect sizes and confidence intervals; reliability statistics, including the minimal detectable difference and minimal important difference; and statistical power. Copyright © 2018 Elsevier Inc. All rights reserved.

  5. Synthetic velocity gradient tensors and the identification of statistically significant aspects of the structure of turbulence

    NASA Astrophysics Data System (ADS)

    Keylock, Christopher J.

    2017-08-01

    A method is presented for deriving random velocity gradient tensors given a source tensor. These synthetic tensors are constrained to lie within mathematical bounds of the non-normality of the source tensor, but we do not impose direct constraints upon scalar quantities typically derived from the velocity gradient tensor and studied in fluid mechanics. Hence, it becomes possible to ask hypotheses of data at a point regarding the statistical significance of these scalar quantities. Having presented our method and the associated mathematical concepts, we apply it to homogeneous, isotropic turbulence to test the utility of the approach for a case where the behavior of the tensor is understood well. We show that, as well as the concentration of data along the Vieillefosse tail, actual turbulence is also preferentially located in the quadrant where there is both excess enstrophy (Q>0 ) and excess enstrophy production (R<0 ). We also examine the topology implied by the strain eigenvalues and find that for the statistically significant results there is a particularly strong relative preference for the formation of disklike structures in the (Q<0 ,R<0 ) quadrant. With the method shown to be useful for a turbulence that is already understood well, it should be of even greater utility for studying complex flows seen in industry and the environment.

  6. Detecting Selection on Protein Stability through Statistical Mechanical Models of Folding and Evolution

    PubMed Central

    Bastolla, Ugo

    2014-01-01

    The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change. PMID:24970217

  7. The Diagnostic Performance of Multiparametric Magnetic Resonance Imaging to Detect Significant Prostate Cancer.

    PubMed

    Thompson, J E; van Leeuwen, P J; Moses, D; Shnier, R; Brenner, P; Delprado, W; Pulbrook, M; Böhm, M; Haynes, A M; Hayen, A; Stricker, P D

    2016-05-01

    We assess the accuracy of multiparametric magnetic resonance imaging for significant prostate cancer detection before diagnostic biopsy in men with an abnormal prostate specific antigen/digital rectal examination. A total of 388 men underwent multiparametric magnetic resonance imaging, including T2-weighted, diffusion weighted and dynamic contrast enhanced imaging before biopsy. Two radiologists used PI-RADS to allocate a score of 1 to 5 for suspicion of significant prostate cancer (Gleason 7 with more than 5% grade 4). PI-RADS 3 to 5 was considered positive. Transperineal template guided mapping biopsy of 18 regions (median 30 cores) was performed with additional manually directed cores from magnetic resonance imaging positive regions. The anatomical location, size and grade of individual cancer areas in the biopsy regions (18) as the primary outcome and in prostatectomy specimens (117) as the secondary outcome were correlated to the magnetic resonance imaging positive regions. Of the 388 men who were enrolled in the study 344 were analyzed. Multiparametric magnetic resonance imaging was positive in 77.0% of patients, 62.5% had prostate cancer and 41.6% had significant prostate cancer. The detection of significant prostate cancer by multiparametric magnetic resonance imaging had a sensitivity of 96%, specificity of 36%, negative predictive value of 92% and positive predictive value of 52%. Adding PI-RADS to the multivariate model, including prostate specific antigen, digital rectal examination, prostate volume and age, improved the AUC from 0.776 to 0.879 (p <0.001). Anatomical concordance analysis showed a low mismatch between the magnetic resonance imaging positive regions and biopsy positive regions (4 [2.9%]), and the significant prostate cancer area in the radical prostatectomy specimen (3 [3.3%]). In men with an abnormal prostate specific antigen/digital rectal examination, multiparametric magnetic resonance imaging detected significant prostate cancer

  8. 1.5-Tesla Multiparametric-Magnetic Resonance Imaging for the detection of clinically significant prostate cancer

    PubMed Central

    POPITA, CRISTIAN; POPITA, ANCA RALUCA; SITAR-TAUT, ADELA; PETRUT, BOGDAN; FETICA, BOGDAN; COMAN, IOAN

    2017-01-01

    Background and aim Multiparametric-magnetic resonance imaging (mp-MRI) is the main imaging modality used for prostate cancer detection. The aim of this study is to evaluate the diagnostic performance of mp-MRI at 1.5-Tesla (1.5-T) for the detection of clinically significant prostate cancer. Methods In this ethical board approved prospective study, 39 patients with suspected prostate cancer were included. Patients with a history of positive prostate biopsy and patients treated for prostate cancer were excluded. All patients were examined at 1.5-T MRI, before standard transrectal ultrasonography–guided biopsy. Results The overall sensitivity, specificity, positive predictive value and negative predictive value for mp-MRI were 100%, 73.68%, 80% and 100%, respectively. Conclusion Our results showed that 1.5 T mp-MRI has a high sensitivity for detection of clinically significant prostate cancer and high negative predictive value in order to rule out significant disease. PMID:28246496

  9. Using statistical and machine learning to help institutions detect suspicious access to electronic health records.

    PubMed

    Boxwala, Aziz A; Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs.

  10. Using statistical and machine learning to help institutions detect suspicious access to electronic health records

    PubMed Central

    Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila

    2011-01-01

    Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912

  11. Accuracy of Noncycloplegic Retinoscopy, Retinomax Autorefractor, and SureSight Vision Screener for Detecting Significant Refractive Errors

    PubMed Central

    Kulp, Marjean Taylor; Ying, Gui-shuang; Huang, Jiayan; Maguire, Maureen; Quinn, Graham; Ciner, Elise B.; Cyert, Lynn A.; Orel-Bixler, Deborah A.; Moore, Bruce D.

    2014-01-01

    Purpose. To evaluate, by receiver operating characteristic (ROC) analysis, the ability of noncycloplegic retinoscopy (NCR), Retinomax Autorefractor (Retinomax), and SureSight Vision Screener (SureSight) to detect significant refractive errors (RE) among preschoolers. Methods. Refraction results of eye care professionals using NCR, Retinomax, and SureSight (n = 2588) and of nurse and lay screeners using Retinomax and SureSight (n = 1452) were compared with masked cycloplegic retinoscopy results. Significant RE was defined as hyperopia greater than +3.25 diopters (D), myopia greater than 2.00 D, astigmatism greater than 1.50 D, and anisometropia greater than 1.00 D interocular difference in hyperopia, greater than 3.00 D interocular difference in myopia, or greater than 1.50 D interocular difference in astigmatism. The ability of each screening test to identify presence, type, and/or severity of significant RE was summarized by the area under the ROC curve (AUC) and calculated from weighted logistic regression models. Results. For detection of each type of significant RE, AUC of each test was high; AUC was better for detecting the most severe levels of RE than for all REs considered important to detect (AUC 0.97–1.00 vs. 0.92–0.93). The area under the curve of each screening test was high for myopia (AUC 0.97–0.99). Noncycloplegic retinoscopy and Retinomax performed better than SureSight for hyperopia (AUC 0.92–0.99 and 0.90–0.98 vs. 0.85–0.94, P ≤ 0.02), Retinomax performed better than NCR for astigmatism greater than 1.50 D (AUC 0.95 vs. 0.90, P = 0.01), and SureSight performed better than Retinomax for anisometropia (AUC 0.85–1.00 vs. 0.76–0.96, P ≤ 0.07). Performance was similar for nurse and lay screeners in detecting any significant RE (AUC 0.92–1.00 vs. 0.92–0.99). Conclusions. Each test had a very high discriminatory power for detecting children with any significant RE. PMID:24481262

  12. Accuracy of noncycloplegic retinoscopy, retinomax autorefractor, and SureSight vision screener for detecting significant refractive errors.

    PubMed

    Kulp, Marjean Taylor; Ying, Gui-Shuang; Huang, Jiayan; Maguire, Maureen; Quinn, Graham; Ciner, Elise B; Cyert, Lynn A; Orel-Bixler, Deborah A; Moore, Bruce D

    2014-03-06

    To evaluate, by receiver operating characteristic (ROC) analysis, the ability of noncycloplegic retinoscopy (NCR), Retinomax Autorefractor (Retinomax), and SureSight Vision Screener (SureSight) to detect significant refractive errors (RE) among preschoolers. Refraction results of eye care professionals using NCR, Retinomax, and SureSight (n = 2588) and of nurse and lay screeners using Retinomax and SureSight (n = 1452) were compared with masked cycloplegic retinoscopy results. Significant RE was defined as hyperopia greater than +3.25 diopters (D), myopia greater than 2.00 D, astigmatism greater than 1.50 D, and anisometropia greater than 1.00 D interocular difference in hyperopia, greater than 3.00 D interocular difference in myopia, or greater than 1.50 D interocular difference in astigmatism. The ability of each screening test to identify presence, type, and/or severity of significant RE was summarized by the area under the ROC curve (AUC) and calculated from weighted logistic regression models. For detection of each type of significant RE, AUC of each test was high; AUC was better for detecting the most severe levels of RE than for all REs considered important to detect (AUC 0.97-1.00 vs. 0.92-0.93). The area under the curve of each screening test was high for myopia (AUC 0.97-0.99). Noncycloplegic retinoscopy and Retinomax performed better than SureSight for hyperopia (AUC 0.92-0.99 and 0.90-0.98 vs. 0.85-0.94, P ≤ 0.02), Retinomax performed better than NCR for astigmatism greater than 1.50 D (AUC 0.95 vs. 0.90, P = 0.01), and SureSight performed better than Retinomax for anisometropia (AUC 0.85-1.00 vs. 0.76-0.96, P ≤ 0.07). Performance was similar for nurse and lay screeners in detecting any significant RE (AUC 0.92-1.00 vs. 0.92-0.99). Each test had a very high discriminatory power for detecting children with any significant RE.

  13. A voting-based statistical cylinder detection framework applied to fallen tree mapping in terrestrial laser scanning point clouds

    NASA Astrophysics Data System (ADS)

    Polewski, Przemyslaw; Yao, Wei; Heurich, Marco; Krzystek, Peter; Stilla, Uwe

    2017-07-01

    This paper introduces a statistical framework for detecting cylindrical shapes in dense point clouds. We target the application of mapping fallen trees in datasets obtained through terrestrial laser scanning. This is a challenging task due to the presence of ground vegetation, standing trees, DTM artifacts, as well as the fragmentation of dead trees into non-collinear segments. Our method shares the concept of voting in parameter space with the generalized Hough transform, however two of its significant drawbacks are improved upon. First, the need to generate samples on the shape's surface is eliminated. Instead, pairs of nearby input points lying on the surface cast a vote for the cylinder's parameters based on the intrinsic geometric properties of cylindrical shapes. Second, no discretization of the parameter space is required: the voting is carried out in continuous space by means of constructing a kernel density estimator and obtaining its local maxima, using automatic, data-driven kernel bandwidth selection. Furthermore, we show how the detected cylindrical primitives can be efficiently merged to obtain object-level (entire tree) semantic information using graph-cut segmentation and a tailored dynamic algorithm for eliminating cylinder redundancy. Experiments were performed on 3 plots from the Bavarian Forest National Park, with ground truth obtained through visual inspection of the point clouds. It was found that relative to sample consensus (SAC) cylinder fitting, the proposed voting framework can improve the detection completeness by up to 10 percentage points while maintaining the correctness rate.

  14. A scan statistic to extract causal gene clusters from case-control genome-wide rare CNV data.

    PubMed

    Nishiyama, Takeshi; Takahashi, Kunihiko; Tango, Toshiro; Pinto, Dalila; Scherer, Stephen W; Takami, Satoshi; Kishino, Hirohisa

    2011-05-26

    Several statistical tests have been developed for analyzing genome-wide association data by incorporating gene pathway information in terms of gene sets. Using these methods, hundreds of gene sets are typically tested, and the tested gene sets often overlap. This overlapping greatly increases the probability of generating false positives, and the results obtained are difficult to interpret, particularly when many gene sets show statistical significance. We propose a flexible statistical framework to circumvent these problems. Inspired by spatial scan statistics for detecting clustering of disease occurrence in the field of epidemiology, we developed a scan statistic to extract disease-associated gene clusters from a whole gene pathway. Extracting one or a few significant gene clusters from a global pathway limits the overall false positive probability, which results in increased statistical power, and facilitates the interpretation of test results. In the present study, we applied our method to genome-wide association data for rare copy-number variations, which have been strongly implicated in common diseases. Application of our method to a simulated dataset demonstrated the high accuracy of this method in detecting disease-associated gene clusters in a whole gene pathway. The scan statistic approach proposed here shows a high level of accuracy in detecting gene clusters in a whole gene pathway. This study has provided a sound statistical framework for analyzing genome-wide rare CNV data by incorporating topological information on the gene pathway.

  15. Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?

    NASA Astrophysics Data System (ADS)

    Vu, Minh Tue; Aribarg, Thannob; Supratid, Siriporn; Raghavan, Srivatsan V.; Liong, Shie-Yui

    2016-11-01

    Artificial neural network (ANN) is an established technique with a flexible mathematical structure that is capable of identifying complex nonlinear relationships between input and output data. The present study utilizes ANN as a method of statistically downscaling global climate models (GCMs) during the rainy season at meteorological site locations in Bangkok, Thailand. The study illustrates the applications of the feed forward back propagation using large-scale predictor variables derived from both the ERA-Interim reanalyses data and present day/future GCM data. The predictors are first selected over different grid boxes surrounding Bangkok region and then screened by using principal component analysis (PCA) to filter the best correlated predictors for ANN training. The reanalyses downscaled results of the present day climate show good agreement against station precipitation with a correlation coefficient of 0.8 and a Nash-Sutcliffe efficiency of 0.65. The final downscaled results for four GCMs show an increasing trend of precipitation for rainy season over Bangkok by the end of the twenty-first century. The extreme values of precipitation determined using statistical indices show strong increases of wetness. These findings will be useful for policy makers in pondering adaptation measures due to flooding such as whether the current drainage network system is sufficient to meet the changing climate and to plan for a range of related adaptation/mitigation measures.

  16. Statistical trend analysis and extreme distribution of significant wave height from 1958 to 1999 - an application to the Italian Seas

    NASA Astrophysics Data System (ADS)

    Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.

    2010-06-01

    The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 80's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The r-largest annual maxima method provides more reliable predictions of the extreme values especially for small return periods (<100 years). Finally, the study statistically proves the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.

  17. On the statistical significance of excess events: Remarks of caution and the need for a standard method of calculation

    NASA Technical Reports Server (NTRS)

    Staubert, R.

    1985-01-01

    Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.

  18. THE DETECTION AND STATISTICS OF GIANT ARCS BEHIND CLASH CLUSTERS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Xu, Bingxiao; Zheng, Wei; Postman, Marc

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ andmore » length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift z{sub s} = 1.9 with 33% of the detected arcs having z{sub s} > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c–M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.« less

  19. Automatic motion and noise artifact detection in Holter ECG data using empirical mode decomposition and statistical approaches.

    PubMed

    Lee, Jinseok; McManus, David D; Merchant, Sneh; Chon, Ki H

    2012-06-01

    We present a real-time method for the detection of motion and noise (MN) artifacts, which frequently interferes with accurate rhythm assessment when ECG signals are collected from Holter monitors. Our MN artifact detection approach involves two stages. The first stage involves the use of the first-order intrinsic mode function (F-IMF) from the empirical mode decomposition to isolate the artifacts' dynamics as they are largely concentrated in the higher frequencies. The second stage of our approach uses three statistical measures on the F-IMF time series to look for characteristics of randomness and variability, which are hallmark signatures of MN artifacts: the Shannon entropy, mean, and variance. We then use the receiver-operator characteristics curve on Holter data from 15 healthy subjects to derive threshold values associated with these statistical measures to separate between the clean and MN artifacts' data segments. With threshold values derived from 15 training data sets, we tested our algorithms on 30 additional healthy subjects. Our results show that our algorithms are able to detect the presence of MN artifacts with sensitivity and specificity of 96.63% and 94.73%, respectively. In addition, when we applied our previously developed algorithm for atrial fibrillation (AF) detection on those segments that have been labeled to be free from MN artifacts, the specificity increased from 73.66% to 85.04% without loss of sensitivity (74.48%-74.62%) on six subjects diagnosed with AF. Finally, the computation time was less than 0.2 s using a MATLAB code, indicating that real-time application of the algorithms is possible for Holter monitoring.

  20. Statistical reporting inconsistencies in experimental philosophy

    PubMed Central

    Colombo, Matteo; Duev, Georgi; Nuijten, Michèle B.; Sprenger, Jan

    2018-01-01

    Experimental philosophy (x-phi) is a young field of research in the intersection of philosophy and psychology. It aims to make progress on philosophical questions by using experimental methods traditionally associated with the psychological and behavioral sciences, such as null hypothesis significance testing (NHST). Motivated by recent discussions about a methodological crisis in the behavioral sciences, questions have been raised about the methodological standards of x-phi. Here, we focus on one aspect of this question, namely the rate of inconsistencies in statistical reporting. Previous research has examined the extent to which published articles in psychology and other behavioral sciences present statistical inconsistencies in reporting the results of NHST. In this study, we used the R package statcheck to detect statistical inconsistencies in x-phi, and compared rates of inconsistencies in psychology and philosophy. We found that rates of inconsistencies in x-phi are lower than in the psychological and behavioral sciences. From the point of view of statistical reporting consistency, x-phi seems to do no worse, and perhaps even better, than psychological science. PMID:29649220

  1. Sensitivity and Specificity of Interictal EEG-fMRI for Detecting the Ictal Onset Zone at Different Statistical Thresholds

    PubMed Central

    Tousseyn, Simon; Dupont, Patrick; Goffin, Karolien; Sunaert, Stefan; Van Paesschen, Wim

    2014-01-01

    There is currently a lack of knowledge about electroencephalography (EEG)-functional magnetic resonance imaging (fMRI) specificity. Our aim was to define sensitivity and specificity of blood oxygen level dependent (BOLD) responses to interictal epileptic spikes during EEG-fMRI for detecting the ictal onset zone (IOZ). We studied 21 refractory focal epilepsy patients who had a well-defined IOZ after a full presurgical evaluation and interictal spikes during EEG-fMRI. Areas of spike-related BOLD changes overlapping the IOZ in patients were considered as true positives; if no overlap was found, they were treated as false-negatives. Matched healthy case-controls had undergone similar EEG-fMRI in order to determine true-negative and false-positive fractions. The spike-related regressor of the patient was used in the design matrix of the healthy case-control. Suprathreshold BOLD changes in the brain of controls were considered as false positives, absence of these changes as true negatives. Sensitivity and specificity were calculated for different statistical thresholds at the voxel level combined with different cluster size thresholds and represented in receiver operating characteristic (ROC)-curves. Additionally, we calculated the ROC-curves based on the cluster containing the maximal significant activation. We achieved a combination of 100% specificity and 62% sensitivity, using a Z-threshold in the interval 3.4–3.5 and cluster size threshold of 350 voxels. We could obtain higher sensitivity at the expense of specificity. Similar performance was found when using the cluster containing the maximal significant activation. Our data provide a guideline for different EEG-fMRI settings with their respective sensitivity and specificity for detecting the IOZ. The unique cluster containing the maximal significant BOLD activation was a sensitive and specific marker of the IOZ. PMID:25101049

  2. Sigsearch: a new term for post hoc unplanned search for statistically significant relationships with the intent to create publishable findings.

    PubMed

    Hashim, Muhammad Jawad

    2010-09-01

    Post-hoc secondary data analysis with no prespecified hypotheses has been discouraged by textbook authors and journal editors alike. Unfortunately no single term describes this phenomenon succinctly. I would like to coin the term "sigsearch" to define this practice and bring it within the teaching lexicon of statistics courses. Sigsearch would include any unplanned, post-hoc search for statistical significance using multiple comparisons of subgroups. It would also include data analysis with outcomes other than the prespecified primary outcome measure of a study as well as secondary data analyses of earlier research.

  3. [Detection of serum anti-salivary duct antibody and its clinical significance].

    PubMed

    Zhang, H; Shi, G Y; Cai, X H

    1990-11-01

    The authors developed an indirect immunofluorescence technique for the detection of Anti-salivary duct antibody (ASDA) and screened 34 patients with rheumatoid arthritis, 15 patients with Sjögren's syndrome-rheumatoid arthritis and 15 patients with primary Sjögren's syndrome, 63 cases with other connective tissue diseases, 9 cases with other diseases and 40 normal controls. The incidence of ASDA in patients with Sjögren's syndrome rheumatoid arthritis (66.67%) or rheumatoid arthritis (32.35%) was significantly higher than that in normal controls (P less than 0.001). In patients with primary Sjögren's syndrome and other CTDs, non-CTDs, no ASDA was found. However, in patients with Sjögren's syndrome-rheumatoid arthritis or rheumatoid arthritis alone, ASDA was not correlated with age, sex, disease duration or serological findings. The result suggests that the detection of serum ASDA might be useful in the differentiation of Sjögren's syndrome with rheumatoid arthritis from primary Sjögren's syndrome with arthralgia and/or arthritis.

  4. The t-CWT: a new ERP detection and quantification method based on the continuous wavelet transform and Student's t-statistics.

    PubMed

    Bostanov, Vladimir; Kotchoubey, Boris

    2006-12-01

    This study was aimed at developing a method for extraction and assessment of event-related brain potentials (ERP) from single-trials. This method should be applicable in the assessment of single persons' ERPs and should be able to handle both single ERP components and whole waveforms. We adopted a recently developed ERP feature extraction method, the t-CWT, for the purposes of hypothesis testing in the statistical assessment of ERPs. The t-CWT is based on the continuous wavelet transform (CWT) and Student's t-statistics. The method was tested in two ERP paradigms, oddball and semantic priming, by assessing individual-participant data on a single-trial basis, and testing the significance of selected ERP components, P300 and N400, as well as of whole ERP waveforms. The t-CWT was also compared to other univariate and multivariate ERP assessment methods: peak picking, area computation, discrete wavelet transform (DWT) and principal component analysis (PCA). The t-CWT produced better results than all of the other assessment methods it was compared with. The t-CWT can be used as a reliable and powerful method for ERP-component detection and testing of statistical hypotheses concerning both single ERP components and whole waveforms extracted from either single persons' or group data. The t-CWT is the first such method based explicitly on the criteria of maximal statistical difference between two average ERPs in the time-frequency domain and is particularly suitable for ERP assessment of individual data (e.g. in clinical settings), but also for the investigation of small and/or novel ERP effects from group data.

  5. [Effect sizes, statistical power and sample sizes in "the Japanese Journal of Psychology"].

    PubMed

    Suzukawa, Yumi; Toyoda, Hideki

    2012-04-01

    This study analyzed the statistical power of research studies published in the "Japanese Journal of Psychology" in 2008 and 2009. Sample effect sizes and sample statistical powers were calculated for each statistical test and analyzed with respect to the analytical methods and the fields of the studies. The results show that in the fields like perception, cognition or learning, the effect sizes were relatively large, although the sample sizes were small. At the same time, because of the small sample sizes, some meaningful effects could not be detected. In the other fields, because of the large sample sizes, meaningless effects could be detected. This implies that researchers who could not get large enough effect sizes would use larger samples to obtain significant results.

  6. Testing statistical isotropy in cosmic microwave background polarization maps

    NASA Astrophysics Data System (ADS)

    Rath, Pranati K.; Samal, Pramoda Kumar; Panda, Srikanta; Mishra, Debesh D.; Aluri, Pavan K.

    2018-04-01

    We apply our symmetry based Power tensor technique to test conformity of PLANCK Polarization maps with statistical isotropy. On a wide range of angular scales (l = 40 - 150), our preliminary analysis detects many statistically anisotropic multipoles in foreground cleaned full sky PLANCK polarization maps viz., COMMANDER and NILC. We also study the effect of residual foregrounds that may still be present in the Galactic plane using both common UPB77 polarization mask, as well as the individual component separation method specific polarization masks. However, some of the statistically anisotropic modes still persist, albeit significantly in NILC map. We further probed the data for any coherent alignments across multipoles in several bins from the chosen multipole range.

  7. Quantitative analysis of trace levels of surface contamination by X-ray photoelectron spectroscopy Part I: statistical uncertainty near the detection limit.

    PubMed

    Hill, Shannon B; Faradzhev, Nadir S; Powell, Cedric J

    2017-12-01

    We discuss the problem of quantifying common sources of statistical uncertainties for analyses of trace levels of surface contamination using X-ray photoelectron spectroscopy. We examine the propagation of error for peak-area measurements using common forms of linear and polynomial background subtraction including the correlation of points used to determine both background and peak areas. This correlation has been neglected in previous analyses, but we show that it contributes significantly to the peak-area uncertainty near the detection limit. We introduce the concept of relative background subtraction variance (RBSV) which quantifies the uncertainty introduced by the method of background determination relative to the uncertainty of the background area itself. The uncertainties of the peak area and atomic concentration and of the detection limit are expressed using the RBSV, which separates the contributions from the acquisition parameters, the background-determination method, and the properties of the measured spectrum. These results are then combined to find acquisition strategies that minimize the total measurement time needed to achieve a desired detection limit or atomic-percentage uncertainty for a particular trace element. Minimization of data-acquisition time is important for samples that are sensitive to x-ray dose and also for laboratories that need to optimize throughput.

  8. MO-DE-207A-01: Impact of Statistical Weights On Detection of Low-Contrast Details in Model-Based Iterative CT Reconstruction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Noo, F; Guo, Z

    2016-06-15

    Purpose: Penalized-weighted least-square reconstruction has become an important research topic in CT, to reduce dose without affecting image quality. Two components impact image quality in this reconstruction: the statistical weights and the use of an edge-preserving penalty term. We are interested in assessing the influence of statistical weights on their own, without the edge-preserving feature. Methods: The influence of statistical weights on image quality was assessed in terms of low-contrast detail detection using LROC analysis. The task amounted to detect and localize a 6-mm lesion with random contrast inside the FORBILD head phantom. A two-alternative forced-choice experiment was used withmore » two human observers performing the task. Reconstructions without and with statistical weights were compared, both using the same quadratic penalty term. The beam energy was set to 30keV to amplify spatial differences in attenuation and thereby the role of statistical weights. A fan-beam data acquisition geometry was used. Results: Visual inspection of images clearly showed a difference in noise between the two reconstructions methods. As expected, the reconstruction without statistical weights exhibited noise streaks. The other reconstruction appeared better in this aspect, but presented other disturbing noise patterns and artifacts induced by the weights. The LROC analysis yield the following 95-percent confidence interval for the difference in reader-averaged AUC (reconstruction without weights minus reconstruction with weights): [0.0026,0.0599]. The mean AUC value was 0.9094. Conclusion: We have investigated the impact of statistical weights without the use of edge-preserving penalty in penalized weighted least-square reconstruction. A decrease rather than increase in image quality was observed when using statistical weights. Thus, the observers were better able to cope with the noise streaks than the noise patterns and artifacts induced by the statistical

  9. A novel balloon colonoscope detects significantly more simulated polyps than a standard colonoscope in a colon model.

    PubMed

    Hasan, Nazia; Gross, Seth A; Gralnek, Ian M; Pochapin, Mark; Kiesslich, Ralf; Halpern, Zamir

    2014-12-01

    Although standard colonoscopy is considered the optimal test to detect adenomas, it can have a significant adenoma miss rate. A major contributing factor to high miss rates is the inability to visualize adenomas behind haustral folds and at anatomic flexures. To compare the diagnostic yield of balloon-assisted colonoscopy versus standard colonoscopy in the detection of simulated polyps in a colon model. Prospective, cohort study. International gastroenterology meeting. A colon model composed of elastic material, which mimics the flexible structure of haustral folds, allowing for dynamic responses to balloon inflation, with embedded simulated colon polyps (n = 12 silicone "polyps"). Fifty gastroenterologists were recruited to identify simulated colon polyps in a colon model, first using standard colonoscopy immediately followed by balloon-assisted colonoscopy. Detection of simulated polyps. The median polyp detection rate for all simulated polyps was significantly higher with balloon-assisted as compared with standard colonoscopy (91.7% vs 45.8%, respectively; P < .0001). The significantly higher simulated polyp detection rate with balloon-assisted versus standard colonoscopy was notable both for non-obscured polyps (100.0% vs 75.0%; P < .0001) and obscured polyps (88.0% vs 25.0%; P < .0001). Non-randomized design, use of a colon model, and simulated colon polyps. As compared with standard colonoscopy, balloon-assisted colonoscopy detected significantly more obscured and non-obscured simulated polyps in a colon model. Clinical studies in human participants are being pursued to further evaluate this new colonoscopic technology. Copyright © 2014 American Society for Gastrointestinal Endoscopy. Published by Elsevier Inc. All rights reserved.

  10. Expression of NF-κB and PTEN in osteosarcoma and its clinical significance

    PubMed Central

    Gong, Teng; Su, Xuetao; Xia, Qun; Wang, Jinggui; Kan, Shilian

    2017-01-01

    We investigated the role of nuclear factor-κB (NF-κB) and phosphatase and tensin homolog deleted in chromosome 10 (PTEN) in the pathogenesis of osteosarcoma and its relationship with prognosis. Immunohistochemical method was used to detect the expression of NF-κB and PTEN in osteosarcoma and adjacent tissues. RT-PCR was used to detect the expression of NF-κB and PTEN mRNA in osteosarcoma and adjacent tissues. Western blotting was used to detect the expression of NF-κB and PTEN in osteosarcoma and adjacent tissues and compare their differences. The expression of NF-κB and PTEN was detected in osteosarcoma and adjacent tissues. The positive rate of NF-κB was 75.3 and 32.9%, respectively; while the positive rate of PTEN was 67.1 and 90.4%, respectively. The positive expression of NF-κB and PTEN was statistically significant. There was a negative correlation between NF-κB and PTEN expression (r=−0.502, p<0.05). The positive and negative expression of NF-κB and PTEN was statistically significant for the five-year survival (p<0.05). At gene and protein level, osteosarcoma tissues had higher expression of NF-κB, and lower expression of PTEN, which was significantly different from the adjacent tissues. In osteosarcoma, NF-κB is highly expressed, but PTEN is expressed at low level, and the two are negatively correlated. This is of great significance for the early diagnosis of osteosarcoma and prognosis. PMID:29151913

  11. A comparison of Probability Of Detection (POD) data determined using different statistical methods

    NASA Astrophysics Data System (ADS)

    Fahr, A.; Forsyth, D.; Bullock, M.

    1993-12-01

    Different statistical methods have been suggested for determining probability of detection (POD) data for nondestructive inspection (NDI) techniques. A comparative assessment of various methods of determining POD was conducted using results of three NDI methods obtained by inspecting actual aircraft engine compressor disks which contained service induced cracks. The study found that the POD and 95 percent confidence curves as a function of crack size as well as the 90/95 percent crack length vary depending on the statistical method used and the type of data. The distribution function as well as the parameter estimation procedure used for determining POD and the confidence bound must be included when referencing information such as the 90/95 percent crack length. The POD curves and confidence bounds determined using the range interval method are very dependent on information that is not from the inspection data. The maximum likelihood estimators (MLE) method does not require such information and the POD results are more reasonable. The log-logistic function appears to model POD of hit/miss data relatively well and is easy to implement. The log-normal distribution using MLE provides more realistic POD results and is the preferred method. Although it is more complicated and slower to calculate, it can be implemented on a common spreadsheet program.

  12. Using scan statistics for congenital anomalies surveillance: the EUROCAT methodology.

    PubMed

    Teljeur, Conor; Kelly, Alan; Loane, Maria; Densem, James; Dolk, Helen

    2015-11-01

    Scan statistics have been used extensively to identify temporal clusters of health events. We describe the temporal cluster detection methodology adopted by the EUROCAT (European Surveillance of Congenital Anomalies) monitoring system. Since 2001, EUROCAT has implemented variable window width scan statistic for detecting unusual temporal aggregations of congenital anomaly cases. The scan windows are based on numbers of cases rather than being defined by time. The methodology is imbedded in the EUROCAT Central Database for annual application to centrally held registry data. The methodology was incrementally adapted to improve the utility and to address statistical issues. Simulation exercises were used to determine the power of the methodology to identify periods of raised risk (of 1-18 months). In order to operationalize the scan methodology, a number of adaptations were needed, including: estimating date of conception as unit of time; deciding the maximum length (in time) and recency of clusters of interest; reporting of multiple and overlapping significant clusters; replacing the Monte Carlo simulation with a lookup table to reduce computation time; and placing a threshold on underlying population change and estimating the false positive rate by simulation. Exploration of power found that raised risk periods lasting 1 month are unlikely to be detected except when the relative risk and case counts are high. The variable window width scan statistic is a useful tool for the surveillance of congenital anomalies. Numerous adaptations have improved the utility of the original methodology in the context of temporal cluster detection in congenital anomalies.

  13. Improving the Crossing-SIBTEST Statistic for Detecting Non-uniform DIF.

    PubMed

    Chalmers, R Philip

    2018-06-01

    This paper demonstrates that, after applying a simple modification to Li and Stout's (Psychometrika 61(4):647-677, 1996) CSIBTEST statistic, an improved variant of the statistic could be realized. It is shown that this modified version of CSIBTEST has a more direct association with the SIBTEST statistic presented by Shealy and Stout (Psychometrika 58(2):159-194, 1993). In particular, the asymptotic sampling distributions and general interpretation of the effect size estimates are the same for SIBTEST and the new CSIBTEST. Given the more natural connection to SIBTEST, it is shown that Li and Stout's hypothesis testing approach is insufficient for CSIBTEST; thus, an improved hypothesis testing procedure is required. Based on the presented arguments, a new chi-squared-based hypothesis testing approach is proposed for the modified CSIBTEST statistic. Positive results from a modest Monte Carlo simulation study strongly suggest the original CSIBTEST procedure and randomization hypothesis testing approach should be replaced by the modified statistic and hypothesis testing method.

  14. Statistical discrimination of induced and tectonic earthquake sequences in Central and Eastern US based on waveform detected catalogs

    NASA Astrophysics Data System (ADS)

    Meng, X.; Daniels, C.; Smith, E.; Peng, Z.; Chen, X.; Wagner, L. S.; Fischer, K. M.; Hawman, R. B.

    2015-12-01

    Since 2001, the number of M>3 earthquakes increased significantly in Central and Eastern United States (CEUS), likely due to waste-water injection, also known as "induced earthquakes" [Ellsworth, 2013]. Because induced earthquakes are driven by short-term external forcing and hence may behave like earthquake swarms, which are not well characterized by branching point-process models, such as the Epidemic Type Aftershock Sequence (ETAS) model [Ogata, 1988]. In this study we focus on the 02/15/2014 M4.1 South Carolina and the 06/16/2014 M4.3 Oklahoma earthquakes, which likely represent intraplate tectonic and induced events, respectively. For the South Carolina event, only one M3.0 aftershock is identified by the ANSS catalog, which may be caused by a lack of low-magnitude events in this catalog. We apply a recently developed matched filter technique to detect earthquakes from 02/08/2014 to 02/22/2014 around the epicentral region. 15 seismic stations (both permanent and temporary USArray networks) within 100 km of the mainshock are used for detection. The mainshock and aftershock are used as templates for the initial detection. Newly detected events are employed as new templates, and the same detection procedure repeats until no new event can be added. Overall we have identified more than 10 events, including one foreshock occurred ~11 min before the M4.1 mainshock. However, the numbers of aftershocks are still much less than predicted with the modified Bath's law. For the Oklahoma event, we use 1270 events from the ANSS catalog and 182 events from a relocated catalog as templates to scan through continuous recordings 3 days before to 7 days after the mainshock. 12 seismic stations within the vicinity of the mainshock are included in the study. After obtaining more complete catalogs for both sequences, we plan to compare the statistical parameters (e.g., b, a, K, and p values) between the two sequences, as well as their spatial-temporal migration pattern, which may

  15. Typhoid fever acquired in the United States, 1999-2010: epidemiology, microbiology, and use of a space-time scan statistic for outbreak detection.

    PubMed

    Imanishi, M; Newton, A E; Vieira, A R; Gonzalez-Aviles, G; Kendall Scott, M E; Manikonda, K; Maxwell, T N; Halpin, J L; Freeman, M M; Medalla, F; Ayers, T L; Derado, G; Mahon, B E; Mintz, E D

    2015-08-01

    Although rare, typhoid fever cases acquired in the United States continue to be reported. Detection and investigation of outbreaks in these domestically acquired cases offer opportunities to identify chronic carriers. We searched surveillance and laboratory databases for domestically acquired typhoid fever cases, used a space-time scan statistic to identify clusters, and classified clusters as outbreaks or non-outbreaks. From 1999 to 2010, domestically acquired cases accounted for 18% of 3373 reported typhoid fever cases; their isolates were less often multidrug-resistant (2% vs. 15%) compared to isolates from travel-associated cases. We identified 28 outbreaks and two possible outbreaks within 45 space-time clusters of ⩾2 domestically acquired cases, including three outbreaks involving ⩾2 molecular subtypes. The approach detected seven of the ten outbreaks published in the literature or reported to CDC. Although this approach did not definitively identify any previously unrecognized outbreaks, it showed the potential to detect outbreaks of typhoid fever that may escape detection by routine analysis of surveillance data. Sixteen outbreaks had been linked to a carrier. Every case of typhoid fever acquired in a non-endemic country warrants thorough investigation. Space-time scan statistics, together with shoe-leather epidemiology and molecular subtyping, may improve outbreak detection.

  16. Statistical behavior of ten million experimental detection limits

    NASA Astrophysics Data System (ADS)

    Voigtman, Edward; Abraham, Kevin T.

    2011-02-01

    Using a lab-constructed laser-excited fluorimeter, together with bootstrapping methodology, the authors have generated many millions of experimental linear calibration curves for the detection of rhodamine 6G tetrafluoroborate in ethanol solutions. The detection limits computed from them are in excellent agreement with both previously published theory and with comprehensive Monte Carlo computer simulations. Currie decision levels and Currie detection limits, each in the theoretical, chemical content domain, were found to be simply scaled reciprocals of the non-centrality parameter of the non-central t distribution that characterizes univariate linear calibration curves that have homoscedastic, additive Gaussian white noise. Accurate and precise estimates of the theoretical, content domain Currie detection limit for the experimental system, with 5% (each) probabilities of false positives and false negatives, are presented.

  17. STATISTICS OF GAMMA-RAY POINT SOURCES BELOW THE FERMI DETECTION LIMIT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malyshev, Dmitry; Hogg, David W., E-mail: dm137@nyu.edu

    2011-09-10

    An analytic relation between the statistics of photons in pixels and the number counts of multi-photon point sources is used to constrain the distribution of gamma-ray point sources below the Fermi detection limit at energies above 1 GeV and at latitudes below and above 30 deg. The derived source-count distribution is consistent with the distribution found by the Fermi Collaboration based on the first Fermi point-source catalog. In particular, we find that the contribution of resolved and unresolved active galactic nuclei (AGNs) to the total gamma-ray flux is below 20%-25%. In the best-fit model, the AGN-like point-source fraction is 17%more » {+-} 2%. Using the fact that the Galactic emission varies across the sky while the extragalactic diffuse emission is isotropic, we put a lower limit of 51% on Galactic diffuse emission and an upper limit of 32% on the contribution from extragalactic weak sources, such as star-forming galaxies. Possible systematic uncertainties are discussed.« less

  18. Detection of Cutting Tool Wear using Statistical Analysis and Regression Model

    NASA Astrophysics Data System (ADS)

    Ghani, Jaharah A.; Rizal, Muhammad; Nuawi, Mohd Zaki; Haron, Che Hassan Che; Ramli, Rizauddin

    2010-10-01

    This study presents a new method for detecting the cutting tool wear based on the measured cutting force signals. A statistical-based method called Integrated Kurtosis-based Algorithm for Z-Filter technique, called I-kaz was used for developing a regression model and 3D graphic presentation of I-kaz 3D coefficient during machining process. The machining tests were carried out using a CNC turning machine Colchester Master Tornado T4 in dry cutting condition. A Kistler 9255B dynamometer was used to measure the cutting force signals, which were transmitted, analyzed, and displayed in the DasyLab software. Various force signals from machining operation were analyzed, and each has its own I-kaz 3D coefficient. This coefficient was examined and its relationship with flank wear lands (VB) was determined. A regression model was developed due to this relationship, and results of the regression model shows that the I-kaz 3D coefficient value decreases as tool wear increases. The result then is used for real time tool wear monitoring.

  19. Fighting bias with statistics: Detecting gender differences in responses to items on a preschool science assessment

    NASA Astrophysics Data System (ADS)

    Greenberg, Ariela Caren

    Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.

  20. The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

    PubMed

    Fukuda, Haruhisa; Kuroki, Manabu

    2016-03-01

    To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.

  1. Optimal statistical damage detection and classification in an experimental wind turbine blade using minimum instrumentation

    NASA Astrophysics Data System (ADS)

    Hoell, Simon; Omenzetter, Piotr

    2017-04-01

    The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.

  2. A generalised significance test for individual communities in networks.

    PubMed

    Kojaku, Sadamori; Masuda, Naoki

    2018-05-09

    Many empirical networks have community structure, in which nodes are densely interconnected within each community (i.e., a group of nodes) and sparsely across different communities. Like other local and meso-scale structure of networks, communities are generally heterogeneous in various aspects such as the size, density of edges, connectivity to other communities and significance. In the present study, we propose a method to statistically test the significance of individual communities in a given network. Compared to the previous methods, the present algorithm is unique in that it accepts different community-detection algorithms and the corresponding quality function for single communities. The present method requires that a quality of each community can be quantified and that community detection is performed as optimisation of such a quality function summed over the communities. Various community detection algorithms including modularity maximisation and graph partitioning meet this criterion. Our method estimates a distribution of the quality function for randomised networks to calculate a likelihood of each community in the given network. We illustrate our algorithm by synthetic and empirical networks.

  3. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development.

    PubMed

    Szabo, Linda; Morey, Robert; Palpant, Nathan J; Wang, Peter L; Afari, Nastaran; Jiang, Chuan; Parast, Mana M; Murry, Charles E; Laurent, Louise C; Salzman, Julia

    2015-06-16

    The pervasive expression of circular RNA is a recently discovered feature of gene expression in highly diverged eukaryotes, but the functions of most circular RNAs are still unknown. Computational methods to discover and quantify circular RNA are essential. Moreover, discovering biological contexts where circular RNAs are regulated will shed light on potential functional roles they may play. We present a new algorithm that increases the sensitivity and specificity of circular RNA detection by discovering and quantifying circular and linear RNA splicing events at both annotated and un-annotated exon boundaries, including intergenic regions of the genome, with high statistical confidence. Unlike approaches that rely on read count and exon homology to determine confidence in prediction of circular RNA expression, our algorithm uses a statistical approach. Using our algorithm, we unveiled striking induction of general and tissue-specific circular RNAs, including in the heart and lung, during human fetal development. We discover regions of the human fetal brain, such as the frontal cortex, with marked enrichment for genes where circular RNA isoforms are dominant. The vast majority of circular RNA production occurs at major spliceosome splice sites; however, we find the first examples of developmentally induced circular RNAs processed by the minor spliceosome, and an enriched propensity of minor spliceosome donors to splice into circular RNA at un-annotated, rather than annotated, exons. Together, these results suggest a potentially significant role for circular RNA in human development.

  4. Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.

    PubMed

    McCallum, Kenneth J; Ionita-Laza, Iuliana

    2015-12-01

    Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.

  5. CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis.

    PubMed

    Wang, Hong-Qiang; Tsai, Chung-Jui

    2013-01-01

    With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for

  6. Statistics of software vulnerability detection in certification testing

    NASA Astrophysics Data System (ADS)

    Barabanov, A. V.; Markov, A. S.; Tsirlov, V. L.

    2018-05-01

    The paper discusses practical aspects of introduction of the methods to detect software vulnerability in the day-to-day activities of the accredited testing laboratory. It presents the approval results of the vulnerability detection methods as part of the study of the open source software and the software that is a test object of the certification tests under information security requirements, including software for communication networks. Results of the study showing the allocation of identified vulnerabilities by types of attacks, country of origin, programming languages used in the development, methods for detecting vulnerability, etc. are given. The experience of foreign information security certification systems related to the detection of certified software vulnerabilities is analyzed. The main conclusion based on the study is the need to implement practices for developing secure software in the development life cycle processes. The conclusions and recommendations for the testing laboratories on the implementation of the vulnerability analysis methods are laid down.

  7. Quantifying spatial and temporal trends in beach-dune volumetric changes using spatial statistics

    NASA Astrophysics Data System (ADS)

    Eamer, Jordan B. R.; Walker, Ian J.

    2013-06-01

    Spatial statistics are generally underutilized in coastal geomorphology, despite offering great potential for identifying and quantifying spatial-temporal trends in landscape morphodynamics. In particular, local Moran's Ii provides a statistical framework for detecting clusters of significant change in an attribute (e.g., surface erosion or deposition) and quantifying how this changes over space and time. This study analyzes and interprets spatial-temporal patterns in sediment volume changes in a beach-foredune-transgressive dune complex following removal of invasive marram grass (Ammophila spp.). Results are derived by detecting significant changes in post-removal repeat DEMs derived from topographic surveys and airborne LiDAR. The study site was separated into discrete, linked geomorphic units (beach, foredune, transgressive dune complex) to facilitate sub-landscape scale analysis of volumetric change and sediment budget responses. Difference surfaces derived from a pixel-subtraction algorithm between interval DEMs and the LiDAR baseline DEM were filtered using the local Moran's Ii method and two different spatial weights (1.5 and 5 m) to detect statistically significant change. Moran's Ii results were compared with those derived from a more spatially uniform statistical method that uses a simpler student's t distribution threshold for change detection. Morphodynamic patterns and volumetric estimates were similar between the uniform geostatistical method and Moran's Ii at a spatial weight of 5 m while the smaller spatial weight (1.5 m) consistently indicated volumetric changes of less magnitude. The larger 5 m spatial weight was most representative of broader site morphodynamics and spatial patterns while the smaller spatial weight provided volumetric changes consistent with field observations. All methods showed foredune deflation immediately following removal with increased sediment volumes into the spring via deposition at the crest and on lobes in the lee

  8. Significance of Viable but Nonculturable Escherichia coli: Induction, Detection, and Control.

    PubMed

    Ding, Tian; Suo, Yuanjie; Xiang, Qisen; Zhao, Xihong; Chen, Shiguo; Ye, Xingqian; Liu, Donghong

    2017-03-28

    Diseases caused by foodborne or waterborne pathogens are emerging. Many pathogens can enter into the viable but nonculturable (VBNC) state, which is a survival strategy when exposed to harsh environmental stresses. Pathogens in the VBNC state have the ability to evade conventional microbiological detection methods, posing a significant and potential health risk. Therefore, controlling VBNC bacteria in food processing and the environment is of great importance. As the typical one of the gram-negatives, Escherichia coli ( E. coli ) is a widespread foodborne and waterborne pathogenic bacterium and is able to enter into a VBNC state in extreme conditions (similar to the other gram-negative bacteria), including inducing factors and resuscitation stimulus. VBNC E. coli has the ability to recover both culturability and pathogenicity, which may bring potential health risk. This review describes the concrete factors (nonthermal treatment, chemical agents, and environmental factors) that induce E. coli into the VBNC state, the condition or stimulus required for resuscitation of VBNC E. coli , and the methods for detecting VBNC E. coli . Furthermore, the mechanism of genes and proteins involved in the VBNC E. coli is also discussed in this review.

  9. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  10. Energy Monitoring and Targeting as diagnosis; Applying work analysis to adapt a statistical change detection strategy using representation aiding

    NASA Astrophysics Data System (ADS)

    Hilliard, Antony

    Energy Monitoring and Targeting is a well-established business process that develops information about utility energy consumption in a business or institution. While M&T has persisted as a worthwhile energy conservation support activity, it has not been widely adopted. This dissertation explains M&T challenges in terms of diagnosing and controlling energy consumption, informed by a naturalistic field study of M&T work. A Cognitive Work Analysis of M&T identifies structures that diagnosis can search, information flows un-supported in canonical support tools, and opportunities to extend the most popular tool for MM&T: Cumulative Sum of Residuals (CUSUM) charts. A design application outlines how CUSUM charts were augmented with a more contemporary statistical change detection strategy, Recursive Parameter Estimates, modified to better suit the M&T task using Representation Aiding principles. The design was experimentally evaluated in a controlled M&T synthetic task, and was shown to significantly improve diagnosis performance.

  11. Typhoid fever acquired in the United States, 1999–2010: epidemiology, microbiology, and use of a space–time scan statistic for outbreak detection

    PubMed Central

    IMANISHI, M.; NEWTON, A. E.; VIEIRA, A. R.; GONZALEZ-AVILES, G.; KENDALL SCOTT, M. E.; MANIKONDA, K.; MAXWELL, T. N.; HALPIN, J. L.; FREEMAN, M. M.; MEDALLA, F.; AYERS, T. L.; DERADO, G.; MAHON, B. E.; MINTZ, E. D.

    2016-01-01

    SUMMARY Although rare, typhoid fever cases acquired in the United States continue to be reported. Detection and investigation of outbreaks in these domestically acquired cases offer opportunities to identify chronic carriers. We searched surveillance and laboratory databases for domestically acquired typhoid fever cases, used a space–time scan statistic to identify clusters, and classified clusters as outbreaks or non-outbreaks. From 1999 to 2010, domestically acquired cases accounted for 18% of 3373 reported typhoid fever cases; their isolates were less often multidrug-resistant (2% vs. 15%) compared to isolates from travel-associated cases. We identified 28 outbreaks and two possible outbreaks within 45 space–time clusters of ⩾2 domestically acquired cases, including three outbreaks involving ⩾2 molecular subtypes. The approach detected seven of the ten outbreaks published in the literature or reported to CDC. Although this approach did not definitively identify any previously unrecognized outbreaks, it showed the potential to detect outbreaks of typhoid fever that may escape detection by routine analysis of surveillance data. Sixteen outbreaks had been linked to a carrier. Every case of typhoid fever acquired in a non-endemic country warrants thorough investigation. Space–time scan statistics, together with shoe-leather epidemiology and molecular subtyping, may improve outbreak detection. PMID:25427666

  12. The uriscreen test to detect significant asymptomatic bacteriuria during pregnancy.

    PubMed

    Teppa, Roberto J; Roberts, James M

    2005-01-01

    Asymptomatic bacteriuria (ASB) occurs in 2-11% of pregnancies and it is a clear predisposition to the development of acute pyelonephritis, which, in turn, poses risk to mother and fetus. Treatment of bacteriuria during pregnancy reduces the incidence of pyelonephritis. Therefore, it is recommended to screen for ASB at the first prenatal visit. The gold standard for detection of bacteriuria during pregnancy is urine culture, but this test is expensive, time-consuming, and labor-intensive. To determine the reliability of an enzymatic urine screening test (Uriscreen; Savyon Diagnostics, Ashdod, Israel) for detecting ASB in pregnancy. Catheterized urine samples were collected from 150 women who had routine prenatal screening for ASB. Patients with urinary symptoms, active vaginal bleeding, or who were previously on antibiotics therapy were excluded from the study. Sensitivity, specificity, and the positive and negative predictive values for the Uriscreen were estimated using urine culture as the criterion standard. Urine cultures were considered positive if they grew >10(5) colony-forming units of a single uropathogen. Twenty-eight women (18.7%) had urine culture results indicating significant bacteriuria, and 17 of these 28 specimens had positive enzyme activity. Of 122 samples with no growth, 109 had negative enzyme activity. Sensitivity, specificity, and positive and negative predictive values for the Uriscreen test were 60.7% (+/-18.1), 89.3% (+/-5.6), 56.6%, and 90.8%, respectively. The Uriscreen test had inadequate sensitivity for rapid screening of bacteriuria in pregnancy.

  13. No significant association between prenatal exposure poliovirus epidemics and psychosis.

    PubMed

    Cahill, Matthew; Chant, David; Welham, Joy; McGrath, John

    2002-06-01

    To examine the association between prenatal exposure to poliovirus infection and later development of schizophrenia or affective psychosis in a Southern Hemisphere psychiatric register. We calculated rates of poliomyelitis cases per 10 000 background population and rates for schizophrenia(n = 6078) and affective psychosis (n = 3707)per 10 000 births for the period 1930-1964. Empirically weighted regression was used to measure the association between a given psychosis birth-rate and a poliomyelitis epidemic during gestation. There was no statistically significant association between exposure to a poliomyelitis epidemic during gestation and subsequent development of schizophrenia or affective psychosis. The lack of a consistent statistically significant association between poliovirus epidemics and schizophrenia suggests that either poliovirus may have a small effect which is only detectable with large data-sets and/or the effect may be modified by location. Further investigation of such inconsistencies may help elucidate candidate risk-modifying factors for schizophrenia.

  14. Statistical trend analysis and extreme distribution of significant wave height from 1958 to 1999 - an application to the Italian Seas

    NASA Astrophysics Data System (ADS)

    Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.

    2009-09-01

    The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 70's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The study shows the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.

  15. Evaluation of a regional monitoring program's statistical power to detect temporal trends in forest health indicators

    USGS Publications Warehouse

    Perles, Stephanie J.; Wagner, Tyler; Irwin, Brian J.; Manning, Douglas R.; Callahan, Kristina K.; Marshall, Matthew R.

    2014-01-01

    Forests are socioeconomically and ecologically important ecosystems that are exposed to a variety of natural and anthropogenic stressors. As such, monitoring forest condition and detecting temporal changes therein remain critical to sound public and private forestland management. The National Parks Service’s Vital Signs monitoring program collects information on many forest health indicators, including species richness, cover by exotics, browse pressure, and forest regeneration. We applied a mixed-model approach to partition variability in data for 30 forest health indicators collected from several national parks in the eastern United States. We then used the estimated variance components in a simulation model to evaluate trend detection capabilities for each indicator. We investigated the extent to which the following factors affected ability to detect trends: (a) sample design: using simple panel versus connected panel design, (b) effect size: increasing trend magnitude, (c) sample size: varying the number of plots sampled each year, and (d) stratified sampling: post-stratifying plots into vegetation domains. Statistical power varied among indicators; however, indicators that measured the proportion of a total yielded higher power when compared to indicators that measured absolute or average values. In addition, the total variability for an indicator appeared to influence power to detect temporal trends more than how total variance was partitioned among spatial and temporal sources. Based on these analyses and the monitoring objectives of theVital Signs program, the current sampling design is likely overly intensive for detecting a 5 % trend·year−1 for all indicators and is appropriate for detecting a 1 % trend·year−1 in most indicators.

  16. SU-F-BRD-05: Dosimetric Comparison of Protocol-Based SBRT Lung Treatment Modalities: Statistically Significant VMAT Advantages Over Fixed- Beam IMRT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Best, R; Harrell, A; Geesey, C

    2014-06-15

    Purpose: The purpose of this study is to inter-compare and find statistically significant differences between flattened field fixed-beam (FB) IMRT with flattening-filter free (FFF) volumetric modulated arc therapy (VMAT) for stereotactic body radiation therapy SBRT. Methods: SBRT plans using FB IMRT and FFF VMAT were generated for fifteen SBRT lung patients using 6 MV beams. For each patient, both IMRT and VMAT plans were created for comparison. Plans were generated utilizing RTOG 0915 (peripheral, 10 patients) and RTOG 0813 (medial, 5 patients) lung protocols. Target dose, critical structure dose, and treatment time were compared and tested for statistical significance. Parametersmore » of interest included prescription isodose surface coverage, target dose heterogeneity, high dose spillage (location and volume), low dose spillage (location and volume), lung dose spillage, and critical structure maximum- and volumetric-dose limits. Results: For all criteria, we found equivalent or higher conformality with VMAT plans as well as reduced critical structure doses. Several differences passed a Student's t-test of significance: VMAT reduced the high dose spillage, evaluated with conformality index (CI), by an average of 9.4%±15.1% (p=0.030) compared to IMRT. VMAT plans reduced the lung volume receiving 20 Gy by 16.2%±15.0% (p=0.016) compared with IMRT. For the RTOG 0915 peripheral lesions, the volumes of lung receiving 12.4 Gy and 11.6 Gy were reduced by 27.0%±13.8% and 27.5%±12.6% (for both, p<0.001) in VMAT plans. Of the 26 protocol pass/fail criteria, VMAT plans were able to achieve an average of 0.2±0.7 (p=0.026) more constraints than the IMRT plans. Conclusions: FFF VMAT has dosimetric advantages over fixed beam IMRT for lung SBRT. Significant advantages included increased dose conformity, and reduced organs-at-risk doses. The overall improvements in terms of protocol pass/fail criteria were more modest and will require more patient data to establish

  17. Funding source and primary outcome changes in clinical trials registered on ClinicalTrials.gov are associated with the reporting of a statistically significant primary outcome: a cross-sectional study.

    PubMed

    Ramagopalan, Sreeram V; Skingsley, Andrew P; Handunnetthi, Lahiru; Magnus, Daniel; Klingel, Michelle; Pakpoor, Julia; Goldacre, Ben

    2015-01-01

    We and others have shown a significant proportion of interventional trials registered on ClinicalTrials.gov have their primary outcomes altered after the listed study start and completion dates. The objectives of this study were to investigate whether changes made to primary outcomes are associated with the likelihood of reporting a statistically significant primary outcome on ClinicalTrials.gov. A cross-sectional analysis of all interventional clinical trials registered on ClinicalTrials.gov as of 20 November 2014 was performed. The main outcome was any change made to the initially listed primary outcome and the time of the change in relation to the trial start and end date. 13,238 completed interventional trials were registered with ClinicalTrials.gov that also had study results posted on the website. 2555 (19.3%) had one or more statistically significant primary outcomes. Statistical analysis showed that registration year, funding source and primary outcome change after trial completion were associated with reporting a statistically significant primary outcome .  Funding source and primary outcome change after trial completion are associated with a statistically significant primary outcome report on clinicaltrials.gov.

  18. Assessment of Quadrivalent Human Papillomavirus Vaccine Safety Using the Self-Controlled Tree-Temporal Scan Statistic Signal-Detection Method in the Sentinel System.

    PubMed

    Yih, W Katherine; Maro, Judith C; Nguyen, Michael; Baker, Meghan A; Balsbaugh, Carolyn; Cole, David V; Dashevsky, Inna; Mba-Jonas, Adamma; Kulldorff, Martin

    2018-06-01

    The self-controlled tree-temporal scan statistic-a new signal-detection method-can evaluate whether any of a wide variety of health outcomes are temporally associated with receipt of a specific vaccine, while adjusting for multiple testing. Neither health outcomes nor postvaccination potential periods of increased risk need be prespecified. Using US medical claims data in the Food and Drug Administration's Sentinel system, we employed the method to evaluate adverse events occurring after receipt of quadrivalent human papillomavirus vaccine (4vHPV). Incident outcomes recorded in emergency department or inpatient settings within 56 days after first doses of 4vHPV received by 9- through 26.9-year-olds in 2006-2014 were identified using International Classification of Diseases, Ninth Revision, diagnosis codes and analyzed by pairing the new method with a standard hierarchical classification of diagnoses. On scanning diagnoses of 1.9 million 4vHPV recipients, 2 statistically significant categories of adverse events were found: cellulitis on days 2-3 after vaccination and "other complications of surgical and medical procedures" on days 1-3 after vaccination. Cellulitis is a known adverse event. Clinically informed investigation of electronic claims records of the patients with "other complications" did not suggest any previously unknown vaccine safety problem. Considering that thousands of potential short-term adverse events and hundreds of potential risk intervals were evaluated, these findings add significantly to the growing safety record of 4vHPV.

  19. Significant lexical relationships

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pedersen, T.; Kayaalp, M.; Bruce, R.

    Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests.

  20. Statistics on gene-based laser speckles with a small number of scatterers: implications for the detection of polymorphism in the Chlamydia trachomatis omp1 gene

    NASA Astrophysics Data System (ADS)

    Ulyanov, Sergey S.; Ulianova, Onega V.; Zaytsev, Sergey S.; Saltykov, Yury V.; Feodorova, Valentina A.

    2018-04-01

    The transformation mechanism for a nucleotide sequence of the Chlamydia trachomatis gene into a speckle pattern has been considered. The first and second-order statistics of gene-based speckles have been analyzed. It has been demonstrated that gene-based speckles do not obey Gaussian statistics and belong to the class of speckles with a small number of scatterers. It has been shown that gene polymorphism can be easily detected through analysis of the statistical characteristics of gene-based speckles.

  1. Carotid bruit for detection of hemodynamically significant carotid stenosis: the Northern Manhattan Study

    PubMed Central

    Ratchford, Elizabeth V.; Jin, Zhezhen; Di Tullio, Marco R.; Salameh, Maya J.; Homma, Shunichi; Gan, Robert; Boden-Albala, Bernadette; Sacco, Ralph L.; Rundek, Tatjana

    2009-01-01

    Objective The prevalence of carotid bruits and the utility of auscultation for predicting carotid stenosis are not well known. We aimed to establish the prevalence of carotid bruits and the diagnostic accuracy of auscultation for detection of hemodynamically significant carotid stenosis, using carotid duplex as the gold standard. Methods The Northern Manhattan Study (NOMAS) is a prospective multiethnic community-based cohort designed to examine the incidence of stroke and other vascular events and the association between various vascular risk factors and subclinical atherosclerosis. Of the stroke-free cohort (n=3298), 686 were examined for carotid bruits and underwent carotid duplex. Main outcome measures included prevalence of carotid bruits and sensitivity, specificity, positive predictive value, negative predictive value and accuracy of auscultation for prediction of ipsilateral carotid stenosis. Results Among 686 subjects with a mean age of 68.2 ± 9.4 years, the prevalence of ≥60% carotid stenosis as detected by ultrasound was 2.2% and the prevalence of carotid bruits was 4.1%. For detection of carotid stenosis, sensitivity of auscultation was 56%, specificity was 98%, positive predictive value was 25%, negative predictive value was 99% and overall accuracy was 97.5%. Discussion In this ethnically diverse cohort, the prevalence of carotid bruits and hemodynamically significant carotid stenosis was low. Sensitivity and positive predictive value were also low, and the 44% false-negative rate suggests that auscultation is not sufficient to exclude carotid stenosis. While the presence of a bruit may still warrant further evaluation with carotid duplex, ultrasonography may be considered in high-risk asymptomatic patients, irrespective of findings on auscultation. PMID:19133168

  2. On the significance of future trends in flood frequencies

    NASA Astrophysics Data System (ADS)

    Bernhardt, M.; Schulz, K.; Wieder, O.

    2015-12-01

    Floods are a significant threat for alpine headwater catchments and for the forelands. The formation of significant flood events is thereby often coupled on processes occurring in the alpine zone. Rain on snow events are just one example. The prediction of flood risks or trends of flood risks is of major interest to people under direct threat, policy and decision makers as well as for insurance companies. A lot of research was and is currently done in view of detecting future trends in flood extremes or return periods. From a pure physically based point of view, there is strong evidence that those trends exist. But, the central point question is if trends in flood events or other extreme events could be detected from a statistical point of view and on the basis of the available data. This study will investigate this question on the basis of different target parameters and by using long term measurements.

  3. Drug Adverse Event Detection in Health Plan Data Using the Gamma Poisson Shrinker and Comparison to the Tree-based Scan Statistic

    PubMed Central

    Brown, Jeffrey S.; Petronis, Kenneth R.; Bate, Andrew; Zhang, Fang; Dashevsky, Inna; Kulldorff, Martin; Avery, Taliser R.; Davis, Robert L.; Chan, K. Arnold; Andrade, Susan E.; Boudreau, Denise; Gunter, Margaret J.; Herrinton, Lisa; Pawloski, Pamala A.; Raebel, Marsha A.; Roblin, Douglas; Smith, David; Reynolds, Robert

    2013-01-01

    Background: Drug adverse event (AE) signal detection using the Gamma Poisson Shrinker (GPS) is commonly applied in spontaneous reporting. AE signal detection using large observational health plan databases can expand medication safety surveillance. Methods: Using data from nine health plans, we conducted a pilot study to evaluate the implementation and findings of the GPS approach for two antifungal drugs, terbinafine and itraconazole, and two diabetes drugs, pioglitazone and rosiglitazone. We evaluated 1676 diagnosis codes grouped into 183 different clinical concepts and four levels of granularity. Several signaling thresholds were assessed. GPS results were compared to findings from a companion study using the identical analytic dataset but an alternative statistical method—the tree-based scan statistic (TreeScan). Results: We identified 71 statistical signals across two signaling thresholds and two methods, including closely-related signals of overlapping diagnosis definitions. Initial review found that most signals represented known adverse drug reactions or confounding. About 31% of signals met the highest signaling threshold. Conclusions: The GPS method was successfully applied to observational health plan data in a distributed data environment as a drug safety data mining method. There was substantial concordance between the GPS and TreeScan approaches. Key method implementation decisions relate to defining exposures and outcomes and informed choice of signaling thresholds. PMID:24300404

  4. Maintenance energy requirements of odor detection, explosive detection and human detection working dogs.

    PubMed

    Mullis, Rebecca A; Witzel, Angela L; Price, Joshua

    2015-01-01

    Despite their important role in security, little is known about the energy requirements of working dogs such as odor, explosive and human detection dogs. Previous researchers have evaluated the energy requirements of individual canine breeds as well as dogs in exercise roles such as sprint racing. This study is the first to evaluate the energy requirements of working dogs trained in odor, explosive and human detection. This retrospective study evaluated twenty adult dogs who maintained consistent body weights over a six month period. During this time, the average energy consumption was [Formula: see text] or two times the calculated resting energy requirement ([Formula: see text]). No statistical differences were found between breeds, age or sex, but a statistically significant association (p = 0.0033, R-square = 0.0854) was seen between the number of searches a dog performs and their energy requirement. Based on this study's population, it appears that working dogs have maintenance energy requirements similar to the 1974 National Research Council's (NRC) maintenance energy requirement of [Formula: see text] (National Research Council (NRC), 1974) and the [Formula: see text] reported for young laboratory beagles (Rainbird & Kienzle, 1990). Additional research is needed to determine if these data can be applied to all odor, explosive and human detection dogs and to determine if other types of working dogs (tracking, search and rescue etc.) have similar energy requirements.

  5. MIDAS: Regionally linear multivariate discriminative statistical mapping.

    PubMed

    Varol, Erdem; Sotiras, Aristeidis; Davatzikos, Christos

    2018-07-01

    statistical significance of the derived statistic by analytically approximating its null distribution without the need for computationally expensive permutation tests. The proposed framework was extensively validated using simulated atrophy in structural magnetic resonance imaging (MRI) and further tested using data from a task-based functional MRI study as well as a structural MRI study of cognitive performance. The performance of the proposed framework was evaluated against standard voxel-wise general linear models and other information mapping methods. The experimental results showed that MIDAS achieves relatively higher sensitivity and specificity in detecting group differences. Together, our results demonstrate the potential of the proposed approach to efficiently map effects of interest in both structural and functional data. Copyright © 2018. Published by Elsevier Inc.

  6. Clinical significance of sentinel lymph node detection in patients with invasive cervical cancer

    NASA Astrophysics Data System (ADS)

    Sinilkin, I. G.; Chernov, V. I.; Lyapunov, A. Yu.; Medvedeva, A. A.; Zelchan, R. V.; Chernyshova, A. L.; Kolomiets, L. A.; Bragina, O. D.

    2017-09-01

    The clinical significance of determining sentinel lymph nodes (SLN) in patients with invasive cervical cancer was studied. From 2013 to 2014, 30 cervical cancer patients (T1a1NxM0-T1b1NxM0) were treated at the Gynecological Oncology Department of the Cancer Research Institute. The day before surgery, four submucosal injections of 99mTc Al2O3 at a total dose of 80 MBq were made in each quadrant around the cervical tumor. Patients were submitted to preoperative lymphoscintigraphy and intraoperative SLN detection. The feasibility of preserving the reproductive potential in patients after radical abdominal trachelectomy was assessed. The 3-year, overall, disease-free and metastasis-free survival rates were analyzed. Thirty-four SLNs were detected by single-photon emission computed tomography (SPECT) and 42 SLNs were identified by intraoperative gamma probe. The sensitivity in detecting SLNs was 100% for intraoperative SLN identification and 80% for SPECT image. The reproductive potential was preserved in 86% of patients. The 3-year overall and metastases-free survival rates were 100%. Recurrence occurred in 8.6% of cases.

  7. The statistical average of optical properties for alumina particle cluster in aircraft plume

    NASA Astrophysics Data System (ADS)

    Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin

    2018-04-01

    We establish a model for lognormal distribution of monomer radius and number of alumina particle clusters in plume. According to the Multi-Sphere T Matrix (MSTM) theory, we provide a method for finding the statistical average of optical properties for alumina particle clusters in plume, analyze the effect of different distributions and different detection wavelengths on the statistical average of optical properties for alumina particle cluster, and compare the statistical average optical properties under the alumina particle cluster model established in this study and those under three simplified alumina particle models. The calculation results show that the monomer number of alumina particle cluster and its size distribution have a considerable effect on its statistical average optical properties. The statistical average of optical properties for alumina particle cluster at common detection wavelengths exhibit obvious differences, whose differences have a great effect on modeling IR and UV radiation properties of plume. Compared with the three simplified models, the alumina particle cluster model herein features both higher extinction and scattering efficiencies. Therefore, we may find that an accurate description of the scattering properties of alumina particles in aircraft plume is of great significance in the study of plume radiation properties.

  8. Significance of parametric spectral ratio methods in detection and recognition of whispered speech

    NASA Astrophysics Data System (ADS)

    Mathur, Arpit; Reddy, Shankar M.; Hegde, Rajesh M.

    2012-12-01

    In this article the significance of a new parametric spectral ratio method that can be used to detect whispered speech segments within normally phonated speech is described. Adaptation methods based on the maximum likelihood linear regression (MLLR) are then used to realize a mismatched train-test style speech recognition system. This proposed parametric spectral ratio method computes a ratio spectrum of the linear prediction (LP) and the minimum variance distortion-less response (MVDR) methods. The smoothed ratio spectrum is then used to detect whispered segments of speech within neutral speech segments effectively. The proposed LP-MVDR ratio method exhibits robustness at different SNRs as indicated by the whisper diarization experiments conducted on the CHAINS and the cell phone whispered speech corpus. The proposed method also performs reasonably better than the conventional methods for whisper detection. In order to integrate the proposed whisper detection method into a conventional speech recognition engine with minimal changes, adaptation methods based on the MLLR are used herein. The hidden Markov models corresponding to neutral mode speech are adapted to the whispered mode speech data in the whispered regions as detected by the proposed ratio method. The performance of this method is first evaluated on whispered speech data from the CHAINS corpus. The second set of experiments are conducted on the cell phone corpus of whispered speech. This corpus is collected using a set up that is used commercially for handling public transactions. The proposed whisper speech recognition system exhibits reasonably better performance when compared to several conventional methods. The results shown indicate the possibility of a whispered speech recognition system for cell phone based transactions.

  9. Clinical significance of hTERC gene amplification detection by FISH in the screening of cervical lesions.

    PubMed

    Zhang, Yuan; Wang, Xiaobei; Ma, Ling; Wang, Zehua; Hu, Lihua

    2009-06-01

    This study evaluated the clinical significance of hTERC gene amplification detection by fluorescence in situ hybridization (FISH) in the screening of cervical lesions. Cervical specimens of 50 high risk patients were detected by thin liquid-based cytology. The patients whose cytological results were classified as ASCUS or above were subjected to the subsequent colposcopic biopsies. Slides prepared from these 50 cervical specimens were analyzed for hTERC gene amplification using interphase FISH with the two-color hTERC probe. The results of the cytological analysis and those of subsequent biopsies, when available, were compared with the FISH-detected hTERC abnormalities. It was found that the positive rates of hTERC gene amplification in NILM, ASCUS, LSIL, HSIL, and SCC groups were 0.00, 28.57%, 57.14%, 100%, and 100%, respectively. The positive rates of hTERC gene amplification in HSIL and SCC groups were significantly higher than those in NILM, ASCUS and LSIL groups (all P<0.05). The mean percentages of cells with hTERC gene amplification in NILM, ASCUS, LSIL, HSIL, and SCC groups were 0.00, 10.50%, 36.00%, 79.00%, and 96.50%, respectively. Patients with HSIL or SCC cytological diagnoses had significantly higher mean percentages of cells with hTERC gene amplification than did patients with NILM, ASCUS or LSIL cytological diagnoses (all P<0.05). It was concluded that two-color interphase FISH could detect hTERC gene amplification to accurately distinguish HSIL and ISIL of cervical cells. It may be an adjunct to cytology screening, especially high-risk patients.

  10. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  11. Infant Statistical Learning

    PubMed Central

    Saffran, Jenny R.; Kirkham, Natasha Z.

    2017-01-01

    Perception involves making sense of a dynamic, multimodal environment. In the absence of mechanisms capable of exploiting the statistical patterns in the natural world, infants would face an insurmountable computational problem. Infant statistical learning mechanisms facilitate the detection of structure. These abilities allow the infant to compute across elements in their environmental input, extracting patterns for further processing and subsequent learning. In this selective review, we summarize findings that show that statistical learning is both a broad and flexible mechanism (supporting learning from different modalities across many different content areas) and input specific (shifting computations depending on the type of input and goal of learning). We suggest that statistical learning not only provides a framework for studying language development and object knowledge in constrained laboratory settings, but also allows researchers to tackle real-world problems, such as multilingualism, the role of ever-changing learning environments, and differential developmental trajectories. PMID:28793812

  12. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data.

    PubMed

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2018-02-02

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1-10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. Copyright © 2018 Soraggi et al.

  13. Powerful Inference with the D-Statistic on Low-Coverage Whole-Genome Data

    PubMed Central

    Soraggi, Samuele; Wiuf, Carsten; Albrechtsen, Anders

    2017-01-01

    The detection of ancient gene flow between human populations is an important issue in population genetics. A common tool for detecting ancient admixture events is the D-statistic. The D-statistic is based on the hypothesis of a genetic relationship that involves four populations, whose correctness is assessed by evaluating specific coincidences of alleles between the groups. When working with high-throughput sequencing data, calling genotypes accurately is not always possible; therefore, the D-statistic currently samples a single base from the reads of one individual per population. This implies ignoring much of the information in the data, an issue especially striking in the case of ancient genomes. We provide a significant improvement to overcome the problems of the D-statistic by considering all reads from multiple individuals in each population. We also apply type-specific error correction to combat the problems of sequencing errors, and show a way to correct for introgression from an external population that is not part of the supposed genetic relationship, and how this leads to an estimate of the admixture rate. We prove that the D-statistic is approximated by a standard normal distribution. Furthermore, we show that our method outperforms the traditional D-statistic in detecting admixtures. The power gain is most pronounced for low and medium sequencing depth (1–10×), and performances are as good as with perfectly called genotypes at a sequencing depth of 2×. We show the reliability of error correction in scenarios with simulated errors and ancient data, and correct for introgression in known scenarios to estimate the admixture rates. PMID:29196497

  14. Musicians' edge: A comparison of auditory processing, cognitive abilities and statistical learning.

    PubMed

    Mandikal Vasuki, Pragati Rao; Sharma, Mridula; Demuth, Katherine; Arciuli, Joanne

    2016-12-01

    It has been hypothesized that musical expertise is associated with enhanced auditory processing and cognitive abilities. Recent research has examined the relationship between musicians' advantage and implicit statistical learning skills. In the present study, we assessed a variety of auditory processing skills, cognitive processing skills, and statistical learning (auditory and visual forms) in age-matched musicians (N = 17) and non-musicians (N = 18). Musicians had significantly better performance than non-musicians on frequency discrimination, and backward digit span. A key finding was that musicians had better auditory, but not visual, statistical learning than non-musicians. Performance on the statistical learning tasks was not correlated with performance on auditory and cognitive measures. Musicians' superior performance on auditory (but not visual) statistical learning suggests that musical expertise is associated with an enhanced ability to detect statistical regularities in auditory stimuli. Copyright © 2016 Elsevier B.V. All rights reserved.

  15. Multivariate statistical analysis strategy for multiple misfire detection in internal combustion engines

    NASA Astrophysics Data System (ADS)

    Hu, Chongqing; Li, Aihua; Zhao, Xingyang

    2011-02-01

    This paper proposes a multivariate statistical analysis approach to processing the instantaneous engine speed signal for the purpose of locating multiple misfire events in internal combustion engines. The state of each cylinder is described with a characteristic vector extracted from the instantaneous engine speed signal following a three-step procedure. These characteristic vectors are considered as the values of various procedure parameters of an engine cycle. Therefore, determination of occurrence of misfire events and identification of misfiring cylinders can be accomplished by a principal component analysis (PCA) based pattern recognition methodology. The proposed algorithm can be implemented easily in practice because the threshold can be defined adaptively without the information of operating conditions. Besides, the effect of torsional vibration on the engine speed waveform is interpreted as the presence of super powerful cylinder, which is also isolated by the algorithm. The misfiring cylinder and the super powerful cylinder are often adjacent in the firing sequence, thus missing detections and false alarms can be avoided effectively by checking the relationship between the cylinders.

  16. Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

    PubMed Central

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136

  17. Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

    PubMed

    Ing, Alex; Schwarzbauer, Christian

    2014-01-01

    Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.

  18. Determination of quality parameters from statistical analysis of routine TLD dosimetry data.

    PubMed

    German, U; Weinstein, M; Pelled, O

    2006-01-01

    Following the as low as reasonably achievable (ALARA) practice, there is a need to measure very low doses, of the same order of magnitude as the natural background, and the limits of detection of the dosimetry systems. The different contributions of the background signals to the total zero dose reading of thermoluminescence dosemeter (TLD) cards were analysed by using the common basic definitions of statistical indicators: the critical level (L(C)), the detection limit (L(D)) and the determination limit (L(Q)). These key statistical parameters for the system operated at NRC-Negev were quantified, based on the history of readings of the calibration cards in use. The electronic noise seems to play a minor role, but the reading of the Teflon coating (without the presence of a TLD crystal) gave a significant contribution.

  19. How Do Statistical Detection Methods Compare to Entropy Measures

    DTIC Science & Technology

    2012-08-28

    October 2001. It is known as RS attack or “Reliable Detection of LSB Steganography in Grayscale and color images ”. The algorithm they use is very...precise for the detection of pseudo-aleatory LSB steganography . Its precision varies with the image but, its referential value is a 0.005 bits by...Jessica Fridrich, Miroslav Goljan, Rui Du, "Detecting LSB Steganography in Color and Gray-Scale Images ," IEEE Multimedia, vol. 8, no. 4, pp. 22-28, Oct

  20. Performance of cancer cluster Q-statistics for case-control residential histories

    PubMed Central

    Sloan, Chantel D.; Jacquez, Geoffrey M.; Gallagher, Carolyn M.; Ward, Mary H.; Raaschou-Nielsen, Ole; Nordsborg, Rikke Baastrup; Meliker, Jaymie R.

    2012-01-01

    Few investigations of health event clustering have evaluated residential mobility, though causative exposures for chronic diseases such as cancer often occur long before diagnosis. Recently developed Q-statistics incorporate human mobility into disease cluster investigations by quantifying space- and time-dependent nearest neighbor relationships. Using residential histories from two cancer case-control studies, we created simulated clusters to examine Q-statistic performance. Results suggest the intersection of cases with significant clustering over their life course, Qi, with cases who are constituents of significant local clusters at given times, Qit, yielded the best performance, which improved with increasing cluster size. Upon comparison, a larger proportion of true positives were detected with Kulldorf’s spatial scan method if the time of clustering was provided. We recommend using Q-statistics to identify when and where clustering may have occurred, followed by the scan method to localize the candidate clusters. Future work should investigate the generalizability of these findings. PMID:23149326

  1. Statistical Detection of Atypical Aircraft Flights

    NASA Technical Reports Server (NTRS)

    Statler, Irving; Chidester, Thomas; Shafto, Michael; Ferryman, Thomas; Amidan, Brett; Whitney, Paul; White, Amanda; Willse, Alan; Cooley, Scott; Jay, Joseph; hide

    2006-01-01

    A computational method and software to implement the method have been developed to sift through vast quantities of digital flight data to alert human analysts to aircraft flights that are statistically atypical in ways that signify that safety may be adversely affected. On a typical day, there are tens of thousands of flights in the United States and several times that number throughout the world. Depending on the specific aircraft design, the volume of data collected by sensors and flight recorders can range from a few dozen to several thousand parameters per second during a flight. Whereas these data have long been utilized in investigating crashes, the present method is oriented toward helping to prevent crashes by enabling routine monitoring of flight operations to identify portions of flights that may be of interest with respect to safety issues.

  2. The score statistic of the LD-lod analysis: detecting linkage adaptive to linkage disequilibrium.

    PubMed

    Huang, J; Jiang, Y

    2001-01-01

    We study the properties of a modified lod score method for testing linkage that incorporates linkage disequilibrium (LD-lod). By examination of its score statistic, we show that the LD-lod score method adaptively combines two sources of information: (a) the IBD sharing score which is informative for linkage regardless of the existence of LD and (b) the contrast between allele-specific IBD sharing scores which is informative for linkage only in the presence of LD. We also consider the connection between the LD-lod score method and the transmission-disequilibrium test (TDT) for triad data and the mean test for affected sib pair (ASP) data. We show that, for triad data, the recessive LD-lod test is asymptotically equivalent to the TDT; and for ASP data, it is an adaptive combination of the TDT and the ASP mean test. We demonstrate that the LD-lod score method has relatively good statistical efficiency in comparison with the ASP mean test and the TDT for a broad range of LD and the genetic models considered in this report. Therefore, the LD-lod score method is an interesting approach for detecting linkage when the extent of LD is unknown, such as in a genome-wide screen with a dense set of genetic markers. Copyright 2001 S. Karger AG, Basel

  3. Radiation detection method and system using the sequential probability ratio test

    DOEpatents

    Nelson, Karl E [Livermore, CA; Valentine, John D [Redwood City, CA; Beauchamp, Brock R [San Ramon, CA

    2007-07-17

    A method and system using the Sequential Probability Ratio Test to enhance the detection of an elevated level of radiation, by determining whether a set of observations are consistent with a specified model within a given bounds of statistical significance. In particular, the SPRT is used in the present invention to maximize the range of detection, by providing processing mechanisms for estimating the dynamic background radiation, adjusting the models to reflect the amount of background knowledge at the current point in time, analyzing the current sample using the models to determine statistical significance, and determining when the sample has returned to the expected background conditions.

  4. Combining Statistical and Geometric Features for Colonic Polyp Detection in CTC Based on Multiple Kernel Learning

    PubMed Central

    Wang, Shijun; Yao, Jianhua; Petrick, Nicholas; Summers, Ronald M.

    2010-01-01

    Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible approach for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these traditional features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features called histograms of curvature features are rotation, translation and scale invariant and can be treated as complementing existing feature set. Then in order to make full use of the traditional geometric features (defined as group A) and the new statistical features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to learn an optimized classification kernel from the two groups of features. We conducted leave-one-patient-out test on a CTC dataset which contained scans from 66 patients. Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per scan rate of 5, the sensitivity of the SVM using the combined features improved from 0.77 (Group A) and 0.73 (Group B) to 0.83 (p ≤ 0.01). PMID:20953299

  5. Robust hypothesis tests for detecting statistical evidence of two-dimensional and three-dimensional interactions in single-molecule measurements

    NASA Astrophysics Data System (ADS)

    Calderon, Christopher P.; Weiss, Lucien E.; Moerner, W. E.

    2014-05-01

    Experimental advances have improved the two- (2D) and three-dimensional (3D) spatial resolution that can be extracted from in vivo single-molecule measurements. This enables researchers to quantitatively infer the magnitude and directionality of forces experienced by biomolecules in their native environment. Situations where such force information is relevant range from mitosis to directed transport of protein cargo along cytoskeletal structures. Models commonly applied to quantify single-molecule dynamics assume that effective forces and velocity in the x ,y (or x ,y,z) directions are statistically independent, but this assumption is physically unrealistic in many situations. We present a hypothesis testing approach capable of determining if there is evidence of statistical dependence between positional coordinates in experimentally measured trajectories; if the hypothesis of independence between spatial coordinates is rejected, then a new model accounting for 2D (3D) interactions can and should be considered. Our hypothesis testing technique is robust, meaning it can detect interactions, even if the noise statistics are not well captured by the model. The approach is demonstrated on control simulations and on experimental data (directed transport of intraflagellar transport protein 88 homolog in the primary cilium).

  6. A systematic review and meta-analysis of acute stroke unit care: What’s beyond the statistical significance?

    PubMed Central

    2013-01-01

    Background The benefits of stroke unit care in terms of reducing death, dependency and institutional care were demonstrated in a 2009 Cochrane review carried out by the Stroke Unit Trialists’ Collaboration. Methods As requested by the Belgian health authorities, a systematic review and meta-analysis of the effect of acute stroke units was performed. Clinical trials mentioned in the original Cochrane review were included. In addition, an electronic database search on Medline, Embase, the Cochrane Central Register of Controlled Trials, and Physiotherapy Evidence Database (PEDro) was conducted to identify trials published since 2006. Trials investigating acute stroke units compared to alternative care were eligible for inclusion. Study quality was appraised according to the criteria recommended by Scottish Intercollegiate Guidelines Network (SIGN) and the GRADE system. In the meta-analysis, dichotomous outcomes were estimated by calculating odds ratios (OR) and continuous outcomes were estimated by calculating standardized mean differences. The weight of a study was calculated based on inverse variance. Results Evidence from eight trials comparing acute stroke unit and conventional care (general medical ward) were retained for the main synthesis and analysis. The findings from this study were broadly in line with the original Cochrane review: acute stroke units can improve survival and independency, as well as reduce the chance of hospitalization and the length of inpatient stay. The improvement with stroke unit care on mortality was less conclusive and only reached borderline level of significance (OR 0.84, 95% CI 0.70 to 1.00, P = 0.05). This improvement became statistically non-significant (OR 0.87, 95% CI 0.74 to 1.03, P = 0.12) when data from two unpublished trials (Goteborg-Ostra and Svendborg) were added to the analysis. After further also adding two additional trials (Beijing, Stockholm) with very short observation periods (until discharge), the

  7. Detecting signals of drug-drug interactions in a spontaneous reports database.

    PubMed

    Thakrar, Bharat T; Grundschober, Sabine Borel; Doessegger, Lucette

    2007-10-01

    The spontaneous reports database is widely used for detecting signals of ADRs. We have extended the methodology to include the detection of signals of ADRs that are associated with drug-drug interactions (DDI). In particular, we have investigated two different statistical assumptions for detecting signals of DDI. Using the FDA's spontaneous reports database, we investigated two models, a multiplicative and an additive model, to detect signals of DDI. We applied the models to four known DDIs (methotrexate-diclofenac and bone marrow depression, simvastatin-ciclosporin and myopathy, ketoconazole-terfenadine and torsades de pointes, and cisapride-erythromycin and torsades de pointes) and to four drug-event combinations where there is currently no evidence of a DDI (fexofenadine-ketoconazole and torsades de pointes, methotrexade-rofecoxib and bone marrow depression, fluvastatin-ciclosporin and myopathy, and cisapride-azithromycine and torsade de pointes) and estimated the measure of interaction on the two scales. The additive model correctly identified all four known DDIs by giving a statistically significant (P < 0.05) positive measure of interaction. The multiplicative model identified the first two of the known DDIs as having a statistically significant or borderline significant (P < 0.1) positive measure of interaction term, gave a nonsignificant positive trend for the third interaction (P = 0.27), and a negative trend for the last interaction. Both models correctly identified the four known non interactions by estimating a negative measure of interaction. The spontaneous reports database is a valuable resource for detecting signals of DDIs. In particular, the additive model is more sensitive in detecting such signals. The multiplicative model may further help qualify the strength of the signal detected by the additive model.

  8. Detecting signals of drug–drug interactions in a spontaneous reports database

    PubMed Central

    Thakrar, Bharat T; Grundschober, Sabine Borel; Doessegger, Lucette

    2007-01-01

    Aims The spontaneous reports database is widely used for detecting signals of ADRs. We have extended the methodology to include the detection of signals of ADRs that are associated with drug–drug interactions (DDI). In particular, we have investigated two different statistical assumptions for detecting signals of DDI. Methods Using the FDA's spontaneous reports database, we investigated two models, a multiplicative and an additive model, to detect signals of DDI. We applied the models to four known DDIs (methotrexate-diclofenac and bone marrow depression, simvastatin-ciclosporin and myopathy, ketoconazole-terfenadine and torsades de pointes, and cisapride-erythromycin and torsades de pointes) and to four drug-event combinations where there is currently no evidence of a DDI (fexofenadine-ketoconazole and torsades de pointes, methotrexade-rofecoxib and bone marrow depression, fluvastatin-ciclosporin and myopathy, and cisapride-azithromycine and torsade de pointes) and estimated the measure of interaction on the two scales. Results The additive model correctly identified all four known DDIs by giving a statistically significant (P< 0.05) positive measure of interaction. The multiplicative model identified the first two of the known DDIs as having a statistically significant or borderline significant (P< 0.1) positive measure of interaction term, gave a nonsignificant positive trend for the third interaction (P= 0.27), and a negative trend for the last interaction. Both models correctly identified the four known non interactions by estimating a negative measure of interaction. Conclusions The spontaneous reports database is a valuable resource for detecting signals of DDIs. In particular, the additive model is more sensitive in detecting such signals. The multiplicative model may further help qualify the strength of the signal detected by the additive model. PMID:17506784

  9. Determining coding CpG islands by identifying regions significant for pattern statistics on Markov chains.

    PubMed

    Singer, Meromit; Engström, Alexander; Schönhuth, Alexander; Pachter, Lior

    2011-09-23

    Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account. We present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate. We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.

  10. Climate change and the detection of trends in annual runoff

    USGS Publications Warehouse

    McCabe, G.J.; Wolock, D.M.

    1997-01-01

    This study examines the statistical likelihood of detecting a trend in annual runoff given an assumed change in mean annual runoff, the underlying year-to-year variability in runoff, and serial correlation of annual runoff. Means, standard deviations, and lag-1 serial correlations of annual runoff were computed for 585 stream gages in the conterminous United States, and these statistics were used to compute the probability of detecting a prescribed trend in annual runoff. Assuming a linear 20% change in mean annual runoff over a 100 yr period and a significance level of 95%, the average probability of detecting a significant trend was 28% among the 585 stream gages. The largest probability of detecting a trend was in the northwestern U.S., the Great Lakes region, the northeastern U.S., the Appalachian Mountains, and parts of the northern Rocky Mountains. The smallest probability of trend detection was in the central and southwestern U.S., and in Florida. Low probabilities of trend detection were associated with low ratios of mean annual runoff to the standard deviation of annual runoff and with high lag-1 serial correlation in the data.

  11. Multi-scale structure and topological anomaly detection via a new network statistic: The onion decomposition.

    PubMed

    Hébert-Dufresne, Laurent; Grochow, Joshua A; Allard, Antoine

    2016-08-18

    We introduce a network statistic that measures structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and interpretable at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. Yet, the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks.

  12. Time series, periodograms, and significance

    NASA Astrophysics Data System (ADS)

    Hernandez, G.

    1999-05-01

    The geophysical literature shows a wide and conflicting usage of methods employed to extract meaningful information on coherent oscillations from measurements. This makes it difficult, if not impossible, to relate the findings reported by different authors. Therefore, we have undertaken a critical investigation of the tests and methodology used for determining the presence of statistically significant coherent oscillations in periodograms derived from time series. Statistical significance tests are only valid when performed on the independent frequencies present in a measurement. Both the number of possible independent frequencies in a periodogram and the significance tests are determined by the number of degrees of freedom, which is the number of true independent measurements, present in the time series, rather than the number of sample points in the measurement. The number of degrees of freedom is an intrinsic property of the data, and it must be determined from the serial coherence of the time series. As part of this investigation, a detailed study has been performed which clearly illustrates the deleterious effects that the apparently innocent and commonly used processes of filtering, de-trending, and tapering of data have on periodogram analysis and the consequent difficulties in the interpretation of the statistical significance thus derived. For the sake of clarity, a specific example of actual field measurements containing unevenly-spaced measurements, gaps, etc., as well as synthetic examples, have been used to illustrate the periodogram approach, and pitfalls, leading to the (statistical) significance tests for the presence of coherent oscillations. Among the insights of this investigation are: (1) the concept of a time series being (statistically) band limited by its own serial coherence and thus having a critical sampling rate which defines one of the necessary requirements for the proper statistical design of an experiment; (2) the design of a critical

  13. [Clinical significance of HPV L1 capsid protein detection in cervical exfoliated cells in high-risk HPV positive women].

    PubMed

    Wang, Jiajian; Tian, Qifang; Zhang, Su; Lyu, Liping; Dong, Jie; Lyu, Weiguo

    2015-04-01

    To explore the clinical significance of human papillomavirus L1 capsid protein detection in cervical exfoliated cells in high-risk HPV positive women. From November 2012 to June 2013, 386 high-risk HPV positive (detected by hybrid capture II) cases were enrolled as eligible women from Huzhou Maternity & Child Care Hospital and Women's Hospital, School of Medicine, Zhejiang University. All eligible women underwent liquid-based cytology (ThinPrep) followed by colposcopy. Biopsies were taken if indicated. Cervical exfoliated cells were collected for HPV L1 capsid protein detection by immunocytochemistry. Expression of HPV L1 capsid protein in groups with different histological diagnosis were compared, and the role of HPV L1 capsid protein detection in cervical exfoliated cells in cervical lesions screening was accessed. Total 386 enrolled eligible women were finally diagnosed histologically as follwed: 162 normal cervix, 94 low-grade squamous intraepithelial lesion (LSIL), 128 high-grade squamous intraepithelial lesion (HSIL) and 2 squamous cervical cancer (SCC). The positive expression rate of HPV L1 in HSIL+ (HSIL or worse) group was significantly lower than that in LSIL- (LSIL or better) group (19.2% vs 66.4%, P=0.000). While identifying HSIL+ in HPV positive cases and compared with cytology, HPV L1 detection resulted in significant higher sensitivity (80.77% vs 50.77%, P=0.000) and negative predictive value (NPV; 87.18% vs 76.47%, P=0.004), significant lower specificity (66.41% vs 81.25%, P=0.000), and comparable positive predictive value (PPV; 54.97% vs 57.89%, P=0.619). To identify HSIL+ in HPV-positive/cytology-negative women, the sensitivity, specificity, PPV, and NPV of HPV L1 detection were 87.50%, 61.54%, 41.18%, and 94.12% respectively, while 80.00%, 86.36%, 80.00% and 86.36% respectively in HPV-positive/atypical squamous cell of undetermined significance (ASCUS) women. HPV L1 capsid detection in cervical exfoliated cells have a role in cervical lesions

  14. Proper Image Subtraction—Optimal Transient Detection, Photometry, and Hypothesis Testing

    NASA Astrophysics Data System (ADS)

    Zackay, Barak; Ofek, Eran O.; Gal-Yam, Avishay

    2016-10-01

    Transient detection and flux measurement via image subtraction stand at the base of time domain astronomy. Due to the varying seeing conditions, the image subtraction process is non-trivial, and existing solutions suffer from a variety of problems. Starting from basic statistical principles, we develop the optimal statistic for transient detection, flux measurement, and any image-difference hypothesis testing. We derive a closed-form statistic that: (1) is mathematically proven to be the optimal transient detection statistic in the limit of background-dominated noise, (2) is numerically stable, (3) for accurately registered, adequately sampled images, does not leave subtraction or deconvolution artifacts, (4) allows automatic transient detection to the theoretical sensitivity limit by providing credible detection significance, (5) has uncorrelated white noise, (6) is a sufficient statistic for any further statistical test on the difference image, and, in particular, allows us to distinguish particle hits and other image artifacts from real transients, (7) is symmetric to the exchange of the new and reference images, (8) is at least an order of magnitude faster to compute than some popular methods, and (9) is straightforward to implement. Furthermore, we present extensions of this method that make it resilient to registration errors, color-refraction errors, and any noise source that can be modeled. In addition, we show that the optimal way to prepare a reference image is the proper image coaddition presented in Zackay & Ofek. We demonstrate this method on simulated data and real observations from the PTF data release 2. We provide an implementation of this algorithm in MATLAB and Python.

  15. Statistics at a glance.

    PubMed

    Ector, Hugo

    2010-12-01

    I still remember my first book on statistics: "Elementary statistics with applications in medicine and the biological sciences" by Frederick E. Croxton. For me, it has been the start of pursuing understanding statistics in daily life and in medical practice. It was the first volume in a long row of books. In his introduction, Croxton pretends that"nearly everyone involved in any aspect of medicine needs to have some knowledge of statistics". The reality is that for many clinicians, statistics are limited to a "P < 0.05 = ok". I do not blame my colleagues who omit the paragraph on statistical methods. They have never had the opportunity to learn concise and clear descriptions of the key features. I have experienced how some authors can describe difficult methods in a well understandable language. Others fail completely. As a teacher, I tell my students that life is impossible without a basic knowledge of statistics. This feeling has resulted in an annual seminar of 90 minutes. This tutorial is the summary of this seminar. It is a summary and a transcription of the best pages I have detected.

  16. Statistical Algorithms Accounting for Background Density in the Detection of UXO Target Areas at DoD Munitions Sites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Matzke, Brett D.; Wilson, John E.; Hathaway, J.

    2008-02-12

    Statistically defensible methods are presented for developing geophysical detector sampling plans and analyzing data for munitions response sites where unexploded ordnance (UXO) may exist. Detection methods for identifying areas of elevated anomaly density from background density are shown. Additionally, methods are described which aid in the choice of transect pattern and spacing to assure with degree of confidence that a target area (TA) of specific size, shape, and anomaly density will be identified using the detection methods. Methods for evaluating the sensitivity of designs to variation in certain parameters are also discussed. Methods presented have been incorporated into the Visualmore » Sample Plan (VSP) software (free at http://dqo.pnl.gov/vsp) and demonstrated at multiple sites in the United States. Application examples from actual transect designs and surveys from the previous two years are demonstrated.« less

  17. Significance of clinical evaluation of the metacarpophalangeal joint in relation to synovial/bone pathology in rheumatoid and psoriatic arthritis detected by magnetic resonance imaging.

    PubMed

    Stone, Millicent A; White, Lawrence M; Gladman, Dafna D; Inman, Robert D; Chaya, Sam; Lax, Matthew; Salonen, David; Weber, Deborah A; Guthrie, Judy A; Pomeroy, Emma; Podbielski, Dominik; Keystone, Edward C

    2009-12-01

    Rheumatologists base many clinical decisions regarding the management of inflammatory joint diseases on joint counts performed at clinic. We investigated the reliability and accuracy of physically examining the metacarpophalangeal (MCP) joints to detect inflammatory synovitis using magnetic resonance imaging (MRI) as the gold standard. MCP joints 2 to 5 in both hands of 5 patients with rheumatoid arthritis (RA) and 5 with psoriatic arthritis (PsA) were assessed by 5 independent examiners for joint-line swelling (visually and by palpation); joint-line tenderness by palpation (tender joint count, TJC) and stress pain; and by MRI (1.5 Tesla superconducting magnet). Interrater reliability was assessed using kappa statistics, and agreement between examination and corresponding MRI assessment was assessed by Fisher's exact tests (p < 0.05 considered statistically significant). Interrater agreement was highest for visual assessment of swelling (kappa = 0.55-0.63), slight-fair for assessment of swelling by palpation (kappa = 0.19-0.41), and moderate (kappa = 0.41-0.58) for assessment of joint tenderness. In patients with RA, TJC, stress pain, and visual swelling assessment were strongly associated with MRI evaluation of synovitis. Visual swelling assessment demonstrated high specificity (> 0.8) and positive predictive value (= 0.8). For PsA, significant associations exist between TJC and MRI synovitis scores (p < 0.01) and stress pain and MRI edema scores (p < 0.04). Assessment of swelling by palpation was not significantly associated with synovitis or edema as determined by MRI in RA or PsA (p = 0.54-1.0). In inflammatory arthritis, disease activity in MCP joints can be reliably assessed at the bedside by examining for joint-line tenderness (TJC) and visual inspection for swelling. Clinical assessment may have to be complemented by other methods for evaluating disease activity in the joint, such as MRI, particularly in patients with PsA.

  18. A robust and efficient statistical method for genetic association studies using case and control samples from multiple cohorts

    PubMed Central

    2013-01-01

    Background The theoretical basis of genome-wide association studies (GWAS) is statistical inference of linkage disequilibrium (LD) between any polymorphic marker and a putative disease locus. Most methods widely implemented for such analyses are vulnerable to several key demographic factors and deliver a poor statistical power for detecting genuine associations and also a high false positive rate. Here, we present a likelihood-based statistical approach that accounts properly for non-random nature of case–control samples in regard of genotypic distribution at the loci in populations under study and confers flexibility to test for genetic association in presence of different confounding factors such as population structure, non-randomness of samples etc. Results We implemented this novel method together with several popular methods in the literature of GWAS, to re-analyze recently published Parkinson’s disease (PD) case–control samples. The real data analysis and computer simulation show that the new method confers not only significantly improved statistical power for detecting the associations but also robustness to the difficulties stemmed from non-randomly sampling and genetic structures when compared to its rivals. In particular, the new method detected 44 significant SNPs within 25 chromosomal regions of size < 1 Mb but only 6 SNPs in two of these regions were previously detected by the trend test based methods. It discovered two SNPs located 1.18 Mb and 0.18 Mb from the PD candidates, FGF20 and PARK8, without invoking false positive risk. Conclusions We developed a novel likelihood-based method which provides adequate estimation of LD and other population model parameters by using case and control samples, the ease in integration of these samples from multiple genetically divergent populations and thus confers statistically robust and powerful analyses of GWAS. On basis of simulation studies and analysis of real datasets, we demonstrated

  19. Statistical Study of Eruptive Filaments using Automated Detection and Tracking Technique

    NASA Astrophysics Data System (ADS)

    Joshi, Anand D.; Hanaoka, Yoichiro

    2017-08-01

    Solar filaments are dense and cool material suspended in the low solar corona. They are found to be on the Sun for periods up to a few weeks, and they end their lifetime either as a gradual disappearance or an eruption. We have developed an automated detection and tracking technique to study such filament eruptions using full-disc Hα images. Various processing steps are used before subjecting an image to segmentation, that would extract only the filaments. Further steps track the filaments between successive images, label them uniquely, and generate output that can be used for a comparative study. In this poster, we would use this technique to carry out a statistical study of several erupting filaments through which the common underlying properties of such eruptions can be derived. Details of the technique will also be discussed in brief. Filament eruptions are found to be closely associated with coronal mass ejections (CMEs) wherein a large mass from corona is ejected into the interplanetary space. If such a CME hits the Earth with a favourable orientation of magnetic field a geomagnetic storm can result adversely affecting electronic infrastructure in space as well as ground. The properties of filament eruptions derived can be used in future to predict an eruption in an almost real-time basis, thereby giving a warning of imminent storm.

  20. Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

    PubMed

    Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

    2015-09-21

    Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.

  1. Sensitivity of new detection method for ultra-low frequency gravitational waves with pulsar spin-down rate statistics

    NASA Astrophysics Data System (ADS)

    Yonemaru, Naoyuki; Kumamoto, Hiroki; Takahashi, Keitaro; Kuroyanagi, Sachiko

    2018-04-01

    A new detection method for ultra-low frequency gravitational waves (GWs) with a frequency much lower than the observational range of pulsar timing arrays (PTAs) was suggested in Yonemaru et al. (2016). In the PTA analysis, ultra-low frequency GWs (≲ 10-10 Hz) which evolve just linearly during the observation time span are absorbed by the pulsar spin-down rates since both have the same effect on the pulse arrival time. Therefore, such GWs cannot be detected by the conventional method of PTAs. However, the bias on the observed spin-down rates depends on relative direction of a pulsar and GW source and shows a quadrupole pattern in the sky. Thus, if we divide the pulsars according to the position in the sky and see the difference in the statistics of the spin-down rates, ultra-low frequency GWs from a single source can be detected. In this paper, we evaluate the potential of this method by Monte-Carlo simulations and estimate the sensitivity, considering only the "Earth term" while the "pulsar term" acts like random noise for GW frequencies 10-13 - 10-10 Hz. We find that with 3,000 milli-second pulsars, which are expected to be discovered by a future survey with the Square Kilometre Array, GWs with the derivative of amplitude of about 3 × 10^{-19} {s}^{-1} can in principle be detected. Implications for possible supermassive binary black holes in Sgr* and M87 are also given.

  2. Consomic mouse strain selection based on effect size measurement, statistical significance testing and integrated behavioral z-scoring: focus on anxiety-related behavior and locomotion.

    PubMed

    Labots, M; Laarakker, M C; Ohl, F; van Lith, H A

    2016-06-29

    Selecting chromosome substitution strains (CSSs, also called consomic strains/lines) used in the search for quantitative trait loci (QTLs) consistently requires the identification of the respective phenotypic trait of interest and is simply based on a significant difference between a consomic and host strain. However, statistical significance as represented by P values does not necessarily predicate practical importance. We therefore propose a method that pays attention to both the statistical significance and the actual size of the observed effect. The present paper extends on this approach and describes in more detail the use of effect size measures (Cohen's d, partial eta squared - η p (2) ) together with the P value as statistical selection parameters for the chromosomal assignment of QTLs influencing anxiety-related behavior and locomotion in laboratory mice. The effect size measures were based on integrated behavioral z-scoring and were calculated in three experiments: (A) a complete consomic male mouse panel with A/J as the donor strain and C57BL/6J as the host strain. This panel, including host and donor strains, was analyzed in the modified Hole Board (mHB). The consomic line with chromosome 19 from A/J (CSS-19A) was selected since it showed increased anxiety-related behavior, but similar locomotion compared to its host. (B) Following experiment A, female CSS-19A mice were compared with their C57BL/6J counterparts; however no significant differences and effect sizes close to zero were found. (C) A different consomic mouse strain (CSS-19PWD), with chromosome 19 from PWD/PhJ transferred on the genetic background of C57BL/6J, was compared with its host strain. Here, in contrast with CSS-19A, there was a decreased overall anxiety in CSS-19PWD compared to C57BL/6J males, but not locomotion. This new method shows an improved way to identify CSSs for QTL analysis for anxiety-related behavior using a combination of statistical significance testing and effect

  3. Evidence for ultra-fast outflows in radio-quiet AGNs. I. Detection and statistical incidence of Fe K-shell absorption lines

    NASA Astrophysics Data System (ADS)

    Tombesi, F.; Cappi, M.; Reeves, J. N.; Palumbo, G. G. C.; Yaqoob, T.; Braito, V.; Dadina, M.

    2010-10-01

    Context. Blue-shifted Fe K absorption lines have been detected in recent years between 7 and 10 keV in the X-ray spectra of several radio-quiet AGNs. The derived blue-shifted velocities of the lines can often reach mildly relativistic values, up to 0.2-0.4c. These findings are important because they suggest the presence of a previously unknown massive and highly ionized absorbing material outflowing from their nuclei, possibly connected with accretion disk winds/outflows. Aims: The scope of the present work is to statistically quantify the parameters and incidence of the blue-shifted Fe K absorption lines through a uniform analysis on a large sample of radio-quiet AGNs. This allows us to assess their global detection significance and to overcome any possible publication bias. Methods: We performed a blind search for narrow absorption features at energies greater than 6.4 keV in a sample of 42 radio-quiet AGNs observed with XMM-Newton. A simple uniform model composed by an absorbed power-law plus Gaussian emission and absorption lines provided a good fit for all the data sets. We derived the absorption lines parameters and calculated their detailed detection significance making use of the classical F-test and extensive Monte Carlo simulations. Results: We detect 36 narrow absorption lines on a total of 101 XMM-Newton EPIC pn observations. The number of absorption lines at rest-frame energies higher than 7 keV is 22. Their global probability to be generated by random fluctuations is very low, less than 3 × 10-8, and their detection have been independently confirmed by a spectral analysis of the MOS data, with associated random probability <10-7. We identify the lines as Fe XXV and Fe XXVI K-shell resonant absorption. They are systematically blue-shifted, with a velocity distribution ranging from zero up to ~0.3c, with a peak and mean value at ~0.1c. We detect variability of the lines on both EWs and blue-shifted velocities among different XMM-Newton observations

  4. A systematic Chandra study of Sgr A⋆: II. X-ray flare statistics

    NASA Astrophysics Data System (ADS)

    Yuan, Qiang; Wang, Q. Daniel; Liu, Siming; Wu, Kinwah

    2018-01-01

    The routinely flaring events from Sgr A⋆ trace dynamic, high-energy processes in the immediate vicinity of the supermassive black hole. We statistically study temporal and spectral properties, as well as fluence and duration distributions, of the flares detected by the Chandra X-ray Observatory from 1999 to 2012. The detection incompleteness and bias are carefully accounted for in determining these distributions. We find that the fluence distribution can be well characterized by a power law with a slope of 1.73^{+0.20}_{-0.19}, while the durations (τ in seconds) by a lognormal function with a mean log (τ)=3.39^{+0.27}_{-0.24} and an intrinsic dispersion σ =0.28^{+0.08}_{-0.06}. No significant correlation between the fluence and duration is detected. The apparent positive correlation, as reported previously, is mainly due to the detection bias (i.e. weak flares can be detected only when their durations are short). These results indicate that the simple self-organized criticality model has difficulties in explaining these flares. We further find that bright flares usually have asymmetric light curves with no statistically evident difference/preference between the rising and decaying phases in terms of their spectral/timing properties. Our spectral analysis shows that although a power-law model with a photon index of 2.0 ± 0.4 gives a satisfactory fit to the joint spectra of strong and weak flares, there is weak evidence for a softer spectrum of weaker flares. This work demonstrates the potential to use statistical properties of X-ray flares to probe their trigger and emission mechanisms, as well as the radiation propagation around the black hole.

  5. Data-Based Detection of Potential Terrorist Attacks: Statistical and Graphical Methods

    DTIC Science & Technology

    2010-06-01

    Naren; Vasquez-Robinet, Cecilia; Watkinson, Jonathan: "A General Probabilistic Model of the PCR Process," Applied Mathematics and Computation 182(1...September 2006. Seminar, Measuring the effect of Length biased sampling, Mathematical Sciences Section, National Security Agency, 19 September 2006...Committee on National Statistics, 9 February 2007. Invited seminar, Statistical Tests for Bullet Lead Comparisons, Department of Mathematics , Butler

  6. Significant Linkage for Tourette Syndrome in a Large French Canadian Family

    PubMed Central

    Mérette, Chantal; Brassard, Andrée; Potvin, Anne; Bouvier, Hélène; Rousseau, François; Émond, Claudia; Bissonnette, Luc; Roy, Marc-André; Maziade, Michel; Ott, Jurg; Caron, Chantal

    2000-01-01

    Family and twin studies provide strong evidence that genetic factors are involved in the transmission of Gilles de la Tourette syndrome (TS) and related psychiatric disorders. To detect the underlying susceptibility gene(s) for TS, we performed linkage analysis in one large French Canadian family (127 members) from the Charlevoix region, in which 20 family members were definitely affected by TS and 20 others showed related tic disorders. Using model-based linkage analysis, we observed a LOD score of 3.24 on chromosome 11 (11q23). This result was obtained in a multipoint approach involving marker D11S1377, the marker for which significant linkage disequilibrium with TS recently has been detected in an Afrikaner population. Altogether, 25 markers were studied, and, for level of significance, we derived a criterion that took into account the multiple testing arising from the use of three phenotype definitions and three modes of inheritance, a procedure that yielded a LOD score of 3.18. Hence, even after adjustment for multiple testing, the present study shows statistically significant evidence for genetic linkage with TS. PMID:10986045

  7. Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE.

    PubMed

    Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M; Grün, Sonja

    2017-01-01

    Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis.

  8. Detection and Evaluation of Spatio-Temporal Spike Patterns in Massively Parallel Spike Train Data with SPADE

    PubMed Central

    Quaglio, Pietro; Yegenoglu, Alper; Torre, Emiliano; Endres, Dominik M.; Grün, Sonja

    2017-01-01

    Repeated, precise sequences of spikes are largely considered a signature of activation of cell assemblies. These repeated sequences are commonly known under the name of spatio-temporal patterns (STPs). STPs are hypothesized to play a role in the communication of information in the computational process operated by the cerebral cortex. A variety of statistical methods for the detection of STPs have been developed and applied to electrophysiological recordings, but such methods scale poorly with the current size of available parallel spike train recordings (more than 100 neurons). In this work, we introduce a novel method capable of overcoming the computational and statistical limits of existing analysis techniques in detecting repeating STPs within massively parallel spike trains (MPST). We employ advanced data mining techniques to efficiently extract repeating sequences of spikes from the data. Then, we introduce and compare two alternative approaches to distinguish statistically significant patterns from chance sequences. The first approach uses a measure known as conceptual stability, of which we investigate a computationally cheap approximation for applications to such large data sets. The second approach is based on the evaluation of pattern statistical significance. In particular, we provide an extension to STPs of a method we recently introduced for the evaluation of statistical significance of synchronous spike patterns. The performance of the two approaches is evaluated in terms of computational load and statistical power on a variety of artificial data sets that replicate specific features of experimental data. Both methods provide an effective and robust procedure for detection of STPs in MPST data. The method based on significance evaluation shows the best overall performance, although at a higher computational cost. We name the novel procedure the spatio-temporal Spike PAttern Detection and Evaluation (SPADE) analysis. PMID:28596729

  9. [The application of the prospective space-time statistic in early warning of infectious disease].

    PubMed

    Yin, Fei; Li, Xiao-Song; Feng, Zi-Jian; Ma, Jia-Qi

    2007-06-01

    To investigate the application of prospective space-time scan statistic in the early stage of detecting infectious disease outbreaks. The prospective space-time scan statistic was tested by mimicking daily prospective analyses of bacillary dysentery data of Chengdu city in 2005 (3212 cases in 102 towns and villages). And the results were compared with that of purely temporal scan statistic. The prospective space-time scan statistic could give specific messages both in spatial and temporal. The results of June indicated that the prospective space-time scan statistic could timely detect the outbreaks that started from the local site, and the early warning message was powerful (P = 0.007). When the merely temporal scan statistic for detecting the outbreak was sent two days later, and the signal was less powerful (P = 0.039). The prospective space-time scan statistic could make full use of the spatial and temporal information in infectious disease data and could timely and effectively detect the outbreaks that start from the local sites. The prospective space-time scan statistic could be an important tool for local and national CDC to set up early detection surveillance systems.

  10. Autologous Doping with Cryopreserved Red Blood Cells – Effects on Physical Performance and Detection by Multivariate Statistics

    PubMed Central

    Malm, Christer B.; Khoo, Nelson S.; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas

    2016-01-01

    The discovery of erythropoietin (EPO) simplified blood doping in sports, but improved detection methods, for EPO has forced cheating athletes to return to blood transfusion. Autologous blood transfusion with cryopreserved red blood cells (RBCs) is the method of choice, because no valid method exists to accurately detect such event. In endurance sports, it can be estimated that elite athletes improve performance by up to 3% with blood doping, regardless of method. Valid detection methods for autologous blood doping is important to maintain credibility of athletic performances. Recreational male (N = 27) and female (N = 11) athletes served as Transfusion (N = 28) and Control (N = 10) subjects in two different transfusion settings. Hematological variables and physical performance were measured before donation of 450 or 900 mL whole blood, and until four weeks after re-infusion of the cryopreserved RBC fraction. Blood was analyzed for transferrin, iron, Hb, EVF, MCV, MCHC, reticulocytes, leucocytes and EPO. Repeated measures multivariate analysis of variance (MANOVA) and pattern recognition using Principal Component Analysis (PCA) and Orthogonal Projections of Latent Structures (OPLS) discriminant analysis (DA) investigated differences between Control and Transfusion groups over time. Significant increase in performance (15 ± 8%) and VO2max (17 ± 10%) (mean ± SD) could be measured 48 h after RBC re-infusion, and remained increased for up to four weeks in some subjects. In total, 533 blood samples were included in the study (Clean = 220, Transfused = 313). In response to blood transfusion, the largest change in hematological variables occurred 48 h after blood donation, when Control and Transfused groups could be separated with OPLS-DA (R2 = 0.76/Q2 = 0.59). RBC re-infusion resulted in the best model (R2 = 0.40/Q2 = 0.10) at the first sampling point (48 h), predicting one false positive and one false negative. Over all, a 25% and 86% false positives ratio was

  11. Statistical Tests of Reliability of NDE

    NASA Technical Reports Server (NTRS)

    Baaklini, George Y.; Klima, Stanley J.; Roth, Don J.; Kiser, James D.

    1987-01-01

    Capabilities of advanced material-testing techniques analyzed. Collection of four reports illustrates statistical method for characterizing flaw-detecting capabilities of sophisticated nondestructive evaluation (NDE). Method used to determine reliability of several state-of-the-art NDE techniques for detecting failure-causing flaws in advanced ceramic materials considered for use in automobiles, airplanes, and space vehicles.

  12. A statistical study of decaying kink oscillations detected using SDO/AIA

    NASA Astrophysics Data System (ADS)

    Goddard, C. R.; Nisticò, G.; Nakariakov, V. M.; Zimovets, I. V.

    2016-01-01

    Context. Despite intensive studies of kink oscillations of coronal loops in the last decade, a large-scale statistically significant investigation of the oscillation parameters has not been made using data from the Solar Dynamics Observatory (SDO). Aims: We carry out a statistical study of kink oscillations using extreme ultraviolet imaging data from a previously compiled catalogue. Methods: We analysed 58 kink oscillation events observed by the Atmospheric Imaging Assembly (AIA) on board SDO during its first four years of operation (2010-2014). Parameters of the oscillations, including the initial apparent amplitude, period, length of the oscillating loop, and damping are studied for 120 individual loop oscillations. Results: Analysis of the initial loop displacement and oscillation amplitude leads to the conclusion that the initial loop displacement prescribes the initial amplitude of oscillation in general. The period is found to scale with the loop length, and a linear fit of the data cloud gives a kink speed of Ck = (1330 ± 50) km s-1. The main body of the data corresponds to kink speeds in the range Ck = (800-3300) km s-1. Measurements of 52 exponential damping times were made, and it was noted that at least 21 of the damping profiles may be better approximated by a combination of non-exponential and exponential profiles rather than a purely exponential damping envelope. There are nine additional cases where the profile appears to be purely non-exponential and no damping time was measured. A scaling of the exponential damping time with the period is found, following the previously established linear scaling between these two parameters.

  13. MRI/US fusion-guided prostate biopsy allows for equivalent cancer detection with significantly fewer needle cores in biopsy-naive men

    PubMed Central

    Yarlagadda, Vidhush K.; Lai, Win Shun; Gordetsky, Jennifer B.; Porter, Kristin K.; Nix, Jeffrey W.; Thomas, John V.; Rais-Bahrami, Soroush

    2018-01-01

    PURPOSE We aimed to investigate the efficiency and cancer detection of magnetic resonance imaging (MRI)/ultrasonography (US) fusion-guided prostate biopsy in a cohort of biopsy-naive men compared with standard-of-care systematic extended sextant transrectal ultrasonography (TRUS)-guided biopsy. METHODS From 2014 to 2016, 72 biopsy-naive men referred for initial prostate cancer evaluation who underwent MRI of the prostate were prospectively evaluated. Retrospective review was performed on 69 patients with lesions suspicious for malignancy who underwent MRI/US fusion-guided biopsy in addition to systematic extended sextant biopsy. Biometric, imaging, and pathology data from both the MRI-targeted biopsies and systematic biopsies were analyzed and compared. RESULTS There were no significant differences in overall prostate cancer detection when comparing MRI-targeted biopsies to standard systematic biopsies (P = 0.39). Furthermore, there were no significant differences in the distribution of severity of cancers based on grade groups in cases with cancer detection (P = 0.68). However, significantly fewer needle cores were taken during the MRI/US fusion-guided biopsy compared with systematic biopsy (63% less cores sampled, P < 0.001) CONCLUSION In biopsy-naive men, MRI/US fusion-guided prostate biopsy offers equal prostate cancer detection compared with systematic TRUS-guided biopsy with significantly fewer tissue cores using the targeted technique. This approach can potentially reduce morbidity in the future if used instead of systematic biopsy without sacrificing the ability to detect prostate cancer, particularly in cases with higher grade disease. PMID:29770762

  14. A Statistical Model for Multilingual Entity Detection and Tracking

    DTIC Science & Technology

    2004-01-01

    tomatic Content Extraction ( ACE ) evaluation achieved top-tier results in all three evaluation languages. 1 Introduction Detecting entities, whether named...of com- bining the detected mentions into groups of references to the same object. The work presented here is motivated by the ACE eval- uation...Entropy (MaxEnt henceforth) (Berger et al., 1996) and Robust Risk Minimization (RRM henceforth) 1For a description of the ACE program see http

  15. Recovering incomplete data using Statistical Multiple Imputations (SMI): a case study in environmental chemistry.

    PubMed

    Mercer, Theresa G; Frostick, Lynne E; Walmsley, Anthony D

    2011-10-15

    This paper presents a statistical technique that can be applied to environmental chemistry data where missing values and limit of detection levels prevent the application of statistics. A working example is taken from an environmental leaching study that was set up to determine if there were significant differences in levels of leached arsenic (As), chromium (Cr) and copper (Cu) between lysimeters containing preservative treated wood waste and those containing untreated wood. Fourteen lysimeters were setup and left in natural conditions for 21 weeks. The resultant leachate was analysed by ICP-OES to determine the As, Cr and Cu concentrations. However, due to the variation inherent in each lysimeter combined with the limits of detection offered by ICP-OES, the collected quantitative data was somewhat incomplete. Initial data analysis was hampered by the number of 'missing values' in the data. To recover the dataset, the statistical tool of Statistical Multiple Imputation (SMI) was applied, and the data was re-analysed successfully. It was demonstrated that using SMI did not affect the variance in the data, but facilitated analysis of the complete dataset. Copyright © 2011 Elsevier B.V. All rights reserved.

  16. Experimental and environmental factors affect spurious detection of ecological thresholds

    USGS Publications Warehouse

    Daily, Jonathan P.; Hitt, Nathaniel P.; Smith, David; Snyder, Craig D.

    2012-01-01

    Threshold detection methods are increasingly popular for assessing nonlinear responses to environmental change, but their statistical performance remains poorly understood. We simulated linear change in stream benthic macroinvertebrate communities and evaluated the performance of commonly used threshold detection methods based on model fitting (piecewise quantile regression [PQR]), data partitioning (nonparametric change point analysis [NCPA]), and a hybrid approach (significant zero crossings [SiZer]). We demonstrated that false detection of ecological thresholds (type I errors) and inferences on threshold locations are influenced by sample size, rate of linear change, and frequency of observations across the environmental gradient (i.e., sample-environment distribution, SED). However, the relative importance of these factors varied among statistical methods and between inference types. False detection rates were influenced primarily by user-selected parameters for PQR (τ) and SiZer (bandwidth) and secondarily by sample size (for PQR) and SED (for SiZer). In contrast, the location of reported thresholds was influenced primarily by SED. Bootstrapped confidence intervals for NCPA threshold locations revealed strong correspondence to SED. We conclude that the choice of statistical methods for threshold detection should be matched to experimental and environmental constraints to minimize false detection rates and avoid spurious inferences regarding threshold location.

  17. Planck 2015 results. XVI. Isotropy and statistics of the CMB

    NASA Astrophysics Data System (ADS)

    Planck Collaboration; Ade, P. A. R.; Aghanim, N.; Akrami, Y.; Aluri, P. K.; Arnaud, M.; Ashdown, M.; Aumont, J.; Baccigalupi, C.; Banday, A. J.; Barreiro, R. B.; Bartolo, N.; Basak, S.; Battaner, E.; Benabed, K.; Benoît, A.; Benoit-Lévy, A.; Bernard, J.-P.; Bersanelli, M.; Bielewicz, P.; Bock, J. J.; Bonaldi, A.; Bonavera, L.; Bond, J. R.; Borrill, J.; Bouchet, F. R.; Boulanger, F.; Bucher, M.; Burigana, C.; Butler, R. C.; Calabrese, E.; Cardoso, J.-F.; Casaponsa, B.; Catalano, A.; Challinor, A.; Chamballu, A.; Chiang, H. C.; Christensen, P. R.; Church, S.; Clements, D. L.; Colombi, S.; Colombo, L. P. L.; Combet, C.; Contreras, D.; Couchot, F.; Coulais, A.; Crill, B. P.; Cruz, M.; Curto, A.; Cuttaia, F.; Danese, L.; Davies, R. D.; Davis, R. J.; de Bernardis, P.; de Rosa, A.; de Zotti, G.; Delabrouille, J.; Désert, F.-X.; Diego, J. M.; Dole, H.; Donzelli, S.; Doré, O.; Douspis, M.; Ducout, A.; Dupac, X.; Efstathiou, G.; Elsner, F.; Enßlin, T. A.; Eriksen, H. K.; Fantaye, Y.; Fergusson, J.; Fernandez-Cobos, R.; Finelli, F.; Forni, O.; Frailis, M.; Fraisse, A. A.; Franceschi, E.; Frejsel, A.; Frolov, A.; Galeotta, S.; Galli, S.; Ganga, K.; Gauthier, C.; Ghosh, T.; Giard, M.; Giraud-Héraud, Y.; Gjerløw, E.; González-Nuevo, J.; Górski, K. M.; Gratton, S.; Gregorio, A.; Gruppuso, A.; Gudmundsson, J. E.; Hansen, F. K.; Hanson, D.; Harrison, D. L.; Henrot-Versillé, S.; Hernández-Monteagudo, C.; Herranz, D.; Hildebrandt, S. R.; Hivon, E.; Hobson, M.; Holmes, W. A.; Hornstrup, A.; Hovest, W.; Huang, Z.; Huffenberger, K. M.; Hurier, G.; Jaffe, A. H.; Jaffe, T. R.; Jones, W. C.; Juvela, M.; Keihänen, E.; Keskitalo, R.; Kim, J.; Kisner, T. S.; Knoche, J.; Kunz, M.; Kurki-Suonio, H.; Lagache, G.; Lähteenmäki, A.; Lamarre, J.-M.; Lasenby, A.; Lattanzi, M.; Lawrence, C. R.; Leonardi, R.; Lesgourgues, J.; Levrier, F.; Liguori, M.; Lilje, P. B.; Linden-Vørnle, M.; Liu, H.; López-Caniego, M.; Lubin, P. M.; Macías-Pérez, J. F.; Maggio, G.; Maino, D.; Mandolesi, N.; Mangilli, A.; Marinucci, D.; Maris, M.; Martin, P. G.; Martínez-González, E.; Masi, S.; Matarrese, S.; McGehee, P.; Meinhold, P. R.; Melchiorri, A.; Mendes, L.; Mennella, A.; Migliaccio, M.; Mikkelsen, K.; Mitra, S.; Miville-Deschênes, M.-A.; Molinari, D.; Moneti, A.; Montier, L.; Morgante, G.; Mortlock, D.; Moss, A.; Munshi, D.; Murphy, J. A.; Naselsky, P.; Nati, F.; Natoli, P.; Netterfield, C. B.; Nørgaard-Nielsen, H. U.; Noviello, F.; Novikov, D.; Novikov, I.; Oxborrow, C. A.; Paci, F.; Pagano, L.; Pajot, F.; Pant, N.; Paoletti, D.; Pasian, F.; Patanchon, G.; Pearson, T. J.; Perdereau, O.; Perotto, L.; Perrotta, F.; Pettorino, V.; Piacentini, F.; Piat, M.; Pierpaoli, E.; Pietrobon, D.; Plaszczynski, S.; Pointecouteau, E.; Polenta, G.; Popa, L.; Pratt, G. W.; Prézeau, G.; Prunet, S.; Puget, J.-L.; Rachen, J. P.; Rebolo, R.; Reinecke, M.; Remazeilles, M.; Renault, C.; Renzi, A.; Ristorcelli, I.; Rocha, G.; Rosset, C.; Rossetti, M.; Rotti, A.; Roudier, G.; Rubiño-Martín, J. A.; Rusholme, B.; Sandri, M.; Santos, D.; Savelainen, M.; Savini, G.; Scott, D.; Seiffert, M. D.; Shellard, E. P. S.; Souradeep, T.; Spencer, L. D.; Stolyarov, V.; Stompor, R.; Sudiwala, R.; Sunyaev, R.; Sutton, D.; Suur-Uski, A.-S.; Sygnet, J.-F.; Tauber, J. A.; Terenzi, L.; Toffolatti, L.; Tomasi, M.; Tristram, M.; Trombetti, T.; Tucci, M.; Tuovinen, J.; Valenziano, L.; Valiviita, J.; Van Tent, B.; Vielva, P.; Villa, F.; Wade, L. A.; Wandelt, B. D.; Wehus, I. K.; Yvon, D.; Zacchei, A.; Zibin, J. P.; Zonca, A.

    2016-09-01

    We test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect our studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The "Cold Spot" is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.

  18. Formalized Conflicts Detection Based on the Analysis of Multiple Emails: An Approach Combining Statistics and Ontologies

    NASA Astrophysics Data System (ADS)

    Zakaria, Chahnez; Curé, Olivier; Salzano, Gabriella; Smaïli, Kamel

    In Computer Supported Cooperative Work (CSCW), it is crucial for project leaders to detect conflicting situations as early as possible. Generally, this task is performed manually by studying a set of documents exchanged between team members. In this paper, we propose a full-fledged automatic solution that identifies documents, subjects and actors involved in relational conflicts. Our approach detects conflicts in emails, probably the most popular type of documents in CSCW, but the methods used can handle other text-based documents. These methods rely on the combination of statistical and ontological operations. The proposed solution is decomposed in several steps: (i) we enrich a simple negative emotion ontology with terms occuring in the corpus of emails, (ii) we categorize each conflicting email according to the concepts of this ontology and (iii) we identify emails, subjects and team members involved in conflicting emails using possibilistic description logic and a set of proposed measures. Each of these steps are evaluated and validated on concrete examples. Moreover, this approach's framework is generic and can be easily adapted to domains other than conflicts, e.g. security issues, and extended with operations making use of our proposed set of measures.

  19. Automatic brain tumor detection in MRI: methodology and statistical validation

    NASA Astrophysics Data System (ADS)

    Iftekharuddin, Khan M.; Islam, Mohammad A.; Shaik, Jahangheer; Parra, Carlos; Ogg, Robert

    2005-04-01

    Automated brain tumor segmentation and detection are immensely important in medical diagnostics because it provides information associated to anatomical structures as well as potential abnormal tissue necessary to delineate appropriate surgical planning. In this work, we propose a novel automated brain tumor segmentation technique based on multiresolution texture information that combines fractal Brownian motion (fBm) and wavelet multiresolution analysis. Our wavelet-fractal technique combines the excellent multiresolution localization property of wavelets to texture extraction of fractal. We prove the efficacy of our technique by successfully segmenting pediatric brain MR images (MRIs) from St. Jude Children"s Research Hospital. We use self-organizing map (SOM) as our clustering tool wherein we exploit both pixel intensity and multiresolution texture features to obtain segmented tumor. Our test results show that our technique successfully segments abnormal brain tissues in a set of T1 images. In the next step, we design a classifier using Feed-Forward (FF) neural network to statistically validate the presence of tumor in MRI using both the multiresolution texture and the pixel intensity features. We estimate the corresponding receiver operating curve (ROC) based on the findings of true positive fractions and false positive fractions estimated from our classifier at different threshold values. An ROC, which can be considered as a gold standard to prove the competence of a classifier, is obtained to ascertain the sensitivity and specificity of our classifier. We observe that at threshold 0.4 we achieve true positive value of 1.0 (100%) sacrificing only 0.16 (16%) false positive value for the set of 50 T1 MRI analyzed in this experiment.

  20. Causality in Statistical Power: Isomorphic Properties of Measurement, Research Design, Effect Size, and Sample Size.

    PubMed

    Heidel, R Eric

    2016-01-01

    Statistical power is the ability to detect a significant effect, given that the effect actually exists in a population. Like most statistical concepts, statistical power tends to induce cognitive dissonance in hepatology researchers. However, planning for statistical power by an a priori sample size calculation is of paramount importance when designing a research study. There are five specific empirical components that make up an a priori sample size calculation: the scale of measurement of the outcome, the research design, the magnitude of the effect size, the variance of the effect size, and the sample size. A framework grounded in the phenomenon of isomorphism, or interdependencies amongst different constructs with similar forms, will be presented to understand the isomorphic effects of decisions made on each of the five aforementioned components of statistical power.

  1. Antecedents of students' achievement in statistics

    NASA Astrophysics Data System (ADS)

    Awaludin, Izyan Syazana; Razak, Ruzanna Ab; Harris, Hezlin; Selamat, Zarehan

    2015-02-01

    The applications of statistics in most fields have been vast. Many degree programmes at local universities require students to enroll in at least one statistics course. The standard of these courses varies across different degree programmes. This is because of students' diverse academic backgrounds in which some comes far from the field of statistics. The high failure rate in statistics courses for non-science stream students had been concerning every year. The purpose of this research is to investigate the antecedents of students' achievement in statistics. A total of 272 students participated in the survey. Multiple linear regression was applied to examine the relationship between the factors and achievement. We found that statistics anxiety was a significant predictor of students' achievement. We also found that students' age has significant effect to achievement. Older students are more likely to achieve lowers scores in statistics. Student's level of study also has a significant impact on their achievement in statistics.

  2. A critique of Rasch residual fit statistics.

    PubMed

    Karabatsos, G

    2000-01-01

    In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.

  3. Automated X-ray Flare Detection with GOES, 2003-2017: The Where of the Flare Catalog and Early Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Loftus, K.; Saar, S. H.

    2017-12-01

    NOAA's Space Weather Prediction Center publishes the current definitive public soft X-ray flare catalog, derived using data from the X-ray Sensor (XRS) on the Geostationary Operational Environmental Satellites (GOES) series. However, this flare list has shortcomings for use in scientific analysis. Its detection algorithm has drawbacks (missing smaller flux events and poorly characterizing complex ones), and its event timing is imprecise (peak and end times are frequently marked incorrectly, and hence peak fluxes are underestimated). It also lacks explicit and regular spatial location data. We present a new database, "The Where of the Flare" catalog, which improves upon the precision of NOAA's current version, with more consistent and accurate spatial locations, timings, and peak fluxes. Our catalog also offers several new parameters per flare (e.g. background flux, integrated flux). We use data from the GOES Solar X-ray Imager (SXI) for spatial flare locating. Our detection algorithm is more sensitive to smaller flux events close to the background level and more precisely marks flare start/peak/end times so that integrated flux can be accurately calculated. It also decomposes complex events (with multiple overlapping flares) by constituent peaks. The catalog dates from the operation of the first SXI instrument in 2003 until the present. We give an overview of the detection algorithm's design, review the catalog's features, and discuss preliminary statistical analyses of light curve morphology, complex event decomposition, and integrated flux distribution. The Where of the Flare catalog will be useful in studying X-ray flare statistics and correlating X-ray flare properties with other observations. This work was supported by Contract #8100002705 from Lockheed-Martin to SAO in support of the science of NASA's IRIS mission.

  4. Statistical Signal Process in R Language in the Pharmacovigilance Programme of India.

    PubMed

    Kumar, Aman; Ahuja, Jitin; Shrivastava, Tarani Prakash; Kumar, Vipin; Kalaiselvan, Vivekanandan

    2018-05-01

    The Ministry of Health & Family Welfare, Government of India, initiated the Pharmacovigilance Programme of India (PvPI) in July 2010. The purpose of the PvPI is to collect data on adverse reactions due to medications, analyze it, and use the reference to recommend informed regulatory intervention, besides communicating the risk to health care professionals and the public. The goal of the present study was to apply statistical tools to find the relationship between drugs and ADRs for signal detection by R programming. Four statistical parameters were proposed for quantitative signal detection. These 4 parameters are IC 025 , PRR and PRR lb , chi-square, and N 11 ; we calculated these 4 values using R programming. We analyzed 78,983 drug-ADR combinations, and the total count of drug-ADR combination was 4,20,060. During the calculation of the statistical parameter, we use 3 variables: (1) N 11 (number of counts), (2) N 1. (Drug margin), and (3) N .1 (ADR margin). The structure and calculation of these 4 statistical parameters in R language are easily understandable. On the basis of the IC value (IC value >0), out of the 78,983 drug-ADR combination (drug-ADR combination), we found the 8,667 combinations to be significantly associated. The calculation of statistical parameters in R language is time saving and allows to easily identify new signals in the Indian ICSR (Individual Case Safety Reports) database.

  5. A Simple and Robust Statistical Test for Detecting the Presence of Recombination

    PubMed Central

    Bruen, Trevor C.; Philippe, Hervé; Bryant, David

    2006-01-01

    Recombination is a powerful evolutionary force that merges historically distinct genotypes. But the extent of recombination within many organisms is unknown, and even determining its presence within a set of homologous sequences is a difficult question. Here we develop a new statistic, Φw, that can be used to test for recombination. We show through simulation that our test can discriminate effectively between the presence and absence of recombination, even in diverse situations such as exponential growth (star-like topologies) and patterns of substitution rate correlation. A number of other tests, Max χ2, NSS, a coalescent-based likelihood permutation test (from LDHat), and correlation of linkage disequilibrium (both r2 and |D′|) with distance, all tend to underestimate the presence of recombination under strong population growth. Moreover, both Max χ2 and NSS falsely infer the presence of recombination under a simple model of mutation rate correlation. Results on empirical data show that our test can be used to detect recombination between closely as well as distantly related samples, regardless of the suspected rate of recombination. The results suggest that Φw is one of the best approaches to distinguish recurrent mutation from recombination in a wide variety of circumstances. PMID:16489234

  6. A spatial scan statistic for nonisotropic two-level risk cluster.

    PubMed

    Li, Xiao-Zhou; Wang, Jin-Feng; Yang, Wei-Zhong; Li, Zhong-Jie; Lai, Sheng-Jie

    2012-01-30

    Spatial scan statistic methods are commonly used for geographical disease surveillance and cluster detection. The standard spatial scan statistic does not model any variability in the underlying risks of subregions belonging to a detected cluster. For a multilevel risk cluster, the isotonic spatial scan statistic could model a centralized high-risk kernel in the cluster. Because variations in disease risks are anisotropic owing to different social, economical, or transport factors, the real high-risk kernel will not necessarily take the central place in a whole cluster area. We propose a spatial scan statistic for a nonisotropic two-level risk cluster, which could be used to detect a whole cluster and a noncentralized high-risk kernel within the cluster simultaneously. The performance of the three methods was evaluated through an intensive simulation study. Our proposed nonisotropic two-level method showed better power and geographical precision with two-level risk cluster scenarios, especially for a noncentralized high-risk kernel. Our proposed method is illustrated using the hand-foot-mouth disease data in Pingdu City, Shandong, China in May 2009, compared with two other methods. In this practical study, the nonisotropic two-level method is the only way to precisely detect a high-risk area in a detected whole cluster. Copyright © 2011 John Wiley & Sons, Ltd.

  7. Statistical analysis of atom probe data: detecting the early stages of solute clustering and/or co-segregation.

    PubMed

    Hyde, J M; Cerezo, A; Williams, T J

    2009-04-01

    Statistical analysis of atom probe data has improved dramatically in the last decade and it is now possible to determine the size, the number density and the composition of individual clusters or precipitates such as those formed in reactor pressure vessel (RPV) steels during irradiation. However, the characterisation of the onset of clustering or co-segregation is more difficult and has traditionally focused on the use of composition frequency distributions (for detecting clustering) and contingency tables (for detecting co-segregation). In this work, the authors investigate the possibility of directly examining the neighbourhood of each individual solute atom as a means of identifying the onset of solute clustering and/or co-segregation. The methodology involves comparing the mean observed composition around a particular type of solute with that expected from the overall composition of the material. The methodology has been applied to atom probe data obtained from several irradiated RPV steels. The results show that the new approach is more sensitive to fine scale clustering and co-segregation than that achievable using composition frequency distribution and contingency table analyses.

  8. Statistical Algorithms for Designing Geophysical Surveys to Detect UXO Target Areas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, Robert F.; Carlson, Deborah K.; Gilbert, Richard O.

    2005-07-29

    The U.S. Department of Defense is in the process of assessing and remediating closed, transferred, and transferring military training ranges across the United States. Many of these sites have areas that are known to contain unexploded ordnance (UXO). Other sites or portions of sites are not expected to contain UXO, but some verification of this expectation using geophysical surveys is needed. Many sites are so large that it is often impractical and/or cost prohibitive to perform surveys over 100% of the site. In that case, it is particularly important to be explicit about the performance required of the survey. Thismore » article presents the statistical algorithms developed to support the design of geophysical surveys along transects (swaths) to find target areas (TAs) of anomalous geophysical readings that may indicate the presence of UXO. The algorithms described here determine 1) the spacing between transects that should be used for the surveys to achieve a specified probability of traversing the TA, 2) the probability of both traversing and detecting a TA of anomalous geophysical readings when the spatial density of anomalies within the TA is either uniform (unchanging over space) or has a bivariate normal distribution, and 3) the probability that a TA exists when it was not found by surveying along transects. These algorithms have been implemented in the Visual Sample Plan (VSP) software to develop cost-effective transect survey designs that meet performance objectives.« less

  9. Statistical Algorithms for Designing Geophysical Surveys to Detect UXO Target Areas

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    O'Brien, Robert F.; Carlson, Deborah K.; Gilbert, Richard O.

    2005-07-28

    The U.S. Department of Defense is in the process of assessing and remediating closed, transferred, and transferring military training ranges across the United States. Many of these sites have areas that are known to contain unexploded ordnance (UXO). Other sites or portions of sites are not expected to contain UXO, but some verification of this expectation using geophysical surveys is needed. Many sites are so large that it is often impractical and/or cost prohibitive to perform surveys over 100% of the site. In such cases, it is particularly important to be explicit about the performance required of the surveys. Thismore » article presents the statistical algorithms developed to support the design of geophysical surveys along transects (swaths) to find target areas (TAs) of anomalous geophysical readings that may indicate the presence of UXO. The algorithms described here determine (1) the spacing between transects that should be used for the surveys to achieve a specified probability of traversing the TA, (2) the probability of both traversing and detecting a TA of anomalous geophysical readings when the spatial density of anomalies within the TA is either uniform (unchanging over space) or has a bivariate normal distribution, and (3) the probability that a TA exists when it was not found by surveying along transects. These algorithms have been implemented in the Visual Sample Plan (VSP) software to develop cost-effective transect survey designs that meet performance objectives.« less

  10. Modeling And Detecting Anomalies In Scada Systems

    NASA Astrophysics Data System (ADS)

    Svendsen, Nils; Wolthusen, Stephen

    The detection of attacks and intrusions based on anomalies is hampered by the limits of specificity underlying the detection techniques. However, in the case of many critical infrastructure systems, domain-specific knowledge and models can impose constraints that potentially reduce error rates. At the same time, attackers can use their knowledge of system behavior to mask their manipulations, causing adverse effects to observed only after a significant period of time. This paper describes elementary statistical techniques that can be applied to detect anomalies in critical infrastructure networks. A SCADA system employed in liquefied natural gas (LNG) production is used as a case study.

  11. Rainfall statistics, stationarity, and climate change

    NASA Astrophysics Data System (ADS)

    Sun, Fubao; Roderick, Michael L.; Farquhar, Graham D.

    2018-03-01

    There is a growing research interest in the detection of changes in hydrologic and climatic time series. Stationarity can be assessed using the autocorrelation function, but this is not yet common practice in hydrology and climate. Here, we use a global land-based gridded annual precipitation (hereafter P) database (1940–2009) and find that the lag 1 autocorrelation coefficient is statistically significant at around 14% of the global land surface, implying nonstationary behavior (90% confidence). In contrast, around 76% of the global land surface shows little or no change, implying stationary behavior. We use these results to assess change in the observed P over the most recent decade of the database. We find that the changes for most (84%) grid boxes are within the plausible bounds of no significant change at the 90% CI. The results emphasize the importance of adequately accounting for natural variability when assessing change.

  12. Construction of cosmic string induced temperature anisotropy maps with CMBFAST and statistical analysis

    NASA Astrophysics Data System (ADS)

    Simatos, N.; Perivolaropoulos, L.

    2001-01-01

    We use the publicly available code CMBFAST, as modified by Pogosian and Vachaspati, to simulate the effects of wiggly cosmic strings on the cosmic microwave background (CMB). Using the modified CMBFAST code, which takes into account vector modes and models wiggly cosmic strings by the one-scale model, we go beyond the angular power spectrum to construct CMB temperature maps with a resolution of a few degrees. The statistics of these maps are then studied using conventional and recently proposed statistical tests optimized for the detection of hidden temperature discontinuities induced by the Gott-Kaiser-Stebbins effect. We show, however, that these realistic maps cannot be distinguished in a statistically significant way from purely Gaussian maps with an identical power spectrum.

  13. Worry, Intolerance of Uncertainty, and Statistics Anxiety

    ERIC Educational Resources Information Center

    Williams, Amanda S.

    2013-01-01

    Statistics anxiety is a problem for most graduate students. This study investigates the relationship between intolerance of uncertainty, worry, and statistics anxiety. Intolerance of uncertainty was significantly related to worry, and worry was significantly related to three types of statistics anxiety. Six types of statistics anxiety were…

  14. Reducing statistics anxiety and enhancing statistics learning achievement: effectiveness of a one-minute strategy.

    PubMed

    Chiou, Chei-Chang; Wang, Yu-Min; Lee, Li-Tze

    2014-08-01

    Statistical knowledge is widely used in academia; however, statistics teachers struggle with the issue of how to reduce students' statistics anxiety and enhance students' statistics learning. This study assesses the effectiveness of a "one-minute paper strategy" in reducing students' statistics-related anxiety and in improving students' statistics-related achievement. Participants were 77 undergraduates from two classes enrolled in applied statistics courses. An experiment was implemented according to a pretest/posttest comparison group design. The quasi-experimental design showed that the one-minute paper strategy significantly reduced students' statistics anxiety and improved students' statistics learning achievement. The strategy was a better instructional tool than the textbook exercise for reducing students' statistics anxiety and improving students' statistics achievement.

  15. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    USGS Publications Warehouse

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  16. RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2007-01-01

    Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request. PMID:17961253

  17. Clinicopathological significance of c-MYC in esophageal squamous cell carcinoma.

    PubMed

    Lian, Yu; Niu, Xiangdong; Cai, Hui; Yang, Xiaojun; Ma, Haizhong; Ma, Shixun; Zhang, Yupeng; Chen, Yifeng

    2017-07-01

    Esophageal squamous cell carcinoma is one of the most common malignant tumors. The oncogene c-MYC is thought to be important in the initiation, promotion, and therapy resistance of cancer. In this study, we aim to investigate the clinicopathologic roles of c-MYC in esophageal squamous cell carcinoma tissue. This study is aimed at discovering and analyzing c-MYC expression in a series of human esophageal tissues. A total of 95 esophageal squamous cell carcinoma samples were analyzed by the western blotting and immunohistochemistry techniques. Then, correlation of c-MYC expression with clinicopathological features of esophageal squamous cell carcinoma patients was statistically analyzed. In most esophageal squamous cell carcinoma cases, the c-MYC expression was positive in tumor tissues. The positive rate of c-MYC expression in tumor tissues was 61.05%, obviously higher than the adjacent normal tissues (8.42%, 8/92) and atypical hyperplasia tissues (19.75%, 16/95). There was a statistical difference among adjacent normal tissues, atypical hyperplasia tissues, and tumor tissues. Overexpression of the c-MYC was detected in 61.05% (58/95) esophageal squamous cell carcinomas, which was significantly correlated with the degree of differentiation (p = 0.004). The positive rate of c-MYC expression was 40.0% in well-differentiated esophageal tissues, with a significantly statistical difference (p = 0.004). The positive rate of c-MYC was 41.5% in T1 + T2 esophageal tissues and 74.1% in T3 + T4 esophageal tissues, with a significantly statistical difference (p = 0.001). The positive rate of c-MYC was 45.0% in I + II esophageal tissues and 72.2% in III + IV esophageal tissues, with a significantly statistical difference (p = 0.011). The c-MYC expression strongly correlated with clinical staging (p = 0.011), differentiation degree (p = 0.004), lymph node metastasis (p = 0.003), and invasion depth (p = 0.001) of patients with esophageal squamous cell carcinoma. The c-MYC was

  18. Statistical analysis of nonmonotonic dose-response relationships: research design and analysis of nasal cell proliferation in rats exposed to formaldehyde.

    PubMed

    Gaylor, David W; Lutz, Werner K; Conolly, Rory B

    2004-01-01

    Statistical analyses of nonmonotonic dose-response curves are proposed, experimental designs to detect low-dose effects of J-shaped curves are suggested, and sample sizes are provided. For quantal data such as cancer incidence rates, much larger numbers of animals are required than for continuous data such as biomarker measurements. For example, 155 animals per dose group are required to have at least an 80% chance of detecting a decrease from a 20% incidence in controls to an incidence of 10% at a low dose. For a continuous measurement, only 14 animals per group are required to have at least an 80% chance of detecting a change of the mean by one standard deviation of the control group. Experimental designs based on three dose groups plus controls are discussed to detect nonmonotonicity or to estimate the zero equivalent dose (ZED), i.e., the dose that produces a response equal to the average response in the controls. Cell proliferation data in the nasal respiratory epithelium of rats exposed to formaldehyde by inhalation are used to illustrate the statistical procedures. Statistically significant departures from a monotonic dose response were obtained for time-weighted average labeling indices with an estimated ZED at a formaldehyde dose of 5.4 ppm, with a lower 95% confidence limit of 2.7 ppm. It is concluded that demonstration of a statistically significant bi-phasic dose-response curve, together with estimation of the resulting ZED, could serve as a point-of departure in establishing a reference dose for low-dose risk assessment.

  19. A PLSPM-based test statistic for detecting gene-gene co-association in genome-wide association study with case-control design.

    PubMed

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods.

  20. A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

    PubMed Central

    Zhang, Xiaoshuai; Yang, Xiaowei; Yuan, Zhongshang; Liu, Yanxun; Li, Fangyu; Peng, Bin; Zhu, Dianwen; Zhao, Jinghua; Xue, Fuzhong

    2013-01-01

    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods. PMID:23620809

  1. A significant-loophole-free test of Bell's theorem with entangled photons

    NASA Astrophysics Data System (ADS)

    Giustina, Marissa; Versteegh, Marijn A. M.; Wengerowsky, Sören; Handsteiner, Johannes; Hochrainer, Armin; Phelan, Kevin; Steinlechner, Fabian; Kofler, Johannes; Larsson, Jan-Åke; Abellán, Carlos; Amaya, Waldimar; Mitchell, Morgan W.; Beyer, Jörn; Gerrits, Thomas; Lita, Adriana E.; Shalm, Lynden K.; Nam, Sae Woo; Scheidl, Thomas; Ursin, Rupert; Wittmann, Bernhard; Zeilinger, Anton

    2017-10-01

    John Bell's theorem of 1964 states that local elements of physical reality, existing independent of measurement, are inconsistent with the predictions of quantum mechanics (Bell, J. S. (1964), Physics (College. Park. Md). Specifically, correlations between measurement results from distant entangled systems would be smaller than predicted by quantum physics. This is expressed in Bell's inequalities. Employing modifications of Bell's inequalities, many experiments have been performed that convincingly support the quantum predictions. Yet, all experiments rely on assumptions, which provide loopholes for a local realist explanation of the measurement. Here we report an experiment with polarization-entangled photons that simultaneously closes the most significant of these loopholes. We use a highly efficient source of entangled photons, distributed these over a distance of 58.5 meters, and implemented rapid random setting generation and high-efficiency detection to observe a violation of a Bell inequality with high statistical significance. The merely statistical probability of our results to occur under local realism is less than 3.74×10-31, corresponding to an 11.5 standard deviation effect.

  2. A novel data-driven learning method for radar target detection in nonstationary environments

    DOE PAGES

    Akcakaya, Murat; Nehorai, Arye; Sen, Satyabrata

    2016-04-12

    Most existing radar algorithms are developed under the assumption that the environment (clutter) is stationary. However, in practice, the characteristics of the clutter can vary enormously depending on the radar-operational scenarios. If unaccounted for, these nonstationary variabilities may drastically hinder the radar performance. Therefore, to overcome such shortcomings, we develop a data-driven method for target detection in nonstationary environments. In this method, the radar dynamically detects changes in the environment and adapts to these changes by learning the new statistical characteristics of the environment and by intelligibly updating its statistical detection algorithm. Specifically, we employ drift detection algorithms to detectmore » changes in the environment; incremental learning, particularly learning under concept drift algorithms, to learn the new statistical characteristics of the environment from the new radar data that become available in batches over a period of time. The newly learned environment characteristics are then integrated in the detection algorithm. Furthermore, we use Monte Carlo simulations to demonstrate that the developed method provides a significant improvement in the detection performance compared with detection techniques that are not aware of the environmental changes.« less

  3. Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

    USGS Publications Warehouse

    Southard, Rodney E.

    2013-01-01

    The weather and precipitation patterns in Missouri vary considerably from year to year. In 2008, the statewide average rainfall was 57.34 inches and in 2012, the statewide average rainfall was 30.64 inches. This variability in precipitation and resulting streamflow in Missouri underlies the necessity for water managers and users to have reliable streamflow statistics and a means to compute select statistics at ungaged locations for a better understanding of water availability. Knowledge of surface-water availability is dependent on the streamflow data that have been collected and analyzed by the U.S. Geological Survey for more than 100 years at approximately 350 streamgages throughout Missouri. The U.S. Geological Survey, in cooperation with the Missouri Department of Natural Resources, computed streamflow statistics at streamgages through the 2010 water year, defined periods of drought and defined methods to estimate streamflow statistics at ungaged locations, and developed regional regression equations to compute selected streamflow statistics at ungaged locations. Streamflow statistics and flow durations were computed for 532 streamgages in Missouri and in neighboring States of Missouri. For streamgages with more than 10 years of record, Kendall’s tau was computed to evaluate for trends in streamflow data. If trends were detected, the variable length method was used to define the period of no trend. Water years were removed from the dataset from the beginning of the record for a streamgage until no trend was detected. Low-flow frequency statistics were then computed for the entire period of record and for the period of no trend if 10 or more years of record were available for each analysis. Three methods are presented for computing selected streamflow statistics at ungaged locations. The first method uses power curve equations developed for 28 selected streams in Missouri and neighboring States that have multiple streamgages on the same streams. Statistical

  4. HYPOTHESIS SETTING AND ORDER STATISTIC FOR ROBUST GENOMIC META-ANALYSIS.

    PubMed

    Song, Chi; Tseng, George C

    2014-01-01

    Meta-analysis techniques have been widely developed and applied in genomic applications, especially for combining multiple transcriptomic studies. In this paper, we propose an order statistic of p-values ( r th ordered p-value, rOP) across combined studies as the test statistic. We illustrate different hypothesis settings that detect gene markers differentially expressed (DE) "in all studies", "in the majority of studies", or "in one or more studies", and specify rOP as a suitable method for detecting DE genes "in the majority of studies". We develop methods to estimate the parameter r in rOP for real applications. Statistical properties such as its asymptotic behavior and a one-sided testing correction for detecting markers of concordant expression changes are explored. Power calculation and simulation show better performance of rOP compared to classical Fisher's method, Stouffer's method, minimum p-value method and maximum p-value method under the focused hypothesis setting. Theoretically, rOP is found connected to the naïve vote counting method and can be viewed as a generalized form of vote counting with better statistical properties. The method is applied to three microarray meta-analysis examples including major depressive disorder, brain cancer and diabetes. The results demonstrate rOP as a more generalizable, robust and sensitive statistical framework to detect disease-related markers.

  5. Multivariate objective response detectors (MORD): statistical tools for multichannel EEG analysis during rhythmic stimulation.

    PubMed

    Felix, Leonardo Bonato; Miranda de Sá, Antonio Mauricio Ferreira Leite; Infantosi, Antonio Fernando Catelli; Yehia, Hani Camille

    2007-03-01

    The presence of cerebral evoked responses can be tested by using objective response detectors. They are statistical tests that provide a threshold above which responses can be assumed to have occurred. The detection power depends on the signal-to-noise ratio (SNR) of the response and the amount of data available. However, the correlation within the background noise could also affect the power of such detectors. For a fixed SNR, the detection can only be improved at the expense of using a longer stretch of signal. This can constitute a limitation, for instance, in monitored surgeries. Alternatively, multivariate objective response detection (MORD) could be used. This work applies two MORD techniques (multiple coherence and multiple component synchrony measure) to EEG data collected during intermittent photic stimulation. They were evaluated throughout Monte Carlo simulations, which also allowed verifying that correlation in the background reduces the detection rate. Considering the N EEG derivations as close as possible to the primary visual cortex, if N = 4, 6 or 8, multiple coherence leads to a statistically significant higher detection rate in comparison with multiple component synchrony measure. With the former, the best performance was obtained with six signals (O1, O2, T5, T6, P3 and P4).

  6. A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

    PubMed

    Liu, Wei; Ding, Jinhui

    2018-04-01

    The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.

  7. Null-space and statistical significance of first-arrival traveltime inversion

    NASA Astrophysics Data System (ADS)

    Morozov, Igor B.

    2004-03-01

    The strong uncertainty inherent in the traveltime inversion of first arrivals from surface sources is usually removed by using a priori constraints or regularization. This leads to the null-space (data-independent model variability) being inadequately sampled, and consequently, model uncertainties may be underestimated in traditional (such as checkerboard) resolution tests. To measure the full null-space model uncertainties, we use unconstrained Monte Carlo inversion and examine the statistics of the resulting model ensembles. In an application to 1-D first-arrival traveltime inversion, the τ-p method is used to build a set of models that are equivalent to the IASP91 model within small, ~0.02 per cent, time deviations. The resulting velocity variances are much larger, ~2-3 per cent within the regions above the mantle discontinuities, and are interpreted as being due to the null-space. Depth-variant depth averaging is required for constraining the velocities within meaningful bounds, and the averaging scalelength could also be used as a measure of depth resolution. Velocity variances show structure-dependent, negative correlation with the depth-averaging scalelength. Neither the smoothest (Herglotz-Wiechert) nor the mean velocity-depth functions reproduce the discontinuities in the IASP91 model; however, the discontinuities can be identified by the increased null-space velocity (co-)variances. Although derived for a 1-D case, the above conclusions also relate to higher dimensions.

  8. Identifying clusters of active transportation using spatial scan statistics.

    PubMed

    Huang, Lan; Stinchcomb, David G; Pickle, Linda W; Dill, Jennifer; Berrigan, David

    2009-08-01

    There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007-2008. Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units.

  9. Identifying Clusters of Active Transportation Using Spatial Scan Statistics

    PubMed Central

    Huang, Lan; Stinchcomb, David G.; Pickle, Linda W.; Dill, Jennifer; Berrigan, David

    2009-01-01

    Background There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Methods Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007–2008. Results Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. Conclusions The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units. PMID:19589451

  10. Solar Flare Hard X-ray Spikes Observed by RHESSI: a Statistical Study

    NASA Astrophysics Data System (ADS)

    Cheng, Jianxia; Qiu, J.; Ding, M.; Wang, H.

    2013-07-01

    Hard X-ray (HXR) spikes refer to fine time structures on timescales of seconds to milliseconds in high-energy HXR emission profiles during solar flare eruptions. We present a preliminary statistical investigation of temporal and spectral properties of HXR spikes. Using a three-sigma spike selection rule, we detected 184 spikes in 94 out of 322 flares with significant counts at given photon energies, which were detected from demodulated HXR light curves obtained by the Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI). About one fifth of these spikes are also detected at photon energies higher than 100 keV. The statistical properties of the spikes are as follows. (1) HXR spikes are produced in both impulsive flares and long-duration flares with nearly the same occurrence rates. Ninety percent of the spikes occur during the rise phase of the flares, and about 70% occur around the peak times of the flares. (2) The time durations of the spikes vary from 0.2 to 2 s, with the mean being 1.0 s, which is not dependent on photon energies. The spikes exhibit symmetric time profiles with no significant difference between rise and decay times.(3) Among the most energetic spikes, nearly all of them have harder count spectra than their underlying slow-varying components. There is also a weak indication that spikes exhibiting time lags in high-energy emissions tend to have harder spectra than spikes with time lags in low-energy emissions.

  11. WordCluster: detecting clusters of DNA words and genomic elements

    PubMed Central

    2011-01-01

    Background Many k-mers (or DNA words) and genomic elements are known to be spatially clustered in the genome. Well established examples are the genes, TFBSs, CpG dinucleotides, microRNA genes and ultra-conserved non-coding regions. Currently, no algorithm exists to find these clusters in a statistically comprehensible way. The detection of clustering often relies on densities and sliding-window approaches or arbitrarily chosen distance thresholds. Results We introduce here an algorithm to detect clusters of DNA words (k-mers), or any other genomic element, based on the distance between consecutive copies and an assigned statistical significance. We implemented the method into a web server connected to a MySQL backend, which also determines the co-localization with gene annotations. We demonstrate the usefulness of this approach by detecting the clusters of CAG/CTG (cytosine contexts that can be methylated in undifferentiated cells), showing that the degree of methylation vary drastically between inside and outside of the clusters. As another example, we used WordCluster to search for statistically significant clusters of olfactory receptor (OR) genes in the human genome. Conclusions WordCluster seems to predict biological meaningful clusters of DNA words (k-mers) and genomic entities. The implementation of the method into a web server is available at http://bioinfo2.ugr.es/wordCluster/wordCluster.php including additional features like the detection of co-localization with gene regions or the annotation enrichment tool for functional analysis of overlapped genes. PMID:21261981

  12. Statistical process control based chart for information systems security

    NASA Astrophysics Data System (ADS)

    Khan, Mansoor S.; Cui, Lirong

    2015-07-01

    Intrusion detection systems have a highly significant role in securing computer networks and information systems. To assure the reliability and quality of computer networks and information systems, it is highly desirable to develop techniques that detect intrusions into information systems. We put forward the concept of statistical process control (SPC) in computer networks and information systems intrusions. In this article we propose exponentially weighted moving average (EWMA) type quality monitoring scheme. Our proposed scheme has only one parameter which differentiates it from the past versions. We construct the control limits for the proposed scheme and investigate their effectiveness. We provide an industrial example for the sake of clarity for practitioner. We give comparison of the proposed scheme with EWMA schemes and p chart; finally we provide some recommendations for the future work.

  13. A study of two unsupervised data driven statistical methodologies for detecting and classifying damages in structural health monitoring

    NASA Astrophysics Data System (ADS)

    Tibaduiza, D.-A.; Torres-Arredondo, M.-A.; Mujica, L. E.; Rodellar, J.; Fritzen, C.-P.

    2013-12-01

    This article is concerned with the practical use of Multiway Principal Component Analysis (MPCA), Discrete Wavelet Transform (DWT), Squared Prediction Error (SPE) measures and Self-Organizing Maps (SOM) to detect and classify damages in mechanical structures. The formalism is based on a distributed piezoelectric active sensor network for the excitation and detection of structural dynamic responses. Statistical models are built using PCA when the structure is known to be healthy either directly from the dynamic responses or from wavelet coefficients at different scales representing Time-frequency information. Different damages on the tested structures are simulated by adding masses at different positions. The data from the structure in different states (damaged or not) are then projected into the different principal component models by each actuator in order to obtain the input feature vectors for a SOM from the scores and the SPE measures. An aircraft fuselage from an Airbus A320 and a multi-layered carbon fiber reinforced plastic (CFRP) plate are used as examples to test the approaches. Results are presented, compared and discussed in order to determine their potential in structural health monitoring. These results showed that all the simulated damages were detectable and the selected features proved capable of separating all damage conditions from the undamaged state for both approaches.

  14. Limitations of Poisson statistics in describing radioactive decay.

    PubMed

    Sitek, Arkadiusz; Celler, Anna M

    2015-12-01

    The assumption that nuclear decays are governed by Poisson statistics is an approximation. This approximation becomes unjustified when data acquisition times longer than or even comparable with the half-lives of the radioisotope in the sample are considered. In this work, the limits of the Poisson-statistics approximation are investigated. The formalism for the statistics of radioactive decay based on binomial distribution is derived. The theoretical factor describing the deviation of variance of the number of decays predicated by the Poisson distribution from the true variance is defined and investigated for several commonly used radiotracers such as (18)F, (15)O, (82)Rb, (13)N, (99m)Tc, (123)I, and (201)Tl. The variance of the number of decays estimated using the Poisson distribution is significantly different than the true variance for a 5-minute observation time of (11)C, (15)O, (13)N, and (82)Rb. Durations of nuclear medicine studies often are relatively long; they may be even a few times longer than the half-lives of some short-lived radiotracers. Our study shows that in such situations the Poisson statistics is unsuitable and should not be applied to describe the statistics of the number of decays in radioactive samples. However, the above statement does not directly apply to counting statistics at the level of event detection. Low sensitivities of detectors which are used in imaging studies make the Poisson approximation near perfect. Copyright © 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.

  15. Observing fermionic statistics with photons in arbitrary processes

    PubMed Central

    Matthews, Jonathan C. F.; Poulios, Konstantinos; Meinecke, Jasmin D. A.; Politi, Alberto; Peruzzo, Alberto; Ismail, Nur; Wörhoff, Kerstin; Thompson, Mark G.; O'Brien, Jeremy L.

    2013-01-01

    Quantum mechanics defines two classes of particles-bosons and fermions-whose exchange statistics fundamentally dictate quantum dynamics. Here we develop a scheme that uses entanglement to directly observe the correlated detection statistics of any number of fermions in any physical process. This approach relies on sending each of the entangled particles through identical copies of the process and by controlling a single phase parameter in the entangled state, the correlated detection statistics can be continuously tuned between bosonic and fermionic statistics. We implement this scheme via two entangled photons shared across the polarisation modes of a single photonic chip to directly mimic the fermion, boson and intermediate behaviour of two-particles undergoing a continuous time quantum walk. The ability to simulate fermions with photons is likely to have applications for verifying boson scattering and for observing particle correlations in analogue simulation using any physical platform that can prepare the entangled state prescribed here. PMID:23531788

  16. The Significance of Microbe-Mineral-Biomarker Interactions in the Detection of Life on Mars and Beyond.

    PubMed

    Röling, Wilfred F M; Aerts, Joost W; Patty, C H Lucas; ten Kate, Inge Loes; Ehrenfreund, Pascale; Direito, Susana O L

    2015-06-01

    The detection of biomarkers plays a central role in our effort to establish whether there is, or was, life beyond Earth. In this review, we address the importance of considering mineralogy in relation to the selection of locations and biomarker detection methodologies with characteristics most promising for exploration. We review relevant mineral-biomarker and mineral-microbe interactions. The local mineralogy on a particular planet reflects its past and current environmental conditions and allows a habitability assessment by comparison with life under extreme conditions on Earth. The type of mineral significantly influences the potential abundances and types of biomarkers and microorganisms containing these biomarkers. The strong adsorptive power of some minerals aids in the preservation of biomarkers and may have been important in the origin of life. On the other hand, this strong adsorption as well as oxidizing properties of minerals can interfere with efficient extraction and detection of biomarkers. Differences in mechanisms of adsorption and in properties of minerals and biomarkers suggest that it will be difficult to design a single extraction procedure for a wide range of biomarkers. While on Mars samples can be used for direct detection of biomarkers such as nucleic acids, amino acids, and lipids, on other planetary bodies remote spectrometric detection of biosignatures has to be relied upon. The interpretation of spectral signatures of photosynthesis can also be affected by local mineralogy. We identify current gaps in our knowledge and indicate how they may be filled to improve the chances of detecting biomarkers on Mars and beyond.

  17. Planck 2015 results: XVI. Isotropy and statistics of the CMB

    DOE PAGES

    Ade, P. A. R.; Aghanim, N.; Akrami, Y.; ...

    2016-09-20

    In this paper, we test the statistical isotropy and Gaussianity of the cosmic microwave background (CMB) anisotropies using observations made by the Planck satellite. Our results are based mainly on the full Planck mission for temperature, but also include some polarization measurements. In particular, we consider the CMB anisotropy maps derived from the multi-frequency Planck data by several component-separation methods. For the temperature anisotropies, we find excellent agreement between results based on these sky maps over both a very large fraction of the sky and a broad range of angular scales, establishing that potential foreground residuals do not affect ourmore » studies. Tests of skewness, kurtosis, multi-normality, N-point functions, and Minkowski functionals indicate consistency with Gaussianity, while a power deficit at large angular scales is manifested in several ways, for example low map variance. The results of a peak statistics analysis are consistent with the expectations of a Gaussian random field. The “Cold Spot” is detected with several methods, including map kurtosis, peak statistics, and mean temperature profile. We thoroughly probe the large-scale dipolar power asymmetry, detecting it with several independent tests, and address the subject of a posteriori correction. Tests of directionality suggest the presence of angular clustering from large to small scales, but at a significance that is dependent on the details of the approach. We perform the first examination of polarization data, finding the morphology of stacked peaks to be consistent with the expectations of statistically isotropic simulations. Finally, where they overlap, these results are consistent with the Planck 2013 analysis based on the nominal mission data and provide our most thorough view of the statistics of the CMB fluctuations to date.« less

  18. Sequential detection of influenza epidemics by the Kolmogorov-Smirnov test

    PubMed Central

    2012-01-01

    Background Influenza is a well known and common human respiratory infection, causing significant morbidity and mortality every year. Despite Influenza variability, fast and reliable outbreak detection is required for health resource planning. Clinical health records, as published by the Diagnosticat database in Catalonia, host useful data for probabilistic detection of influenza outbreaks. Methods This paper proposes a statistical method to detect influenza epidemic activity. Non-epidemic incidence rates are modeled against the exponential distribution, and the maximum likelihood estimate for the decaying factor λ is calculated. The sequential detection algorithm updates the parameter as new data becomes available. Binary epidemic detection of weekly incidence rates is assessed by Kolmogorov-Smirnov test on the absolute difference between the empirical and the cumulative density function of the estimated exponential distribution with significance level 0 ≤ α ≤ 1. Results The main advantage with respect to other approaches is the adoption of a statistically meaningful test, which provides an indicator of epidemic activity with an associated probability. The detection algorithm was initiated with parameter λ0 = 3.8617 estimated from the training sequence (corresponding to non-epidemic incidence rates of the 2008-2009 influenza season) and sequentially updated. Kolmogorov-Smirnov test detected the following weeks as epidemic for each influenza season: 50−10 (2008-2009 season), 38−50 (2009-2010 season), weeks 50−9 (2010-2011 season) and weeks 3 to 12 for the current 2011-2012 season. Conclusions Real medical data was used to assess the validity of the approach, as well as to construct a realistic statistical model of weekly influenza incidence rates in non-epidemic periods. For the tested data, the results confirmed the ability of the algorithm to detect the start and the end of epidemic periods. In general, the proposed test could be applied to other data

  19. The Effect of Local Orientation Change on the Detection of Contours Defined by Constant Curvature: Psychophysics and Image Statistics.

    PubMed

    Khuu, Sieu K; Cham, Joey; Hayes, Anthony

    2016-01-01

    In the present study, we investigated the detection of contours defined by constant curvature and the statistics of curved contours in natural scenes. In Experiment 1, we examined the degree to which human sensitivity to contours is affected by changing the curvature angle and disrupting contour curvature continuity by varying the orientation of end elements. We find that (1) changing the angle of contour curvature decreased detection performance, while (2) end elements oriented in the direction (i.e., clockwise) of curvature facilitated contour detection regardless of the curvature angle of the contour. In Experiment 2 we further established that the relative effect of end-element orientation on contour detection was not only dependent on their orientation (collinear or cocircular), but also their spatial separation from the contour, and whether the contour shape was curved or not (i.e., C-shaped or S-shaped). Increasing the spatial separation of end-elements reduced contour detection performance regardless of their orientation or the contour shape. However, at small separations, cocircular end-elements facilitated the detection of C-shaped contours, but not S-shaped contours. The opposite result was observed for collinear end-elements, which improved the detection of S- shaped, but not C-shaped contours. These dissociative results confirmed that the visual system specifically codes contour curvature, but the association of contour elements occurs locally. Finally, we undertook an analysis of natural images that mapped contours with a constant angular change and determined the frequency of occurrence of end elements with different orientations. Analogous to our behavioral data, this image analysis revealed that the mapped end elements of constantly curved contours are likely to be oriented clockwise to the angle of curvature. Our findings indicate that the visual system is selectively sensitive to contours defined by constant curvature and that this might reflect

  20. Relationship between perceptual learning in speech and statistical learning in younger and older adults

    PubMed Central

    Neger, Thordis M.; Rietveld, Toni; Janse, Esther

    2014-01-01

    Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech. In the present study, 73 older adults (aged over 60 years) and 60 younger adults (aged between 18 and 30 years) performed a visual artificial grammar learning task and were presented with 60 meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory, and processing speed). Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly. PMID:25225475

  1. Relationship between perceptual learning in speech and statistical learning in younger and older adults.

    PubMed

    Neger, Thordis M; Rietveld, Toni; Janse, Esther

    2014-01-01

    Within a few sentences, listeners learn to understand severely degraded speech such as noise-vocoded speech. However, individuals vary in the amount of such perceptual learning and it is unclear what underlies these differences. The present study investigates whether perceptual learning in speech relates to statistical learning, as sensitivity to probabilistic information may aid identification of relevant cues in novel speech input. If statistical learning and perceptual learning (partly) draw on the same general mechanisms, then statistical learning in a non-auditory modality using non-linguistic sequences should predict adaptation to degraded speech. In the present study, 73 older adults (aged over 60 years) and 60 younger adults (aged between 18 and 30 years) performed a visual artificial grammar learning task and were presented with 60 meaningful noise-vocoded sentences in an auditory recall task. Within age groups, sentence recognition performance over exposure was analyzed as a function of statistical learning performance, and other variables that may predict learning (i.e., hearing, vocabulary, attention switching control, working memory, and processing speed). Younger and older adults showed similar amounts of perceptual learning, but only younger adults showed significant statistical learning. In older adults, improvement in understanding noise-vocoded speech was constrained by age. In younger adults, amount of adaptation was associated with lexical knowledge and with statistical learning ability. Thus, individual differences in general cognitive abilities explain listeners' variability in adapting to noise-vocoded speech. Results suggest that perceptual and statistical learning share mechanisms of implicit regularity detection, but that the ability to detect statistical regularities is impaired in older adults if visual sequences are presented quickly.

  2. Rainfall statistics, stationarity, and climate change.

    PubMed

    Sun, Fubao; Roderick, Michael L; Farquhar, Graham D

    2018-03-06

    There is a growing research interest in the detection of changes in hydrologic and climatic time series. Stationarity can be assessed using the autocorrelation function, but this is not yet common practice in hydrology and climate. Here, we use a global land-based gridded annual precipitation (hereafter P ) database (1940-2009) and find that the lag 1 autocorrelation coefficient is statistically significant at around 14% of the global land surface, implying nonstationary behavior (90% confidence). In contrast, around 76% of the global land surface shows little or no change, implying stationary behavior. We use these results to assess change in the observed P over the most recent decade of the database. We find that the changes for most (84%) grid boxes are within the plausible bounds of no significant change at the 90% CI. The results emphasize the importance of adequately accounting for natural variability when assessing change. Copyright © 2018 the Author(s). Published by PNAS.

  3. Radiologists' confidence in detecting abnormalities on chest images and their subjective judgments of image quality

    NASA Astrophysics Data System (ADS)

    King, Jill L.; Gur, David; Rockette, Howard E.; Curtin, Hugh D.; Obuchowski, Nancy A.; Thaete, F. Leland; Britton, Cynthia A.; Metz, Charles E.

    1991-07-01

    The relationship between subjective judgments of image quality for the performance of specific detection tasks and radiologists' confidence level in arriving at correct diagnoses was investigated in two studies in which 12 readers, using a total of three different display environments, interpreted a series of 300 PA chest images. The modalities used were conventional films, laser-printed films, and high-resolution CRT display of digitized images. For the detection of interstitial disease, nodules, and pneumothoraces, there was no statistically significant correlation (Spearman rho) between subjective ratings of quality and radiologists' confidence in detecting these abnormalities. However, in each study, for all modalities and all readers but one, a small but statistically significant correlation was found between the radiologists' ability to correctly and confidently rule out interstitial disease and their subjective ratings of image quality.

  4. Cloud Statistics and Discrimination in the Polar Regions

    NASA Astrophysics Data System (ADS)

    Chan, M.; Comiso, J. C.

    2012-12-01

    Despite their important role in the climate system, cloud cover and their statistics are poorly known, especially in the polar regions, where clouds are difficult to discriminate from snow covered surfaces. The advent of the A-train, which included Aqua/MODIS, CALIPSO/CALIOP and CloudSat/CPR sensors has provided an opportunity to improve our ability to accurately characterize the cloud cover. MODIS provides global coverage at a relatively good temporal and spatial resolution while CALIOP and CPR provide limited nadir sampling but accurate characterization of the vertical structure and phase of the cloud cover. Over the polar regions, cloud detection from a passive sensors like MODIS is challenging because of the presence of cold and highly reflective surfaces such as snow, sea-ice, glaciers, and ice-sheet, which have surface signatures similar to those of clouds. On the other hand, active sensors such as CALIOP and CPR are not only very sensitive to the presence of clouds but can also provide information about its microphysical characteristics. However, these nadir-looking sensors have sparse spatial coverage and their global data can have data spatial gaps of up to 100 km. We developed a polar cloud detection system for MODIS that is trained using collocated data from CALIOP and CPR. In particular, we employ a machine learning system that reads the radiative profile observed by MODIS and determine whether the field of view is cloudy or clear. Results have shown that the improved cloud detection scheme performs better than typical cloud mask algorithms using a validation data set not used for training. A one-year data set was generated and results indicate that daytime cloud detection accuracies improved from 80.1% to 92.6% (over sea-ice) and 71.2% to 87.4% (over ice-sheet) with CALIOP data used as the baseline. Significant improvements are also observed during nighttime, where cloud detection accuracies increase by 19.8% (over sea-ice) and 11.6% (over ice

  5. Statistical Limits to Super Resolution

    NASA Astrophysics Data System (ADS)

    Lucy, L. B.

    1992-08-01

    The limits imposed by photon statistics on the degree to which Rayleigh's resolution limit for diffraction-limited images can be surpassed by applying image restoration techniques are investigated. An approximate statistical theory is given for the number of detected photons required in the image of an unresolved pair of equal point sources in order that its information content allows in principle resolution by restoration. This theory is confirmed by numerical restoration experiments on synthetic images, and quantitative limits are presented for restoration of diffraction-limited images formed by slit and circular apertures.

  6. [Relationship between factor v Leiden mutation and Chinese Budd-Chiari syndrome and its clinical significance].

    PubMed

    Feng, B; Xu, K; Jiang, H

    2000-05-01

    To investigate the relationship between factor v Leiden (FvL) mutation and Chinese sporadic Budd-Chiari syndrome (BCS), familial BCS, and to explore the significance of FvL mutation in the etiology of BCS. Twenty-five patients with sporadic BCS, 6 patients with familial BCS (from A and B families), 39 both A and B family members, and 31 healthy persons were detected for FvL mutation with PCR. Meantime, two family members were explored for the related etiology of BCS. Factor V Leiden mutation was detected in 4 of 6 patients with familial BCS and in 2 family members. AIII(7,11,15) and BII(10), AII(2) and BIII(5) were found FvL mutation, and mutation was heterzygous. FvL mutation in the two degrees was compatible with Mendel hereditery law. The frequency of FvL mutation in 31 BCS and 31 healthy persons showed no statistical significance: but the frequency of FvL mutation between the familial BCS and healthy persons showed statistical significance. Ten persons in A family had varicose vein of the low extremeties, which was compatible with autosomal dominant inheritance. FvL mutation is related to Chinese familial BCS, but is not related to Chinese sporadic BCS. FvL mutation may be a underlying pathogenicity of familial BCS. Varicose vein of the low extremeties may be one of the pathogenicity of familial BCS.

  7. SU-D-BRD-07: Evaluation of the Effectiveness of Statistical Process Control Methods to Detect Systematic Errors For Routine Electron Energy Verification

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Parker, S

    2015-06-15

    Purpose: To evaluate the ability of statistical process control methods to detect systematic errors when using a two dimensional (2D) detector array for routine electron beam energy verification. Methods: Electron beam energy constancy was measured using an aluminum wedge and a 2D diode array on four linear accelerators. Process control limits were established. Measurements were recorded in control charts and compared with both calculated process control limits and TG-142 recommended specification limits. The data was tested for normality, process capability and process acceptability. Additional measurements were recorded while systematic errors were intentionally introduced. Systematic errors included shifts in the alignmentmore » of the wedge, incorrect orientation of the wedge, and incorrect array calibration. Results: Control limits calculated for each beam were smaller than the recommended specification limits. Process capability and process acceptability ratios were greater than one in all cases. All data was normally distributed. Shifts in the alignment of the wedge were most apparent for low energies. The smallest shift (0.5 mm) was detectable using process control limits in some cases, while the largest shift (2 mm) was detectable using specification limits in only one case. The wedge orientation tested did not affect the measurements as this did not affect the thickness of aluminum over the detectors of interest. Array calibration dependence varied with energy and selected array calibration. 6 MeV was the least sensitive to array calibration selection while 16 MeV was the most sensitive. Conclusion: Statistical process control methods demonstrated that the data distribution was normally distributed, the process was capable of meeting specifications, and that the process was centered within the specification limits. Though not all systematic errors were distinguishable from random errors, process control limits increased the ability to detect systematic

  8. The Significance of Microbe-Mineral-Biomarker Interactions in the Detection of Life on Mars and Beyond

    PubMed Central

    Aerts, Joost W.; Patty, C.H. Lucas; ten Kate, Inge Loes; Ehrenfreund, Pascale; Direito, Susana O.L.

    2015-01-01

    Abstract The detection of biomarkers plays a central role in our effort to establish whether there is, or was, life beyond Earth. In this review, we address the importance of considering mineralogy in relation to the selection of locations and biomarker detection methodologies with characteristics most promising for exploration. We review relevant mineral-biomarker and mineral-microbe interactions. The local mineralogy on a particular planet reflects its past and current environmental conditions and allows a habitability assessment by comparison with life under extreme conditions on Earth. The type of mineral significantly influences the potential abundances and types of biomarkers and microorganisms containing these biomarkers. The strong adsorptive power of some minerals aids in the preservation of biomarkers and may have been important in the origin of life. On the other hand, this strong adsorption as well as oxidizing properties of minerals can interfere with efficient extraction and detection of biomarkers. Differences in mechanisms of adsorption and in properties of minerals and biomarkers suggest that it will be difficult to design a single extraction procedure for a wide range of biomarkers. While on Mars samples can be used for direct detection of biomarkers such as nucleic acids, amino acids, and lipids, on other planetary bodies remote spectrometric detection of biosignatures has to be relied upon. The interpretation of spectral signatures of photosynthesis can also be affected by local mineralogy. We identify current gaps in our knowledge and indicate how they may be filled to improve the chances of detecting biomarkers on Mars and beyond. Key Words: DNA—Lipids—Photosynthesis—Extremophiles—Mineralogy—Subsurface. Astrobiology 15, 492–507. PMID:26060985

  9. Statistical evaluation of vibration analysis techniques

    NASA Technical Reports Server (NTRS)

    Milner, G. Martin; Miller, Patrice S.

    1987-01-01

    An evaluation methodology is presented for a selection of candidate vibration analysis techniques applicable to machinery representative of the environmental control and life support system of advanced spacecraft; illustrative results are given. Attention is given to the statistical analysis of small sample experiments, the quantification of detection performance for diverse techniques through the computation of probability of detection versus probability of false alarm, and the quantification of diagnostic performance.

  10. [Optimized application of nested PCR method for detection of malaria].

    PubMed

    Yao-Guang, Z; Li, J; Zhen-Yu, W; Li, C

    2017-04-28

    Objective To optimize the application of the nested PCR method for the detection of malaria according to the working practice, so as to improve the efficiency of malaria detection. Methods Premixing solution of PCR, internal primers for further amplification and new designed primers that aimed at two Plasmodium ovale subspecies were employed to optimize the reaction system, reaction condition and specific primers of P . ovale on basis of routine nested PCR. Then the specificity and the sensitivity of the optimized method were analyzed. The positive blood samples and examination samples of malaria were detected by the routine nested PCR and the optimized method simultaneously, and the detection results were compared and analyzed. Results The optimized method showed good specificity, and its sensitivity could reach the pg to fg level. The two methods were used to detect the same positive malarial blood samples simultaneously, the results indicated that the PCR products of the two methods had no significant difference, but the non-specific amplification reduced obviously and the detection rates of P . ovale subspecies improved, as well as the total specificity also increased through the use of the optimized method. The actual detection results of 111 cases of malarial blood samples showed that the sensitivity and specificity of the routine nested PCR were 94.57% and 86.96%, respectively, and those of the optimized method were both 93.48%, and there was no statistically significant difference between the two methods in the sensitivity ( P > 0.05), but there was a statistically significant difference between the two methods in the specificity ( P < 0.05). Conclusion The optimized PCR can improve the specificity without reducing the sensitivity on the basis of the routine nested PCR, it also can save the cost and increase the efficiency of malaria detection as less experiment links.

  11. Clinical significance of CD44 expression in children with hepatoblastoma.

    PubMed

    Cai, H-Y; Yu, B; Feng, Z-C; Qi, X; Wei, X-J

    2015-10-27

    The aim of this study was to investigate the expression of CD44 and its clinical significance in children suffering from hepatoblastoma (HB). CD44 expression was detected with immunohistochemistry staining in 30 samples from hepatoblastoma children and 10 normal liver tissue samples from normal children. The data obtained was statistically analyzed using the chi-square test, using the SPSS (v.11.0) software. The rate of CD44 expression was significantly higher (66.7%) in hepatoblastoma tissues than in normal liver tissues (χ(2) = 4.848, P < 0.05). The rate of CD44 expression was significantly higher in children with stage III or IV hepatoblastoma (83.3%) than that in children with stage I and II hepatoblastoma (χ(2) = 5.625, P < 0.05) (41.7%). Therefore, CD44 expression might play an important role in the pathogenesis, progression, and prognosis of HB in children.

  12. diffHic: a Bioconductor package to detect differential genomic interactions in Hi-C data.

    PubMed

    Lun, Aaron T L; Smyth, Gordon K

    2015-08-19

    Chromatin conformation capture with high-throughput sequencing (Hi-C) is a technique that measures the in vivo intensity of interactions between all pairs of loci in the genome. Most conventional analyses of Hi-C data focus on the detection of statistically significant interactions. However, an alternative strategy involves identifying significant changes in the interaction intensity (i.e., differential interactions) between two or more biological conditions. This is more statistically rigorous and may provide more biologically relevant results. Here, we present the diffHic software package for the detection of differential interactions from Hi-C data. diffHic provides methods for read pair alignment and processing, counting into bin pairs, filtering out low-abundance events and normalization of trended or CNV-driven biases. It uses the statistical framework of the edgeR package to model biological variability and to test for significant differences between conditions. Several options for the visualization of results are also included. The use of diffHic is demonstrated with real Hi-C data sets. Performance against existing methods is also evaluated with simulated data. On real data, diffHic is able to successfully detect interactions with significant differences in intensity between biological conditions. It also compares favourably to existing software tools on simulated data sets. These results suggest that diffHic is a viable approach for differential analyses of Hi-C data.

  13. Analysis of variance to assess statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes.

    PubMed

    Makeyev, Oleksandr; Joe, Cody; Lee, Colin; Besio, Walter G

    2017-07-01

    Concentric ring electrodes have shown promise in non-invasive electrophysiological measurement demonstrating their superiority to conventional disc electrodes, in particular, in accuracy of Laplacian estimation. Recently, we have proposed novel variable inter-ring distances concentric ring electrodes. Analytic and finite element method modeling results for linearly increasing distances electrode configurations suggested they may decrease the truncation error resulting in more accurate Laplacian estimates compared to currently used constant inter-ring distances configurations. This study assesses statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes. Full factorial design of analysis of variance was used with one categorical and two numerical factors: the inter-ring distances, the electrode diameter, and the number of concentric rings in the electrode. The response variables were the Relative Error and the Maximum Error of Laplacian estimation computed using a finite element method model for each of the combinations of levels of three factors. Effects of the main factors and their interactions on Relative Error and Maximum Error were assessed and the obtained results suggest that all three factors have statistically significant effects in the model confirming the potential of using inter-ring distances as a means of improving accuracy of Laplacian estimation.

  14. Detecting trends in raptor counts: power and type I error rates of various statistical tests

    USGS Publications Warehouse

    Hatfield, J.S.; Gould, W.R.; Hoover, B.A.; Fuller, M.R.; Lindquist, E.L.

    1996-01-01

    We conducted simulations that estimated power and type I error rates of statistical tests for detecting trends in raptor population count data collected from a single monitoring site. Results of the simulations were used to help analyze count data of bald eagles (Haliaeetus leucocephalus) from 7 national forests in Michigan, Minnesota, and Wisconsin during 1980-1989. Seven statistical tests were evaluated, including simple linear regression on the log scale and linear regression with a permutation test. Using 1,000 replications each, we simulated n = 10 and n = 50 years of count data and trends ranging from -5 to 5% change/year. We evaluated the tests at 3 critical levels (alpha = 0.01, 0.05, and 0.10) for both upper- and lower-tailed tests. Exponential count data were simulated by adding sampling error with a coefficient of variation of 40% from either a log-normal or autocorrelated log-normal distribution. Not surprisingly, tests performed with 50 years of data were much more powerful than tests with 10 years of data. Positive autocorrelation inflated alpha-levels upward from their nominal levels, making the tests less conservative and more likely to reject the null hypothesis of no trend. Of the tests studied, Cox and Stuart's test and Pollard's test clearly had lower power than the others. Surprisingly, the linear regression t-test, Collins' linear regression permutation test, and the nonparametric Lehmann's and Mann's tests all had similar power in our simulations. Analyses of the count data suggested that bald eagles had increasing trends on at least 2 of the 7 national forests during 1980-1989.

  15. Clinical significance of detecting circulating tumor cells in colorectal cancer using subtraction enrichment and immunostaining-fluorescence in situ hybridization (SE-iFISH)

    PubMed Central

    Shen, Zhen; Jing, Yan; Lu, Haibo; Li, Heng; Yang, Xiaoye; Cui, Xiangbin; Li, Yuqing; Lou, Zheng; Liu, Peng; Zhang, Cun; Zhang, Wei

    2017-01-01

    Circulating tumor cells (CTC) are useful in early detection of colorectal cancer. This study described a newly developed platform, integrated subtraction enrichment and immunostaining-fluorescence in situ hybridization (SE-iFISH), to assess CTCs in colorectal cancer. CTCs were detected by SE-iFISH in 40 of 44 preoperative colorectal cancer patients, and yielded a sensitivity of 90.9%, which was significantly higher than CellSearch system (90.9% vs. 43.2%, P=0.033). No significant association was found between tumor stage, survival and preoperative CTC number. CTCs were detected in 10 colorectal cancer patients one week after surgery; seven patients with decreased CTC numbers (compared with preoperative CTC number) were free of recurrence; whereas two of the three patients with increased CTC numbers had tumor recurrence. Moreover, CTCs were detected in 34 colorectal cancer patients three months after surgery; patients with CTC<2 at three months after surgery had significantly longer Progression Free Survival than those with CTC>=2 (P=0.019); patients with decreased CTC number (compared with preoperative CTC number) had significantly longer Progression Free Survival than those with increased CTC number (P=0.003). In conclusion, CTCs could be detected in various stages of colorectal cancer using SE-iFISH. Dynamic monitoring of CTC numbers could predict recurrence and prognosis. PMID:28423493

  16. Statistical and operational considerations for designs for x-ray tomographic spectrophotometry to detect, localize, and classify foreign objects in various systems

    NASA Astrophysics Data System (ADS)

    Fennelly, Alphonsus J.; Fry, Edward L.; Zukic, Muamer; Wilson, Michele M.; Janik, Tadeusz J.; Torr, Douglas G.

    1994-11-01

    In six companion papers we discuss a capability for x-ray tomographic spectrophotometry at three energy ranges to observe foreign objects in various systems using a novel x-ray optical and photometric approach. We describe new types of thin-film x-ray reflecting filters to provide energy-specific optical trains, inserted into existing x-ray interrogation systems. That is complemented by performing topographic imaging at a few, to several, energies in each case. That provides a full topographic and spectrophotometric analysis. Foreign objects can then be detected, localized, discriminated, and classified, so that they may be dealt with by excision, and replacement with benign system elements. We analyze statistical and operational concerns leading to the design of three systems: The first operates at x-ray energies of 1 - 10 keV; it deals with defects in microelectronic integrated circuits. The second operates at x-ray energies of 10 - 30 keV; it deals with the defects in human tissue. The chemical specificity and image resolution of the system will allow identification, localization, and mensuration of tumors without the need of biopsy. The system which we concentrate this discussion on, the third, operates at x- ray energies of 30 - 70 keV; it deals with the presence in transportation systems of explosive devices, and contraband materials and objects in luggage and cargo. We present the analysis of the statistical features of the detection problem in these types of systems, discussing the operational constraints which limits system performance. After considering the multivariate, multisignature, approach to the problem, we discuss the tomographic and spectrophotometric approach to the problem which yields a better solution to the detection problem within the operational constraints.

  17. Detecting Visually Observable Disease Symptoms from Faces.

    PubMed

    Wang, Kuan; Luo, Jiebo

    2016-12-01

    Recent years have witnessed an increasing interest in the application of machine learning to clinical informatics and healthcare systems. A significant amount of research has been done on healthcare systems based on supervised learning. In this study, we present a generalized solution to detect visually observable symptoms on faces using semi-supervised anomaly detection combined with machine vision algorithms. We rely on the disease-related statistical facts to detect abnormalities and classify them into multiple categories to narrow down the possible medical reasons of detecting. Our method is in contrast with most existing approaches, which are limited by the availability of labeled training data required for supervised learning, and therefore offers the major advantage of flagging any unusual and visually observable symptoms.

  18. A statistical assessment of pesticide pollution in surface waters using environmental monitoring data: Chlorpyrifos in Central Valley, California.

    PubMed

    Wang, Dan; Singhasemanon, Nan; Goh, Kean S

    2016-11-15

    Pesticides are routinely monitored in surface waters and resultant data are analyzed to assess whether their uses will damage aquatic eco-systems. However, the utility of the monitoring data is limited because of the insufficiency in the temporal and spatial sampling coverage and the inability to detect and quantify trace concentrations. This study developed a novel assessment procedure that addresses those limitations by combining 1) statistical methods capable of extracting information from concentrations below changing detection limits, 2) statistical resampling techniques that account for uncertainties rooted in the non-detects and insufficient/irregular sampling coverage, and 3) multiple lines of evidence that improve confidence in the final conclusion. This procedure was demonstrated by an assessment on chlorpyrifos monitoring data in surface waters of California's Central Valley (2005-2013). We detected a significant downward trend in the concentrations, which cannot be observed by commonly-used statistical approaches. We assessed that the aquatic risk was low using a probabilistic method that works with non-detects and has the ability to differentiate indicator groups with varying sensitivity. In addition, we showed that the frequency of exceedance over ambient aquatic life water quality criteria was affected by pesticide use, precipitation and irrigation demand in certain periods anteceding the water sampling events. Copyright © 2016 Elsevier B.V. All rights reserved.

  19. Measuring and statistically testing the size of the effect of a chemical compound on a continuous in-vitro pharmacological response through a new statistical model of response detection limit

    PubMed Central

    Diaz, Francisco J.; McDonald, Peter R.; Pinter, Abraham; Chaguturu, Rathnam

    2018-01-01

    Biomolecular screening research frequently searches for the chemical compounds that are most likely to make a biochemical or cell-based assay system produce a strong continuous response. Several doses are tested with each compound and it is assumed that, if there is a dose-response relationship, the relationship follows a monotonic curve, usually a version of the median-effect equation. However, the null hypothesis of no relationship cannot be statistically tested using this equation. We used a linearized version of this equation to define a measure of pharmacological effect size, and use this measure to rank the investigated compounds in order of their overall capability to produce strong responses. The null hypothesis that none of the examined doses of a particular compound produced a strong response can be tested with this approach. The proposed approach is based on a new statistical model of the important concept of response detection limit, a concept that is usually neglected in the analysis of dose-response data with continuous responses. The methodology is illustrated with data from a study searching for compounds that neutralize the infection by a human immunodeficiency virus of brain glioblastoma cells. PMID:24905187

  20. Exoplanets, Exo-Solar Life, and Human Significance

    NASA Technical Reports Server (NTRS)

    Wiseman, Jennifer

    2011-01-01

    With the recent detection of over 500 extrasolar planets, the existence of "other worlds", perhaps even other Earths, is no longer in the realm of science fiction. The study of exoplanets rapidly moved from an activity on the fringe of astronomy to one of the highest priorities of the world's astronomical programs. Actual images of extrasolar planets were revealed over the past two years for the first time. NASA's Hubble Space Telescope is already characterizing the atmospheres of Jupiter-like planets, in other systems. And the recent launch of the NASA Kepler space telescope is enabling the first statistical assessment of how common solar systems like our own really are. As we begin to characterize these "other worlds" and assess their habitability, the question of the significance and uniqueness of life on Earth will impact our society as never before. I will provide a comprehensive overview of the techniques and status of exoplanet detection, followed by reflections as to the societal impact of finding out that Earths are common, or rare. Will finding other potentially habitable planets create another "Copernican Revolution"? Will perceptions of the significance of life on Earth change when we find other Earth-like planets? I will discuss the plans of the scientific community for future telescopes that will be abe to survey our solar neighborhood for Earth-like planets, study their atmospheres, and search for biological signs of life.

  1. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    USGS Publications Warehouse

    Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

    2013-01-01

    he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  2. A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

    NASA Astrophysics Data System (ADS)

    Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

    2013-08-01

    The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.

  3. Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data.

    PubMed

    Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M

    2015-03-01

    Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.

  4. Clinical significance of the serum biomarker index detection in children with Henoch-Schonlein purpura.

    PubMed

    Purevdorj, Narangerel; Mu, Yun; Gu, Yajun; Zheng, Fang; Wang, Ran; Yu, Jinwei; Sun, Xuguo

    2018-02-01

    To explore a panel of serum biomarkers for laboratory diagnosis of pediatric Henoch-Schönlein purpura (HSP). The blood white blood cells (WBC) and serum levels of serum amyloid A (SAA), interleukin 6 (IL-6), immunoglobulin A (IgA), immunoglobulin G (IgG), immunoglobulin M (IgM), immunoglobulin E (IgE), C-reactive protein (CRP), complement component 3 (C3), complement component 4 (C4), and ASO (anti-streptolysin O) were detected in 127 patients with Henoch-Schonlein purpura (HSP), 110 cases of septicemia patients, and 121 healthy volunteers. The diagnostic ability of biomarkers selected from HSP and septicemia patients was analyzed by ROC curve. By designing the calculation model, the biomarker index was calculated for laboratory diagnosis of HSP and differential diagnosis between HSP and septicemia. The levels of serum WBC, CRP, IL-6 and SAA in the septicemia patients were significantly higher than those in the control group (p<0.05). Compared with the healthy individuals, serum levels of WBC, CRP, IL-6, SAA, IgA and IgM were significantly increased in patients with HSP (p<0.05). The area under the curve (AUC) of SAA, IgA, IgM, WBC, IL-6, and CRP in the patients with HSP was 0.964, 0.855, 0.849, 0.787, 0.765, and 0.622, respectively. The values of SAA, IgA, IgM, WBC, IL-6, and CRP in septicemia patients were 0.700, 0.428, 0.689, 0.682, 0.891, and 0.853, respectively. Biomarker index=SAA+IgA/4000+IgM/4000×0.4CRPmean valueCRPi . The biomarker index in HSP patients was significantly higher than that of the healthy controls. However, the biomarker index in septicemia patients was significantly lower than the control. The biomarker index of HSP patients is higher than that of the control group. While in the infectious disease represented by septicemia, it is decreased. The detection of biomarker index could exclude the interference of infection as the auxiliary examination to HSP patients. Copyright © 2017 The Canadian Society of Clinical Chemists. Published by

  5. Performance studies of GooFit on GPUs vs RooFit on CPUs while estimating the statistical significance of a new physical signal

    NASA Astrophysics Data System (ADS)

    Di Florio, Adriano

    2017-10-01

    In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.

  6. Toward "Constructing" the Concept of Statistical Power: An Optical Analogy.

    ERIC Educational Resources Information Center

    Rogers, Bruce G.

    This paper presents a visual analogy that may be used by instructors to teach the concept of statistical power in statistical courses. Statistical power is mathematically defined as the probability of rejecting a null hypothesis when that null is false, or, equivalently, the probability of detecting a relationship when it exists. The analogy…

  7. The use of higher-order statistics in rapid object categorization in natural scenes.

    PubMed

    Banno, Hayaki; Saiki, Jun

    2015-02-04

    We can rapidly and efficiently recognize many types of objects embedded in complex scenes. What information supports this object recognition is a fundamental question for understanding our visual processing. We investigated the eccentricity-dependent role of shape and statistical information for ultrarapid object categorization, using the higher-order statistics proposed by Portilla and Simoncelli (2000). Synthesized textures computed by their algorithms have the same higher-order statistics as the originals, while the global shapes were destroyed. We used the synthesized textures to manipulate the availability of shape information separately from the statistics. We hypothesized that shape makes a greater contribution to central vision than to peripheral vision and that statistics show the opposite pattern. Results did not show contributions clearly biased by eccentricity. Statistical information demonstrated a robust contribution not only in peripheral but also in central vision. For shape, the results supported the contribution in both central and peripheral vision. Further experiments revealed some interesting properties of the statistics. They are available for a limited time, attributable to the presence or absence of animals without shape, and predict how easily humans detect animals in original images. Our data suggest that when facing the time constraint of categorical processing, higher-order statistics underlie our significant performance for rapid categorization, irrespective of eccentricity. © 2015 ARVO.

  8. Solar flare hard X-ray spikes observed by RHESSI: a statistical study

    NASA Astrophysics Data System (ADS)

    Cheng, J. X.; Qiu, J.; Ding, M. D.; Wang, H.

    2012-11-01

    Context. Hard X-ray (HXR) spikes refer to fine time structures on timescales of seconds to milliseconds in high-energy HXR emission profiles during solar flare eruptions. Aims: We present a preliminary statistical investigation of temporal and spectral properties of HXR spikes. Methods: Using a three-sigma spike selection rule, we detected 184 spikes in 94 out of 322 flares with significant counts at given photon energies, which were detected from demodulated HXR light curves obtained by the Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI). About one fifth of these spikes are also detected at photon energies higher than 100 keV. Results: The statistical properties of the spikes are as follows. (1) HXR spikes are produced in both impulsive flares and long-duration flares with nearly the same occurrence rates. Ninety percent of the spikes occur during the rise phase of the flares, and about 70% occur around the peak times of the flares. (2) The time durations of the spikes vary from 0.2 to 2 s, with the mean being 1.0 s, which is not dependent on photon energies. The spikes exhibit symmetric time profiles with no significant difference between rise and decay times. (3) Among the most energetic spikes, nearly all of them have harder count spectra than their underlying slow-varying components. There is also a weak indication that spikes exhibiting time lags in high-energy emissions tend to have harder spectra than spikes with time lags in low-energy emissions.

  9. Relative risk estimates from spatial and space-time scan statistics: Are they biased?

    PubMed Central

    Prates, Marcos O.; Kulldorff, Martin; Assunção, Renato M.

    2014-01-01

    The purely spatial and space-time scan statistics have been successfully used by many scientists to detect and evaluate geographical disease clusters. Although the scan statistic has high power in correctly identifying a cluster, no study has considered the estimates of the cluster relative risk in the detected cluster. In this paper we evaluate whether there is any bias on these estimated relative risks. Intuitively, one may expect that the estimated relative risks has upward bias, since the scan statistic cherry picks high rate areas to include in the cluster. We show that this intuition is correct for clusters with low statistical power, but with medium to high power the bias becomes negligible. The same behaviour is not observed for the prospective space-time scan statistic, where there is an increasing conservative downward bias of the relative risk as the power to detect the cluster increases. PMID:24639031

  10. Statistical power analysis of cardiovascular safety pharmacology studies in conscious rats.

    PubMed

    Bhatt, Siddhartha; Li, Dingzhou; Flynn, Declan; Wisialowski, Todd; Hemkens, Michelle; Steidl-Nichols, Jill

    2016-01-01

    Cardiovascular (CV) toxicity and related attrition are a major challenge for novel therapeutic entities and identifying CV liability early is critical for effective derisking. CV safety pharmacology studies in rats are a valuable tool for early investigation of CV risk. Thorough understanding of data analysis techniques and statistical power of these studies is currently lacking and is imperative for enabling sound decision-making. Data from 24 crossover and 12 parallel design CV telemetry rat studies were used for statistical power calculations. Average values of telemetry parameters (heart rate, blood pressure, body temperature, and activity) were logged every 60s (from 1h predose to 24h post-dose) and reduced to 15min mean values. These data were subsequently binned into super intervals for statistical analysis. A repeated measure analysis of variance was used for statistical analysis of crossover studies and a repeated measure analysis of covariance was used for parallel studies. Statistical power analysis was performed to generate power curves and establish relationships between detectable CV (blood pressure and heart rate) changes and statistical power. Additionally, data from a crossover CV study with phentolamine at 4, 20 and 100mg/kg are reported as a representative example of data analysis methods. Phentolamine produced a CV profile characteristic of alpha adrenergic receptor antagonism, evidenced by a dose-dependent decrease in blood pressure and reflex tachycardia. Detectable blood pressure changes at 80% statistical power for crossover studies (n=8) were 4-5mmHg. For parallel studies (n=8), detectable changes at 80% power were 6-7mmHg. Detectable heart rate changes for both study designs were 20-22bpm. Based on our results, the conscious rat CV model is a sensitive tool to detect and mitigate CV risk in early safety studies. Furthermore, these results will enable informed selection of appropriate models and study design for early stage CV studies

  11. Fisher statistics for analysis of diffusion tensor directional information.

    PubMed

    Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

    2012-04-30

    A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.

  12. Significance of noisy signals in periodograms

    NASA Astrophysics Data System (ADS)

    Süveges, Maria

    2015-08-01

    The detection of tiny periodic signals in noisy and irregularly sampled time series is a challenging task. Once a small peak is found in the periodogram, the next step is to see how probable it is that pure noise produced a peak so extreme - that is to say, compute its False Alarm Probability (FAP). This useful measure quantifies the statistical plausibility of the found signal among the noise. However, its derivation from statistical principles is very hard due to the specificities of astronomical periodograms, such as oversampling and the ensuing strong correlation among its values at different frequencies. I will present a method to compute the FAP based on extreme-value statistics (Süveges 2014), and compare it to two other methods proposed by Baluev (2008) and Paltani (2004) and Schwarzenberg-Czerny (2012) on signals with various signal shapes and at different signal-to-noise ratios.

  13. Detecting changes in ultrasound backscattered statistics by using Nakagami parameters: Comparisons of moment-based and maximum likelihood estimators.

    PubMed

    Lin, Jen-Jen; Cheng, Jung-Yu; Huang, Li-Fei; Lin, Ying-Hsiu; Wan, Yung-Liang; Tsui, Po-Hsiang

    2017-05-01

    The Nakagami distribution is an approximation useful to the statistics of ultrasound backscattered signals for tissue characterization. Various estimators may affect the Nakagami parameter in the detection of changes in backscattered statistics. In particular, the moment-based estimator (MBE) and maximum likelihood estimator (MLE) are two primary methods used to estimate the Nakagami parameters of ultrasound signals. This study explored the effects of the MBE and different MLE approximations on Nakagami parameter estimations. Ultrasound backscattered signals of different scatterer number densities were generated using a simulation model, and phantom experiments and measurements of human liver tissues were also conducted to acquire real backscattered echoes. Envelope signals were employed to estimate the Nakagami parameters by using the MBE, first- and second-order approximations of MLE (MLE 1 and MLE 2 , respectively), and Greenwood approximation (MLE gw ) for comparisons. The simulation results demonstrated that, compared with the MBE and MLE 1 , the MLE 2 and MLE gw enabled more stable parameter estimations with small sample sizes. Notably, the required data length of the envelope signal was 3.6 times the pulse length. The phantom and tissue measurement results also showed that the Nakagami parameters estimated using the MLE 2 and MLE gw could simultaneously differentiate various scatterer concentrations with lower standard deviations and reliably reflect physical meanings associated with the backscattered statistics. Therefore, the MLE 2 and MLE gw are suggested as estimators for the development of Nakagami-based methodologies for ultrasound tissue characterization. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression

    PubMed Central

    Dipnall, Joanna F.

    2016-01-01

    Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and

  15. Fusing Data Mining, Machine Learning and Traditional Statistics to Detect Biomarkers Associated with Depression.

    PubMed

    Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny

    2016-01-01

    Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and

  16. Statistical characteristics of the sequential detection of signals in correlated noise

    NASA Astrophysics Data System (ADS)

    Averochkin, V. A.; Baranov, P. E.

    1985-10-01

    A solution is given to the problem of determining the distribution of the duration of the sequential two-threshold Wald rule for the time-discrete detection of determinate and Gaussian correlated signals on a background of Gaussian correlated noise. Expressions are obtained for the joint probability densities of the likelihood ratio logarithms, and an analysis is made of the effect of correlation and SNR on the duration distribution and the detection efficiency. Comparison is made with Neumann-Pearson detection.

  17. Statistical Significance and Baseline Monitoring.

    DTIC Science & Technology

    1984-07-01

    impacted at once........................... 24 6 Observed versus nominal a levels for multivariate tests of data sets (50 runs of 4 groups each...cumulative proportion of the observations found for each nominal level. The results of the comparisons of the observed versus nominal a levels for the...a values are always higher than nominal levels. Virtual- . .,ly all nominal a levels are below 0.20. In other words, the discriminant analysis models

  18. Using statistical anomaly detection models to find clinical decision support malfunctions.

    PubMed

    Ray, Soumi; McEvoy, Dustin S; Aaron, Skye; Hickman, Thu-Trang; Wright, Adam

    2018-05-11

    Malfunctions in Clinical Decision Support (CDS) systems occur due to a multitude of reasons, and often go unnoticed, leading to potentially poor outcomes. Our goal was to identify malfunctions within CDS systems. We evaluated 6 anomaly detection models: (1) Poisson Changepoint Model, (2) Autoregressive Integrated Moving Average (ARIMA) Model, (3) Hierarchical Divisive Changepoint (HDC) Model, (4) Bayesian Changepoint Model, (5) Seasonal Hybrid Extreme Studentized Deviate (SHESD) Model, and (6) E-Divisive with Median (EDM) Model and characterized their ability to find known anomalies. We analyzed 4 CDS alerts with known malfunctions from the Longitudinal Medical Record (LMR) and Epic® (Epic Systems Corporation, Madison, WI, USA) at Brigham and Women's Hospital, Boston, MA. The 4 rules recommend lead testing in children, aspirin therapy in patients with coronary artery disease, pneumococcal vaccination in immunocompromised adults and thyroid testing in patients taking amiodarone. Poisson changepoint, ARIMA, HDC, Bayesian changepoint and the SHESD model were able to detect anomalies in an alert for lead screening in children and in an alert for pneumococcal conjugate vaccine in immunocompromised adults. EDM was able to detect anomalies in an alert for monitoring thyroid function in patients on amiodarone. Malfunctions/anomalies occur frequently in CDS alert systems. It is important to be able to detect such anomalies promptly. Anomaly detection models are useful tools to aid such detections.

  19. Assessing significance in a Markov chain without mixing

    PubMed Central

    Chikina, Maria; Frieze, Alan; Pegden, Wesley

    2017-01-01

    We present a statistical test to detect that a presented state of a reversible Markov chain was not chosen from a stationary distribution. In particular, given a value function for the states of the Markov chain, we would like to show rigorously that the presented state is an outlier with respect to the values, by establishing a p value under the null hypothesis that it was chosen from a stationary distribution of the chain. A simple heuristic used in practice is to sample ranks of states from long random trajectories on the Markov chain and compare these with the rank of the presented state; if the presented state is a 0.1% outlier compared with the sampled ranks (its rank is in the bottom 0.1% of sampled ranks), then this observation should correspond to a p value of 0.001. This significance is not rigorous, however, without good bounds on the mixing time of the Markov chain. Our test is the following: Given the presented state in the Markov chain, take a random walk from the presented state for any number of steps. We prove that observing that the presented state is an ε-outlier on the walk is significant at p=2ε under the null hypothesis that the state was chosen from a stationary distribution. We assume nothing about the Markov chain beyond reversibility and show that significance at p≈ε is best possible in general. We illustrate the use of our test with a potential application to the rigorous detection of gerrymandering in Congressional districting. PMID:28246331

  20. Visualization and statistical comparisons of microbial communities using R packages on Phylochip data.

    PubMed

    Holmes, Susan; Alekseyenko, Alexander; Timme, Alden; Nelson, Tyrrell; Pasricha, Pankaj Jay; Spormann, Alfred

    2011-01-01

    This article explains the statistical and computational methodology used to analyze species abundances collected using the LNBL Phylochip in a study of Irritable Bowel Syndrome (IBS) in rats. Some tools already available for the analysis of ordinary microarray data are useful in this type of statistical analysis. For instance in correcting for multiple testing we use Family Wise Error rate control and step-down tests (available in the multtest package). Once the most significant species are chosen we use the hypergeometric tests familiar for testing GO categories to test specific phyla and families. We provide examples of normalization, multivariate projections, batch effect detection and integration of phylogenetic covariation, as well as tree equalization and robustification methods.

  1. A STATISTICAL SURVEY OF DIOXIN-LIKE COMPOUNDS IN ...

    EPA Pesticide Factsheets

    The USEPA and the USDA completed the first statistically designed survey of the occurrence and concentration of dibenzo-p-dioxins (CDDs), dibenzofurans (CDFs), and coplanar polychlorinated biphenyls (PCBs) in the fat of beef animals raised for human consumption in the United States. Back fat was sampled from 63 carcasses at federally inspected slaughter establishments nationwide. The sample design called for sampling beef animal classes in proportion to national annual slaughter statistics. All samples were analyzed using a modification of EPA method 1613, using isotope dilution, High Resolution GC/MS to determine the rate of occurrence of 2,3,7,8-substituted CDDs/CDFs/PCBs. The method detection limits ranged from 0.05 ng/kg for TCDD to 3 ng/kg for OCDD. The results of this survey showed a mean concentration (reported as I-TEQ, lipid adjusted) in U.S. beef animals of 0.35 ng/kg and 0.89 ng/kg for CDD/CDF TEQs when either non-detects are treated as 0 value or assigned a value of 1/2 the detection limit, respectively, and 0.51 ng/kg for coplanar PCB TEQs at both non-detect equal 0 and 1/2 detection limit. journal article

  2. Adjusted scaling of FDG positron emission tomography images for statistical evaluation in patients with suspected Alzheimer's disease.

    PubMed

    Buchert, Ralph; Wilke, Florian; Chakrabarti, Bhismadev; Martin, Brigitte; Brenner, Winfried; Mester, Janos; Clausen, Malte

    2005-10-01

    Statistical parametric mapping (SPM) gained increasing acceptance for the voxel-based statistical evaluation of brain positron emission tomography (PET) with the glucose analog 2-[18F]-fluoro-2-deoxy-d-glucose (FDG) in patients with suspected Alzheimer's disease (AD). To increase the sensitivity for detection of local changes, individual differences of total brain FDG uptake are usually compensated for by proportional scaling. However, in cases of extensive hypometabolic areas, proportional scaling overestimates scaled uptake. This may cause significant underestimation of the extent of hypometabolic areas by the statistical test. To detect this problem, the authors tested for hypermetabolism. In patients with no visual evidence of true focal hypermetabolism, significant clusters of hypermetabolism in the presence of extended hypometabolism were interpreted as false-positive findings, indicating relevant overestimation of scaled uptake. In this case, scaled uptake was reduced step by step until there were no more significant clusters of hypermetabolism. In 22 consecutive patients with suspected AD, proportional scaling resulted in relevant overestimation of scaled uptake in 9 patients. Scaled uptake had to be reduced by 11.1% +/- 5.3% in these cases to eliminate the artifacts. Adjusted scaling resulted in extension of existing and appearance of new clusters of hypometabolism. Total volume of the additional voxels with significant hypometabolism depended linearly on the extent of the additional scaling and was 202 +/- 118 mL on average. Adjusted scaling helps to identify characteristic metabolic patterns in patients with suspected AD. It is expected to increase specificity of FDGPET in this group of patients.

  3. Enhancing Dairy Manufacturing through customer feedback: A statistical approach

    NASA Astrophysics Data System (ADS)

    Vineesh, D.; Anbuudayasankar, S. P.; Narassima, M. S.

    2018-02-01

    Dairy products have become inevitable of habitual diet. This study aims to investigate the consumers’ satisfaction towards dairy products so as to provide useful information for the manufacturers which would serve as useful inputs for enriching the quality of products delivered. The study involved consumers of dairy products from various demographical backgrounds across South India. The questionnaire focussed on quality aspects of dairy products and also the service provided. A customer satisfaction model was developed based on various factors identified, with robust hypotheses that govern the use of the product. The developed model proved to be statistically significant as it passed the required statistical tests for reliability, construct validity and interdependency between the constructs. Some major concerns detected were regarding the fat content, taste and odour of packaged milk. A minor proportion of people (15.64%) were unsatisfied with the quality of service provided, which is another issue to be addressed to eliminate the sense of dissatisfaction in the minds of consumers.

  4. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  5. A Methodology for Determining Statistical Performance Compliance for Airborne Doppler Radar with Forward-Looking Turbulence Detection Capability

    NASA Technical Reports Server (NTRS)

    Bowles, Roland L.; Buck, Bill K.

    2009-01-01

    The objective of the research developed and presented in this document was to statistically assess turbulence hazard detection performance employing airborne pulse Doppler radar systems. The FAA certification methodology for forward looking airborne turbulence radars will require estimating the probabilities of missed and false hazard indications under operational conditions. Analytical approaches must be used due to the near impossibility of obtaining sufficient statistics experimentally. This report describes an end-to-end analytical technique for estimating these probabilities for Enhanced Turbulence (E-Turb) Radar systems under noise-limited conditions, for a variety of aircraft types, as defined in FAA TSO-C134. This technique provides for one means, but not the only means, by which an applicant can demonstrate compliance to the FAA directed ATDS Working Group performance requirements. Turbulence hazard algorithms were developed that derived predictive estimates of aircraft hazards from basic radar observables. These algorithms were designed to prevent false turbulence indications while accurately predicting areas of elevated turbulence risks to aircraft, passengers, and crew; and were successfully flight tested on a NASA B757-200 and a Delta Air Lines B737-800. Application of this defined methodology for calculating the probability of missed and false hazard indications taking into account the effect of the various algorithms used, is demonstrated for representative transport aircraft and radar performance characteristics.

  6. Statistical analysis and digital processing of the Mössbauer spectra

    NASA Astrophysics Data System (ADS)

    Prochazka, Roman; Tucek, Pavel; Tucek, Jiri; Marek, Jaroslav; Mashlan, Miroslav; Pechousek, Jiri

    2010-02-01

    This work is focused on using the statistical methods and development of the filtration procedures for signal processing in Mössbauer spectroscopy. Statistical tools for noise filtering in the measured spectra are used in many scientific areas. The use of a pure statistical approach in accumulated Mössbauer spectra filtration is described. In Mössbauer spectroscopy, the noise can be considered as a Poisson statistical process with a Gaussian distribution for high numbers of observations. This noise is a superposition of the non-resonant photons counting with electronic noise (from γ-ray detection and discrimination units), and the velocity system quality that can be characterized by the velocity nonlinearities. The possibility of a noise-reducing process using a new design of statistical filter procedure is described. This mathematical procedure improves the signal-to-noise ratio and thus makes it easier to determine the hyperfine parameters of the given Mössbauer spectra. The filter procedure is based on a periodogram method that makes it possible to assign the statistically important components in the spectral domain. The significance level for these components is then feedback-controlled using the correlation coefficient test results. The estimation of the theoretical correlation coefficient level which corresponds to the spectrum resolution is performed. Correlation coefficient test is based on comparison of the theoretical and the experimental correlation coefficients given by the Spearman method. The correctness of this solution was analyzed by a series of statistical tests and confirmed by many spectra measured with increasing statistical quality for a given sample (absorber). The effect of this filter procedure depends on the signal-to-noise ratio and the applicability of this method has binding conditions.

  7. Study Designs and Statistical Analyses for Biomarker Research

    PubMed Central

    Gosho, Masahiko; Nagashima, Kengo; Sato, Yasunori

    2012-01-01

    Biomarkers are becoming increasingly important for streamlining drug discovery and development. In addition, biomarkers are widely expected to be used as a tool for disease diagnosis, personalized medication, and surrogate endpoints in clinical research. In this paper, we highlight several important aspects related to study design and statistical analysis for clinical research incorporating biomarkers. We describe the typical and current study designs for exploring, detecting, and utilizing biomarkers. Furthermore, we introduce statistical issues such as confounding and multiplicity for statistical tests in biomarker research. PMID:23012528

  8. Meteor trail footprint statistics

    NASA Astrophysics Data System (ADS)

    Mui, S. Y.; Ellicott, R. C.

    Footprint statistics derived from field-test data are presented. The statistics are the probability that two receivers will lie in the same footprint. The dependence of the footprint statistics on the transmitter range, link orientation, and antenna polarization are examined. Empirical expressions for the footprint statistics are presented. The need to distinguish the instantaneous footprint, which is the area illuminated at a particular instant, from the composite footprint, which is the total area illuminated during the lifetime of the meteor trail, is explained. The statistics for the instantaneous and composite footprints have been found to be similar. The only significant difference lies in the parameter that represents the probability of two colocated receivers being in the same footprint. The composite footprint statistics can be used to calculate the space diversity gain of a multiple-receiver system. The instantaneous footprint statistics are useful in the evaluation of the interference probability in a network of meteor burst communication nodes.

  9. Applying Statistical Process Control to Clinical Data: An Illustration.

    ERIC Educational Resources Information Center

    Pfadt, Al; And Others

    1992-01-01

    Principles of statistical process control are applied to a clinical setting through the use of control charts to detect changes, as part of treatment planning and clinical decision-making processes. The logic of control chart analysis is derived from principles of statistical inference. Sample charts offer examples of evaluating baselines and…

  10. Interpreting the gamma statistic in phylogenetic diversification rate studies: a rate decrease does not necessarily indicate an early burst.

    PubMed

    Fordyce, James A

    2010-07-23

    Phylogenetic hypotheses are increasingly being used to elucidate historical patterns of diversification rate-variation. Hypothesis testing is often conducted by comparing the observed vector of branching times to a null, pure-birth expectation. A popular method for inferring a decrease in speciation rate, which might suggest an early burst of diversification followed by a decrease in diversification rate is the gamma statistic. Using simulations under varying conditions, I examine the sensitivity of gamma to the distribution of the most recent branching times. Using an exploratory data analysis tool for lineages through time plots, tree deviation, I identified trees with a significant gamma statistic that do not appear to have the characteristic early accumulation of lineages consistent with an early, rapid rate of cladogenesis. I further investigated the sensitivity of the gamma statistic to recent diversification by examining the consequences of failing to simulate the full time interval following the most recent cladogenic event. The power of gamma to detect rate decrease at varying times was assessed for simulated trees with an initial high rate of diversification followed by a relatively low rate. The gamma statistic is extraordinarily sensitive to recent diversification rates, and does not necessarily detect early bursts of diversification. This was true for trees of various sizes and completeness of taxon sampling. The gamma statistic had greater power to detect recent diversification rate decreases compared to early bursts of diversification. Caution should be exercised when interpreting the gamma statistic as an indication of early, rapid diversification.

  11. On the Statistical Properties of Cospectra

    NASA Astrophysics Data System (ADS)

    Huppenkothen, D.; Bachetti, M.

    2018-05-01

    In recent years, the cross-spectrum has received considerable attention as a means of characterizing the variability of astronomical sources as a function of wavelength. The cospectrum has only recently been understood as a means of mitigating instrumental effects dependent on temporal frequency in astronomical detectors, as well as a method of characterizing the coherent variability in two wavelength ranges on different timescales. In this paper, we lay out the statistical foundations of the cospectrum, starting with the simplest case of detecting a periodic signal in the presence of white noise, under the assumption that the same source is observed simultaneously in independent detectors in the same energy range. This case is especially relevant for detecting faint X-ray pulsars in detectors heavily affected by instrumental effects, including NuSTAR, Astrosat, and IXPE, which allow for even sampling and where the cospectrum can act as an effective way to mitigate dead time. We show that the statistical distributions of both single and averaged cospectra differ considerably from those for standard periodograms. While a single cospectrum follows a Laplace distribution exactly, averaged cospectra are approximated by a Gaussian distribution only for more than ∼30 averaged segments, dependent on the number of trials. We provide an instructive example of a quasi-periodic oscillation in NuSTAR and show that applying standard periodogram statistics leads to underestimated tail probabilities for period detection. We also demonstrate the application of these distributions to a NuSTAR observation of the X-ray pulsar Hercules X-1.

  12. Calculation of the detection limit in radiation measurements with systematic uncertainties

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, J. M.; Russ, W.; Venkataraman, R.; Young, B. M.

    2015-06-01

    The detection limit (LD) or Minimum Detectable Activity (MDA) is an a priori evaluation of assay sensitivity intended to quantify the suitability of an instrument or measurement arrangement for the needs of a given application. Traditional approaches as pioneered by Currie rely on Gaussian approximations to yield simple, closed-form solutions, and neglect the effects of systematic uncertainties in the instrument calibration. These approximations are applicable over a wide range of applications, but are of limited use in low-count applications, when high confidence values are required, or when systematic uncertainties are significant. One proposed modification to the Currie formulation attempts account for systematic uncertainties within a Gaussian framework. We have previously shown that this approach results in an approximation formula that works best only for small values of the relative systematic uncertainty, for which the modification of Currie's method is the least necessary, and that it significantly overestimates the detection limit or gives infinite or otherwise non-physical results for larger systematic uncertainties where such a correction would be the most useful. We have developed an alternative approach for calculating detection limits based on realistic statistical modeling of the counting distributions which accurately represents statistical and systematic uncertainties. Instead of a closed form solution, numerical and iterative methods are used to evaluate the result. Accurate detection limits can be obtained by this method for the general case.

  13. Data processing of qualitative results from an interlaboratory comparison for the detection of "Flavescence dorée" phytoplasma: How the use of statistics can improve the reliability of the method validation process in plant pathology.

    PubMed

    Chabirand, Aude; Loiseau, Marianne; Renaudin, Isabelle; Poliakoff, Françoise

    2017-01-01

    A working group established in the framework of the EUPHRESCO European collaborative project aimed to compare and validate diagnostic protocols for the detection of "Flavescence dorée" (FD) phytoplasma in grapevines. Seven molecular protocols were compared in an interlaboratory test performance study where each laboratory had to analyze the same panel of samples consisting of DNA extracts prepared by the organizing laboratory. The tested molecular methods consisted of universal and group-specific real-time and end-point nested PCR tests. Different statistical approaches were applied to this collaborative study. Firstly, there was the standard statistical approach consisting in analyzing samples which are known to be positive and samples which are known to be negative and reporting the proportion of false-positive and false-negative results to respectively calculate diagnostic specificity and sensitivity. This approach was supplemented by the calculation of repeatability and reproducibility for qualitative methods based on the notions of accordance and concordance. Other new approaches were also implemented, based, on the one hand, on the probability of detection model, and, on the other hand, on Bayes' theorem. These various statistical approaches are complementary and give consistent results. Their combination, and in particular, the introduction of new statistical approaches give overall information on the performance and limitations of the different methods, and are particularly useful for selecting the most appropriate detection scheme with regards to the prevalence of the pathogen. Three real-time PCR protocols (methods M4, M5 and M6 respectively developed by Hren (2007), Pelletier (2009) and under patent oligonucleotides) achieved the highest levels of performance for FD phytoplasma detection. This paper also addresses the issue of indeterminate results and the identification of outlier results. The statistical tools presented in this paper and their

  14. Statistics of Dark Matter Halos from Gravitational Lensing.

    PubMed

    Jain; Van Waerbeke L

    2000-02-10

    We present a new approach to measure the mass function of dark matter halos and to discriminate models with differing values of Omega through weak gravitational lensing. We measure the distribution of peaks from simulated lensing surveys and show that the lensing signal due to dark matter halos can be detected for a wide range of peak heights. Even when the signal-to-noise ratio is well below the limit for detection of individual halos, projected halo statistics can be constrained for halo masses spanning galactic to cluster halos. The use of peak statistics relies on an analytical model of the noise due to the intrinsic ellipticities of source galaxies. The noise model has been shown to accurately describe simulated data for a variety of input ellipticity distributions. We show that the measured peak distribution has distinct signatures of gravitational lensing, and its non-Gaussian shape can be used to distinguish models with different values of Omega. The use of peak statistics is complementary to the measurement of field statistics, such as the ellipticity correlation function, and is possibly not susceptible to the same systematic errors.

  15. The Importance of Teaching Power in Statistical Hypothesis Testing

    ERIC Educational Resources Information Center

    Olinsky, Alan; Schumacher, Phyllis; Quinn, John

    2012-01-01

    In this paper, we discuss the importance of teaching power considerations in statistical hypothesis testing. Statistical power analysis determines the ability of a study to detect a meaningful effect size, where the effect size is the difference between the hypothesized value of the population parameter under the null hypothesis and the true value…

  16. Comparison of four different methods for detection of biofilm formation by uropathogens.

    PubMed

    Panda, Pragyan Swagatika; Chaudhary, Uma; Dube, Surya K

    2016-01-01

    Urinary tract infection (UTI) is one of the most common infectious diseases encountered in clinical practice. Emerging resistance of the uropathogens to the antimicrobial agents due to biofilm formation is a matter of concern while treating symptomatic UTI. However, studies comparing different methods for detection of biofilm by uropathogens are scarce. To compare four different methods for detection of biofilm formation by uropathogens. Prospective observational study conducted in a tertiary care hospital. Totally 300 isolates from urinary samples were analyzed for biofilm formation by four methods, that is, tissue culture plate (TCP) method, tube method (TM), Congo Red Agar (CRA) method and modified CRA (MCRA) method. Chi-square test was applied when two or more set of variables were compared. P < 0.05 considered as statistically significant. Considering TCP to be a gold standard method for our study we calculated other statistical parameters. The rate of biofilm detection was 45.6%, 39.3% and 11% each by TCP, TM, CRA and MCRA methods, respectively. The difference between TCP and only CRA/MCRA was significant, but not that between TCP and TM. There was no difference in the rate of biofilm detection between CRA and MCRA in other isolates, but MCRA is superior to CRA for detection of the staphylococcal biofilm formation. TCP method is the ideal method for detection of bacterial biofilm formation by uropathogens. MCRA method is superior only to CRA for detection of staphylococcal biofilm formation.

  17. Hyperspectral target detection using heavy-tailed distributions

    NASA Astrophysics Data System (ADS)

    Willis, Chris J.

    2009-09-01

    One promising approach to target detection in hyperspectral imagery exploits a statistical mixture model to represent scene content at a pixel level. The process then goes on to look for pixels which are rare, when judged against the model, and marks them as anomalies. It is assumed that military targets will themselves be rare and therefore likely to be detected amongst these anomalies. For the typical assumption of multivariate Gaussianity for the mixture components, the presence of the anomalous pixels within the training data will have a deleterious effect on the quality of the model. In particular, the derivation process itself is adversely affected by the attempt to accommodate the anomalies within the mixture components. This will bias the statistics of at least some of the components away from their true values and towards the anomalies. In many cases this will result in a reduction in the detection performance and an increased false alarm rate. This paper considers the use of heavy-tailed statistical distributions within the mixture model. Such distributions are better able to account for anomalies in the training data within the tails of their distributions, and the balance of the pixels within their central masses. This means that an improved model of the majority of the pixels in the scene may be produced, ultimately leading to a better anomaly detection result. The anomaly detection techniques are examined using both synthetic data and hyperspectral imagery with injected anomalous pixels. A range of results is presented for the baseline Gaussian mixture model and for models accommodating heavy-tailed distributions, for different parameterizations of the algorithms. These include scene understanding results, anomalous pixel maps at given significance levels and Receiver Operating Characteristic curves.

  18. A statistical study of whistler waves observed by Van Allen Probes (RBSP) and lightning detected by WWLLN

    NASA Astrophysics Data System (ADS)

    Zheng, Hao; Holzworth, Robert H.; Brundell, James B.; Jacobson, Abram R.; Wygant, John R.; Hospodarsky, George B.; Mozer, Forrest S.; Bonnell, John

    2016-03-01

    Lightning-generated whistler waves are electromagnetic plasma waves in the very low frequency (VLF) band, which play an important role in the dynamics of radiation belt particles. In this paper, we statistically analyze simultaneous waveform data from the Van Allen Probes (Radiation Belt Storm Probes, RBSP) and global lightning data from the World Wide Lightning Location Network (WWLLN). Data were obtained between July to September 2013 and between March and April 2014. For each day during these periods, we predicted the most probable 10 min for which each of the two RBSP satellites would be magnetically conjugate to lightning producing regions. The prediction method uses integrated WWLLN stroke data for that day obtained during the three previous years. Using these predicted times for magnetic conjugacy to lightning activity regions, we recorded high time resolution, burst mode waveform data. Here we show that whistlers are observed by the satellites in more than 80% of downloaded waveform data. About 22.9% of the whistlers observed by RBSP are one-to-one coincident with source lightning strokes detected by WWLLN. About 40.1% more of whistlers are found to be one-to-one coincident with lightning if source regions are extended out 2000 km from the satellites footpoints. Lightning strokes with far-field radiated VLF energy larger than about 100 J are able to generate a detectable whistler wave in the inner magnetosphere. One-to-one coincidences between whistlers observed by RBSP and lightning strokes detected by WWLLN are clearly shown in the L shell range of L = 1-3. Nose whistlers observed in July 2014 show that it may be possible to extend this coincidence to the region of L≥4.

  19. The color of sea level: Importance of spatial variations in spectral shape for assessing the significance of trends

    NASA Astrophysics Data System (ADS)

    Hughes, Chris W.; Williams, Simon D. P.

    2010-10-01

    We investigate spatial variations in the shape of the spectrum of sea level variability based on a homogeneously sampled 12 year gridded altimeter data set. We present a method of plotting spectral information as color, focusing on periods between 2 and 24 weeks, which shows that significant spatial variations in the spectral shape exist and contain useful dynamical information. Using the Bayesian Information Criterion, we determine that, typically, a fifth-order autoregressive model is needed to capture the structure in the spectrum. Using this model, we show that statistical errors in fitted local trends range between less than 1 and more than 5 times of what would be calculated assuming "white" noise and that the time needed to detect a 1 mm/yr trend ranges between about 5 years and many decades. For global mean sea level, the statistical error reduces to 0.1 mm/yr over 12 years, with only 2 years needed to detect a 1 mm/yr trend. We find significant regional differences in trend from the global mean. The patterns of these regional differences are indicative of a sea level trend dominated by dynamical ocean processes over this period.

  20. Detection methods for non-Gaussian gravitational wave stochastic backgrounds

    NASA Astrophysics Data System (ADS)

    Drasco, Steve; Flanagan, Éanna É.

    2003-04-01

    A gravitational wave stochastic background can be produced by a collection of independent gravitational wave events. There are two classes of such backgrounds, one for which the ratio of the average time between events to the average duration of an event is small (i.e., many events are on at once), and one for which the ratio is large. In the first case the signal is continuous, sounds something like a constant hiss, and has a Gaussian probability distribution. In the second case, the discontinuous or intermittent signal sounds something like popcorn popping, and is described by a non-Gaussian probability distribution. In this paper we address the issue of finding an optimal detection method for such a non-Gaussian background. As a first step, we examine the idealized situation in which the event durations are short compared to the detector sampling time, so that the time structure of the events cannot be resolved, and we assume white, Gaussian noise in two collocated, aligned detectors. For this situation we derive an appropriate version of the maximum likelihood detection statistic. We compare the performance of this statistic to that of the standard cross-correlation statistic both analytically and with Monte Carlo simulations. In general the maximum likelihood statistic performs better than the cross-correlation statistic when the stochastic background is sufficiently non-Gaussian, resulting in a gain factor in the minimum gravitational-wave energy density necessary for detection. This gain factor ranges roughly between 1 and 3, depending on the duty cycle of the background, for realistic observing times and signal strengths for both ground and space based detectors. The computational cost of the statistic, although significantly greater than that of the cross-correlation statistic, is not unreasonable. Before the statistic can be used in practice with real detector data, further work is required to generalize our analysis to accommodate separated, misaligned