On detection and assessment of statistical significance of Genomic Islands
Chatterjee, Raghunath; Chaudhuri, Keya; Chaudhuri, Probal
2008-01-01
Background Many of the available methods for detecting Genomic Islands (GIs) in prokaryotic genomes use markers such as transposons, proximal tRNAs, flanking repeats etc., or they use other supervised techniques requiring training datasets. Most of these methods are primarily based on the biases in GC content or codon and amino acid usage of the islands. However, these methods either do not use any formal statistical test of significance or use statistical tests for which the critical values and the P-values are not adequately justified. We propose a method, which is unsupervised in nature and uses Monte-Carlo statistical tests based on randomly selected segments of a chromosome. Such tests are supported by precise statistical distribution theory, and consequently, the resulting P-values are quite reliable for making the decision. Results Our algorithm (named Design-Island, an acronym for Detection of Statistically Significant Genomic Island) runs in two phases. Some 'putative GIs' are identified in the first phase, and those are refined into smaller segments containing horizontally acquired genes in the refinement phase. This method is applied to Salmonella typhi CT18 genome leading to the discovery of several new pathogenicity, antibiotic resistance and metabolic islands that were missed by earlier methods. Many of these islands contain mobile genetic elements like phage-mediated genes, transposons, integrase and IS elements confirming their horizontal acquirement. Conclusion The proposed method is based on statistical tests supported by precise distribution theory and reliable P-values along with a technique for visualizing statistically significant islands. The performance of our method is better than many other well known methods in terms of their sensitivity and accuracy, and in terms of specificity, it is comparable to other methods. PMID:18380895
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks
2015-03-16
Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex networks. The specific...triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle configuration was used as the basis...for forming clusters . 3. Developed a C++ implementation of the proposed clustering framework. 15. SUBJECT TERMS EOARD, Operations Research, Networks
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
ERIC Educational Resources Information Center
Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza
2014-01-01
This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
Lack of Statistical Significance
ERIC Educational Resources Information Center
Kehle, Thomas J.; Bray, Melissa A.; Chafouleas, Sandra M.; Kawano, Takuji
2007-01-01
Criticism has been leveled against the use of statistical significance testing (SST) in many disciplines. However, the field of school psychology has been largely devoid of critiques of SST. Inspection of the primary journals in school psychology indicated numerous examples of SST with nonrandom samples and/or samples of convenience. In this…
Statistical Significance Testing.
ERIC Educational Resources Information Center
McLean, James E., Ed.; Kaufman, Alan S., Ed.
1998-01-01
The controversy about the use or misuse of statistical significance testing has become the major methodological issue in educational research. This special issue contains three articles that explore the controversy, three commentaries on these articles, an overall response, and three rejoinders by the first three authors. They are: (1)…
de Ridder, Jeroen; Uren, Anthony; Kool, Jaap; Reinders, Marcel; Wessels, Lodewyk
2006-01-01
Retroviral insertional mutagenesis screens, which identify genes involved in tumor development in mice, have yielded a substantial number of retroviral integration sites, and this number is expected to grow substantially due to the introduction of high-throughput screening techniques. The data of various retroviral insertional mutagenesis screens are compiled in the publicly available Retroviral Tagged Cancer Gene Database (RTCGD). Integrally analyzing these screens for the presence of common insertion sites (CISs, i.e., regions in the genome that have been hit by viral insertions in multiple independent tumors significantly more than expected by chance) requires an approach that corrects for the increased probability of finding false CISs as the amount of available data increases. Moreover, significance estimates of CISs should be established taking into account both the noise, arising from the random nature of the insertion process, as well as the bias, stemming from preferential insertion sites present in the genome and the data retrieval methodology. We introduce a framework, the kernel convolution (KC) framework, to find CISs in a noisy and biased environment using a predefined significance level while controlling the family-wise error (FWE) (the probability of detecting false CISs). Where previous methods use one, two, or three predetermined fixed scales, our method is capable of operating at any biologically relevant scale. This creates the possibility to analyze the CISs in a scale space by varying the width of the CISs, providing new insights in the behavior of CISs across multiple scales. Our method also features the possibility of including models for background bias. Using simulated data, we evaluate the KC framework using three kernel functions, the Gaussian, triangular, and rectangular kernel function. We applied the Gaussian KC to the data from the combined set of screens in the RTCGD and found that 53% of the CISs do not reach the significance
Statistically significant relational data mining :
Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.
2014-02-01
This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.
NASA Astrophysics Data System (ADS)
Baluev, Roman V.
2013-11-01
We consider the `multifrequency' periodogram, in which the putative signal is modelled as a sum of two or more sinusoidal harmonics with independent frequencies. It is useful in cases when the data may contain several periodic components, especially when their interaction with each other and with the data sampling patterns might produce misleading results. Although the multifrequency statistic itself was constructed earlier, for example by G. Foster in his CLEANest algorithm, its probabilistic properties (the detection significance levels) are still poorly known and much of what is deemed known is not rigorous. These detection levels are nonetheless important for data analysis. We argue that to prove the simultaneous existence of all n components revealed in a multiperiodic variation, it is mandatory to apply at least 2n - 1 significance tests, among which most involve various multifrequency statistics, and only n tests are single-frequency ones. The main result of this paper is an analytic estimation of the statistical significance of the frequency tuples that the multifrequency periodogram can reveal. Using the theory of extreme values of random fields (the generalized Rice method), we find a useful approximation to the relevant false alarm probability. For the double-frequency periodogram, this approximation is given by the elementary formula (π/16)W2e- zz2, where W denotes the normalized width of the settled frequency range, and z is the observed periodogram maximum. We carried out intensive Monte Carlo simulations to show that the practical quality of this approximation is satisfactory. A similar analytic expression for the general multifrequency periodogram is also given, although with less numerical verification.
Statistical significance of the gallium anomaly
Giunti, Carlo; Laveder, Marco
2011-06-15
We calculate the statistical significance of the anomalous deficit of electron neutrinos measured in the radioactive source experiments of the GALLEX and SAGE solar neutrino detectors, taking into account the uncertainty of the detection cross section. We found that the statistical significance of the anomaly is {approx}3.0{sigma}. A fit of the data in terms of neutrino oscillations favors at {approx}2.7{sigma} short-baseline electron neutrino disappearance with respect to the null hypothesis of no oscillations.
Statistical significance versus clinical relevance.
van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G
2017-04-01
In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
Common pitfalls in statistical analysis: Clinical versus statistical significance
Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc
2015-01-01
In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754
Significant results: statistical or clinical?
2016-01-01
The null hypothesis significance test method is popular in biological and medical research. Many researchers have used this method for their research without exact knowledge, though it has both merits and shortcomings. Readers will know its shortcomings, as well as several complementary or alternative methods, as such the estimated effect size and the confidence interval. PMID:27066201
Statistical Significance of Threading Scores
Fayyaz Movaghar, Afshin; Launay, Guillaume; Schbath, Sophie; Gibrat, Jean-François
2012-01-01
Abstract We present a general method for assessing threading score significance. The threading score of a protein sequence, thread onto a given structure, should be compared with the threading score distribution of a random amino-acid sequence, of the same length, thread on the same structure; small p-values point significantly high scores. We claim that, due to general protein contact map properties, this reference distribution is a Weibull extreme value distribution whose parameters depend on the threading method, the structure, the length of the query and the random sequence simulation model used. These parameters can be estimated off-line with simulated sequence samples, for different sequence lengths. They can further be interpolated at the exact length of a query, enabling the quick computation of the p-value. PMID:22149633
de Ronde, Jorma J; Klijn, Christiaan; Velds, Arno; Holstege, Henne; Reinders, Marcel Jt; Jonkers, Jos; Wessels, Lodewyk Fa
2010-11-11
Most approaches used to find recurrent or differential DNA Copy Number Alterations (CNA) in array Comparative Genomic Hybridization (aCGH) data from groups of tumour samples depend on the discretization of the aCGH data to gain, loss or no-change states. This causes loss of valuable biological information in tumour samples, which are frequently heterogeneous. We have previously developed an algorithm, KC-SMART, that bases its estimate of the magnitude of the CNA at a given genomic location on kernel convolution (Klijn et al., 2008). This accounts for the intensity of the probe signal, its local genomic environment and the signal distribution across multiple samples. Here we extend the approach to allow comparative analyses of two groups of samples and introduce the R implementation of these two approaches. The comparative module allows for a supervised analysis to be performed, to enable the identification of regions that are differentially aberrated between two user-defined classes.We analyzed data from a series of B- and T-cell lymphomas and were able to retrieve all positive control regions (VDJ regions) in addition to a number of new regions. A t-test employing segmented data, that we implemented, was also able to locate all the positive control regions and a number of new regions but these regions were highly fragmented. KC-SMARTR offers recurrent CNA and class specific CNA detection, at different genomic scales, in a single package without the need for additional segmentation. It is memory efficient and runs on a wide range of machines. Most importantly, it does not rely on data discretization and therefore maximally exploits the biological information in the aCGH data.The program is freely available from the Bioconductor website http://www.bioconductor.org/ under the terms of the GNU General Public License.
Finding Statistically Significant Communities in Networks
Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo
2011-01-01
Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480
Social significance of community structure: Statistical view
NASA Astrophysics Data System (ADS)
Li, Hui-Jia; Daniels, Jasmine J.
2015-01-01
Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Social significance of community structure: statistical view.
Li, Hui-Jia; Daniels, Jasmine J
2015-01-01
Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p-value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Harris, Joshua D; Brand, Jefferson C; Cote, Mark P; Faucett, Scott C; Dhawan, Aman
2017-06-01
Patient-reported outcomes (PROs) are increasingly being used in today's rapidly evolving health care environment. The value of care provision emphasizes the highest quality of care at the lowest cost. Quality is in the eye of the beholder, with different stakeholders prioritizing different components of the value equation. At the center of the discussion are the patients and their quantification of outcome via PROs. There are hundreds of different PRO questionnaires that may ascertain an individual's overall general health, quality of life, activity level, or determine a body part-, joint-, or disease-specific outcome. As providers and patients increasingly measure outcomes, there exists greater potential to identify significant differences across time points due to an intervention. In other words, if you compare groups enough, you are bound to eventually detect a significant difference. However, the characterization of significance is not purely dichotomous, as a statistically significant outcome may not be clinically relevant. Statistical significance is the direct result of a mathematical equation, irrelevant to the patient experience. In clinical research, despite detecting statistically significant pre- and post-treatment differences, patients may or may not be able to perceive those differences. Thresholds exist to delineate whether those differences are clinically important or relevant to patients. PROs are unique, with distinct parameters of clinical importance for each outcome score. This review highlights the most common PROs in clinical research and discusses the salient pearls and pitfalls. In particular, it stresses the difference between statistical and clinical relevance and the concepts of minimal clinically important difference and patient acceptable symptom state. Researchers and clinicians should consider clinical importance in addition to statistical significance when interpreting and reporting investigation results. Copyright © 2017
The insignificance of statistical significance testing
Johnson, Douglas H.
1999-01-01
Despite their use in scientific journals such as The Journal of Wildlife Management, statistical hypothesis tests add very little value to the products of research. Indeed, they frequently confuse the interpretation of data. This paper describes how statistical hypothesis tests are often viewed, and then contrasts that interpretation with the correct one. I discuss the arbitrariness of P-values, conclusions that the null hypothesis is true, power analysis, and distinctions between statistical and biological significance. Statistical hypothesis testing, in which the null hypothesis about the properties of a population is almost always known a priori to be false, is contrasted with scientific hypothesis testing, which examines a credible null hypothesis about phenomena in nature. More meaningful alternatives are briefly outlined, including estimation and confidence intervals for determining the importance of factors, decision theory for guiding actions in the face of uncertainty, and Bayesian approaches to hypothesis testing and other statistical practices.
Statistical Significance vs. Practical Significance: An Exploration through Health Education
ERIC Educational Resources Information Center
Rosen, Brittany L.; DeMaria, Andrea L.
2012-01-01
The purpose of this paper is to examine the differences between statistical and practical significance, including strengths and criticisms of both methods, as well as provide information surrounding the application of various effect sizes and confidence intervals within health education research. Provided are recommendations, explanations and…
Understanding Statistical Significance: A Conceptual History.
ERIC Educational Resources Information Center
Little, Joseph
2001-01-01
Considers how if literacy is envisioned as a sort of competence in a set of social and intellectual practices, then scientific literacy must encompass the realization that "statistical significance," the cardinal arbiter of social scientific knowledge, was not born out of an immanent logic of mathematics but socially constructed and…
Tests of Statistical Significance Made Sound
ERIC Educational Resources Information Center
Haig, Brian D.
2017-01-01
This article considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology's understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis…
Determining the Statistical Significance of Relative Weights
ERIC Educational Resources Information Center
Tonidandel, Scott; LeBreton, James M.; Johnson, Jeff W.
2009-01-01
Relative weight analysis is a procedure for estimating the relative importance of correlated predictors in a regression equation. Because the sampling distribution of relative weights is unknown, researchers using relative weight analysis are unable to make judgments regarding the statistical significance of the relative weights. J. W. Johnson…
Statistical significance testing and clinical trials.
Krause, Merton S
2011-09-01
The efficacy of treatments is better expressed for clinical purposes in terms of these treatments' outcome distributions and their overlapping rather than in terms of the statistical significance of these distributions' mean differences, because clinical practice is primarily concerned with the outcome of each individual client rather than with the mean of the variety of outcomes in any group of clients. Reports of the obtained outcome distributions for the comparison groups of all competently designed and executed randomized clinical trials should be publicly available no matter what the statistical significance of the mean differences among these groups, because all of these studies' outcome distributions provide clinically useful information about the efficacy of the treatments compared.
Systematic identification of statistically significant network measures
NASA Astrophysics Data System (ADS)
Ziv, Etay; Koytcheff, Robin; Middendorf, Manuel; Wiggins, Chris
2005-01-01
We present a graph embedding space (i.e., a set of measures on graphs) for performing statistical analyses of networks. Key improvements over existing approaches include discovery of “motif hubs” (multiple overlapping significant subgraphs), computational efficiency relative to subgraph census, and flexibility (the method is easily generalizable to weighted and signed graphs). The embedding space is based on scalars, functionals of the adjacency matrix representing the network. Scalars are global, involving all nodes; although they can be related to subgraph enumeration, there is not a one-to-one mapping between scalars and subgraphs. Improvements in network randomization and significance testing—we learn the distribution rather than assuming Gaussianity—are also presented. The resulting algorithm establishes a systematic approach to the identification of the most significant scalars and suggests machine-learning techniques for network classification.
Detection of significant protein coevolution.
Ochoa, David; Juan, David; Valencia, Alfonso; Pazos, Florencio
2015-07-01
The evolution of proteins cannot be fully understood without taking into account the coevolutionary linkages entangling them. From a practical point of view, coevolution between protein families has been used as a way of detecting protein interactions and functional relationships from genomic information. The most common approach to inferring protein coevolution involves the quantification of phylogenetic tree similarity using a family of methodologies termed mirrortree. In spite of their success, a fundamental problem of these approaches is the lack of an adequate statistical framework to assess the significance of a given coevolutionary score (tree similarity). As a consequence, a number of ad hoc filters and arbitrary thresholds are required in an attempt to obtain a final set of confident coevolutionary signals. In this work, we developed a method for associating confidence estimators (P values) to the tree-similarity scores, using a null model specifically designed for the tree comparison problem. We show how this approach largely improves the quality and coverage (number of pairs that can be evaluated) of the detected coevolution in all the stages of the mirrortree workflow, independently of the starting genomic information. This not only leads to a better understanding of protein coevolution and its biological implications, but also to obtain a highly reliable and comprehensive network of predicted interactions, as well as information on the substructure of macromolecular complexes using only genomic information. The software and datasets used in this work are freely available at: http://csbg.cnb.csic.es/pMT/. pazos@cnb.csic.es Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Assessing statistical significance in causal graphs
2012-01-01
Background Causal graphs are an increasingly popular tool for the analysis of biological datasets. In particular, signed causal graphs--directed graphs whose edges additionally have a sign denoting upregulation or downregulation--can be used to model regulatory networks within a cell. Such models allow prediction of downstream effects of regulation of biological entities; conversely, they also enable inference of causative agents behind observed expression changes. However, due to their complex nature, signed causal graph models present special challenges with respect to assessing statistical significance. In this paper we frame and solve two fundamental computational problems that arise in practice when computing appropriate null distributions for hypothesis testing. Results First, we show how to compute a p-value for agreement between observed and model-predicted classifications of gene transcripts as upregulated, downregulated, or neither. Specifically, how likely are the classifications to agree to the same extent under the null distribution of the observed classification being randomized? This problem, which we call "Ternary Dot Product Distribution" owing to its mathematical form, can be viewed as a generalization of Fisher's exact test to ternary variables. We present two computationally efficient algorithms for computing the Ternary Dot Product Distribution and investigate its combinatorial structure analytically and numerically to establish computational complexity bounds. Second, we develop an algorithm for efficiently performing random sampling of causal graphs. This enables p-value computation under a different, equally important null distribution obtained by randomizing the graph topology but keeping fixed its basic structure: connectedness and the positive and negative in- and out-degrees of each vertex. We provide an algorithm for sampling a graph from this distribution uniformly at random. We also highlight theoretical challenges unique to signed
Statistical Significance of Clustering using Soft Thresholding
Huang, Hanwen; Liu, Yufeng; Yuan, Ming; Marron, J. S.
2015-01-01
Clustering methods have led to a number of important discoveries in bioinformatics and beyond. A major challenge in their use is determining which clusters represent important underlying structure, as opposed to spurious sampling artifacts. This challenge is especially serious, and very few methods are available, when the data are very high in dimension. Statistical Significance of Clustering (SigClust) is a recently developed cluster evaluation tool for high dimensional low sample size data. An important component of the SigClust approach is the very definition of a single cluster as a subset of data sampled from a multivariate Gaussian distribution. The implementation of SigClust requires the estimation of the eigenvalues of the covariance matrix for the null multivariate Gaussian distribution. We show that the original eigenvalue estimation can lead to a test that suffers from severe inflation of type-I error, in the important case where there are a few very large eigenvalues. This paper addresses this critical challenge using a novel likelihood based soft thresholding approach to estimate these eigenvalues, which leads to a much improved SigClust. Major improvements in SigClust performance are shown by both mathematical analysis, based on the new notion of Theoretical Cluster Index, and extensive simulation studies. Applications to some cancer genomic data further demonstrate the usefulness of these improvements. PMID:26755893
Significant Statistics: Viewed with a Contextual Lens
ERIC Educational Resources Information Center
Tait-McCutcheon, Sandi
2010-01-01
This paper examines the pedagogical and organisational changes three lead teachers made to their statistics teaching and learning programs. The lead teachers posed the research question: What would the effect of contextually integrating statistical investigations and literacies into other curriculum areas be on student achievement? By finding the…
[Significance of medical statistics in insurance medicine].
Becher, J
2001-03-01
Knowledge of medical statistics is of great benefit to every insurance medical officer as they facilitate communication with actuaries, allow officers to make their own calculations and are the basis for correctly interpreting medical journals. Only about 20% of original work in medicine today is published without statistics or only with descriptive statistics--and this trend is falling. The reader of medical publications should be in a position to make a critical analysis of the methodology and content, since one cannot always rely on the conclusions drawn by the authors: statistical errors appear very frequently in medical publications. Due to the specific methodological features involved, the assessment of meta-analyses demands special attention. The number of published meta-analyses has risen 40-fold over the last ten years. Important examples for the practical use of statistical methods in insurance medicine include estimating extramortality from published survival analyses and evaluating diagnostic test results. The purpose of this article is to highlight statistical problems and issues of relevance to insurance medicine and to establish the bases for understanding them.
Geometry of statistical target detection
NASA Astrophysics Data System (ADS)
Basener, William F.; Allen, Brian; Bretney, Kristen
2017-01-01
This paper presents an investigation into the underlying geometry and performance of various statistical target detection algorithms for hyperspectral imagery, presents results from algorithm testing, and investigates general trends and observable principles for understanding performance. Over the variety of detection algorithms, there is no universally best performing algorithm. In our test, often top performing algorithms on one class of targets obtain mediocre results on another class of targets. However, there are two clear trends: quadratic detectors such as ACE generally performed better than linear ones especially for subpixel targets (our top 15 scoring algorithms were quadratic detectors), and using anomaly detection to prescreen image spectra improved the performance of the quadratic detectors (8 of our top 9 scoring algorithms using anomaly prescreening). We also demonstrate that simple combinations of detection algorithms can outperform single algorithms in practice. In our derivation of detection algorithms, we provide exposition on the underlying mathematical geometry of the algorithms. That geometry is then used to investigate differences in algorithm performance. Tests are conducted using imagery and targets freely available online. The imagery was acquired over Cooke City, Montana, a small town near Yellowstone National Park, using the HyMap V/NIR/SWIR sensor with 126 spectral bands. There are three vehicle and four fabric targets located in the town and surrounding area.
Statistical methodology for pathogen detection.
Ogliari, Paulo José; de Andrade, Dalton Francisco; Pacheco, Juliano Anderson; Franchin, Paulo Rogério; Batista, Cleide Rosana Vieira
2007-08-01
The main goal of the present study was to discuss the application of the McNemar test to the comparison of proportions in dependent samples. Data were analyzed from studies conducted to verify the suitability of replacing a conventional method with a new one for identifying the presence of Salmonella. It is shown that, in most situations, the McNemar test does not provide all the elements required by the microbiologist to make a final decision and that appropriate functions of the proportions need to be considered. Sample sizes suitable to guarantee a test with a high power in the detection of significant differences regarding the problem studied are obtained by simulation. Examples of functions that are of great value to the microbiologist are presented.
Statistics by Example, Detecting Patterns.
ERIC Educational Resources Information Center
Mosteller, Frederick; And Others
This booklet is part of a series of four pamphlets, each intended to stand alone, which provide problems in probability and statistics at the secondary school level. Twelve different real-life examples (written by professional statisticians and experienced teachers) have been collected in this booklet to illustrate the ideas of mean, variation,…
The Use of Meta-Analytic Statistical Significance Testing
ERIC Educational Resources Information Center
Polanin, Joshua R.; Pigott, Terri D.
2015-01-01
Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
The Use of Meta-Analytic Statistical Significance Testing
ERIC Educational Resources Information Center
Polanin, Joshua R.; Pigott, Terri D.
2015-01-01
Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
Advances in Testing the Statistical Significance of Mediation Effects
ERIC Educational Resources Information Center
Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.
2006-01-01
P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
ERIC Educational Resources Information Center
Gwet, Kilem L.
2016-01-01
This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
Testing the Difference of Correlated Agreement Coefficients for Statistical Significance
ERIC Educational Resources Information Center
Gwet, Kilem L.
2016-01-01
This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
Reviewer Bias for Statistically Significant Results: A Reexamination.
ERIC Educational Resources Information Center
Fagley, N. S.; McKinney, I. Jean
1983-01-01
Reexamines the article by Atkinson, Furlong, and Wampold (1982) and questions their conclusion that reviewers were biased toward statistically significant results. A statistical power analysis shows the power of their bogus study was low. Low power in a study reporting nonsignificant findings is a valid reason for recommending not to publish.…
Statistical Significance Testing Should Be Discontinued in Mathematics Education Research.
ERIC Educational Resources Information Center
Menon, Rama
1993-01-01
Discusses five common myths about statistical significance testing (SST), the possible erroneous and harmful contributions of SST to educational research, and suggested alternatives to SST for mathematics education research. (Contains 61 references.) (MKR)
A Tutorial on Hunting Statistical Significance by Chasing N.
Szucs, Denes
2016-01-01
There is increasing concern about the replicability of studies in psychology and cognitive neuroscience. Hidden data dredging (also called p-hacking) is a major contributor to this crisis because it substantially increases Type I error resulting in a much larger proportion of false positive findings than the usually expected 5%. In order to build better intuition to avoid, detect and criticize some typical problems, here I systematically illustrate the large impact of some easy to implement and so, perhaps frequent data dredging techniques on boosting false positive findings. I illustrate several forms of two special cases of data dredging. First, researchers may violate the data collection stopping rules of null hypothesis significance testing by repeatedly checking for statistical significance with various numbers of participants. Second, researchers may group participants post hoc along potential but unplanned independent grouping variables. The first approach 'hacks' the number of participants in studies, the second approach 'hacks' the number of variables in the analysis. I demonstrate the high amount of false positive findings generated by these techniques with data from true null distributions. I also illustrate that it is extremely easy to introduce strong bias into data by very mild selection and re-testing. Similar, usually undocumented data dredging steps can easily lead to having 20-50%, or more false positives.
A Tutorial on Hunting Statistical Significance by Chasing N
Szucs, Denes
2016-01-01
There is increasing concern about the replicability of studies in psychology and cognitive neuroscience. Hidden data dredging (also called p-hacking) is a major contributor to this crisis because it substantially increases Type I error resulting in a much larger proportion of false positive findings than the usually expected 5%. In order to build better intuition to avoid, detect and criticize some typical problems, here I systematically illustrate the large impact of some easy to implement and so, perhaps frequent data dredging techniques on boosting false positive findings. I illustrate several forms of two special cases of data dredging. First, researchers may violate the data collection stopping rules of null hypothesis significance testing by repeatedly checking for statistical significance with various numbers of participants. Second, researchers may group participants post hoc along potential but unplanned independent grouping variables. The first approach ‘hacks’ the number of participants in studies, the second approach ‘hacks’ the number of variables in the analysis. I demonstrate the high amount of false positive findings generated by these techniques with data from true null distributions. I also illustrate that it is extremely easy to introduce strong bias into data by very mild selection and re-testing. Similar, usually undocumented data dredging steps can easily lead to having 20–50%, or more false positives. PMID:27713723
Shukla, R.; Yu Daohai; Fulk, F.
1995-12-31
Short-term toxicity tests with aquatic organisms are a valuable measurement tool in the assessment of the toxicity of effluents, environmental samples and single chemicals. Currently toxicity tests are utilized in a wide range of US EPA regulatory activities including effluent discharge compliance. In the current approach for determining the No Observed Effect Concentration, an effluent concentration is presumed safe if there is no statistically significant difference in toxicant response versus control response. The conclusion of a safe concentration may be due to the fact that it truly is safe, or alternatively, that the ability of the statistical test to detect an effect, given its existence, is inadequate. Results of research of a new statistical approach, the basis of which is to move away from a demonstration of no difference to a demonstration of equivalence, will be discussed. The concept of observed confidence distributions, first suggested by Cox, is proposed as a measure of the strength of evidence for practically equivalent responses between a given effluent concentration and the control. The research included determination of intervals of practically equivalent responses as a function of the variability of control response. The approach is illustrated using reproductive data from tests with Ceriodaphnia dubia and survival and growth data from tests with fathead minnow. The data are from the US EPA`s National Reference Toxicant Database.
The questioned p value: clinical, practical and statistical significance.
Jiménez-Paneque, Rosa
2016-09-09
The use of p-value and statistical significance have been questioned since the early 80s in the last century until today. Much has been discussed about it in the field of statistics and its applications, especially in Epidemiology and Public Health. As a matter of fact, the p-value and its equivalent, statistical significance, are difficult concepts to grasp for the many health professionals some way involved in research applied to their work areas. However, its meaning should be clear in intuitive terms although it is based on theoretical concepts of the field of Statistics. This paper attempts to present the p-value as a concept that applies to everyday life and therefore intuitively simple but whose proper use cannot be separated from theoretical and methodological elements of inherent complexity. The reasons behind the criticism received by the p-value and its isolated use are intuitively explained, mainly the need to demarcate statistical significance from clinical significance and some of the recommended remedies for these problems are approached as well. It finally refers to the current trend to vindicate the p-value appealing to the convenience of its use in certain situations and the recent statement of the American Statistical Association in this regard.
ERIC Educational Resources Information Center
Mittag, Kathleen C.; Thompson, Bruce
2000-01-01
Surveyed AERA members regarding their perceptions of: statistical issues and statistical significance testing; the general linear model; stepwise methods; score reliability; type I and II errors; sample size; statistical probabilities as exclusive measures of effect size; p values as direct measures of result value; and p values evaluating…
Statistical significance test for transition matrices of atmospheric Markov chains
NASA Technical Reports Server (NTRS)
Vautard, Robert; Mo, Kingtse C.; Ghil, Michael
1990-01-01
Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.
Robust statistical methods for automated outlier detection
NASA Technical Reports Server (NTRS)
Jee, J. R.
1987-01-01
The computational challenge of automating outlier, or blunder point, detection in radio metric data requires the use of nonstandard statistical methods because the outliers have a deleterious effect on standard least squares methods. The particular nonstandard methods most applicable to the task are the robust statistical techniques that have undergone intense development since the 1960s. These new methods are by design more resistant to the effects of outliers than standard methods. Because the topic may be unfamiliar, a brief introduction to the philosophy and methods of robust statistics is presented. Then the application of these methods to the automated outlier detection problem is detailed for some specific examples encountered in practice.
Interpretation of Statistical Significance Testing: A Matter of Perspective.
ERIC Educational Resources Information Center
McClure, John; Suen, Hoi K.
1994-01-01
This article compares three models that have been the foundation for approaches to the analysis of statistical significance in early childhood research--the Fisherian and the Neyman-Pearson models (both considered "classical" approaches), and the Bayesian model. The article concludes that all three models have a place in the analysis of research…
Statistical Significance and Effect Size: Two Sides of a Coin.
ERIC Educational Resources Information Center
Fan, Xitao
This paper suggests that statistical significance testing and effect size are two sides of the same coin; they complement each other, but do not substitute for one another. Good research practice requires that both should be taken into consideration to make sound quantitative decisions. A Monte Carlo simulation experiment was conducted, and a…
Your Chi-Square Test Is Statistically Significant: Now What?
ERIC Educational Resources Information Center
Sharpe, Donald
2015-01-01
Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…
Your Chi-Square Test Is Statistically Significant: Now What?
ERIC Educational Resources Information Center
Sharpe, Donald
2015-01-01
Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…
Assigning statistical significance to proteotypic peptides via database searches.
Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo
2011-02-01
Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. Published by Elsevier B.V.
Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc
2015-01-01
In the second part of a series on pitfalls in statistical analysis, we look at various ways in which a statistically significant study result can be expressed. We debunk some of the myths regarding the ‘P’ value, explain the importance of ‘confidence intervals’ and clarify the importance of including both values in a paper PMID:25878958
Statistical Fault Detection & Diagnosis Expert System
Wegerich, Stephan
1996-12-18
STATMON is an expert system that performs real-time fault detection and diagnosis of redundant sensors in any industrial process requiring high reliability. After a training period performed during normal operation, the expert system monitors the statistical properties of the incoming signals using a pattern recognition test. If the test determines that statistical properties of the signals have changed, the expert system performs a sequence of logical steps to determine which sensor or machine component has degraded.
Advances in Significance Testing for Cluster Detection
NASA Astrophysics Data System (ADS)
Coleman, Deidra Andrea
Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic
Estimation of the geochemical threshold and its statistical significance
Miesch, A.T.
1981-01-01
A statistic is proposed for estimating the geochemical threshold and its statistical significance, or it may be used to identify a group of extreme values that can be tested for significance by other means. The statistic is the maximum gap between adjacent values in an ordered array after each gap has been adjusted for the expected frequency. The values in the ordered array are geochemical values transformed by either ln(?? - ??) or ln(?? - ??) and then standardized so that the mean is zero and the variance is unity. The expected frequency is taken from a fitted normal curve with unit area. The midpoint of an adjusted gap that exceeds the corresponding critical value may be taken as an estimate of the geochemical threshold, and the associated probability indicates the likelihood that the threshold separates two geochemical populations. The adjusted gap test may fail to identify threshold values if the variation tends to be continuous from background values to the higher values that reflect mineralized ground. However, the test will serve to identify other anomalies that may be too subtle to have been noted by other means. ?? 1981.
Beyond Statistical Significance: Implications of Network Structure on Neuronal Activity
Vlachos, Ioannis; Aertsen, Ad; Kumar, Arvind
2012-01-01
It is a common and good practice in experimental sciences to assess the statistical significance of measured outcomes. For this, the probability of obtaining the actual results is estimated under the assumption of an appropriately chosen null-hypothesis. If this probability is smaller than some threshold, the results are deemed statistically significant and the researchers are content in having revealed, within their own experimental domain, a “surprising” anomaly, possibly indicative of a hitherto hidden fragment of the underlying “ground-truth”. What is often neglected, though, is the actual importance of these experimental outcomes for understanding the system under investigation. We illustrate this point by giving practical and intuitive examples from the field of systems neuroscience. Specifically, we use the notion of embeddedness to quantify the impact of a neuron's activity on its downstream neurons in the network. We show that the network response strongly depends on the embeddedness of stimulated neurons and that embeddedness is a key determinant of the importance of neuronal activity on local and downstream processing. We extrapolate these results to other fields in which networks are used as a theoretical framework. PMID:22291581
Statistical keyword detection in literary corpora
NASA Astrophysics Data System (ADS)
Herrera, J. P.; Pury, P. A.
2008-05-01
Understanding the complexity of human language requires an appropriate analysis of the statistical distribution of words in texts. We consider the information retrieval problem of detecting and ranking the relevant words of a text by means of statistical information referring to the spatial use of the words. Shannon's entropy of information is used as a tool for automatic keyword extraction. By using The Origin of Species by Charles Darwin as a representative text sample, we show the performance of our detector and compare it with another proposals in the literature. The random shuffled text receives special attention as a tool for calibrating the ranking indices.
Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance
Kramer, Karen L.; Veile, Amanda; Otárola-Castillo, Erik
2016-01-01
Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children’s growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children’s monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children’s growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children’s growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children’s growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance. PMID:26938742
Statistical significance across multiple optimization models for community partition
NASA Astrophysics Data System (ADS)
Li, Ju; Li, Hui-Jia; Mao, He-Jin; Chen, Junhua
2016-05-01
The study of community structure is an important problem in a wide range of applications, which can help us understand the real network system deeply. However, due to the existence of random factors and error edges in real networks, how to measure the significance of community structure efficiently is a crucial question. In this paper, we present a novel statistical framework computing the significance of community structure across multiple optimization methods. Different from the universal approaches, we calculate the similarity between a given node and its leader and employ the distribution of link tightness to derive the significance score, instead of a direct comparison to a randomized model. Based on the distribution of community tightness, a new “p-value” form significance measure is proposed for community structure analysis. Specially, the well-known approaches and their corresponding quality functions are unified to a novel general formulation, which facilitates in providing a detailed comparison across them. To determine the position of leaders and their corresponding followers, an efficient algorithm is proposed based on the spectral theory. Finally, we apply the significance analysis to some famous benchmark networks and the good performance verified the effectiveness and efficiency of our framework.
Statistical modeling approach for detecting generalized synchronization
NASA Astrophysics Data System (ADS)
Schumacher, Johannes; Haslinger, Robert; Pipa, Gordon
2012-05-01
Detecting nonlinear correlations between time series presents a hard problem for data analysis. We present a generative statistical modeling method for detecting nonlinear generalized synchronization. Truncated Volterra series are used to approximate functional interactions. The Volterra kernels are modeled as linear combinations of basis splines, whose coefficients are estimated via l1 and l2 regularized maximum likelihood regression. The regularization manages the high number of kernel coefficients and allows feature selection strategies yielding sparse models. The method's performance is evaluated on different coupled chaotic systems in various synchronization regimes and analytical results for detecting m:n phase synchrony are presented. Experimental applicability is demonstrated by detecting nonlinear interactions between neuronal local field potentials recorded in different parts of macaque visual cortex.
Fostering Students' Statistical Literacy through Significant Learning Experience
ERIC Educational Resources Information Center
Krishnan, Saras
2015-01-01
A major objective of statistics education is to develop students' statistical literacy that enables them to be educated users of data in context. Teaching statistics in today's educational settings is not an easy feat because teachers have a huge task in keeping up with the demands of the new generation of learners. The present day students have…
Strategies for identifying statistically significant dense regions in microarray data.
Yip, Andy M; Ng, Michael K; Wu, Edmond H; Chan, Tony F
2007-01-01
We propose and study the notion of dense regions for the analysis of categorized gene expression data and present some searching algorithms for discovering them. The algorithms can be applied to any categorical data matrices derived from gene expression level matrices. We demonstrate that dense regions are simple but useful and statistically significant patterns that can be used to 1) identify genes and/or samples of interest and 2) eliminate genes and/or samples corresponding to outliers, noise, or abnormalities. Some theoretical studies on the properties of the dense regions are presented which allow us to characterize dense regions into several classes and to derive tailor-made algorithms for different classes of regions. Moreover, an empirical simulation study on the distribution of the size of dense regions is carried out which is then used to assess the significance of dense regions and to derive effective pruning methods to speed up the searching algorithms. Real microarray data sets are employed to test our methods. Comparisons with six other well-known clustering algorithms using synthetic and real data are also conducted which confirm the superiority of our methods in discovering dense regions. The DRIFT code and a tutorial are available as supplemental material, which can be found on the Computer Society Digital Library at http://computer.org/tcbb/archives.htm.
Statistical significance of spectral lag transition in GRB 160625B
NASA Astrophysics Data System (ADS)
Ganguly, Shalini; Desai, Shantanu
2017-09-01
Recently Wei et al.[1] have found evidence for a transition from positive time lags to negative time lags in the spectral lag data of GRB 160625B. They have fit these observed lags to a sum of two components: an assumed functional form for intrinsic time lag due to astrophysical mechanisms and an energy-dependent speed of light due to quadratic and linear Lorentz invariance violation (LIV) models. Here, we examine the statistical significance of the evidence for a transition to negative time lags. Such a transition, even if present in GRB 160625B, cannot be due to an energy dependent speed of light as this would contradict previous limits by some 3-4 orders of magnitude, and must therefore be of intrinsic astrophysical origin. We use three different model comparison techniques: a frequentist test and two information based criteria (AIC and BIC). From the frequentist model comparison test, we find that the evidence for transition in the spectral lag data is favored at 3.05σ and 3.74σ for the linear and quadratic models respectively. We find that ΔAIC and ΔBIC have values ≳ 10 for the spectral lag transition that was motivated as being due to quadratic Lorentz invariance violating model pointing to ;decisive evidence;. We note however that none of the three models (including the model of intrinsic astrophysical emission) provide a good fit to the data.
Statistical fingerprinting for malware detection and classification
Prowell, Stacy J.; Rathgeb, Christopher T.
2015-09-15
A system detects malware in a computing architecture with an unknown pedigree. The system includes a first computing device having a known pedigree and operating free of malware. The first computing device executes a series of instrumented functions that, when executed, provide a statistical baseline that is representative of the time it takes the software application to run on a computing device having a known pedigree. A second computing device executes a second series of instrumented functions that, when executed, provides an actual time that is representative of the time the known software application runs on the second computing device. The system detects malware when there is a difference in execution times between the first and the second computing devices.
Chládek, J; Brázdil, M; Halámek, J; Plešinger, F; Jurák, P
2013-01-01
We present an off-line analysis procedure for exploring brain activity recorded from intra-cerebral electroencephalographic data (SEEG). The objective is to determine the statistical differences between different types of stimulations in the time-frequency domain. The procedure is based on computing relative signal power change and subsequent statistical analysis. An example of characteristic statistically significant event-related de/synchronization (ERD/ERS) detected across different frequency bands following different oddball stimuli is presented. The method is used for off-line functional classification of different brain areas.
Kepler Planet Detection Metrics: Statistical Bootstrap Test
NASA Technical Reports Server (NTRS)
Jenkins, Jon M.; Burke, Christopher J.
2016-01-01
This document describes the data produced by the Statistical Bootstrap Test over the final three Threshold Crossing Event (TCE) deliveries to NExScI: SOC 9.1 (Q1Q16)1 (Tenenbaum et al. 2014), SOC 9.2 (Q1Q17) aka DR242 (Seader et al. 2015), and SOC 9.3 (Q1Q17) aka DR253 (Twicken et al. 2016). The last few years have seen significant improvements in the SOC science data processing pipeline, leading to higher quality light curves and more sensitive transit searches. The statistical bootstrap analysis results presented here and the numerical results archived at NASAs Exoplanet Science Institute (NExScI) bear witness to these software improvements. This document attempts to introduce and describe the main features and differences between these three data sets as a consequence of the software changes.
Conducting tests for statistically significant differences using forest inventory data
James A. Westfall; Scott A. Pugh; John W. Coulston
2013-01-01
Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...
Statistical detection of systematic election irregularities.
Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan
2012-10-09
Democratic societies are built around the principle of free and fair elections, and that each citizen's vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons.
Statistical detection of systematic election irregularities
Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan
2012-01-01
Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons. PMID:23010929
Tipping points in the arctic: eyeballing or statistical significance?
Carstensen, Jacob; Weydmann, Agata
2012-02-01
Arctic ecosystems have experienced and are projected to experience continued large increases in temperature and declines in sea ice cover. It has been hypothesized that small changes in ecosystem drivers can fundamentally alter ecosystem functioning, and that this might be particularly pronounced for Arctic ecosystems. We present a suite of simple statistical analyses to identify changes in the statistical properties of data, emphasizing that changes in the standard error should be considered in addition to changes in mean properties. The methods are exemplified using sea ice extent, and suggest that the loss rate of sea ice accelerated by factor of ~5 in 1996, as reported in other studies, but increases in random fluctuations, as an early warning signal, were observed already in 1990. We recommend to employ the proposed methods more systematically for analyzing tipping points to document effects of climate change in the Arctic.
Infants with Williams syndrome detect statistical regularities in continuous speech.
Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B
2016-09-01
Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. Copyright © 2016 Elsevier B.V. All rights reserved.
Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?
NASA Astrophysics Data System (ADS)
Vu, Minh Tue; Aribarg, Thannob; Supratid, Siriporn; Raghavan, Srivatsan V.; Liong, Shie-Yui
2016-11-01
Artificial neural network (ANN) is an established technique with a flexible mathematical structure that is capable of identifying complex nonlinear relationships between input and output data. The present study utilizes ANN as a method of statistically downscaling global climate models (GCMs) during the rainy season at meteorological site locations in Bangkok, Thailand. The study illustrates the applications of the feed forward back propagation using large-scale predictor variables derived from both the ERA-Interim reanalyses data and present day/future GCM data. The predictors are first selected over different grid boxes surrounding Bangkok region and then screened by using principal component analysis (PCA) to filter the best correlated predictors for ANN training. The reanalyses downscaled results of the present day climate show good agreement against station precipitation with a correlation coefficient of 0.8 and a Nash-Sutcliffe efficiency of 0.65. The final downscaled results for four GCMs show an increasing trend of precipitation for rainy season over Bangkok by the end of the twenty-first century. The extreme values of precipitation determined using statistical indices show strong increases of wetness. These findings will be useful for policy makers in pondering adaptation measures due to flooding such as whether the current drainage network system is sufficient to meet the changing climate and to plan for a range of related adaptation/mitigation measures.
Statistical significance of seasonal warming/cooling trends.
Ludescher, Josef; Bunde, Armin; Schellnhuber, Hans Joachim
2017-04-11
The question whether a seasonal climate trend (e.g., the increase of summer temperatures in Antarctica in the last decades) is of anthropogenic or natural origin is of great importance for mitigation and adaption measures alike. The conventional significance analysis assumes that (i) the seasonal climate trends can be quantified by linear regression, (ii) the different seasonal records can be treated as independent records, and (iii) the persistence in each of these seasonal records can be characterized by short-term memory described by an autoregressive process of first order. Here we show that assumption ii is not valid, due to strong intraannual correlations by which different seasons are correlated. We also show that, even in the absence of correlations, for Gaussian white noise, the conventional analysis leads to a strong overestimation of the significance of the seasonal trends, because multiple testing has not been taken into account. In addition, when the data exhibit long-term memory (which is the case in most climate records), assumption iii leads to a further overestimation of the trend significance. Combining Monte Carlo simulations with the Holm-Bonferroni method, we demonstrate how to obtain reliable estimates of the significance of the seasonal climate trends in long-term correlated records. For an illustration, we apply our method to representative temperature records from West Antarctica, which is one of the fastest-warming places on Earth and belongs to the crucial tipping elements in the Earth system.
Statistical significance of seasonal warming/cooling trends
NASA Astrophysics Data System (ADS)
Ludescher, Josef; Bunde, Armin; Schellnhuber, Hans Joachim
2017-04-01
The question whether a seasonal climate trend (e.g., the increase of summer temperatures in Antarctica in the last decades) is of anthropogenic or natural origin is of great importance for mitigation and adaption measures alike. The conventional significance analysis assumes that (i) the seasonal climate trends can be quantified by linear regression, (ii) the different seasonal records can be treated as independent records, and (iii) the persistence in each of these seasonal records can be characterized by short-term memory described by an autoregressive process of first order. Here we show that assumption ii is not valid, due to strong intraannual correlations by which different seasons are correlated. We also show that, even in the absence of correlations, for Gaussian white noise, the conventional analysis leads to a strong overestimation of the significance of the seasonal trends, because multiple testing has not been taken into account. In addition, when the data exhibit long-term memory (which is the case in most climate records), assumption iii leads to a further overestimation of the trend significance. Combining Monte Carlo simulations with the Holm-Bonferroni method, we demonstrate how to obtain reliable estimates of the significance of the seasonal climate trends in long-term correlated records. For an illustration, we apply our method to representative temperature records from West Antarctica, which is one of the fastest-warming places on Earth and belongs to the crucial tipping elements in the Earth system.
Wilkinson, Michael
2014-03-01
Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
Assessing statistical significance in multivariable genome wide association analysis
Buzdugan, Laura; Kalisch, Markus; Navarro, Arcadi; Schunk, Daniel; Fehr, Ernst; Bühlmann, Peter
2016-01-01
Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ‘spuriously correlated’ SNP merely happens to be correlated with the ‘truly causal’ SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact: peter.buehlmann@stat.math.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153677
Evaluating clinical significance: incorporating robust statistics with normative comparison tests.
van Wieringen, Katrina; Cribbie, Robert A
2014-05-01
The purpose of this study was to evaluate a modified test of equivalence for conducting normative comparisons when distribution shapes are non-normal and variances are unequal. A Monte Carlo study was used to compare the empirical Type I error rates and power of the proposed Schuirmann-Yuen test of equivalence, which utilizes trimmed means, with that of the previously recommended Schuirmann and Schuirmann-Welch tests of equivalence when the assumptions of normality and variance homogeneity are satisfied, as well as when they are not satisfied. The empirical Type I error rates of the Schuirmann-Yuen were much closer to the nominal α level than those of the Schuirmann or Schuirmann-Welch tests, and the power of the Schuirmann-Yuen was substantially greater than that of the Schuirmann or Schuirmann-Welch tests when distributions were skewed or outliers were present. The Schuirmann-Yuen test is recommended for assessing clinical significance with normative comparisons.
Lies, damned lies and statistics: Clinical importance versus statistical significance in research.
Mellis, Craig
2017-02-28
Correctly performed and interpreted statistics play a crucial role for both those who 'produce' clinical research, and for those who 'consume' this research. Unfortunately, however, there are many misunderstandings and misinterpretations of statistics by both groups. In particular, there is a widespread lack of appreciation for the severe limitations with p values. This is a particular problem with small sample sizes and low event rates - common features of many published clinical trials. These issues have resulted in increasing numbers of false positive clinical trials (false 'discoveries'), and the well-publicised inability to replicate many of the findings. While chance clearly plays a role in these errors, many more are due to either poorly performed or badly misinterpreted statistics. Consequently, it is essential that whenever p values appear, these need be accompanied by both 95% confidence limits and effect sizes. These will enable readers to immediately assess the plausible range of results, and whether or not the effect is clinically meaningful.
Statistical significance of the rich-club phenomenon in complex networks
NASA Astrophysics Data System (ADS)
Jiang, Zhi-Qiang; Zhou, Wei-Xing
2008-04-01
We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.
Innovative Statistical Inference for Anomaly Detection in Hyperspectral Imagery
2004-09-01
Innovative Statistical Inference for Anomaly Detection in Hyperspectral Imagery by Dalton Rosario ARL-TR-3339 September 2004...2004 Innovative Statistical Inference for Anomaly Detection in Hyperspectral Imagery Dalton Rosario Sensors and Electron Devices...the effectiveness of both algorithms. 15. SUBJECT TERMS Hyperspectral anomaly detection , large sample theory 16. SECURITY CLASSIFICATION OF: 19a
Timescales for detecting a significant acceleration in sea level rise
Haigh, Ivan D.; Wahl, Thomas; Rohling, Eelco J.; Price, René M.; Pattiaratchi, Charitha B.; Calafat, Francisco M.; Dangendorf, Sönke
2014-01-01
There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records. PMID:24728012
Timescales for detecting a significant acceleration in sea level rise.
Haigh, Ivan D; Wahl, Thomas; Rohling, Eelco J; Price, René M; Pattiaratchi, Charitha B; Calafat, Francisco M; Dangendorf, Sönke
2014-04-14
There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records.
Damage detection in mechanical structures using extreme value statistic.
Worden, K.; Allen, D. W.; Sohn, H.; Farrar, C. R.
2002-01-01
The first and most important objective of any damage identification algorithms is to ascertain with confidence if damage is present or not. Many methods have been proposed for damage detection based on ideas of novelty detection founded in pattern recognition and multivariate statistics. The philosophy of novelty detection is simple. Features are first extracted from a baseline system to be monitored, and subsequent data are then compared to see if the new features are outliers, which significantly depart from the rest of population. In damage diagnosis problems, the assumption is that outliers are generated from a damaged condition of the monitored system. This damage classification necessitates the establishment of a decision boundary. Choosing this threshold value is often based on the assumption that the parent distribution of data is Gaussian in nature. While the problem of novelty detection focuses attention on the outlier or extreme values of the data i.e. those points in the tails of the distribution, the threshold selection using the normality assumption weighs the central population of data. Therefore, this normality assumption might impose potentially misleading behavior on damage classification, and is likely to lead the damage diagnosis astray. In this paper, extreme value statistics is integrated with the novelty detection to specifically model the tails of the distribution of interest. Finally, the proposed technique is demonstrated on simulated numerical data and time series data measured from an eight degree-of-freedom spring-mass system.
Fault Diagnostics Using Statistical Change Detection in the Bispectral Domain
NASA Astrophysics Data System (ADS)
Eugene Parker, B.; Ware, H. A.; Wipf, D. P.; Tompkins, W. R.; Clark, B. R.; Larson, E. C.; Vincent Poor, H.
2000-07-01
It is widely accepted that structural defects in rotating machinery components (e.g. bearings and gears) can be detected through monitoring of vibration and/or sound emissions. Traditional diagnostic vibration analysis attempts to match spectral lines with a priori -known defect frequencies that are characteristic of the affected machinery components. Emphasis herein is on use of bispectral-based statistical change detection algorithms for machinery health monitoring. The bispectrum, a third-order statistic, helps identify pairs of phase-related spectral components, which is useful for fault detection and isolation. In particular, the bispectrum helps sort through the clutter of usual (second-order) vibration spectra to extract useful information associated with the health of particular components. Seeded and non-seeded helicopter gearbox fault results (CH-46E and CH-47D, respectively) show that bispectral algorithms can detect faults at the level of an individual component (i.e. bearings or gears). Fault isolation is implicit with detection based on characteristic a priori -known defect frequencies. Important attributes of the bispectral SCD approach include: (1) it does not require a priori training data as is needed for traditional pattern-classifier-based approaches (and thereby avoids the significant time and cost investments necessary to obtain such data); (2) being based on higher-order moment-based energy detection, it makes no assumptions about the statistical model of the bispectral sequences that are generated; (3) it is operating-regime independent (i.e. works across different operating conditions, flight regimes, torque levels, etc., without knowledge of same); (4) it can be used to isolate faults to the level of specific machinery components (e.g. bearings and gears); and (5) it can be implemented using relatively inexpensive computer hardware, since only low-frequency vibrations need to be processed. The bispectral SCD algorithm thus represents a
Sun, Guoli; Krasnitz, Alexander
2014-11-19
One of the most common goals of hierarchical clustering is finding those branches of a tree that form quantifiably distinct data subtypes. Achieving this goal in a statistically meaningful way requires (a) a measure of distinctness of a branch and (b) a test to determine the significance of the observed measure, applicable to all branches and across multiple scales of dissimilarity. We formulate a method termed Tree Branches Evaluated Statistically for Tightness (TBEST) for identifying significantly distinct tree branches in hierarchical clusters. For each branch of the tree a measure of distinctness, or tightness, is defined as a rational function of heights, both of the branch and of its parent. A statistical procedure is then developed to determine the significance of the observed values of tightness. We test TBEST as a tool for tree-based data partitioning by applying it to five benchmark datasets, one of them synthetic and the other four each from a different area of biology. For each dataset there is a well-defined partition of the data into classes. In all test cases TBEST performs on par with or better than the existing techniques. Based on our benchmark analysis, TBEST is a tool of choice for detection of significantly distinct branches in hierarchical trees grown from biological data. An R language implementation of the method is available from the Comprehensive R Archive Network: http://www.cran.r-project.org/web/packages/TBEST/index.html.
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks
2016-04-26
18 5.2 Author’s Personal Facebook Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 6 Conclusions 22 7 Appendices 27 2...23 17 Unclustered Facebook network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 18 Stem plot of...degree-ordered vertices versus the degree for Facebook network. . . . . . . . . . 24 19 Output of proposed algorithm implemented in C++ and applied to
Statistical Signal Processing Research for Landmine Detection
2012-03-02
Duke University Durham, NC 27705 - REPORT DOCUMENTATION PAGE b. ABSTRACT UU c. THIS PAGE UU 2 . REPORT TYPE Final Report 17. LIMITATION OF ABSTRACT UU...Geoscience and Remote Sensing Symposium (IGARSS), Vancouver, BC, pp. 874-877 , July 2011. ( 2 ) Tantum. S. L., Morton, K.D. Jr., Torrione, P.A... Physics -based features for contextual factors affecting landmine detection with ground-penetrating radar,” SPIE Defense and Security Symposium, April
Statistical Detection of Atypical Aircraft Flights
NASA Technical Reports Server (NTRS)
Statler, Irving; Chidester, Thomas; Shafto, Michael; Ferryman, Thomas; Amidan, Brett; Whitney, Paul; White, Amanda; Willse, Alan; Cooley, Scott; Jay, Joseph; Rosenthal, Loren; Swickard, Andrea; Bates, Derrick; Scherrer, Chad; Webb, Bobbie-Jo; Lawrence, Robert; Mosbrucker, Chris; Prothero, Gary; Andrei, Adi; Romanowski, Tim; Robin, Daniel; Prothero, Jason; Lynch, Robert; Lowe, Michael
2006-01-01
A computational method and software to implement the method have been developed to sift through vast quantities of digital flight data to alert human analysts to aircraft flights that are statistically atypical in ways that signify that safety may be adversely affected. On a typical day, there are tens of thousands of flights in the United States and several times that number throughout the world. Depending on the specific aircraft design, the volume of data collected by sensors and flight recorders can range from a few dozen to several thousand parameters per second during a flight. Whereas these data have long been utilized in investigating crashes, the present method is oriented toward helping to prevent crashes by enabling routine monitoring of flight operations to identify portions of flights that may be of interest with respect to safety issues.
ERIC Educational Resources Information Center
Monterde-i-Bort, Hector; Frias-Navarro, Dolores; Pascual-Llobell, Juan
2010-01-01
The empirical study we present here deals with a pedagogical issue that has not been thoroughly explored up until now in our field. Previous empirical studies in other sectors have identified the opinions of researchers about this topic, showing that completely unacceptable interpretations have been made of significance tests and other statistical…
ERIC Educational Resources Information Center
Monterde-i-Bort, Hector; Frias-Navarro, Dolores; Pascual-Llobell, Juan
2010-01-01
The empirical study we present here deals with a pedagogical issue that has not been thoroughly explored up until now in our field. Previous empirical studies in other sectors have identified the opinions of researchers about this topic, showing that completely unacceptable interpretations have been made of significance tests and other statistical…
A decision surface-based taxonomy of detection statistics
NASA Astrophysics Data System (ADS)
Bouffard, François
2012-09-01
Current and past literature on the topic of detection statistics - in particular those used in hyperspectral target detection - can be intimidating for newcomers, especially given the huge number of detection tests described in the literature. Detection tests for hyperspectral measurements, such as those generated by dispersive or Fourier transform spectrometers used in remote sensing of atmospheric contaminants, are of paramount importance if any level of analysis automation is to be achieved. The detection statistics used in hyperspectral target detection are generally borrowed and adapted from other fields such as radar signal processing or acoustics. Consequently, although remarkable efforts have been made to clarify and categorize the vast number of available detection tests, understanding their differences, similarities, limits and other intricacies is still an exacting journey. Reasons for this state of affairs include heterogeneous nomenclature and mathematical notation, probably due to the multiple origins of hyperspectral target detection formalisms. Attempts at sorting out detection statistics using ambiguously defined properties may also cause more harm than good. Ultimately, a detection statistic is entirely characterized by its decision boundary. Thus, we propose to catalogue detection statistics according to the shape of their decision surfaces, which greatly simplifies this taxonomy exercise. We make a distinction between the topology resulting from the mathematical formulation of the statistic and mere parameters that adjust the boundary's precise shape, position and orientation. Using this simple approach, similarities between various common detection statistics are found, limit cases are reduced to simpler statistics, and a general understanding of the available detection tests and their properties becomes much easier to achieve.
Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.
ERIC Educational Resources Information Center
Breunig, Nancy A.
Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…
Steganography forensics method for detecting least significant bit replacement attack
NASA Astrophysics Data System (ADS)
Wang, Xiaofeng; Wei, Chengcheng; Han, Xiao
2015-01-01
We present an image forensics method to detect least significant bit replacement steganography attack. The proposed method provides fine-grained forensics features by using the hierarchical structure that combines pixels correlation and bit-planes correlation. This is achieved via bit-plane decomposition and difference matrices between the least significant bit-plane and each one of the others. Generated forensics features provide the susceptibility (changeability) that will be drastically altered when the cover image is embedded with data to form a stego image. We developed a statistical model based on the forensics features and used least square support vector machine as a classifier to distinguish stego images from cover images. Experimental results show that the proposed method provides the following advantages. (1) The detection rate is noticeably higher than that of some existing methods. (2) It has the expected stability. (3) It is robust for content-preserving manipulations, such as JPEG compression, adding noise, filtering, etc. (4) The proposed method provides satisfactory generalization capability.
Detecting cell death with optical coherence tomography and envelope statistics
NASA Astrophysics Data System (ADS)
Farhat, Golnaz; Yang, Victor X. D.; Czarnota, Gregory J.; Kolios, Michael C.
2011-02-01
Currently no standard clinical or preclinical noninvasive method exists to monitor cell death based on morphological changes at the cellular level. In our past work we have demonstrated that quantitative high frequency ultrasound imaging can detect cell death in vitro and in vivo. In this study we apply quantitative methods previously used with high frequency ultrasound to optical coherence tomography (OCT) to detect cell death. The ultimate goal of this work is to use these methods for optically-based clinical and preclinical cancer treatment monitoring. Optical coherence tomography data were acquired from acute myeloid leukemia cells undergoing three modes of cell death. Significant increases in integrated backscatter were observed for cells undergoing apoptosis and mitotic arrest, while necrotic cells induced a decrease. These changes appear to be linked to structural changes observed in histology obtained from the cell samples. Signal envelope statistics were analyzed from fittings of the generalized gamma distribution to histograms of envelope intensities. The parameters from this distribution demonstrated sensitivities to morphological changes in the cell samples. These results indicate that OCT integrated backscatter and first order envelope statistics can be used to detect and potentially differentiate between modes of cell death in vitro.
Detection of bearing damage by statistic vibration analysis
NASA Astrophysics Data System (ADS)
Sikora, E. A.
2016-04-01
The condition of bearings, which are essential components in mechanisms, is crucial to safety. The analysis of the bearing vibration signal, which is always contaminated by certain types of noise, is a very important standard for mechanical condition diagnosis of the bearing and mechanical failure phenomenon. In this paper the method of rolling bearing fault detection by statistical analysis of vibration is proposed to filter out Gaussian noise contained in a raw vibration signal. The results of experiments show that the vibration signal can be significantly enhanced by application of the proposed method. Besides, the proposed method is used to analyse real acoustic signals of a bearing with inner race and outer race faults, respectively. The values of attributes are determined according to the degree of the fault. The results confirm that the periods between the transients, which represent bearing fault characteristics, can be successfully detected.
Configurational Statistics of Magnetic Bead Detection with Magnetoresistive Sensors.
Henriksen, Anders Dahl; Ley, Mikkel Wennemoes Hvitfeld; Flyvbjerg, Henrik; Hansen, Mikkel Fougt
2015-01-01
Magnetic biosensors detect magnetic beads that, mediated by a target, have bound to a functionalized area. This area is often larger than the area of the sensor. Both the sign and magnitude of the average magnetic field experienced by the sensor from a magnetic bead depends on the location of the bead relative to the sensor. Consequently, the signal from multiple beads also depends on their locations. Thus, a given coverage of the functionalized area with magnetic beads does not result in a given detector response, except on the average, over many realizations of the same coverage. We present a systematic theoretical analysis of how this location-dependence affects the sensor response. The analysis is done for beads magnetized by a homogeneous in-plane magnetic field. We determine the expected value and standard deviation of the sensor response for a given coverage, as well as the accuracy and precision with which the coverage can be determined from a single sensor measurement. We show that statistical fluctuations between samples may reduce the sensitivity and dynamic range of a sensor significantly when the functionalized area is larger than the sensor area. Hence, the statistics of sampling is essential to sensor design. For illustration, we analyze three important published cases for which statistical fluctuations are dominant, significant, and insignificant, respectively.
Configurational Statistics of Magnetic Bead Detection with Magnetoresistive Sensors
Henriksen, Anders Dahl; Ley, Mikkel Wennemoes Hvitfeld; Flyvbjerg, Henrik; Hansen, Mikkel Fougt
2015-01-01
Magnetic biosensors detect magnetic beads that, mediated by a target, have bound to a functionalized area. This area is often larger than the area of the sensor. Both the sign and magnitude of the average magnetic field experienced by the sensor from a magnetic bead depends on the location of the bead relative to the sensor. Consequently, the signal from multiple beads also depends on their locations. Thus, a given coverage of the functionalized area with magnetic beads does not result in a given detector response, except on the average, over many realizations of the same coverage. We present a systematic theoretical analysis of how this location-dependence affects the sensor response. The analysis is done for beads magnetized by a homogeneous in-plane magnetic field. We determine the expected value and standard deviation of the sensor response for a given coverage, as well as the accuracy and precision with which the coverage can be determined from a single sensor measurement. We show that statistical fluctuations between samples may reduce the sensitivity and dynamic range of a sensor significantly when the functionalized area is larger than the sensor area. Hence, the statistics of sampling is essential to sensor design. For illustration, we analyze three important published cases for which statistical fluctuations are dominant, significant, and insignificant, respectively. PMID:26496495
Using Person Fit Statistics to Detect Outliers in Survey Research.
Felt, John M; Castaneda, Ruben; Tiemensma, Jitske; Depaoli, Sarah
2017-01-01
Context: When working with health-related questionnaires, outlier detection is important. However, traditional methods of outlier detection (e.g., boxplots) can miss participants with "atypical" responses to the questions that otherwise have similar total (subscale) scores. In addition to detecting outliers, it can be of clinical importance to determine the reason for the outlier status or "atypical" response. Objective: The aim of the current study was to illustrate how to derive person fit statistics for outlier detection through a statistical method examining person fit with a health-based questionnaire. Design and Participants: Patients treated for Cushing's syndrome (n = 394) were recruited from the Cushing's Support and Research Foundation's (CSRF) listserv and Facebook page. Main Outcome Measure: Patients were directed to an online survey containing the CushingQoL (English version). A two-dimensional graded response model was estimated, and person fit statistics were generated using the Zh statistic. Results: Conventional outlier detections methods revealed no outliers reflecting extreme scores on the subscales of the CushingQoL. However, person fit statistics identified 18 patients with "atypical" response patterns, which would have been otherwise missed (Zh > |±2.00|). Conclusion: While the conventional methods of outlier detection indicated no outliers, person fit statistics identified several patients with "atypical" response patterns who otherwise appeared average. Person fit statistics allow researchers to delve further into the underlying problems experienced by these "atypical" patients treated for Cushing's syndrome. Annotated code is provided to aid other researchers in using this method.
Why Are People Bad at Detecting Randomness? A Statistical Argument
ERIC Educational Resources Information Center
Williams, Joseph J.; Griffiths, Thomas L.
2013-01-01
Errors in detecting randomness are often explained in terms of biases and misconceptions. We propose and provide evidence for an account that characterizes the contribution of the inherent statistical difficulty of the task. Our account is based on a Bayesian statistical analysis, focusing on the fact that a random process is a special case of…
Why Are People Bad at Detecting Randomness? A Statistical Argument
ERIC Educational Resources Information Center
Williams, Joseph J.; Griffiths, Thomas L.
2013-01-01
Errors in detecting randomness are often explained in terms of biases and misconceptions. We propose and provide evidence for an account that characterizes the contribution of the inherent statistical difficulty of the task. Our account is based on a Bayesian statistical analysis, focusing on the fact that a random process is a special case of…
A Review of Post-1994 Literature on Whether Statistical Significance Tests Should Be Banned.
ERIC Educational Resources Information Center
Sullivan, Jeremy R.
This paper summarizes the literature regarding statistical significance testing with an emphasis on: (1) the post-1994 literature in various disciplines; (2) alternatives to statistical significance testing; and (3) literature exploring why researchers have demonstrably failed to be influenced by the 1994 American Psychological Association…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"
ERIC Educational Resources Information Center
Ozturk, Elif
2012-01-01
The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
ERIC Educational Resources Information Center
Norris, John M.
2015-01-01
Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…
ERIC Educational Resources Information Center
Norris, John M.
2015-01-01
Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…
The Historical Growth of Statistical Significance Testing in Psychology--and Its Future Prospects.
ERIC Educational Resources Information Center
Hubbard, Raymond; Ryan, Patricia A.
2000-01-01
Examined the historical growth in the popularity of statistical significance testing using a random sample of data from 12 American Psychological Association journals. Results replicate and extend findings from a study that used only one such journal. Discusses the role of statistical significance testing and the use of replication and…
Deng, Nina; Allison, Jeroan J; Fang, Hua Julia; Ash, Arlene S; Ware, John E
2013-05-31
Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9).
2013-01-01
Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Detection of small target using recursive higher order statistics
NASA Astrophysics Data System (ADS)
Hou, Wang; Sun, Hongyuan; Lei, Zhihui
2014-02-01
In this paper, a recursive higher order statistics algorithm is proposed for small target detection in temporal domain. Firstly, the background of image sequence is normalized. Then, the higher order statistics are recursively solved in image sequence to obtain the feature image. Finally, the feature image is segmented with threshold to detect the small target. To validate the algorithm proposed in this paper, five simulated and one semi-simulation image sequences are created. The ROC curves are employed for evaluation of experimental results. Experiment results show that our method is very effective for small target detection.
Crow, C.J.
1985-01-01
Middle Ordovician age Chickamauga Group carbonates crop out along the Birmingham and Murphrees Valley anticlines in central Alabama. The macrofossil contents on exposed surfaces of seven bioherms have been counted to determine their various paleontologic characteristics. Twelve groups of organisms are present in these bioherms. Dominant organisms include bryozoans, algae, brachiopods, sponges, pelmatozoans, stromatoporoids and corals. Minor accessory fauna include predators, scavengers and grazers such as gastropods, ostracods, trilobites, cephalopods and pelecypods. Vertical and horizontal niche zonation has been detected for some of the bioherm dwelling fauna. No one bioherm of those studied exhibits all 12 groups of organisms; rather, individual bioherms display various subsets of the total diversity. Statistical treatment (G-test) of the diversity data indicates a lack of statistical homogeneity of the bioherms, both within and between localities. Between-locality population heterogeneity can be ascribed to differences in biologic responses to such gross environmental factors as water depth and clarity, and energy levels. At any one locality, gross aspects of the paleoenvironments are assumed to have been more uniform. Significant differences among bioherms at any one locality may have resulted from patchy distribution of species populations, differential preservation and other factors.
Gehrmann, Thies; Reinders, Marcel J.T.
2015-01-01
Background: With more and more genomes being sequenced, detecting synteny between genomes becomes more and more important. However, for microorganisms the genomic divergence quickly becomes large, resulting in different codon usage and shuffling of gene order and gene elements such as exons. Results: We present Proteny, a methodology to detect synteny between diverged genomes. It operates on the amino acid sequence level to be insensitive to codon usage adaptations and clusters groups of exons disregarding order to handle diversity in genomic ordering between genomes. Furthermore, Proteny assigns significance levels to the syntenic clusters such that they can be selected on statistical grounds. Finally, Proteny provides novel ways to visualize results at different scales, facilitating the exploration and interpretation of syntenic regions. We test the performance of Proteny on a standard ground truth dataset, and we illustrate the use of Proteny on two closely related genomes (two different strains of Aspergillus niger) and on two distant genomes (two species of Basidiomycota). In comparison to other tools, we find that Proteny finds clusters with more true homologies in fewer clusters that contain more genes, i.e. Proteny is able to identify a more consistent synteny. Further, we show how genome rearrangements, assembly errors, gene duplications and the conservation of specific genes can be easily studied with Proteny. Availability and implementation: Proteny is freely available at the Delft Bioinformatics Lab website http://bioinformatics.tudelft.nl/dbl/software. Contact: t.gehrmann@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26116928
Statistical power for detecting trends with applications to seabird monitoring
Hatch, Shyla A.
2003-01-01
Power analysis is helpful in defining goals for ecological monitoring and evaluating the performance of ongoing efforts. I examined detection standards proposed for population monitoring of seabirds using two programs (MONITOR and TRENDS) specially designed for power analysis of trend data. Neither program models within- and among-years components of variance explicitly and independently, thus an error term that incorporates both components is an essential input. Residual variation in seabird counts consisted of day-to-day variation within years and unexplained variation among years in approximately equal parts. The appropriate measure of error for power analysis is the standard error of estimation (S.E.est) from a regression of annual means against year. Replicate counts within years are helpful in minimizing S.E.est but should not be treated as independent samples for estimating power to detect trends. Other issues include a choice of assumptions about variance structure and selection of an exponential or linear model of population change. Seabird count data are characterized by strong correlations between S.D. and mean, thus a constant CV model is appropriate for power calculations. Time series were fit about equally well with exponential or linear models, but log transformation ensures equal variances over time, a basic assumption of regression analysis. Using sample data from seabird monitoring in Alaska, I computed the number of years required (with annual censusing) to detect trends of -1.4% per year (50% decline in 50 years) and -2.7% per year (50% decline in 25 years). At ??=0.05 and a desired power of 0.9, estimated study intervals ranged from 11 to 69 years depending on species, trend, software, and study design. Power to detect a negative trend of 6.7% per year (50% decline in 10 years) is suggested as an alternative standard for seabird monitoring that achieves a reasonable match between statistical and biological significance.
ERIC Educational Resources Information Center
Mittag, Kathleen C
A national survey of a stratified random sample of members of the American Educational Research Association was undertaken to explore perceptions of contemporary statistical issues, and especially of statistical significance tests. The 225 actual respondents were found to be reasonably representative of the population from which the sample was…
Application of Scan Statistics to Detect Suicide Clusters in Australia
Cheung, Yee Tak Derek; Spittal, Matthew J.; Williamson, Michelle Kate; Tung, Sui Jay; Pirkis, Jane
2013-01-01
Background Suicide clustering occurs when multiple suicide incidents take place in a small area or/and within a short period of time. In spite of the multi-national research attention and particular efforts in preparing guidelines for tackling suicide clusters, the broader picture of epidemiology of suicide clustering remains unclear. This study aimed to develop techniques in using scan statistics to detect clusters, with the detection of suicide clusters in Australia as example. Methods and Findings Scan statistics was applied to detect clusters among suicides occurring between 2004 and 2008. Manipulation of parameter settings and change of area for scan statistics were performed to remedy shortcomings in existing methods. In total, 243 suicides out of 10,176 (2.4%) were identified as belonging to 15 suicide clusters. These clusters were mainly located in the Northern Territory, the northern part of Western Australia, and the northern part of Queensland. Among the 15 clusters, 4 (26.7%) were detected by both national and state cluster detections, 8 (53.3%) were only detected by the state cluster detection, and 3 (20%) were only detected by the national cluster detection. Conclusions These findings illustrate that the majority of spatial-temporal clusters of suicide were located in the inland northern areas, with socio-economic deprivation and higher proportions of indigenous people. Discrepancies between national and state/territory cluster detection by scan statistics were due to the contrast of the underlying suicide rates across states/territories. Performing both small-area and large-area analyses, and applying multiple parameter settings may yield the maximum benefits for exploring clusters. PMID:23342098
A network-based method to assess the statistical significance of mild co-regulation effects.
Horvát, Emőke-Ágnes; Zhang, Jitao David; Uhlmann, Stefan; Sahin, Özgür; Zweig, Katharina Anna
2013-01-01
Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis.
A Network-Based Method to Assess the Statistical Significance of Mild Co-Regulation Effects
Horvát, Emőke-Ágnes; Zhang, Jitao David; Uhlmann, Stefan; Sahin, Özgür; Zweig, Katharina Anna
2013-01-01
Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis. PMID:24039936
NASA Astrophysics Data System (ADS)
Vermeesch, Pieter
2011-02-01
In my Eos Forum of 24 November 2009 (90(47), 443), I used the chi-square test to reject the null hypothesis that earthquakes occur independent of the weekday to make the point that statistical significance should not be confused with geological significance. Of the five comments on my article, only the one by Sornette and Pisarenko [2011] disputes this conclusion, while the remaining comments take issue with certain aspects of the geophysical case study. In this reply I will address all of these points, after providing some necessary further background about statistical tests. Two types of error can result from a hypothesis test. A Type I error occurs when a true null hypothesis is erroneously rejected by chance. A Type II error occurs when a false null hypothesis is erroneously accepted by chance. By definition, the p value is the probability, under the null hypothesis, of obtaining a test statistic at least as extreme as the one observed. In other words, the smaller the p value, the lower the probability that a Type I error has been made. In light of the exceedingly small p value of the earthquake data set, Tseng and Chen's [2011] assertion that a Type I error has been committed is clearly wrong. How about Type II errors?
NASA Technical Reports Server (NTRS)
Xu, Kuan-Man
2006-01-01
A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Almost all articles on cancer prognostic markers report statistically significant results.
Kyzas, Panayiotis A; Denaxa-Kyza, Despina; Ioannidis, John P A
2007-11-01
We aimed to understand the extent of the pursuit for statistically significant results in the prognostic literature of cancer. We evaluated 340 articles included in prognostic marker meta-analyses (Database 1) and 1575 articles on cancer prognostic markers published in 2005 (Database 2). For each article, we examined whether the abstract reported any statistically significant prognostic effect for any marker and any outcome ('positive' articles). 'Negative' articles were further examined for statements made by the investigators to overcome the absence of prognostic statistical significance. We also examined how the articles of Database 1 had presented the relative risks that were included in the respective meta-analyses. 'Positive' prognostic articles comprised 90.6% and 95.8% in Databases 1 and 2, respectively. Most of the 'negative' prognostic articles claimed significance for other analyses, expanded on non-significant trends or offered apologies that were occasionally remote from the original study aims. Only five articles in Database 1 (1.5%) and 21 in Database 2 (1.3%) were fully 'negative' for all presented results in the abstract and without efforts to expand on non-significant trends or to defend the importance of the marker with other arguments. Of the statistically non-significant relative risks in the meta-analyses, 25% had been presented as statistically significant in the primary papers using different analyses compared with the respective meta-analysis. We conclude that almost all articles on cancer prognostic marker studies highlight some statistically significant results. Under strong reporting bias, statistical significance loses its discriminating ability for the importance of prognostic markers.
Cheng, Chia-Ying; Huang, Chung-Yuan; Sun, Chuen-Tsai
2008-02-01
A major task for postgenomic systems biology researchers is to systematically catalogue molecules and their interactions within living cells. Advancements in complex-network theory are being made toward uncovering organizing principles that govern cell formation and evolution, but we lack understanding of how molecules and their interactions determine how complex systems function. Molecular bridge motifs include isolated motifs that neither interact nor overlap with others, whereas brick motifs act as network foundations that play a central role in defining global topological organization. To emphasize their structural organizing and evolutionary characteristics, we define bridge motifs as consisting of weak links only and brick motifs as consisting of strong links only, then propose a method for performing two tasks simultaneously, which are as follows: 1) detecting global statistical features and local connection structures in biological networks and 2) locating functionally and statistically significant network motifs. To further understand the role of biological networks in system contexts, we examine functional and topological differences between bridge and brick motifs for predicting biological network behaviors and functions. After observing brick motif similarities between E. coli and S. cerevisiae, we note that bridge motifs differentiate C. elegans from Drosophila and sea urchin in three types of networks. Similarities (differences) in bridge and brick motifs imply similar (different) key circuit elements in the three organisms. We suggest that motif-content analyses can provide researchers with global and local data for real biological networks and assist in the search for either isolated or functionally and topologically overlapping motifs when investigating and comparing biological system functions and behaviors.
Statistical detection of EEG synchrony using empirical bayesian inference.
Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven
2015-01-01
There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries.
A Computer Program for Detection of Statistical Outliers
ERIC Educational Resources Information Center
Pascale, Pietro J.; Lovas, Charles M.
1976-01-01
Presents a Fortran program which computes the rejection criteria of ten procedures for detecting outlying observations. These criteria are defined on comment cards. Journal sources for the statistical equations are listed. After applying rejection rules, the program calculates the mean and standard deviation of the censored sample. (Author/RC)
Statistical Studies on Sequential Probability Ratio Test for Radiation Detection
Warnick Kernan, Ding Yuan, et al.
2007-07-01
A Sequential Probability Ratio Test (SPRT) algorithm helps to increase the reliability and speed of radiation detection. This algorithm is further improved to reduce spatial gap and false alarm. SPRT, using Last-in-First-Elected-Last-Out (LIFELO) technique, reduces the error between the radiation measured and resultant alarm. Statistical analysis determines the reduction of spatial error and false alarm.
A computationally efficient order statistics based outlier detection technique for EEG signals.
Giri, Bapun K; Sarkar, Soumajyoti; Mazumder, Satyaki; Das, Koel
2015-01-01
Detecting artifacts in EEG data produced by muscle activity, eye blinks and electrical noise is a common and important problem in EEG applications. We present a novel outlier detection method based on order statistics. We propose a 2 step procedure comprising of detecting noisy EEG channels followed by detection of noisy epochs in the outlier channels. The performance of our method is tested systematically using simulated and real EEG data. Our technique produces significant improvement in detecting EEG artifacts over state-of-the-art outlier detection technique used in EEG applications. The proposed method can serve as a general outlier detection tool for different types of noisy signals.
Statistical methods for detecting periodic fragments in DNA sequence data
2011-01-01
Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the
A higher-order-statistics-based approach to face detection
NASA Astrophysics Data System (ADS)
Li, Chunming; Li, Yushan; Wu, Ruihong; Li, Qiuming; Zhuang, Qingde; Zhang, Zhan
2005-02-01
A face detection method based on higher order statistics is proposed in this paper. Firstly, the object model and noise model are established to extract moving object from the background according to the fact that higher order statistics is nonsense to Gaussian noise. Secondly, the improved Sobel operator is used to extract the edge image of moving object. And a projection function is used to detect the face in the edge image. Lastly, PCA(Principle Component Analysis) method is used to do face recognition. The performance of the system is evaluated on the real video sequences. It is shown that the proposed method is simple and robust to the detection of human faces in the video sequences.
Detection of Doppler Microembolic Signals Using High Order Statistics
Geryes, Maroun; Hassan, Walid; Mcheick, Ali
2016-01-01
Robust detection of the smallest circulating cerebral microemboli is an efficient way of preventing strokes, which is second cause of mortality worldwide. Transcranial Doppler ultrasound is widely considered the most convenient system for the detection of microemboli. The most common standard detection is achieved through the Doppler energy signal and depends on an empirically set constant threshold. On the other hand, in the past few years, higher order statistics have been an extensive field of research as they represent descriptive statistics that can be used to detect signal outliers. In this study, we propose new types of microembolic detectors based on the windowed calculation of the third moment skewness and fourth moment kurtosis of the energy signal. During energy embolus-free periods the distribution of the energy is not altered and the skewness and kurtosis signals do not exhibit any peak values. In the presence of emboli, the energy distribution is distorted and the skewness and kurtosis signals exhibit peaks, corresponding to the latter emboli. Applied on real signals, the detection of microemboli through the skewness and kurtosis signals outperformed the detection through standard methods. The sensitivities and specificities reached 78% and 91% and 80% and 90% for the skewness and kurtosis detectors, respectively. PMID:28096889
Wang, Bo; Shi, Zhanquan; Weber, Georg F.
2015-01-01
Nuclear magnetic resonance (NMR) spectroscopy-based metabonomics is of growing importance for discovery of human disease biomarkers. Identification and validation of disease biomarkers using statistical significance analysis (SSA) is critical for translation to clinical practice. SSA is performed by assessing a null hypothesis test using a derivative of the Student’s t test, e.g., a Welch’s t test. Choosing how to correct the significance level for rejecting null hypotheses in the case of multiple testing to maintain a constant family-wise type I error rate is a common problem in such tests. The multiple testing problem arises because the likelihood of falsely rejecting the null hypothesis, i.e., a false positive, grows as the number of tests applied to the same data set increases. Several methods have been introduced to address this problem. Bonferroni correction (BC) assumes all variables are independent and therefore sacrifices sensitivity for detecting true positives in partially dependent data sets. False discovery rate (FDR) methods are more sensitive than BC but uniformly ascribe highest stringency to lowest p value variables. Here, we introduce standard deviation step down (SDSD), which is more sensitive and appropriate than BC for partially dependent data sets. Sensitivity and type I error rate of SDSD can be adjusted based on the degree of variable dependency. SDSD generates fundamentally different profiles of critical p values compared with FDR methods potentially leading to reduced type II error rates. SDSD is increasingly sensitive for more concentrated metabolites. SDSD is demonstrated using NMR-based metabonomics data collected on three different breast cancer cell line extracts. PMID:24026514
Baldi, Pierre
2010-01-01
As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in large databases. The framework can be applied to different chemical representations and similarity measures but is demonstrated here using the most common binary fingerprints with the Tanimoto similarity measure. After introducing several probabilistic models of fingerprints, including the Conditional Gaussian Uniform model, we show that the distribution of Tanimoto scores can be approximated by the distribution of the ratio of two correlated Normal random variables associated with the corresponding unions and intersections. This remains true also when the distribution of similarity scores is conditioned on the size of the query molecules in order to derive more fine-grained results and improve chemical retrieval. The corresponding extreme value distributions for the maximum scores are approximated by Weibull distributions. From these various distributions and their analytical forms, Z-scores, E-values, and p-values are derived to assess the significance of similarity scores. In addition, the framework allows one to predict also the value of standard chemical retrieval metrics, such as Sensitivity and Specificity at fixed thresholds, or ROC (Receiver Operating Characteristic) curves at multiple thresholds, and to detect outliers in the form of atypical molecules. Numerous and diverse experiments carried in part with large sets of molecules from the ChemDB show remarkable agreement between theory and empirical results. PMID:20540577
Wang, Bo; Shi, Zhanquan; Weber, Georg F; Kennedy, Michael A
2013-10-01
Nuclear magnetic resonance (NMR) spectroscopy-based metabonomics is of growing importance for discovery of human disease biomarkers. Identification and validation of disease biomarkers using statistical significance analysis (SSA) is critical for translation to clinical practice. SSA is performed by assessing a null hypothesis test using a derivative of the Student's t test, e.g., a Welch's t test. Choosing how to correct the significance level for rejecting null hypotheses in the case of multiple testing to maintain a constant family-wise type I error rate is a common problem in such tests. The multiple testing problem arises because the likelihood of falsely rejecting the null hypothesis, i.e., a false positive, grows as the number of tests applied to the same data set increases. Several methods have been introduced to address this problem. Bonferroni correction (BC) assumes all variables are independent and therefore sacrifices sensitivity for detecting true positives in partially dependent data sets. False discovery rate (FDR) methods are more sensitive than BC but uniformly ascribe highest stringency to lowest p value variables. Here, we introduce standard deviation step down (SDSD), which is more sensitive and appropriate than BC for partially dependent data sets. Sensitivity and type I error rate of SDSD can be adjusted based on the degree of variable dependency. SDSD generates fundamentally different profiles of critical p values compared with FDR methods potentially leading to reduced type II error rates. SDSD is increasingly sensitive for more concentrated metabolites. SDSD is demonstrated using NMR-based metabonomics data collected on three different breast cancer cell line extracts.
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST. PMID:21607077
Coulson, Melissa; Healey, Michelle; Fidler, Fiona; Cumming, Geoff
2010-01-01
A statistically significant result, and a non-significant result may differ little, although significance status may tempt an interpretation of difference. Two studies are reported that compared interpretation of such results presented using null hypothesis significance testing (NHST), or confidence intervals (CIs). Authors of articles published in psychology, behavioral neuroscience, and medical journals were asked, via email, to interpret two fictitious studies that found similar results, one statistically significant, and the other non-significant. Responses from 330 authors varied greatly, but interpretation was generally poor, whether results were presented as CIs or using NHST. However, when interpreting CIs respondents who mentioned NHST were 60% likely to conclude, unjustifiably, the two results conflicted, whereas those who interpreted CIs without reference to NHST were 95% likely to conclude, justifiably, the two results were consistent. Findings were generally similar for all three disciplines. An email survey of academic psychologists confirmed that CIs elicit better interpretations if NHST is not invoked. Improved statistical inference can result from encouragement of meta-analytic thinking and use of CIs but, for full benefit, such highly desirable statistical reform requires also that researchers interpret CIs without recourse to NHST.
A Generative Statistical Algorithm for Automatic Detection of Complex Postures
Amit, Yali; Biron, David
2015-01-01
This paper presents a method for automated detection of complex (non-self-avoiding) postures of the nematode Caenorhabditis elegans and its application to analyses of locomotion defects. Our approach is based on progressively detailed statistical models that enable detection of the head and the body even in cases of severe coilers, where data from traditional trackers is limited. We restrict the input available to the algorithm to a single digitized frame, such that manual initialization is not required and the detection problem becomes embarrassingly parallel. Consequently, the proposed algorithm does not propagate detection errors and naturally integrates in a “big data” workflow used for large-scale analyses. Using this framework, we analyzed the dynamics of postures and locomotion of wild-type animals and mutants that exhibit severe coiling phenotypes. Our approach can readily be extended to additional automated tracking tasks such as tracking pairs of animals (e.g., for mating assays) or different species. PMID:26439258
A statistical modeling approach for detecting generalized synchronization
Schumacher, Johannes; Haslinger, Robert; Pipa, Gordon
2012-01-01
Detecting nonlinear correlations between time series presents a hard problem for data analysis. We present a generative statistical modeling method for detecting nonlinear generalized synchronization. Truncated Volterra series are used to approximate functional interactions. The Volterra kernels are modeled as linear combinations of basis splines, whose coefficients are estimated via l1 and l2 regularized maximum likelihood regression. The regularization manages the high number of kernel coefficients and allows feature selection strategies yielding sparse models. The method's performance is evaluated on different coupled chaotic systems in various synchronization regimes and analytical results for detecting m:n phase synchrony are presented. Experimental applicability is demonstrated by detecting nonlinear interactions between neuronal local field potentials recorded in different parts of macaque visual cortex. PMID:23004851
Detection of reflecting surfaces by a statistical model
NASA Astrophysics Data System (ADS)
He, Qiang; Chu, Chee-Hung H.
2009-02-01
Remote sensing is widely used assess the destruction from natural disasters and to plan relief and recovery operations. How to automatically extract useful features and segment interesting objects from digital images, including remote sensing imagery, becomes a critical task for image understanding. Unfortunately, current research on automated feature extraction is ignorant of contextual information. As a result, the fidelity of populating attributes corresponding to interesting features and objects cannot be satisfied. In this paper, we present an exploration on meaningful object extraction integrating reflecting surfaces. Detection of specular reflecting surfaces can be useful in target identification and then can be applied to environmental monitoring, disaster prediction and analysis, military, and counter-terrorism. Our method is based on a statistical model to capture the statistical properties of specular reflecting surfaces. And then the reflecting surfaces are detected through cluster analysis.
Statistically normalized coherent change detection for synthetic aperture sonar imagery
NASA Astrophysics Data System (ADS)
G-Michael, Tesfaye; Tucker, J. D.; Roberts, Rodney G.
2016-05-01
Coherent Change Detection (CCD) is a process of highlighting an area of activity in scenes (seafloor) under survey and generated from pairs of synthetic aperture sonar (SAS) images of approximately the same location observed at two different time instances. The problem of CCD and subsequent anomaly feature extraction/detection is complicated due to several factors such as the presence of random speckle pattern in the images, changing environmental conditions, and platform instabilities. These complications make the detection of weak target activities even more difficult. Typically, the degree of similarity between two images measured at each pixel locations is the coherence between the complex pixel values in the two images. Higher coherence indicates little change in the scene represented by the pixel and lower coherence indicates change activity in the scene. Such coherence estimation scheme based on the pixel intensity correlation is an ad-hoc procedure where the effectiveness of the change detection is determined by the choice of threshold which can lead to high false alarm rates. In this paper, we propose a novel approach for anomalous change pattern detection using the statistical normalized coherence and multi-pass coherent processing. This method may be used to mitigate shadows by reducing the false alarms resulting in the coherent map due to speckles and shadows. Test results of the proposed methods on a data set of SAS images will be presented, illustrating the effectiveness of the normalized coherence in terms statistics from multi-pass survey of the same scene.
Statistical Mechanics of the Community Detection Problem: Theory and Application
NASA Astrophysics Data System (ADS)
Hu, Dandan
We study phase transitions in spin glass type systems and in related computational problems. In the current work, we focus on the "community detection" problem when cast in terms of a general Potts spin glass type problem. We report on phase transitions between solvable and unsolvable regimes. Solvable region may further split into easy and hard phases. Spin glass type phase transitions appear at both low and high temperatures. Low temperature transitions correspond to an order by disorder type effect wherein fluctuations render the system ordered or solvable. Separate transitions appear at higher temperatures into a disordered (or an unsolvable) phases. Different sorts of randomness lead to disparate behaviors. We illustrate the spin glass character of both transitions and report on memory effects. We further relate Potts type spin systems to mechanical analogs and suggest how chaotic-type behavior in general thermodynamic systems can indeed naturally arise in hard-computational problems and spin-glasses. In this work, we also examine large networks (with a power law distribution in cluster size) that have a large number of communities. We infer that large systems at a constant ratio of q to the number of nodes N asymptotically tend toward insolvability in the limit of large N for any positive temperature. We further employ multivariate Tutte polynomials to show that increasing q emulates increasing T for a general Potts model, leading to a similar stability region at low T. We further apply the replica inference based Potts model method to unsupervised image segmentation on multiple scales. This approach was inspired by the statistical mechanics problem of "community detection" and its phase diagram. The problem is cast as identifying tightly bound clusters against a background. Within our multiresolution approach, we compute information theory based correlations among multiple solutions of the same graph over a range of resolutions. Significant multiresolution
Statistical feature selection for enhanced detection of brain tumor
NASA Astrophysics Data System (ADS)
Chaddad, Ahmad; Colen, Rivka R.
2014-09-01
Feature-based methods are widely used in the brain tumor recognition system. Robust of early cancer detection is one of the most powerful image processing tools. Specifically, statistical features, such as geometric mean, harmonic mean, mean excluding outliers, median, percentiles, skewness and kurtosis, have been extracted from brain tumor glioma to aid in discriminating two levels namely, Level I and Level II using fluid attenuated inversion recovery (FLAIR) sequence in the diagnosis of brain tumor. Statistical feature describes the major characteristics of each level from glioma which is an important step to evaluate heterogeneity of cancer area pixels. In this paper, we address the task of feature selection to identify the relevant subset of features in the statistical domain, while discarding those that are either redundant or confusing, thereby improving the performance of feature-based scheme to distinguish between Level I and Level II. We apply a Decision Structure algorithm to find the optimal combination of nonhomogeneity based statistical features for the problem at hand. We employ a Naïve Bayes classifier to evaluate the performance of the optimal statistical feature based scheme in terms of its glioma Level I and Level II discrimination capability and use real-data collected from 17 patients have a glioblastoma multiforme (GBM). Dataset provided from 3 Tesla MR imaging system by MD Anderson Cancer Center. For the specific data analyzed, it is shown that the identified dominant features yield higher classification accuracy, with lower number of false alarms and missed detections, compared to the full statistical based feature set. This work has been proposed and analyzed specific GBM types which Level I and Level II and the dominant features were considered as feature aid to prognostic indicators. These features were selected automatically to be better able to determine prognosis from classical imaging studies.
[Tests of statistical significance in three biomedical journals: a critical review].
Sarria Castro, Madelaine; Silva Ayçaguer, Luis Carlos
2004-05-01
To describe the use of conventional tests of statistical significance and the current trends shown by their use in three biomedical journals read in Spanish-speaking countries. All descriptive or explanatory original articles published in the five-year period of 1996 through 2000 were reviewed in three journals: Revista Cubana de Medicina General Integral [Cuban Journal of Comprehensive General Medicine], Revista Panamericana de Salud Pública/Pan American Journal of Public Health, and Medicina Clínica [Clinical Medicine] (which is published in Spain). In the three journals that were reviewed various shortcomings were found in their use of hypothesis tests based on P values and in the limited use of new tools that have been suggested for use in their place: confidence intervals (CIs) and Bayesian inference. The basic findings of our research were: minimal use of CIs, as either a complement to significance tests or as the only statistical tool; mentions of a small sample size as a possible explanation for the lack of statistical significance; a predominant use of rigid alpha values; a lack of uniformity in the presentation of results; and improper reference in the research conclusions to the results of hypothesis tests. Our results indicate the lack of compliance by authors and editors with accepted standards for the use of tests of statistical significance. The findings also highlight that the stagnant use of these tests continues to be a common practice in the scientific literature.
ERIC Educational Resources Information Center
Snyder, Patricia; Lawson, Stephen
Magnitude of effect measures (MEMs), when adequately understood and correctly used, are important aids for researchers who do not want to rely solely on tests of statistical significance in substantive result interpretation. The MEM tells how much of the dependent variable can be controlled, predicted, or explained by the independent variables.…
Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.
ERIC Educational Resources Information Center
Kieffer, Kevin M.; Thompson, Bruce
As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate…
Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.
ERIC Educational Resources Information Center
Deegear, James
This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…
Statistical Significance of the Trends in Monthly Heavy Precipitation Over the US
Mahajan, Salil; North, Dr. Gerald R.; Saravanan, Dr. R.; Genton, Dr. Marc G.
2012-01-01
Trends in monthly heavy precipitation, defined by a return period of one year, are assessed for statistical significance in observations and Global Climate Model (GCM) simulations over the contiguous United States using Monte Carlo non-parametric and parametric bootstrapping techniques. The results from the two Monte Carlo approaches are found to be similar to each other, and also to the traditional non-parametric Kendall's {tau} test, implying the robustness of the approach. Two different observational data-sets are employed to test for trends in monthly heavy precipitation and are found to exhibit consistent results. Both data-sets demonstrate upward trends, one of which is found to be statistically significant at the 95% confidence level. Upward trends similar to observations are observed in some climate model simulations of the twentieth century, but their statistical significance is marginal. For projections of the twenty-first century, a statistically significant upwards trend is observed in most of the climate models analyzed. The change in the simulated precipitation variance appears to be more important in the twenty-first century projections than changes in the mean precipitation. Stochastic fluctuations of the climate-system are found to be dominate monthly heavy precipitation as some GCM simulations show a downwards trend even in the twenty-first century projections when the greenhouse gas forcings are strong.
ERIC Educational Resources Information Center
Thompson, Bruce
This paper evaluates the logic underlying various criticisms of statistical significance testing and makes specific recommendations for scientific and editorial practice that might better increase the knowledge base. Reliance on the traditional hypothesis testing model has led to a major bias against nonsignificant results and to misinterpretation…
Alphas and Asterisks: The Development of Statistical Significance Testing Standards in Sociology
ERIC Educational Resources Information Center
Leahey, Erin
2005-01-01
In this paper, I trace the development of statistical significance testing standards in sociology by analyzing data from articles published in two prestigious sociology journals between 1935 and 2000. I focus on the role of two key elements in the diffusion literature, contagion and rationality, as well as the role of institutional factors. I…
Statistical Significance Testing in "Educational and Psychological Measurement" and Other Journals.
ERIC Educational Resources Information Center
Daniel, Larry G.
Statistical significance tests (SSTs) have been the object of much controversy among social scientists. Proponents have hailed SSTs as an objective means for minimizing the likelihood that chance factors have contributed to research results. Critics have both questioned the logic underlying SSTs and bemoaned the widespread misapplication and…
ERIC Educational Resources Information Center
Snyder, Patricia; Lawson, Stephen
Magnitude of effect measures (MEMs), when adequately understood and correctly used, are important aids for researchers who do not want to rely solely on tests of statistical significance in substantive result interpretation. The MEM tells how much of the dependent variable can be controlled, predicted, or explained by the independent variables.…
ERIC Educational Resources Information Center
Linting, Marielle; van Os, Bart Jan; Meulman, Jacqueline J.
2011-01-01
In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix…
Weighing the costs of different errors when determining statistical significant during monitoring
USDA-ARS?s Scientific Manuscript database
Selecting appropriate significance levels when constructing confidence intervals and performing statistical analyses with rangeland monitoring data is not a straightforward process. This process is burdened by the conventional selection of “95% confidence” (i.e., Type I error rate, a =0.05) as the d...
ERIC Educational Resources Information Center
Spinella, Sarah
2011-01-01
As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…
Hulshizer, Randall; Blalock, Eric M
2007-01-01
Background Researchers using RNA expression microarrays in experimental designs with more than two treatment groups often identify statistically significant genes with ANOVA approaches. However, the ANOVA test does not discriminate which of the multiple treatment groups differ from one another. Thus, post hoc tests, such as linear contrasts, template correlations, and pairwise comparisons are used. Linear contrasts and template correlations work extremely well, especially when the researcher has a priori information pointing to a particular pattern/template among the different treatment groups. Further, all pairwise comparisons can be used to identify particular, treatment group-dependent patterns of gene expression. However, these approaches are biased by the researcher's assumptions, and some treatment-based patterns may fail to be detected using these approaches. Finally, different patterns may have different probabilities of occurring by chance, importantly influencing researchers' conclusions about a pattern and its constituent genes. Results We developed a four step, post hoc pattern matching (PPM) algorithm to automate single channel gene expression pattern identification/significance. First, 1-Way Analysis of Variance (ANOVA), coupled with post hoc 'all pairwise' comparisons are calculated for all genes. Second, for each ANOVA-significant gene, all pairwise contrast results are encoded to create unique pattern ID numbers. The # genes found in each pattern in the data is identified as that pattern's 'actual' frequency. Third, using Monte Carlo simulations, those patterns' frequencies are estimated in random data ('random' gene pattern frequency). Fourth, a Z-score for overrepresentation of the pattern is calculated ('actual' against 'random' gene pattern frequencies). We wrote a Visual Basic program (StatiGen) that automates PPM procedure, constructs an Excel workbook with standardized graphs of overrepresented patterns, and lists of the genes comprising
A statistical analysis of the detection limits of fast photometry
NASA Astrophysics Data System (ADS)
Mary, D. L.
2006-06-01
This work investigates the statistical limits for the detection of stellar variability using ground based fast photometry. We show that when sky transparency variations are very low or have been efficiently removed from the raw light curve, the overall noise is of a Mixed Poisson (MP) nature (photon noise mixed by scintillation). As a consequence, three regimes appear for the detection of photometric variations depending on the star's brightness (scintillation, scintillation and photon noise, photon noise and sky background). The proposed analysis is mainly applied to the Indian sites of Manora Peak (existing 104 cm telescope) and Devasthal (future 1 m automated telescope, and 3 m telescope project). As shown by some examples, it can be applied to any site with the corresponding parameters. For 1 m class telescopes at an altitude of about 2000 m, the frontier magnitudes between the different detection regimes are about 10 mag and 15 mag. By analysing the corresponding statistics of the MP noise periodogram, the minimum amplitude variation that one can detect with a given confidence level is evaluated for each observational setting. For example, with a 3 m telescope at about 2500 m, ≈120 μmag variations would be detected in 2 h with a 99% confidence level for stars brighter than magnitude 12. For a star of 15th magnitude, ≈400 μmag oscillations would still be detected at that level. These detection limits are discussed in the light of observations obtained in Manora peak, and compared to results obtained at different astronomical sites.
2010-01-01
Background The null hypothesis significance test (NHST) is the most frequently used statistical method, although its inferential validity has been widely criticized since its introduction. In 1988, the International Committee of Medical Journal Editors (ICMJE) warned against sole reliance on NHST to substantiate study conclusions and suggested supplementary use of confidence intervals (CI). Our objective was to evaluate the extent and quality in the use of NHST and CI, both in English and Spanish language biomedical publications between 1995 and 2006, taking into account the International Committee of Medical Journal Editors recommendations, with particular focus on the accuracy of the interpretation of statistical significance and the validity of conclusions. Methods Original articles published in three English and three Spanish biomedical journals in three fields (General Medicine, Clinical Specialties and Epidemiology - Public Health) were considered for this study. Papers published in 1995-1996, 2000-2001, and 2005-2006 were selected through a systematic sampling method. After excluding the purely descriptive and theoretical articles, analytic studies were evaluated for their use of NHST with P-values and/or CI for interpretation of statistical "significance" and "relevance" in study conclusions. Results Among 1,043 original papers, 874 were selected for detailed review. The exclusive use of P-values was less frequent in English language publications as well as in Public Health journals; overall such use decreased from 41% in 1995-1996 to 21% in 2005-2006. While the use of CI increased over time, the "significance fallacy" (to equate statistical and substantive significance) appeared very often, mainly in journals devoted to clinical specialties (81%). In papers originally written in English and Spanish, 15% and 10%, respectively, mentioned statistical significance in their conclusions. Conclusions Overall, results of our review show some improvements in
Statistical significance of variables driving systematic variation in high-dimensional data
Chung, Neo Christopher; Storey, John D.
2015-01-01
Motivation: There are a number of well-established methods such as principal component analysis (PCA) for automatically capturing systematic variation due to latent variables in large-scale genomic data. PCA and related methods may directly provide a quantitative characterization of a complex biological variable that is otherwise difficult to precisely define or model. An unsolved problem in this context is how to systematically identify the genomic variables that are drivers of systematic variation captured by PCA. Principal components (PCs) (and other estimates of systematic variation) are directly constructed from the genomic variables themselves, making measures of statistical significance artificially inflated when using conventional methods due to over-fitting. Results: We introduce a new approach called the jackstraw that allows one to accurately identify genomic variables that are statistically significantly associated with any subset or linear combination of PCs. The proposed method can greatly simplify complex significance testing problems encountered in genomics and can be used to identify the genomic variables significantly associated with latent variables. Using simulation, we demonstrate that our method attains accurate measures of statistical significance over a range of relevant scenarios. We consider yeast cell-cycle gene expression data, and show that the proposed method can be used to straightforwardly identify genes that are cell-cycle regulated with an accurate measure of statistical significance. We also analyze gene expression data from post-trauma patients, allowing the gene expression data to provide a molecularly driven phenotype. Using our method, we find a greater enrichment for inflammatory-related gene sets compared to the original analysis that uses a clinically defined, although likely imprecise, phenotype. The proposed method provides a useful bridge between large-scale quantifications of systematic variation and gene
Discrete Fourier Transform: statistical effect size and significance of Fourier components.
NASA Astrophysics Data System (ADS)
Crockett, Robin
2016-04-01
A key analytical technique in the context of investigating cyclic/periodic features in time-series (and other sequential data) is the Discrete (Fast) Fourier Transform (DFT/FFT). However, assessment of the statistical effect-size and significance of the Fourier components in the DFT/FFT spectrum can be subjective and variable. This presentation will outline an approach and method for the statistical evaluation of the effect-size and significance of individual Fourier components from their DFT/FFT coefficients. The effect size is determined in terms of the proportions of the variance in the time-series that individual components account for. The statistical significance is determined using an hypothesis-test / p-value approach with respect to a null hypothesis that the time-series has no linear dependence on a given frequency (of a Fourier component). This approach also allows spectrograms to be presented in terms of these statistical parameters. The presentation will use sunspot cycles as an illustrative example.
Seismicity driven by transient aseismic processes: Detection and statistical modeling
NASA Astrophysics Data System (ADS)
Hainzl, S.; Marsan, D.
2012-04-01
It is widely accepted that the Coulomb failure stress variations are underlying earthquake activity. Usually two components of stress variations are considered, the slow and stationary stress build-up due to tectonic forcing and static stress changes related to earthquake occurrences. In this case, the epidemic-type aftershock sequence (ETAS) model has been shown to describe successfully the spatiotemporal evolution of the statistical properties of seismicity. However, in many cases, seismicity might be locally dominated by stress changes related to transient aseismic processes such as magma intrusion, fluid flow or slow slip events which are not directly observable in general. Therefore, it is important to account for those potential transients, firstly to avoid erroneous model fitting leading to biased forecasts and secondly to retrieve important information about the underlying transient processes. In this work, we apply a recently developed methodology to identify the time-dependent background-term which is based on iteratively applying a ETAS-based declustering where the size of the internally applied smoothing filter is set by the Akaike information criterion. This procedure is shown to work well for synthetic data sets. We find that the estimated model parameters are biased if the time-dependence is not taken into account. In particular, the alpha-value describing the magnitude-dependence of the trigger potential can be strongly underestimated if transients are ignored. Low alpha-values have been previously found to indicate swarm activity which is often related to transient processes. Thus observed anomalous alpha-values might refer to transient forcing rather than to differences in the earthquake-earthquake trigger mechanism. To explore this, we apply the procedure systematically to earthquake clusters detected in Southern California and to earthquake swarm data in Vogtland/Western Bohemia. We identify clusters with significant transient forcing and show
Simulated performance of an order statistic threshold strategy for detection of narrowband signals
NASA Technical Reports Server (NTRS)
Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.
1988-01-01
The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.
The Effects of Electrode Impedance on Data Quality and Statistical Significance in ERP Recordings
Kappenman, Emily S.; Luck, Steven J.
2010-01-01
To determine whether data quality is meaningfully reduced by high electrode impedance, EEG was recorded simultaneously from low- and high-impedance electrode sites during an oddball task. Low-frequency noise was found to be increased at high-impedance sites relative to low-impedance sites, especially when the recording environment was warm and humid. The increased noise at the high-impedance sites caused an increase in the number of trials needed to obtain statistical significance in analyses of P3 amplitude, but this could be partially mitigated by high-pass filtering and artifact rejection. High electrode impedance did not reduce statistical power for the N1 wave unless the recording environment was warm and humid. Thus, high electrode impedance may increase noise and decrease statistical power under some conditions, but these effects can be reduced by using a cool and dry recording environment and appropriate signal processing methods. PMID:20374541
Jakobsen, Janus Christian; Wetterslev, Jørn; Winkel, Per; Lange, Theis; Gluud, Christian
2014-11-21
Thresholds for statistical significance when assessing meta-analysis results are being insufficiently demonstrated by traditional 95% confidence intervals and P-values. Assessment of intervention effects in systematic reviews with meta-analysis deserves greater rigour. Methodologies for assessing statistical and clinical significance of intervention effects in systematic reviews were considered. Balancing simplicity and comprehensiveness, an operational procedure was developed, based mainly on The Cochrane Collaboration methodology and the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) guidelines. We propose an eight-step procedure for better validation of meta-analytic results in systematic reviews (1) Obtain the 95% confidence intervals and the P-values from both fixed-effect and random-effects meta-analyses and report the most conservative results as the main results. (2) Explore the reasons behind substantial statistical heterogeneity using subgroup and sensitivity analyses (see step 6). (3) To take account of problems with multiplicity adjust the thresholds for significance according to the number of primary outcomes. (4) Calculate required information sizes (≈ the a priori required number of participants for a meta-analysis to be conclusive) for all outcomes and analyse each outcome with trial sequential analysis. Report whether the trial sequential monitoring boundaries for benefit, harm, or futility are crossed. (5) Calculate Bayes factors for all primary outcomes. (6) Use subgroup analyses and sensitivity analyses to assess the potential impact of bias on the review results. (7) Assess the risk of publication bias. (8) Assess the clinical significance of the statistically significant review results. If followed, the proposed eight-step procedure will increase the validity of assessments of intervention effects in systematic reviews of randomised clinical trials.
Dechartres, Agnes; Bond, Elizabeth G; Scheer, Jordan; Riveros, Carolina; Atal, Ignacio; Ravaud, Philippe
2016-11-30
Publication bias and other reporting bias have been well documented for journal articles, but no study has evaluated the nature of results posted at ClinicalTrials.gov. We aimed to assess how many randomized controlled trials (RCTs) with results posted at ClinicalTrials.gov report statistically significant results and whether the proportion of trials with significant results differs when no treatment effect estimate or p-value is posted. We searched ClinicalTrials.gov in June 2015 for all studies with results posted. We included completed RCTs with a superiority hypothesis and considered results for the first primary outcome with results posted. For each trial, we assessed whether a treatment effect estimate and/or p-value was reported at ClinicalTrials.gov and if yes, whether results were statistically significant. If no treatment effect estimate or p-value was reported, we calculated the treatment effect and corresponding p-value using results per arm posted at ClinicalTrials.gov when sufficient data were reported. From the 17,536 studies with results posted at ClinicalTrials.gov, we identified 2823 completed phase 3 or 4 randomized trials with a superiority hypothesis. Of these, 1400 (50%) reported a treatment effect estimate and/or p-value. Results were statistically significant for 844 trials (60%), with a median p-value of 0.01 (Q1-Q3: 0.001-0.26). For the 1423 trials with no treatment effect estimate or p-value posted, we could calculate the treatment effect and corresponding p-value using results reported per arm for 929 (65%). For 494 trials (35%), p-values could not be calculated mainly because of insufficient reporting, censored data, or repeated measurements over time. For the 929 trials we could calculate p-values, we found statistically significant results for 342 (37%), with a median p-value of 0.19 (Q1-Q3: 0.005-0.59). Half of the trials with results posted at ClinicalTrials.gov reported a treatment effect estimate and/or p-value, with significant
Statistical detection of nanoparticles in cells by darkfield microscopy.
Gnerucci, Alessio; Romano, Giovanni; Ratto, Fulvio; Centi, Sonia; Baccini, Michela; Santosuosso, Ugo; Pini, Roberto; Fusi, Franco
2016-07-01
In the fields of nanomedicine, biophotonics and radiation therapy, nanoparticle (NP) detection in cell models often represents a fundamental step for many in vivo studies. One common question is whether NPs have or have not interacted with cells. In this context, we propose an imaging based technique to detect the presence of NPs in eukaryotic cells. Darkfield images of cell cultures at low magnification (10×) are acquired in different spectral ranges and recombined so as to enhance the contrast due to the presence of NPs. Image analysis is applied to extract cell-based parameters (i.e. mean intensity), which are further analyzed by statistical tests (Student's t-test, permutation test) in order to obtain a robust detection method. By means of a statistical sample size analysis, the sensitivity of the whole methodology is quantified in terms of the minimum cell number that is needed to identify the presence of NPs. The method is presented in the case of HeLa cells incubated with gold nanorods labeled with anti-CA125 antibodies, which exploits the overexpression of CA125 in ovarian cancers. Control cases are considered as well, including PEG-coated NPs and HeLa cells without NPs. Copyright © 2016. Published by Elsevier Ltd.
Robust Statistical Detection of Power-Law Cross-Correlation.
Blythe, Duncan A J; Nikulin, Vadim V; Müller, Klaus-Robert
2016-06-02
We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.
Robust Statistical Detection of Power-Law Cross-Correlation
NASA Astrophysics Data System (ADS)
Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert
2016-06-01
We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.
Nieminen, Pentti; Abass, Khaled; Vähäkanga, Kirsi; Rautio, Arja
2015-09-01
The number of analyzed outcome variables is important in the statistical analysis and interpretation of research findings. This study investigated published papers in the field of environmental health studies. We aimed to examine whether differences in the number of reported outcome variables exist between papers with non-significant findings compared to those with significant findings. Articles on the maternal exposure to mercury and child development were used as examples. Articles published between 1995 and 2013 focusing on the relationships between maternal exposure to mercury and child development were collected from Medline and Scopus. Of 87 extracted papers, 73 used statistical significance testing and 38 (43.7%) of these reported 'non-significant' (P>0.05) findings. The median number of child development outcome variables in papers reporting 'significant' (n=35) and 'non-significant' (n=38) results was 4 versus 7, respectively (Mann-Whitney test P-value=0.014). An elevated number of outcome variables was especially found in papers reporting non-significant associations between maternal mercury and outcomes when mercury was the only analyzed exposure variable. Authors often report analyzed health outcome variables based on their P-values rather than on stated primary research questions. Such a practice probably skews the research evidence. Copyright © 2015 The Editorial Board of Biomedical and Environmental Sciences. Published by China CDC. All rights reserved.
A new statistical approach to climate change detection and attribution
NASA Astrophysics Data System (ADS)
Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe
2017-01-01
We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the " models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).
A Wavelet-Statistical Features Approach for Nonconvulsive Seizure Detection.
Sharma, Priyanka; Khan, Yusuf Uzzaman; Farooq, Omar; Tripathi, Manjari; Adeli, Hojjat
2014-10-01
The detection of nonconvulsive seizures (NCSz) is a challenge because of the lack of physical symptoms, which may delay the diagnosis of the disease. Many researchers have reported automatic detection of seizures. However, few investigators have concentrated on detection of NCSz. This article proposes a method for reliable detection of NCSz. The electroencephalography (EEG) signal is usually contaminated by various nonstationary noises. Signal denoising is an important preprocessing step in the analysis of such signals. In this study, a new wavelet-based denoising approach using cubical thresholding has been proposed to reduce noise from the EEG signal prior to analysis. Three statistical features were extracted from wavelet frequency bands, encompassing the frequency range of 0 to 8, 8 to 16, 16 to 32, and 0 to 32 Hz. Extracted features were used to train linear classifier to discriminate between normal and seizure EEGs. The performance of the method was tested on a database of nine patients with 24 seizures in 80 hours of EEG recording. All the seizures were successfully detected, and false positive rate was found to be 0.7 per hour. © EEG and Clinical Neuroscience Society (ECNS) 2014.
Evidence for t{bar t} production at the Tevatron: Statistical significance and cross section
Koningsberg, J.; CDF Collaboration
1994-09-01
We summarize here the results of the ``counting experiments`` by the CDF Collaboration in the search of t{bar t} production in p{bar p} collisions at {radical}s = 1800 TeV at the Tevatron. We analyze their statistical significance by calculating the probability that the observed excess is a fluctuation of the expected backgrounds, and assuming the excess is from top events, extract a measurement of the t{bar t} production cross-section.
Testing statistical significance scores of sequence comparison methods with structure similarity
Hulsen, Tim; de Vlieg, Jacob; Leunissen, Jack AM; Groenen, Peter MA
2006-01-01
Background In the past years the Smith-Waterman sequence comparison algorithm has gained popularity due to improved implementations and rapidly increasing computing power. However, the quality and sensitivity of a database search is not only determined by the algorithm but also by the statistical significance testing for an alignment. The e-value is the most commonly used statistical validation method for sequence database searching. The CluSTr database and the Protein World database have been created using an alternative statistical significance test: a Z-score based on Monte-Carlo statistics. Several papers have described the superiority of the Z-score as compared to the e-value, using simulated data. We were interested if this could be validated when applied to existing, evolutionary related protein sequences. Results All experiments are performed on the ASTRAL SCOP database. The Smith-Waterman sequence comparison algorithm with both e-value and Z-score statistics is evaluated, using ROC, CVE and AP measures. The BLAST and FASTA algorithms are used as reference. We find that two out of three Smith-Waterman implementations with e-value are better at predicting structural similarities between proteins than the Smith-Waterman implementation with Z-score. SSEARCH especially has very high scores. Conclusion The compute intensive Z-score does not have a clear advantage over the e-value. The Smith-Waterman implementations give generally better results than their heuristic counterparts. We recommend using the SSEARCH algorithm combined with e-values for pairwise sequence comparisons. PMID:17038163
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Suffredini, Anthony F; Sacks, David B; Yu, Yi-Kuo
2016-02-01
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
NASA Astrophysics Data System (ADS)
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo
2016-02-01
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Statistical method for detecting structural change in the growth process.
Ninomiya, Yoshiyuki; Yoshimoto, Atsushi
2008-03-01
Due to competition among individual trees and other exogenous factors that change the growth environment, each tree grows following its own growth trend with some structural changes in growth over time. In the present article, a new method is proposed to detect a structural change in the growth process. We formulate the method as a simple statistical test for signal detection without constructing any specific model for the structural change. To evaluate the p-value of the test, the tube method is developed because the regular distribution theory is insufficient. Using two sets of tree diameter growth data sampled from planted forest stands of Cryptomeria japonica in Japan, we conduct an analysis of identifying the effect of thinning on the growth process as a structural change. Our results demonstrate that the proposed method is useful to identify the structural change caused by thinning. We also provide the properties of the method in terms of the size and power of the test.
Mass spectrometry-based protein identification with accurate statistical significance assignment.
Alves, Gelio; Yu, Yi-Kuo
2015-03-01
Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Rudd, James; Moore, Jason H; Urbanowicz, Ryan J
2013-11-01
Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear.
Detection of Significant Bacteriuria by Automated Urinalysis Using Flow Cytometry
Okada, Hiroshi; Sakai, Yutaka; Miyazaki, Shigenori; Arakawa, Soichi; Hamaguchi, Yukio; Kamidono, Sadao
2000-01-01
A new flow cytometry-based automated urine analyzer, the UF-50, was evaluated for its ability to screen urine samples for significant bacteriuria. One hundred eighty-six urine specimens from patients attending an outpatient clinic of a university-based hospital were examined. The results obtained with the UF-50 were compared with those obtained by conventional quantitative urine culture. The UF-50 detected significant bacteriuria with a sensitivity of 83.1%, a specificity of 76.4%, a positive predictive value of 62.0%, a negative predictive value of 90.7%, and an accuracy of 78.5%. These results are comparable to those obtained by previously reported screening procedures. Besides detecting significant bacteriuria, the UF-50 can also perform routine urinalysis, including measurement of concentrations of red blood cells, white blood cells, epithelial cells, and casts, within 70 s. This capability renders this new flow cytometry-based urine analyzer superior to previously reported rapid screening methods. PMID:10921941
Community detection based on significance optimization in complex networks
NASA Astrophysics Data System (ADS)
Xiang, Ju; Wang, Zhi-Zhong; Li, Hui-Jia; Zhang, Yan; Li, Fang; Dong, Li-Ping; Li, Jian-Ming; Guo, Li-Juan
2017-05-01
Community structure is an important topological property that extensively exists in various complex networks. In the past decade, much attention has been paid to the design of community-detection methods, while analyzing the behaviors of the methods is also of interest in theoretical research and real applications. Here, we focus on an important measure for community structure, i.e. significance (2013 Sci. Rep. 3 2930). Specifically, we study the effect of various network parameters on this measure, analyze the critical behaviors in partition transition, and then deduce the formula of the critical points and the phase diagrams theoretically. The results show that the critical number of communities in partition transition increases dramatically with the difference between inter-community and intra-community link densities, and thus significance optimization displays higher resolution in community detection than many other methods, but it also may lead to the excessive splitting of communities. By employing the Louvain algorithm to optimize the significance, we confirm the theoretical results on artificial and real-world networks, and further perform a series of comparisons with some classical methods.
NASA Astrophysics Data System (ADS)
Eggert, Silke; Walter, Thomas R.
2009-06-01
The study of volcanic triggering and interaction with the tectonic surroundings has received special attention in recent years, using both direct field observations and historical descriptions of eruptions and earthquake activity. Repeated reports of clustered eruptions and earthquakes may imply that interaction is important in some subregions. However, the subregions likely to suffer such clusters have not been systematically identified, and the processes responsible for the observed interaction remain unclear. We first review previous works about the clustered occurrence of eruptions and earthquakes, and describe selected events. We further elaborate available databases and confirm a statistically significant relationship between volcanic eruptions and earthquakes on the global scale. Moreover, our study implies that closed volcanic systems in particular tend to be activated in association with a tectonic earthquake trigger. We then perform a statistical study at the subregional level, showing that certain subregions are especially predisposed to concurrent eruption-earthquake sequences, whereas such clustering is statistically less significant in other subregions. Based on this study, we argue that individual and selected observations may bias the perceptible weight of coupling. The activity at volcanoes located in the predisposed subregions (e.g., Japan, Indonesia, Melanesia), however, often unexpectedly changes in association with either an imminent or a past earthquake.
On the statistical significance of surface air temperature trends in the Eurasian Arctic region
NASA Astrophysics Data System (ADS)
Franzke, C.
2012-12-01
This study investigates the statistical significance of the trends of station temperature time series from the European Climate Assessment & Data archive poleward of 60°N. The trends are identified by different methods and their significance is assessed by three different null models of climate noise. All stations show a warming trend but only 17 out of the 109 considered stations have trends which cannot be explained as arising from intrinsic climate fluctuations when tested against any of the three null models. Out of those 17, only one station exhibits a warming trend which is significant against all three null models. The stations with significant warming trends are located mainly in Scandinavia and Iceland.
Zou, Fei; Fine, Jason P.; Hu, Jianhua; Lin, D. Y.
2004-01-01
Assessing genome-wide statistical significance is an important and difficult problem in multipoint linkage analysis. Due to multiple tests on the same genome, the usual pointwise significance level based on the chi-square approximation is inappropriate. Permutation is widely used to determine genome-wide significance. Theoretical approximations are available for simple experimental crosses. In this article, we propose a resampling procedure to assess the significance of genome-wide QTL mapping for experimental crosses. The proposed method is computationally much less intensive than the permutation procedure (in the order of 102 or higher) and is applicable to complex breeding designs and sophisticated genetic models that cannot be handled by the permutation and theoretical methods. The usefulness of the proposed method is demonstrated through simulation studies and an application to a Drosophila backcross. PMID:15611194
Significance probability mapping: the final touch in t-statistic mapping.
Hassainia, F; Petit, D; Montplaisir, J
1994-01-01
Significance Probability Mapping (SPM), based on Student's t-statistic, is widely used for comparing mean brain topography maps of two groups. The map resulting from this process represents the distribution of t-values over the entire scalp. However, t-values by themselves cannot reveal whether or not group differences are significant. Significance levels associated with a few t-values are therefore commonly indicated on map legends to give the reader an idea of the significance levels of t-values. Nevertheless, a precise significance level topography cannot be achieved with these few significance values. We introduce a new kind of map which directly displays significance level topography in order to relieve the reader from converting multiple t-values to their corresponding significance probabilities, and to obtain a good quantification and a better localization of regions with significant differences between groups. As an illustration of this type of map, we present a comparison of EEG activity in Alzheimer's patients and age-matched control subjects for both wakefulness and REM sleep.
How to get statistically significant effects in any ERP experiment (and why you shouldn't).
Luck, Steven J; Gaspelin, Nicholas
2017-01-01
ERP experiments generate massive datasets, often containing thousands of values for each participant, even after averaging. The richness of these datasets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant but bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand-averaged data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multifactor statistical analyses. Reanalyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant but bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions.
Wang, Yuedong; Guo, Sun-Wei
2004-01-01
Array-based comparative genomic hybridization (ABCGH) is an emerging high-resolution and high-throughput molecular genetic technique that allows genome-wide screening for chromosome alterations associated with tumorigenesis. Like the cDNA microarrays, ABCGH uses two differentially labeled test and reference DNAs which are cohybridized to cloned genomic fragments immobilized on glass slides. The hybridized DNAs are then detected in two different fluorochromes, and the significant deviation from unity in the ratios of the digitized intensity values is indicative of copy-number differences between the test and reference genomes. Proper statistical analyses need to account for many sources of variation besides genuine differences between the two genomes. In particular, spatial correlations, the variable nature of the ratio variance and non-Normal distribution call for careful statistical modeling. We propose two new statistics, the standard t-statistic and its modification with variances smoothed along the genome, and two tests for each statistic, the standard t-test and a test based on the hybrid adaptive spline (HAS). Simulations indicate that the smoothed t-statistic always improves the performance over the standard t-statistic. The t-tests are more powerful in detecting isolated alterations while those based on HAS are more powerful in detecting a cluster of alterations. We apply the proposed methods to the identification of genomic alterations in endometrium in women with endometriosis.
A statistical method (cross-validation) for bone loss region detection after spaceflight
Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.
2010-01-01
Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144
A statistical method (cross-validation) for bone loss region detection after spaceflight.
Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W; Kornak, John; Lang, Thomas F; Fang, Jiqian; Lu, Ying
2010-06-01
Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes.
Krumbholz, Aniko; Anielski, Patricia; Gfrerer, Lena; Graw, Matthias; Geyer, Hans; Schänzer, Wilhelm; Dvorak, Jiri; Thieme, Detlef
2014-01-01
Clenbuterol is a well-established β2-agonist, which is prohibited in sports and strictly regulated for use in the livestock industry. During the last few years clenbuterol-positive results in doping controls and in samples from residents or travellers from a high-risk country were suspected to be related the illegal use of clenbuterol for fattening. A sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) method was developed to detect low clenbuterol residues in hair with a detection limit of 0.02 pg/mg. A sub-therapeutic application study and a field study with volunteers, who have a high risk of contamination, were performed. For the application study, a total dosage of 30 µg clenbuterol was applied to 20 healthy volunteers on 5 subsequent days. One month after the beginning of the application, clenbuterol was detected in the proximal hair segment (0-1 cm) in concentrations between 0.43 and 4.76 pg/mg. For the second part, samples of 66 Mexican soccer players were analyzed. In 89% of these volunteers, clenbuterol was detectable in their hair at concentrations between 0.02 and 1.90 pg/mg. A comparison of both parts showed no statistical difference between sub-therapeutic application and contamination. In contrast, discrimination to a typical abuse of clenbuterol is apparently possible. Due to these findings results of real doping control samples can be evaluated. Copyright © 2014 John Wiley & Sons, Ltd.
Statistical significance estimation of a signal within the GooFit framework on GPUs
NASA Astrophysics Data System (ADS)
Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis
2017-03-01
In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
NASA Astrophysics Data System (ADS)
McClure, Mark; Gibson, Riley; Chiu, Kit-Kwan; Ranganath, Rajesh
2017-03-01
We develop a statistical method for identifying induced seismicity from large data sets and apply the method to decades of wastewater disposal and seismicity data in California and Oklahoma. The study regions are divided into grid blocks. We use a longitudinal study design, seeking associations between seismicity and wastewater injection volume along time series within each grid block. In each grid block, we find the maximum likelihood estimate for a model parameter that relates induced seismicity hazard to total volume of wastewater injected each year. To assess significance, we compute likelihood ratio test statistics in each grid block and each state, California and Oklahoma. Resampling with permutation and random temporal offset of injection data is used to estimate p values from the likelihood ratio statistics. We focus on assessing whether observed associations between injection and seismicity occur more often than would be expected by chance; we do not attempt to quantify the overall incidence of induced seismicity. The study is designed so that, under reasonable assumptions, the associations can be formally interpreted as demonstrating causality. Wastewater disposal is associated with other activities that can induce seismicity, such as reservoir depletion. Therefore, our results should be interpreted as finding seismicity induced by wastewater disposal and all other associated activities. In Oklahoma, the analysis finds with extremely high confidence that seismicity associated with wastewater disposal has occurred. In California, the analysis finds moderate evidence that seismicity associated with wastewater disposal has occurred, but the result is not strong enough to be conclusive.
StegoWall: blind statistical detection of hidden data
NASA Astrophysics Data System (ADS)
Voloshynovskiy, Sviatoslav V.; Herrigel, Alexander; Rytsar, Yuri B.; Pun, Thierry
2002-04-01
Novel functional possibilities, provided by recent data hiding technologies, carry out the danger of uncontrolled (unauthorized) and unlimited information exchange that might be used by people with unfriendly interests. The multimedia industry as well as the research community recognize the urgent necessity for network security and copyright protection, or rather the lack of adequate law for digital multimedia protection. This paper advocates the need for detecting hidden data in digital and analog media as well as in electronic transmissions, and for attempting to identify the underlying hidden data. Solving this problem calls for the development of an architecture for blind stochastic hidden data detection in order to prevent unauthorized data exchange. The proposed architecture is called StegoWall; its key aspects are the solid investigation, the deep understanding, and the prediction of possible tendencies in the development of advanced data hiding technologies. The basic idea of our complex approach is to exploit all information about hidden data statistics to perform its detection based on a stochastic framework. The StegoWall system will be used for four main applications: robust watermarking, secret communications, integrity control and tamper proofing, and internet/network security.
Bothe, Anne K; Richardson, Jessica D
2011-08-01
To discuss constructs and methods related to assessing the magnitude and the meaning of clinical outcomes, with a focus on applications in speech-language pathology. Professionals in medicine, allied health, psychology, education, and many other fields have long been concerned with issues referred to variously as practical significance, clinical significance, social validity, patient satisfaction, treatment effectiveness, or the meaningfulness or importance of beyond-clinic or real-world treatment outcomes. Existing literature addressing these issues from multiple disciplines was reviewed and synthesized. Practical significance, an adjunct to statistical significance, refers to the magnitude of a change or a difference between groups. The appropriate existing term for the interpretation of treatment outcomes, or the attribution of meaning or value to treatment outcomes, is clinical significance. To further distinguish between important constructs, the authors suggest incorporating as definitive the existing notion that clinical significance may refer to measures selected or interpreted by professionals or with respect to groups of clients. The term personal significance is introduced to refer to goals, variables, measures, and changes that are of demonstrated value to individual clients.
NASA Astrophysics Data System (ADS)
Eggert, S.; Walter, T. R.
2009-04-01
The study of volcanic triggering and coupling to the tectonic surroundings has received special attention in recent years, using both direct field observations and historical descriptions of eruptions and earthquake activity. Repeated reports of volcano-earthquake interactions in, e.g., Europe and Japan, may imply that clustered occurrence is important in some regions. However, the regions likely to suffer clustered eruption-earthquake activity have not been systematically identified, and the processes responsible for the observed interaction are debated. We first review previous works about the correlation of volcanic eruptions and earthquakes, and describe selected local clustered events. Following an overview of previous statistical studies, we further elaborate the databases of correlated eruptions and earthquakes from a global perspective. Since we can confirm a relationship between volcanic eruptions and earthquakes on the global scale, we then perform a statistical study on the regional level, showing that time and distance between events follow a linear relationship. In the time before an earthquake, a period of volcanic silence often occurs, whereas in the time after, an increase in volcanic activity is evident. Our statistical tests imply that certain regions are especially predisposed to concurrent eruption-earthquake pairs, e.g., Japan, whereas such pairing is statistically less significant in other regions, such as Europe. Based on this study, we argue that individual and selected observations may bias the perceptible weight of coupling. Volcanoes located in the predisposed regions (e.g., Japan, Indonesia, Melanesia), however, indeed often have unexpectedly changed in association with either an imminent or a past earthquake.
Iacucci, Ernesto; Zingg, Hans H; Perkins, Theodore J
2012-01-01
High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an "interesting" set of genes - say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover "gold standard" annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.
Detecting microsatellites within genomes: significant variation among algorithms.
Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe
2007-04-18
Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions.
Detecting microsatellites within genomes: significant variation among algorithms
Leclercq, Sébastien; Rivals, Eric; Jarne, Philippe
2007-01-01
Background Microsatellites are short, tandemly-repeated DNA sequences which are widely distributed among genomes. Their structure, role and evolution can be analyzed based on exhaustive extraction from sequenced genomes. Several dedicated algorithms have been developed for this purpose. Here, we compared the detection efficiency of five of them (TRF, Mreps, Sputnik, STAR, and RepeatMasker). Results Our analysis was first conducted on the human X chromosome, and microsatellite distributions were characterized by microsatellite number, length, and divergence from a pure motif. The algorithms work with user-defined parameters, and we demonstrate that the parameter values chosen can strongly influence microsatellite distributions. The five algorithms were then compared by fixing parameters settings, and the analysis was extended to three other genomes (Saccharomyces cerevisiae, Neurospora crassa and Drosophila melanogaster) spanning a wide range of size and structure. Significant differences for all characteristics of microsatellites were observed among algorithms, but not among genomes, for both perfect and imperfect microsatellites. Striking differences were detected for short microsatellites (below 20 bp), regardless of motif. Conclusion Since the algorithm used strongly influences empirical distributions, studies analyzing microsatellite evolution based on a comparison between empirical and theoretical size distributions should therefore be considered with caution. We also discuss why a typological definition of microsatellites limits our capacity to capture their genomic distributions. PMID:17442102
Detecting the Significant Flux Backbone of Escherichia coli metabolism.
Güell, Oriol; Sagués, Francesc; Serrano, M Ángeles
2017-04-09
The heterogeneity of computationally predicted reaction fluxes in metabolic networks within a single flux state can be exploited to detect their significant flux backbone. Here, we disclose the backbone of Escherichia coli, and compare it with the backbones of other bacteria. We find that, in general, the core of the backbones is mainly composed of reactions in energy metabolism corresponding to ancient pathways. In E. coli, the synthesis of nucleotides and the metabolism of lipids form smaller cores which rely critically on energy metabolism. Moreover, the consideration of different media leads to the identification of pathways sensitive to environmental changes. The metabolic backbone of an organism is thus useful for tracing, simultaneously, both its evolution and adaptation fingerprints. This article is protected by copyright. All rights reserved.
RT-PSM, a real-time program for peptide-spectrum matching with statistical significance.
Wu, Fang-Xiang; Gagné, Pierre; Droit, Arnaud; Poirier, Guy G
2006-01-01
The analysis of complex biological peptide mixtures by tandem mass spectrometry (MS/MS) produces a huge body of collision-induced dissociation (CID) MS/MS spectra. Several methods have been developed for identifying peptide-spectrum matches (PSMs) by assigning MS/MS spectra to peptides in a database. However, most of these methods either do not give the statistical significance of PSMs (e.g., SEQUEST) or employ time-consuming computational methods to estimate the statistical significance (e.g., PeptideProphet). In this paper, we describe a new algorithm, RT-PSM, which can be used to identify PSMs and estimate their accuracy statistically in real time. RT-PSM first computes PSM scores between an MS/MS spectrum and a set of candidate peptides whose masses are within a preset tolerance of the MS/MS precursor ion mass. Then the computed PSM scores of all candidate peptides are employed to fit the expectation value distribution of the scores into a second-degree polynomial function in PSM score. The statistical significance of the best PSM is estimated by extrapolating the fitting polynomial function to the best PSM score. RT-PSM was tested on two pairs of MS/MS spectrum datasets and protein databases to investigate its performance. The MS/MS spectra were acquired using an ion trap mass spectrometer equipped with a nano-electrospray ionization source. The results show that RT-PSM has good sensitivity and specificity. Using a 55,577-entry protein database and running on a standard Pentium-4, 2.8-GHz CPU personal computer, RT-PSM can process peptide spectra on a sequential, one-by-one basis in 0.047 s on average, compared to more than 7 s per spectrum on average for Sequest and X!Tandem, in their current batch-mode processing implementations. RT-PSM is clearly shown to be fast enough for real-time PSM assignment of MS/MS spectra generated every 3 s or so by a 3D ion trap or by a QqTOF instrument.
Determining the statistical significance of particle precipitation related to EMIC waves.
NASA Astrophysics Data System (ADS)
Shin, D. K.; Lee, D. Y.; Noh, S. J.; Hwang, J.; Lee, J.
2016-12-01
One of the particle loss processes in the magnetosphere is precipitation into the Earth's atmosphere caused by electromagnetic ion cyclotron (EMIC) waves through pitch angle scattering. These particle precipitations can affect the dynamics of ring current protons ( tens of keV) and radiation belt electrons ( MeV) in the inner magnetosphere. Although there have been many reports to support the precipitation by EMIC waves, its effectiveness has not been demonstrated statistically. In this study, we use Van Allen Probes observations to identify a large number of EMIC waves for which we then determine their association with relativistic electron and energetic (30-80 keV) proton precipitation observed at NOAA low earth orbit satellites. We find that the detection rates of precipitation given EMIC waves in space strongly depends on the number of available low-altitude satellites: The average detection rates by one low-altitude satellite are 8.4 % for electrons and 22.2 % for protons, and they increase by a factor of > 2 if one uses observations from five NOAA satellites. This implies a strong MLT dependence of precipitation given EMIC wave in space. To demonstrate this we determine the MLT distribution of precipitations as a function of MLT of identified EMIC wave location. Finally we determine the relationship between precipitations of electrons and protons, and dependence of EMIC waves and precipitations on the solar phase years.
Consequences of statistical sense determination for WIMP directional detection
NASA Astrophysics Data System (ADS)
Green, Anne M.; Morgan, Ben
2008-01-01
We study the consequences of limited recoil sense reconstruction on the number of events required to reject isotropy and detect a WIMP signal using a directional detector. For a constant probability of determining the sense correctly, 3-d readout and zero background, we find that as the probability is decreased from 1.0 to 0.75 the number of events required increases by a factor of a few. As the probability is decreased further the number of events increases sharply, and isotropy can be rejected more easily by discarding the sense information and using axial statistics. This however requires an order of magnitude more events than vectorial data with perfect sense determination. We also consider energy dependent probabilities of correctly measuring the sense. Our main finding is that correctly determining the sense of the abundant, but less anisotropic, low energy recoils is most important.
Cosmology with phase statistics: parameter forecasts and detectability of BAO
NASA Astrophysics Data System (ADS)
Eggemeier, Alexander; Smith, Robert E.
2017-04-01
We consider an alternative to conventional three-point statistics such as the bispectrum, which is purely based on the Fourier phases of the density field: the line correlation function. This statistic directly probes the non-linear clustering regime and contains information highly complementary to that contained in the power spectrum. In this work, we determine, for the first time, its potential to constrain cosmological parameters and detect baryon acoustic oscillations (hereafter BAOs). We show how to compute the line correlation function for a discrete sampled set of tracers that follow a local Lagrangian biasing scheme and demonstrate how it breaks the degeneracy between the amplitude of density fluctuations and the bias parameters of the model. We then derive analytic expressions for its covariance and show that it can be written as a sum of a Gaussian piece plus non-Gaussian corrections. We compare our predictions with a large ensemble of N-body simulations and confirm that BAOs do indeed modulate the signal of the line correlation function for scales 50-100 h-1Mpc and that the characteristic S-shape feature would be detectable in upcoming Stage IV surveys at the level of ∼4σ. We then focus on the cosmological information content and compute Fisher forecasts for an idealized Stage III galaxy redshift survey of volume V ∼ 10 h-3 Gpc3 and out to z = 1. We show that combining the line correlation function with the galaxy power spectrum and a Planck-like microwave background survey yields improvements up to a factor of 2 for parameters such as σ8, b1 and b2, compared with using only the two-point information alone.
Robust Statistical Detection of Power-Law Cross-Correlation
Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert
2016-01-01
We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630
Detection and integration of genotyping errors in statistical genetics.
Sobel, Eric; Papp, Jeanette C; Lange, Kenneth
2002-02-01
Detection of genotyping errors and integration of such errors in statistical analysis are relatively neglected topics, given their importance in gene mapping. A few inopportunely placed errors, if ignored, can tremendously affect evidence for linkage. The present study takes a fresh look at the calculation of pedigree likelihoods in the presence of genotyping error. To accommodate genotyping error, we present extensions to the Lander-Green-Kruglyak deterministic algorithm for small pedigrees and to the Markov-chain Monte Carlo stochastic algorithm for large pedigrees. These extensions can accommodate a variety of error models and refrain from simplifying assumptions, such as allowing, at most, one error per pedigree. In principle, almost any statistical genetic analysis can be performed taking errors into account, without actually correcting or deleting suspect genotypes. Three examples illustrate the possibilities. These examples make use of the full pedigree data, multiple linked markers, and a prior error model. The first example is the estimation of genotyping error rates from pedigree data. The second-and currently most useful-example is the computation of posterior mistyping probabilities. These probabilities cover both Mendelian-consistent and Mendelian-inconsistent errors. The third example is the selection of the true pedigree structure connecting a group of people from among several competing pedigree structures. Paternity testing and twin zygosity testing are typical applications.
NASA Astrophysics Data System (ADS)
Keylock, Christopher J.
2017-08-01
A method is presented for deriving random velocity gradient tensors given a source tensor. These synthetic tensors are constrained to lie within mathematical bounds of the non-normality of the source tensor, but we do not impose direct constraints upon scalar quantities typically derived from the velocity gradient tensor and studied in fluid mechanics. Hence, it becomes possible to ask hypotheses of data at a point regarding the statistical significance of these scalar quantities. Having presented our method and the associated mathematical concepts, we apply it to homogeneous, isotropic turbulence to test the utility of the approach for a case where the behavior of the tensor is understood well. We show that, as well as the concentration of data along the Vieillefosse tail, actual turbulence is also preferentially located in the quadrant where there is both excess enstrophy (Q>0 ) and excess enstrophy production (R<0 ). We also examine the topology implied by the strain eigenvalues and find that for the statistically significant results there is a particularly strong relative preference for the formation of disklike structures in the (Q<0 ,R<0 ) quadrant. With the method shown to be useful for a turbulence that is already understood well, it should be of even greater utility for studying complex flows seen in industry and the environment.
Jefferson, L; Cooper, E; Hewitt, C; Torgerson, T; Cook, L; Tharmanathan, P; Cockayne, S; Torgerson, D
2016-01-01
Objective Time-lag from study completion to publication is a potential source of publication bias in randomised controlled trials. This study sought to update the evidence base by identifying the effect of the statistical significance of research findings on time to publication of trial results. Design Literature searches were carried out in four general medical journals from June 2013 to June 2014 inclusive (BMJ, JAMA, the Lancet and the New England Journal of Medicine). Setting Methodological review of four general medical journals. Participants Original research articles presenting the primary analyses from phase 2, 3 and 4 parallel-group randomised controlled trials were included. Main outcome measures Time from trial completion to publication. Results The median time from trial completion to publication was 431 days (n = 208, interquartile range 278–618). A multivariable adjusted Cox model found no statistically significant difference in time to publication for trials reporting positive or negative results (hazard ratio: 0.86, 95% CI 0.64 to 1.16, p = 0.32). Conclusion In contrast to previous studies, this review did not demonstrate the presence of time-lag bias in time to publication. This may be a result of these articles being published in four high-impact general medical journals that may be more inclined to publish rapidly, whatever the findings. Further research is needed to explore the presence of time-lag bias in lower quality studies and lower impact journals. PMID:27757242
NASA Astrophysics Data System (ADS)
Hu, Rui; Wang, Bin
2001-02-01
Finding out statistically significant words in DNA and protein sequences forms the basis for many genetic studies. By applying the maximal entropy principle, we give one systematic way to study the nonrandom occurrence of words in DNA or protein sequences. Through comparison with experimental results, it was shown that patterns of regulatory binding sites in Saccharomyces cerevisiae ( yeast) genomes tend to occur significantly in the promoter regions. We studied two correlated gene families of yeast. The method successfully extracts the binding sites verified by experiments in each family. Many putative regulatory sites in the upstream regions are proposed. The study also suggested that some regulatory sites are active in both directions, while others show directional preference.
Algorithms for Detecting Significantly Mutated Pathways in Cancer
NASA Astrophysics Data System (ADS)
Vandin, Fabio; Upfal, Eli; Raphael, Benjamin J.
Recent genome sequencing studies have shown that the somatic mutations that drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional mutations from sporadic, passenger mutations. Since cancer mutations are hypothesized to target a relatively small number of cellular signaling and regulatory pathways, a common approach is to assess whether known pathways are enriched for mutated genes. However, restricting attention to known pathways will not reveal novel cancer genes or pathways. An alterative strategy is to examine mutated genes in the context of genome-scale interaction networks that include both well characterized pathways and additional gene interactions measured through various approaches. We introduce a computational framework for de novo identification of subnetworks in a large gene interaction network that are mutated in a significant number of patients. This framework includes two major features. First, we introduce a diffusion process on the interaction network to define a local neighborhood of "influence" for each mutated gene in the network. Second, we derive a two-stage multiple hypothesis test to bound the false discovery rate (FDR) associated with the identified subnetworks. We test these algorithms on a large human protein-protein interaction network using mutation data from two recent studies: glioblastoma samples from The Cancer Genome Atlas and lung adenocarcinoma samples from the Tumor Sequencing Project. We successfully recover pathways that are known to be important in these cancers, such as the p53 pathway. We also identify additional pathways, such as the Notch signaling pathway, that have been implicated in other cancers but not previously reported as mutated in these samples. Our approach is the first, to our knowledge, to demonstrate a computationally efficient strategy for de novo identification of statistically significant mutated subnetworks. We
Statistical inference for community detection in signed networks
NASA Astrophysics Data System (ADS)
Zhao, Xuehua; Yang, Bo; Liu, Xueyan; Chen, Huiling
2017-04-01
The problem of community detection in networks has received wide attention and proves to be computationally challenging. In recent years, with the surge of signed networks with positive links and negative links, to find community structure in such signed networks has become a research focus in the area of network science. Although many methods have been proposed to address the problem, their performance seriously depends on the predefined optimization objectives or heuristics which are usually difficult to accurately describe the intrinsic structure of community. In this study, we present a statistical inference method for community detection in signed networks, in which a probabilistic model is proposed to model signed networks and the expectation-maximization-based parameter estimation method is deduced to find communities in signed networks. In addition, to efficiently analyze signed networks without any a priori information, a model selection criterion is also proposed to automatically determine the number of communities. In our experiments, the proposed method is tested in the synthetic and real-word signed networks and compared with current methods. The experimental results show the proposed method can more efficiently and accurately find the communities in signed networks than current methods. Notably, the proposed method is a mathematically principled method.
Agrawal, Ankit; Huang, Xiaoqiu
2011-01-01
Pairwise sequence alignment is a central problem in bioinformatics, which forms the basis of various other applications. Two related sequences are expected to have a high alignment score, but relatedness is usually judged by statistical significance rather than by alignment score. Recently, it was shown that pairwise statistical significance gives promising results as an alternative to database statistical significance for getting individual significance estimates of pairwise alignment scores. The improvement was mainly attributed to making the statistical significance estimation process more sequence-specific and database-independent. In this paper, we use sequence-specific and position-specific substitution matrices to derive the estimates of pairwise statistical significance, which is expected to use more sequence-specific information in estimating pairwise statistical significance. Experiments on a benchmark database with sequence-specific substitution matrices at different levels of sequence-specific contribution were conducted, and results confirm that using sequence-specific substitution matrices for estimating pairwise statistical significance is significantly better than using a standard matrix like BLOSUM62, and than database statistical significance estimates reported by popular database search programs like BLAST, PSI-BLAST (without pretrained PSSMs), and SSEARCH on a benchmark database, but with pretrained PSSMs, PSI-BLAST results are significantly better. Further, using position-specific substitution matrices for estimating pairwise statistical significance gives significantly better results even than PSI-BLAST using pretrained PSSMs.
Fisher, Aaron; Anderson, G. Brooke; Peng, Roger
2014-01-01
Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%–49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%–76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457
Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff
2014-01-01
Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/.
Performance optimization for pedestrian detection on degraded video using natural scene statistics
NASA Astrophysics Data System (ADS)
Winterlich, Anthony; Denny, Patrick; Kilmartin, Liam; Glavin, Martin; Jones, Edward
2014-11-01
We evaluate the effects of transmission artifacts such as JPEG compression and additive white Gaussian noise on the performance of a state-of-the-art pedestrian detection algorithm, which is based on integral channel features. Integral channel features combine the diversity of information obtained from multiple image channels with the computational efficiency of the Viola and Jones detection framework. We utilize "quality aware" spatial image statistics to blindly categorize distorted video frames by distortion type and level without the use of an explicit reference. We combine quality statistics with a multiclassifier detection framework for optimal pedestrian detection performance across varying image quality. Our detection method provides statistically significant improvements over current approaches based on single classifiers, on two large pedestrian databases containing a wide variety of artificially added distortion. The improvement in detection performance is further demonstrated on real video data captured from multiple cameras containing varying levels of sensor noise and compression. The results of our research have the potential to be used in real-time in-vehicle networks to improve pedestrian detection performance across a wide range of image and video quality.
2014-01-01
Background Most work on the topic of activity landscapes has focused on their quantitative description and visual representation, with the aim of aiding navigation of SAR. Recent developments have addressed applications such as quantifying the proportion of activity cliffs, investigating the predictive abilities of activity landscape methods and so on. However, all these publications have worked under the assumption that the activity landscape models are “real” (i.e., statistically significant). Results The current study addresses for the first time, in a quantitative manner, the significance of a landscape or individual cliffs in the landscape. In particular, we question whether the activity landscape derived from observed (experimental) activity data is different from a randomly generated landscape. To address this we used the SALI measure with six different data sets tested against one or more molecular targets. We also assessed the significance of the landscapes for single and multiple representations. Conclusions We find that non-random landscapes are data set and molecular representation dependent. For the data sets and representations used in this work, our results suggest that not all representations lead to non-random landscapes. This indicates that not all molecular representations should be used to a) interpret the SAR and b) combined to generate consensus models. Our results suggest that significance testing of activity landscape models and in particular, activity cliffs, is key, prior to the use of such models. PMID:24694189
Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner
2017-01-01
Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771
Statistics, Probability, Significance, Likelihood: Words Mean What We Define Them to Mean
ERIC Educational Resources Information Center
Drummond, Gordon B.; Tom, Brian D. M.
2011-01-01
Statisticians use words deliberately and specifically, but not necessarily in the way they are used colloquially. For example, in general parlance "statistics" can mean numerical information, usually data. In contrast, one large statistics textbook defines the term "statistic" to denote "a characteristic of a…
Statistics, Probability, Significance, Likelihood: Words Mean What We Define Them to Mean
ERIC Educational Resources Information Center
Drummond, Gordon B.; Tom, Brian D. M.
2011-01-01
Statisticians use words deliberately and specifically, but not necessarily in the way they are used colloquially. For example, in general parlance "statistics" can mean numerical information, usually data. In contrast, one large statistics textbook defines the term "statistic" to denote "a characteristic of a…
Statistical Analysis of Data with Non-Detectable Values
Frome, E.L.
2004-08-26
Environmental exposure measurements are, in general, positive and may be subject to left censoring, i.e. the measured value is less than a ''limit of detection''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. A basic problem of interest in environmental risk assessment is to determine if the mean concentration of an analyte is less than a prescribed action level. Parametric methods, used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level and/or an upper percentile (e.g. the 95th percentile) are used to characterize exposure levels, and upper confidence limits are needed to describe the uncertainty in these estimates. In certain situations it is of interest to estimate the probability of observing a future (or ''missed'') value of a lognormal variable. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on the 95th percentile (i.e. the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly
NASA Astrophysics Data System (ADS)
Kellerer-Pirklbauer, Andreas
2016-04-01
Longer data series (e.g. >10 a) of ground temperatures in alpine regions are helpful to improve the understanding regarding the effects of present climate change on distribution and thermal characteristics of seasonal frost- and permafrost-affected areas. Beginning in 2004 - and more intensively since 2006 - a permafrost and seasonal frost monitoring network was established in Central and Eastern Austria by the University of Graz. This network consists of c.60 ground temperature (surface and near-surface) monitoring sites which are located at 1922-3002 m a.s.l., at latitude 46°55'-47°22'N and at longitude 12°44'-14°41'E. These data allow conclusions about general ground thermal conditions, potential permafrost occurrence, trend during the observation period, and regional pattern of changes. Calculations and analyses of several different temperature-related parameters were accomplished. At an annual scale a region-wide statistical significant warming during the observation period was revealed by e.g. an increase in mean annual temperature values (mean, maximum) or the significant lowering of the surface frost number (F+). At a seasonal scale no significant trend of any temperature-related parameter was in most cases revealed for spring (MAM) and autumn (SON). Winter (DJF) shows only a weak warming. In contrast, the summer (JJA) season reveals in general a significant warming as confirmed by several different temperature-related parameters such as e.g. mean seasonal temperature, number of thawing degree days, number of freezing degree days, or days without night frost. On a monthly basis August shows the statistically most robust and strongest warming of all months, although regional differences occur. Despite the fact that the general ground temperature warming during the last decade is confirmed by the field data in the study region, complications in trend analyses arise by temperature anomalies (e.g. warm winter 2006/07) or substantial variations in the winter
The Detection and Statistics of Giant Arcs behind CLASH Clusters
NASA Astrophysics Data System (ADS)
Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Zheng, Wei; Bradley, Larry; Vega, Jesus; Koekemoer, Anton
2016-02-01
We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift zs = 1.9 with 33% of the detected arcs having zs > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c-M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.
THE DETECTION AND STATISTICS OF GIANT ARCS BEHIND CLASH CLUSTERS
Xu, Bingxiao; Zheng, Wei; Postman, Marc; Bradley, Larry; Meneghetti, Massimo; Koekemoer, Anton; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Vega, Jesus
2016-02-01
We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift z{sub s} = 1.9 with 33% of the detected arcs having z{sub s} > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c–M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.
Statistical language analysis for automatic exfiltration event detection.
Robinson, David Gerald
2010-04-01
This paper discusses the recent development a statistical approach for the automatic identification of anomalous network activity that is characteristic of exfiltration events. This approach is based on the language processing method eferred to as latent dirichlet allocation (LDA). Cyber security experts currently depend heavily on a rule-based framework for initial detection of suspect network events. The application of the rule set typically results in an extensive list of uspect network events that are then further explored manually for suspicious activity. The ability to identify anomalous network events is heavily dependent on the experience of the security personnel wading through the network log. Limitations f this approach are clear: rule-based systems only apply to exfiltration behavior that has previously been observed, and experienced cyber security personnel are rare commodities. Since the new methodology is not a discrete rule-based pproach, it is more difficult for an insider to disguise the exfiltration events. A further benefit is that the methodology provides a risk-based approach that can be implemented in a continuous, dynamic or evolutionary fashion. This permits uspect network activity to be identified early with a quantifiable risk associated with decision making when responding to suspicious activity.
Sassenhagen, Jona; Alday, Phillip M
2016-11-01
Experimental research on behavior and cognition frequently rests on stimulus or subject selection where not all characteristics can be fully controlled, even when attempting strict matching. For example, when contrasting patients to controls, variables such as intelligence or socioeconomic status are often correlated with patient status. Similarly, when presenting word stimuli, variables such as word frequency are often correlated with primary variables of interest. One procedure very commonly employed to control for such nuisance effects is conducting inferential tests on confounding stimulus or subject characteristics. For example, if word length is not significantly different for two stimulus sets, they are considered as matched for word length. Such a test has high error rates and is conceptually misguided. It reflects a common misunderstanding of statistical tests: interpreting significance not to refer to inference about a particular population parameter, but about 1. the sample in question, 2. the practical relevance of a sample difference (so that a nonsignificant test is taken to indicate evidence for the absence of relevant differences). We show inferential testing for assessing nuisance effects to be inappropriate both pragmatically and philosophically, present a survey showing its high prevalence, and briefly discuss an alternative in the form of regression including nuisance variables.
Hojat, Mohammadreza; Xu, Gang
2004-01-01
Effect Sizes (ES) are an increasingly important index used to quantify the degree of practical significance of study results. This paper gives an introduction to the computation and interpretation of effect sizes from the perspective of the consumer of the research literature. The key points made are: 1. ES is a useful indicator of the practical (clinical) importance of research results that can be operationally defined from being "negligible" to "moderate", to "important". 2. The ES has two advantages over statistical significance testing: (a) it is independent of the size of the sample; (b) it is a scale-free index. Therefore, ES can be uniformly interpreted in different studies regardless of the sample size and the original scales of the variables. 3. Calculations of the ES are illustrated by using examples of comparisons between two means, correlation coefficients, chi-square tests and two proportions, along with appropriate formulas. 4. Operational definitions for the ES s are given, along with numerical examples for the purpose of illustration.
Detecting Significant Change in Wavefront Error: How long does it take?
Koenig, Darren E.; Applegate, Raymond A.; Marsack, Jason D.; Sarver, Edwin J.; Nguyen, Lan Chi
2010-01-01
Purpose Measurement noise in ocular wavefront sensing limits detection of statistically significant change in high-order wavefront error (HO WFE). Consequently, measurement noise is problematic when trying to detect progressive change in HO WFE. Our aim is to 1) determine the necessary amount of time to detect age-related change in HO WFE given measurement variability and HO WFE composition and magnitude and 2) minimize the length of time necessary to detect change. Methods Five subjects with 0.26 to 1.57 micrometers root mean square HO WFE (HO RMS) over a 6 mm pupil were measured 12 times in 10–15 minutes using a custom Shack-Hartmann wavefront sensor. Each individual’s standard deviation of measures was used to calculate the 95% confidence interval around their mean HO RMS. Data previously reported on the rate of change in the HO RMS due to normal aging and pupil diameter was used to calculate time to detect change exceeding this interval given measurement variability. Results Single measurements limit statistical detection to a range of 8 to 30 years. Increasing the number of WFE measurements per visit decreases time to detection (e.g., 7 measurements reduce the range to 3 to 14 years). The number of years to detect a change requires consideration of the subject’s measurement variability, level and distribution of aberrations and age. Uncertainty in locating pupil centre accounts for 39 ± 8% of the total variability. Conclusions The ability to detect change in HO WFE over a short period of time due to normal aging is difficult but possible with current WFE measurement technology. Single measurements of HO WFE become less predictive of true HO WFE with increasing measurement variability. Multiple measurements reduce the variability. Even with proper fixation and instrument alignment, pupil centre location uncertainty in HO WFE measurements is a nontrivial contributor to measurement variability. PMID:19469015
Mass detection on real and synthetic mammograms: human observer templates and local statistics
NASA Astrophysics Data System (ADS)
Castella, Cyril; Kinkel, Karen; Verdun, Francis R.; Eckstein, Miguel P.; Abbey, Craig K.; Bochud, François O.
2007-03-01
In this study we estimated human observer templates associated with the detection of a realistic mass signal superimposed on real and simulated but realistic synthetic mammographic backgrounds. Five trained naÃve observers participated in two-alternative forced-choice (2-AFC) experiments in which they were asked to detect a spherical mass signal extracted from a mammographic phantom. This signal was superimposed on statistically stationary clustered lumpy backgrounds (CLB) in one instance, and on nonstationary real mammographic backgrounds in another. Human observer linear templates were estimated using a genetic algorithm. An additional 2-AFC experiment was conducted with twin noise in order to determine which local statistical properties of the real backgrounds influenced the ability of the human observers to detect the signal. Results show that the estimated linear templates are not significantly different for stationary and nonstationary backgrounds. The estimated performance of the linear template compared with the human observer is within 5% in terms of percent correct (Pc) for the 2-AFC task. Detection efficiency is significantly higher on nonstationary real backgrounds than on globally stationary synthetic CLB. Using the twin-noise experiment and a new method to relate image features to observers trial to trial decisions, we found that the local statistical properties preventing or making the detection task easier were the standard deviation and three features derived from the neighborhood gray-tone difference matrix: coarseness, contrast and strength. These statistical features showed a dependency with the human performance only when they are estimated within an area sufficiently small around the searched location. These findings emphasize that nonstationary backgrounds need to be described by their local statistics and not by global ones like the noise Wiener spectrum.
NASA Astrophysics Data System (ADS)
Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting
2017-04-01
Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a
Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos
2015-10-01
To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Williams, Katrina L; Low Choy, Nancy L; Brauer, Sandra G
2016-09-01
To explore differences in gait endurance, speed, and standing balance in people with multiple sclerosis (MS) across the Disease Step Rating Scale, and to determine if differences are statistically significant and clinically meaningful. Observational study. Community rehabilitation - primary health care center. Community-dwelling people with MS (N=222; mean age, 48±12y; 32% men). Not applicable. Participants were categorized using the Disease Step Rating Scale. Demographics and clinical measures of gait endurance (6-minute walk test [6MWT]), gait speed (10-m walk test [10MWT] and 25-foot walk test [25FWT]), and balance (Berg Balance Scale [BBS]) were recorded in 1 session. Differences in these parameters across categories of the Disease Step Rating Scale were explored, and clinically meaningful differences were identified. The 6MWT showed a greater number of significant differences across adjacent disease steps in those with less disability (P<.001), whereas the 10MWT and 25FWT demonstrated more significant changes in those with greater disability (P<.001). The BBS demonstrated significant differences across the span of the Disease Step Rating Scale categories (P<.001). Differences in gait and balance between adjacent Disease Step Rating Scale categories met most previously established levels of minimally detectable change and all minimally important change scores. Our findings support the Disease Step Rating Scale is an observational tool that can be used by health professionals to categorize people with MS, with the categories reflective of statistically significant and clinically meaningful differences in gait and balance performance. Copyright © 2016 American Congress of Rehabilitation Medicine. Published by Elsevier Inc. All rights reserved.
Automatic brain tumor detection in MRI: methodology and statistical validation
NASA Astrophysics Data System (ADS)
Iftekharuddin, Khan M.; Islam, Mohammad A.; Shaik, Jahangheer; Parra, Carlos; Ogg, Robert
2005-04-01
Automated brain tumor segmentation and detection are immensely important in medical diagnostics because it provides information associated to anatomical structures as well as potential abnormal tissue necessary to delineate appropriate surgical planning. In this work, we propose a novel automated brain tumor segmentation technique based on multiresolution texture information that combines fractal Brownian motion (fBm) and wavelet multiresolution analysis. Our wavelet-fractal technique combines the excellent multiresolution localization property of wavelets to texture extraction of fractal. We prove the efficacy of our technique by successfully segmenting pediatric brain MR images (MRIs) from St. Jude Children"s Research Hospital. We use self-organizing map (SOM) as our clustering tool wherein we exploit both pixel intensity and multiresolution texture features to obtain segmented tumor. Our test results show that our technique successfully segments abnormal brain tissues in a set of T1 images. In the next step, we design a classifier using Feed-Forward (FF) neural network to statistically validate the presence of tumor in MRI using both the multiresolution texture and the pixel intensity features. We estimate the corresponding receiver operating curve (ROC) based on the findings of true positive fractions and false positive fractions estimated from our classifier at different threshold values. An ROC, which can be considered as a gold standard to prove the competence of a classifier, is obtained to ascertain the sensitivity and specificity of our classifier. We observe that at threshold 0.4 we achieve true positive value of 1.0 (100%) sacrificing only 0.16 (16%) false positive value for the set of 50 T1 MRI analyzed in this experiment.
Significance of antibody detection in the diagnosis of cryptococcal meningitis.
Patil, Shripad A; Katyayani, S; Arvind, N
2012-01-01
Cryptococcus neoformans is the causative agent of Cryptococcosis, a chronic and life-threatening infection common in AIDS patients. Sonicated proteins of cryptococci were reported to contain antigenic properties. In the present study antigens are prepared from cryptococcal culture filtrate and by sonication. Secretory antigens are prepared by precipitation of culture filtrate using saturated ammonium sulfate followed by dialysis. Prepared antigens are tested for the presence of antibodies in the CSF samples of cryptococcal meningitis cases by ELISA. Comparison is made between India ink staining, latex antigen test, and the antibodies to the sonicated and secretory antigens. The results indicate that although antigen could be detected in the majority of samples, antibody could also be detected to the extent of 80-85%. It is interesting to note that some samples that were negative for India ink staining also showed high antibody responses. Hence, antibody detection could be a valuable marker in association with India ink staining for the early diagnosis of the cryptococcal infection. This test may also counter false positivity encountered in latex antigen test. Antibody detection assay would be a viable alternative, which has 83% sensitivity and 100% specificity. Thus the presently described test aids in immunodiagnosis of cryptococcal infection.
Petykhov, A B; Maev, I V; Deriabin, V E
2012-01-01
Anthropometry--a technique, allowing to obtain the necessary features for the characteristic of human body's changes in norm and at pathology. Statistical analysis of anthropometric parameters, such as--body mass, length, waist line, hip, shoulder and wrist circumferences, skin rolls of fat thickness: on triceps, under a bladebone, on a breast, on a venter and on a biceps, with calculation of indexes and an assessment of possible age influence was carried out for the first time in domestic medicine. Complexes of showing interrelations anthropometric characteristics were detected. Correlation coefficients (r) were counted and the factorial (on a method main a component with the subsequent rotation--a varimax method), covariance and discriminative analyses (with application of the Kaiser and Wilks criterions and F-test) is applied. Study of intergroup variability of body composition was carried out on separate characteristics in healthy individuals groups (135 surveyed aged 45,6 +/- 1,2 years, 56,3% men and 43,7% women) and at internal pathology: patients after a gastrectomy--121 (57,7 +/- 1,2 years, 52% men and 48% women); after Billroth operation--214 (56,1 +/- 1,0 years, 53% men and 47% women); after enterectomy--103 (44,5 +/- 1,8 years, 53% men and 47% women); after mixed genesis protein-energy wasting--206 (29,04 +/- 1,6 years, 79% men and 21% women). The group of interlocking characteristics which includes anthropometric parameters of hypodermic lipopexia (rolls of fat thickness on triceps, a biceps, under a bladebone, on a venter) and fatty body mass was defined by results of the analysis. These characteristics are interconnected with age and growth and have more expressed dependence at women, that reflects development of a fatty component of a body, at assessment of body mass index at women (unlike men). The waist-hip circumference index differs irrespective of body composition indicators that doesn't allow to characterize it with the terms of truncal or
Yu, Yi-Kuo; Gertz, E Michael; Agarwala, Richa; Schäffer, Alejandro A; Altschul, Stephen F
2006-01-01
Protein sequence database search programs may be evaluated both for their retrieval accuracy--the ability to separate meaningful from chance similarities--and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.
Spatial scan statistics for detection of multiple clusters with arbitrary shapes.
Lin, Pei-Sheng; Kung, Yi-Hung; Clayton, Murray
2016-12-01
In applying scan statistics for public health research, it would be valuable to develop a detection method for multiple clusters that accommodates spatial correlation and covariate effects in an integrated model. In this article, we connect the concepts of the likelihood ratio (LR) scan statistic and the quasi-likelihood (QL) scan statistic to provide a series of detection procedures sufficiently flexible to apply to clusters of arbitrary shape. First, we use an independent scan model for detection of clusters and then a variogram tool to examine the existence of spatial correlation and regional variation based on residuals of the independent scan model. When the estimate of regional variation is significantly different from zero, a mixed QL estimating equation is developed to estimate coefficients of geographic clusters and covariates. We use the Benjamini-Hochberg procedure (1995) to find a threshold for p-values to address the multiple testing problem. A quasi-deviance criterion is used to regroup the estimated clusters to find geographic clusters with arbitrary shapes. We conduct simulations to compare the performance of the proposed method with other scan statistics. For illustration, the method is applied to enterovirus data from Taiwan. © 2016, The International Biometric Society.
Lee, L.; Helsel, D.
2005-01-01
Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.
Sheikh, Mashhood Ahmed
2017-01-01
association between childhood adversity and ADS in adulthood. However, when education was excluded as a mediator-response confounding variable, the indirect effect of childhood adversity on ADS in adulthood was statistically significant (p < 0.05). This study shows that a careful inclusion of potential confounding variables is important when assessing mediation. PMID:28824498
Advanced Statistical Signal Processing Techniques for Landmine Detection Using GPR
2014-07-12
based ground penetrating radars for the detection of subsurface objects that are low in metal content and hard to detect. The derived techniques...penetrating radars for the detection of subsurface objects that are low in metal content and hard to detect. The derived techniques include the exploitation...5.00 4.00 3.00 9.00 T. Glenn, J. Wilson, D. Ho. A MULTIMODAL MATCHING PURSUITS DISSIMILARITY MEASURE APPLIED TO LANDMINE/CLUTTER DISCRIMINATION
Wan, Lin; Sun, Fengzhu
2012-01-01
RNA-Seq is widely used in transcriptome studies, and the detection of differentially expressed genes (DEGs) between two classes of individuals, e.g. cases vs controls, using RNA-Seq is of fundamental importance. Many statistical methods for DEG detection based on RNA-Seq data have been developed and most of them are based on the read counts mapped to individual genes. On the other hand, genes are composed of exons and the distribution of reads for the different exons can be heterogeneous. We hypothesize that the detection accuracy of differentially expressed genes can be increased by analyzing individual exons within a gene and then combining the results of the exons. We therefore developed a novel program, termed CEDER, to accurately detect DGEs by combining the significance of the exons. CEDER first tests for differentially expressed exons yielding a p-value for each, and then gives a score indicating the potential for a gene to be differentially expressed by integrating the p-values of the exons in the gene. We showed that CEDER can significantly increase the accuracy of existing methods for detecting DEGs on two benchmark RNA-Seq datasets and simulated datasets. PMID:22641709
NASA Astrophysics Data System (ADS)
Casati, Michele
2014-05-01
The assertion that solar activity may play a significant role in the trigger of large volcanic eruptions is, and has been discussed by many geophysicists. Numerous scientific papers have established a possible correlation between these events and the electromagnetic coupling between the Earth and the Sun, but none of them has been able to highlight a possible statistically significant relationship between large volcanic eruptions and any of the series, such as geomagnetic activity, solar wind, sunspots number. In our research, we compare the 148 volcanic eruptions with index VEI4, the major 37 historical volcanic eruptions equal to or greater than index VEI5, recorded from 1610 to 2012 , with its sunspots number. Staring, as the threshold value, a monthly sunspot number of 46 (recorded during the great eruption of Krakatoa VEI6 historical index, August 1883), we note some possible relationships and conduct a statistical test. • Of the historical 31 large volcanic eruptions with index VEI5+, recorded between 1610 and 1955, 29 of these were recorded when the SSN<46. The remaining 2 eruptions were not recorded when the SSN<46, but rather during solar maxima of the solar cycle of the year 1739 and in the solar cycle No. 14 (Shikotsu eruption of 1739 and Ksudach 1907). • Of the historical 8 large volcanic eruptions with index VEI6+, recorded from 1610 to the present, 7 of these were recorded with SSN<46 and more specifically, within the three large solar minima known : Maunder (1645-1710), Dalton (1790-1830) and during the solar minimums occurred between 1880 and 1920. As the only exception, we note the eruption of Pinatubo of June 1991, recorded in the solar maximum of cycle 22. • Of the historical 6 major volcanic eruptions with index VEI5+, recorded after 1955, 5 of these were not recorded during periods of low solar activity, but rather during solar maxima, of the cycles 19,21 and 22. The significant tests, conducted with the chi-square χ ² = 7,782, detect a
Avalanche Photodiode Statistics in Triggered-avalanche Detection Mode
NASA Technical Reports Server (NTRS)
Tan, H. H.
1984-01-01
The output of a triggered avalanche mode avalanche photodiode is modeled as Poisson distributed primary avalanche events plus conditionally Poisson distributed trapped carrier induced secondary events. The moment generating function as well as the mean and variance of the diode output statistics are derived. The dispersion of the output statistics is shown to always exceed that of the Poisson distribution. Several examples are considered in detail.
Avalanche Photodiode Statistics in Triggered-avalanche Detection Mode
NASA Technical Reports Server (NTRS)
Tan, H. H.
1984-01-01
The output of a triggered avalanche mode avalanche photodiode is modeled as Poisson distributed primary avalanche events plus conditionally Poisson distributed trapped carrier induced secondary events. The moment generating function as well as the mean and variance of the diode output statistics are derived. The dispersion of the output statistics is shown to always exceed that of the Poisson distribution. Several examples are considered in detail.
Statistical Anomaly Detection for Monitoring of Human Dynamics
NASA Astrophysics Data System (ADS)
Kamiya, K.; Fuse, T.
2015-05-01
Understanding of human dynamics has drawn attention to various areas. Due to the wide spread of positioning technologies that use GPS or public Wi-Fi, location information can be obtained with high spatial-temporal resolution as well as at low cost. By collecting set of individual location information in real time, monitoring of human dynamics is recently considered possible and is expected to lead to dynamic traffic control in the future. Although this monitoring focuses on detecting anomalous states of human dynamics, anomaly detection methods are developed ad hoc and not fully systematized. This research aims to define an anomaly detection problem of the human dynamics monitoring with gridded population data and develop an anomaly detection method based on the definition. According to the result of a review we have comprehensively conducted, we discussed the characteristics of the anomaly detection of human dynamics monitoring and categorized our problem to a semi-supervised anomaly detection problem that detects contextual anomalies behind time-series data. We developed an anomaly detection method based on a sticky HDP-HMM, which is able to estimate the number of hidden states according to input data. Results of the experiment with synthetic data showed that our proposed method has good fundamental performance with respect to the detection rate. Through the experiment with real gridded population data, an anomaly was detected when and where an actual social event had occurred.
Statistical Traffic Anomaly Detection in Time-Varying Communication Networks
2015-02-01
based anomaly detection methods are considered to be more economic and promising since they can identify novel attacks. In this work we focus on change ...82. [4] W. Lu and A. A. Ghorbani, “Network anomaly detection based on wavelet analysis ,” EURASIP Journal on Advances in Signal Processing, vol. 2009... dynamically . We formulate the anomaly detection problem as a binary composite hypothesis testing problem and develop a model-free and a model- based
Statistical Traffic Anomaly Detection in Time Varying Communication Networks
2015-02-01
based anomaly detection methods are considered to be more economic and promising since they can identify novel attacks. In this work we focus on change ...82. [4] W. Lu and A. A. Ghorbani, “Network anomaly detection based on wavelet analysis ,” EURASIP Journal on Advances in Signal Processing, vol. 2009... dynamically . We formulate the anomaly detection problem as a binary composite hypothesis testing problem and develop a model-free and a model- based
Some statistical tools for change-points detection
NASA Astrophysics Data System (ADS)
Lebarbier, E.
2012-04-01
The homogenization of climatological series sometimes amounts at finding change-points in the distribution of the observation along time. This problem is refereed to as 'segmentation' in the statistical literature. Segmentation raises interesting issues in terms of both statistical modeling, model selection and algorithmics. We will make a brief overview of these issues and present several solutions that have been recently proposed. We will also consider the joint segmentation of several series. Eventually, we will introduce the R package 'CGHseg' (cran.r-project.org/web/packages/cghseg/index.html) that as been originally developed for biological applications, and contains several useful tools for the analysis of climatological series.
Surface Electromyographic Onset Detection Based On Statistics and Information Content
NASA Astrophysics Data System (ADS)
López, Natalia M.; Orosco, Eugenio; di Sciascio, Fernando
2011-12-01
The correct detection of the onset of muscular contraction is a diagnostic tool to neuromuscular diseases and an action trigger to control myoelectric devices. In this work, entropy and information content concepts were applied in algorithmic methods to automatic detection in surface electromyographic signals.
Prostate atypia: does repeat biopsy detect clinically significant prostate cancer?
Dorin, Ryan P; Wiener, Scott; Harris, Cory D; Wagner, Joseph R
2015-05-01
While the treatment pathway in response to benign or malignant prostate biopsies is well established, there is uncertainty regarding the risk of subsequently diagnosing prostate cancer when an initial diagnosis of prostate atypia is made. As such, we investigated the likelihood of a repeat biopsy diagnosing prostate cancer (PCa) in patients in which an initial biopsy diagnosed prostate atypia. We reviewed our prospectively maintained prostate biopsy database to identify patients who underwent a repeat prostate biopsy within one year of atypia (atypical small acinar proliferation; ASAP) diagnosis between November 1987 and March 2011. Patients with a history of PCa were excluded. Chart review identified patients who underwent radical prostatectomy (RP), radiotherapy (RT), or active surveillance (AS). For some analyses, patients were divided into two subgroups based on their date of service. Ten thousand seven hundred and twenty patients underwent 13,595 biopsies during November 1987-March 2011. Five hundred and sixty seven patients (5.3%) had ASAP on initial biopsy, and 287 (50.1%) of these patients underwent a repeat biopsy within one year. Of these, 122 (42.5%) were negative, 44 (15.3%) had atypia, 19 (6.6%) had prostatic intraepithelial neoplasia, and 102 (35.6%) contained PCa. Using modified Epstein's criteria, 27/53 (51%) patients with PCa on repeat biopsy were determined to have clinically significant tumors. 37 (36.3%) proceeded to RP, 25 (24.5%) underwent RT, and 40 (39.2%) received no immediate treatment. In patients who underwent surgery, Gleason grade on final pathology was upgraded in 11 (35.5%), and downgraded 1 (3.2%) patient. ASAP on initial biopsy was associated with a significant risk of PCa on repeat biopsy in patients who subsequently underwent definitive local therapy. Patients with ASAP should be counseled on the probability of harboring both clinically significant and insignificant prostate cancer. © 2015 Wiley Periodicals, Inc.
On the statistical significance of possible variations in the solar neutrino flux
NASA Astrophysics Data System (ADS)
Subramanian, A.; Lal, Siddheshwar
A statistical study has been made on the flux of the solar neutrinos as recorded in the experiment of Davis et al. (1968 and 1983) to see if there is any evidence for its variation with time. It is found that there are certain correlations and fluctuations in the data, which when grouped indicate a pattern of temporal variation. The probability that this pattern of variation would have been caused purely by chance is estimated to be about 0.0001.
Joint multipartite photon statistics by on/off detection.
Brida, G; Genovese, M; Piacentini, F; Paris, Matteo G A
2006-12-01
We demonstrate a method to reconstruct the joint photon statistics of two or more modes of radiation by using on/off photodetection performed at different quantum efficiencies. The two-mode case is discussed in detail, and experimental results are presented for the bipartite states obtained after a beam splitter fed by a single photon state or a thermal state.
Statistical Methods for Detecting Anomalous Voting Patterns: A Case Study
2011-09-23
voting data. As a case study, we apply methods developed by Beber and Scacco to analyze polling station counts in Helmand province for the four...1 2) STATISTICAL MODELS FOR ANOMALY ANALYSIS .............................................. 2 a) The Beber -Scacco Model...carry out the necessary analysis. Beber and Scacco [4] have developed one such model for analyzing voting tallies. Their methods exploit the apparent
Outliers in Statistical Analysis: Basic Methods of Detection and Accommodation.
ERIC Educational Resources Information Center
Jacobs, Robert
Researchers are often faced with the prospect of dealing with observations within a given data set that are unexpected in terms of their great distance from the concentration of observations. For their potential to influence the mean disproportionately, thus affecting many statistical analyses, outlying observations require special care on the…
The statistics of single molecule detection: An overview
Enderlein, J.; Robbins, D.L.; Ambrose, W.P.
1995-12-31
An overview of our recent results in modeling single molecule detection in fluid flow is presented. Our mathematical approach is based on a path integral representation. The model accounts for all experimental details, such as light collection, laser excitation, hydrodynamics and diffusion, and molecular photophysics. Special attention is paid to multiple molecule crossings through the detection volume. Numerical realization of the theory is discussed. Measurements of burst size distributions in single B-phycoerythrin molecule detection experiments are presented and compared with theoretical predictions.
Extrasolar planets detections and statistics through gravitational microlensing
NASA Astrophysics Data System (ADS)
Cassan, A.
2014-10-01
Gravitational microlensing was proposed thirty years ago as a promising method to probe the existence and properties of compact objects in the Galaxy and its surroundings. The particularity and strength of the technique is based on the fact that the detection does not rely on the detection of the photon emission of the object itself, but on the way its mass affects the path of light of a background, almost aligned source. Detections thus include not only bright, but also dark objects. Today, the many successes of gravitational microlensing have largely exceeded the original promises. Microlensing contributed important results and breakthroughs in several astrophysical fields as it was used as a powerful tool to probe the Galactic structure (proper motions, extinction maps), to search for dark and compact massive objects in the halo and disk of the Milky Way, to probe the atmospheres of bulge red giant stars, to search for low-mass stars and brown dwarfs and to hunt for extrasolar planets. As an extrasolar planet detection method, microlensing nowadays stands in the top five of the successful observational techniques. Compared to other (complementary) detection methods, microlensing provides unique information on the population of exoplanets, because it allows the detection of very low-mass planets (down to the mass of the Earth) at large orbital distances from their star (0.5 to 10 AU). It is also the only technique that allows the discovery of planets at distances from Earth greater than a few kiloparsecs, up to the bulge of the Galaxy. Microlensing discoveries include the first ever detection of a cool super-Earth around an M-dwarf star, the detection of several cool Neptunes, Jupiters and super-Jupiters, as well as multi-planetary systems and brown dwarfs. So far, the least massive planet detected by microlensing has only three times the mass of the Earth and orbits a very low mass star at the edge of the brown dwarf regime. Several free-floating planetary
Significant pathways detection in osteoporosis based on the bibliometric network.
Sun, G J; Guo, T; Chen, Y; Xu, B; Guo, J H; Zhao, J N
2013-01-01
Osteoporosis is a significant public health issue worldwide. The underlying mechanism of osteoporosis is an imbalance between bone resorption and bone formation. However, the exact pathology is still unclear, and more related genes are on demand. Here, we aim to identify the differentially expressed genes in osteoporosis patients and control. Biblio-MetReS, a tool to reconstruct gene and protein networks from automated literature analysis, was used for identifying potential interactions among target genes. Relevant signaling pathways were also identified through pathway enrichment analysis. Our results showed that 56 differentially expressed genes were identified. Of them, STAT1, CXCL10, SOCS3, ADM, THBS1, SOD2, and ERG2 have been demonstrated involving in osteoporosis. Further, a bibliometric network was constructed between DEGs and other genes through the Biblio-MetReS. The results showed that STAT1 could interact with CXCL10 through Toll-like receptor signaling pathway and Chemokine signaling pathway. STAT1 interacted with SOCS3 through JAK/STAT pathway.
Detection of significant pathways in osteoporosis based on graph clustering.
Xiao, Haijun; Shan, Liancheng; Zhu, Haiming; Xue, Feng
2012-12-01
Osteoporosis is the most common and serious skeletal disorder among the elderly, characterized by a low bone mineral density (BMD). Low bone mass in the elderly is highly dependent on their peak bone mass (PBM) as young adults. Circulating monocytes serve as early progenitors of osteoclasts and produce significant molecules for bone metabolism. An improved understanding of the biology and genetics of osteoclast differentiation at the pathway level is likely to be beneficial for the development of novel targeted approaches for osteoporosis. The objective of this study was to explore gene expression profiles comprehensively by grouping individual differentially expressed genes (DEGs) into gene sets and pathways using the graph clustering approach and Gene Ontology (GO) term enrichment analysis. The results indicated that the DEGs between high and low PBM samples were grouped into nine gene sets. The genes in clusters 1 and 8 (including GBP1, STAT1, CXCL10 and EIF2AK2) may be associated with osteoclast differentiation by the immune system response. The genes in clusters 2, 7 and 9 (including SOCS3, SOD2, ATF3, ADM EGR2 and BCL2A1) may be associated with osteoclast differentiation by responses to various stimuli. This study provides a number of candidate genes that warrant further investigation, including DDX60, HERC5, RSAD2, SIGLEC1, CMPK2, MX1, SEPING1, EPSTI1, C9orf72, PHLDA2, PFKFB3, PLEKHG2, ANKRD28, IL1RN and RNF19B.
Le Goualher, G; Argenti, A M; Duyme, M; Baaré, W F; Hulshoff Pol, H E; Boomsma, D I; Zouaoui, A; Barillot, C; Evans, A C
2000-05-01
Principal Component Analysis allows a quantitative description of shape variability with a restricted number of parameters (or modes) which can be used to quantify the difference between two shapes through the computation of a modal distance. A statistical test can then be applied to this set of measurements in order to detect a statistically significant difference between two groups. We have applied this methodology to highlight evidence of genetic encoding of the shape of neuroanatomical structures. To investigate genetic constraint, we studied if shapes were more similar within 10 pairs of monozygotic twins than within interpairs and compared the results with those obtained from 10 pairs of dizygotic twins. The statistical analysis was performed using a Mantel permutation test. We show, using simulations, that this statistical test applied on modal distances can detect a possible genetic encoding. When applied to real data, this study highlighted genetic constraints on the shape of the central sulcus. We found from 10 pairs of monozygotic twins that the intrapair modal distance of the central sulcus was significantly smaller than the interpair modal distance, for both the left central sulcus (Z = -2.66; P < 0.005) and the right central sulcus (Z = -2.26; P < 0.05). Genetic constraints on the definition of the central sulcus shape were confirmed by applying the same experiment to 10 pairs of normal young individuals (Z = -1.39; Z = -0.63, i.e., values not significant at the P < 0.05 level) and 10 pairs of dizygotic twins (Z = 0.47; Z = 0.03, i.e., values not significant at the P < 0.05 level).
ERIC Educational Resources Information Center
Smith, A. Delany; Henson, Robin K.
This paper addresses the state of the art regarding the use of statistical significance tests (SSTs). How social science research will be conducted in the future is impacted directly by current debates regarding hypothesis testing. This paper: (1) briefly explicates the current debate on hypothesis testing; (2) reviews the newly published report…
Optimizing automated gas turbine fault detection using statistical pattern recognition
NASA Astrophysics Data System (ADS)
Loukis, E.; Mathioudakis, K.; Papailiou, K.
1992-06-01
A method enabling the automated diagnosis of Gas Turbine Compressor blade faults, based on the principles of statistical pattern recognition is initially presented. The decision making is based on the derivation of spectral patterns from dynamic measurements data and then the calculation of discriminants with respect to reference spectral patterns of the faults while it takes into account their statistical properties. A method of optimizing the selection of discriminants using dynamic measurements data is also presented. A few scalar discriminants are derived, in such a way that the maximum available discrimination potential is exploited. In this way the success rate of automated decision making is further improved, while the need for intuitive discriminant selection is eliminated. The effectiveness of the proposed methods is demonstrated by application to data coming from an Industrial Gas Turbine while extension to other aspects of Fault Diagnosis is discussed.
Statistical algorithms for target detection in coherent active polarimetric images.
Goudail, F; Réfrégier, P
2001-12-01
We address the problem of small-target detection with a polarimetric imager that provides orthogonal state contrast images. Such active systems allow one to measure the degree of polarization of the light backscattered by purely depolarizing isotropic materials. To be independent of the spatial nonuniformities of the illumination beam, small-target detection on the orthogonal state contrast image must be performed without using the image of backscattered intensity. We thus propose and develop a simple and efficient target detection algorithm based on a nonlinear pointwise transformation of the orthogonal state contrast image followed by a maximum-likelihood algorithm optimal for additive Gaussian perturbations. We demonstrate the efficiency of this suboptimal technique in comparison with the optimal one, which, however, assumes a priori knowledge about the scene that is not available in practice. We illustrate the performance of this approach on both simulated and real polarimetric images.
Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.
Application of hotspot detection using spatial scan statistic: Study of criminality in Indonesia
NASA Astrophysics Data System (ADS)
Runadi, Taruga; Widyaningsih, Yekti
2017-03-01
According to the police registered data, the number of criminal cases tends to fluctuate during 2011 to 2013. It means there is no significant reduction cases number of criminal acts during that period. Local government needs to observe whether their area was a high risk of criminal case. The objectives of this study are to detect hotspot area of certain criminal cases using spatial scan statistic. This study analyzed the data of 22 criminal types cases based on province in Indonesia that occurred during 2013. The data was obtained from Badan Pusat Statistik (BPS) that was released in 2014. Hotspot detection was performed according to the likelihood ratio of the Poisson model using SaTScanTM software and then mapped using R. The spatial scan statistic method successfully detected provinces that was categorized as hotspot for 22 crime types cases being analyzed with p-value less than 0.05. The local governments of province that were detected as hotspot area of certain crime cases should provide more attention to improve security quality.
Elçi, Alper; Polat, Rahime
2011-01-01
The main objective of this study was to statistically evaluate the significance of seasonal groundwater quality change and to provide an assessment on the spatial distribution of specific groundwater quality parameters. The studied area was the Mount Nif karstic aquifer system located in the southeast of the city of Izmir. Groundwater samples were collected at 57 sampling points in the rainy winter and dry summer seasons. Groundwater quality indicators of interest were electrical conductivity (EC), nitrate, chloride, sulfate, sodium, some heavy metals, and arsenic. Maps showing the spatial distributions and temporal changes of these parameters were created to further interpret spatial patterns and seasonal changes in groundwater quality. Furthermore, statistical tests were conducted to confirm whether the seasonal changes for each quality parameter were statistically significant. It was evident from the statistical tests that the seasonal changes in most groundwater quality parameters were statistically not significant. However, the increase in EC values and aluminum concentrations from winter to summer was found to be significant. Furthermore, a negative correlation between sampling elevation and groundwater quality was found. It was shown that with simple statistical testing, important conclusions can be drawn from limited monitoring data. It was concluded that less groundwater recharge in the dry period of the year does not always imply higher concentrations for all groundwater quality parameters because water circulation times, lithology, quality and extent of recharge, and land use patterns also play an important role on the alteration of groundwater quality.
Öfner, D; Maier, H; Riedmann, B; Holzberger, P; Nogler, M; Tötsch, M; Bankfalvi, A; Winde, G; Böcker, W; Schmid, K W
1995-01-01
Aims—To investigate the correlation between the expression of the p53 and mdm-2 oncoproteins and to assess their prognostic value in colorectal cancer. Methods—Using a polyclonal (CM1) and a monoclonal antibody directed against p53 and mdm-2, respectively, these oncoproteins were stained immunohistochemically in 109 colorectal adenocarcinomas. Results—p53 was detected in less than 10% of tumour cells in 11 of 109 adenocarcinomas, in 10-50% of tumour cells, in 17 of 109 adenocarcinomas, and in more than 50% of tumour cells in 32 of 109 adenocarcinomas. Expression of mdm-2 was detected in 22 of 109 (20%) cases investigated, of which 19 showed concomitant p53 expression. In most cases mdm-2 immunoreactivity was strongly associated with a small proportion of p53 positive tumour cells. Both p53 and mdm-2 expression lacked statistical significance when correlated with common staging and grading parameters. Conclusions—Detection of p53 and mdm-2 oncoprotein expression, detected using immunohistochemistry, is of no prognostic value in colorectal cancer. However, the close correlation between mdm-2 immunoreactivity and the proportion of p53 positive cells provides further evidence that the mdm-2 gene product interacts with p53 protein. PMID:16695968
Determination of significant variables in compound wear using a statistical model
Pumwa, J.; Griffin, R.B.; Smith, C.M.
1997-07-01
This paper will report on a study of dry compound wear of normalized 1018 steel on A2 tool steel. Compound wear is a combination of sliding and impact wear. The compound wear machine consisted of an A2 tool steel wear plate that could be rotated, and an indentor head that held the 1018 carbon steel wear pins. The variables in the system were the rpm of the wear plate, the force with which the indentor strikes the wear plate, and the frequency with which the indentor strikes the wear plate. A statistically designed experiment was used to analyze the effects of the different variables on the compound wear process. The model developed showed that wear could be reasonably well predicted using a defined variable that was called the workrate. The paper will discuss the results of the modeling and the metallurgical changes that occurred at the indentor interface, with the wear plate, during the wear process.
A Non-Parametric Surrogate-based Test of Significance for T-Wave Alternans Detection
Nemati, Shamim; Abdala, Omar; Bazán, Violeta; Yim-Yeh, Susie; Malhotra, Atul; Clifford, Gari
2010-01-01
We present a non-parametric adaptive surrogate test that allows for the differentiation of statistically significant T-Wave Alternans (TWA) from alternating patterns that can be solely explained by the statistics of noise. The proposed test is based on estimating the distribution of noise induced alternating patterns in a beat sequence from a set of surrogate data derived from repeated reshuffling of the original beat sequence. Thus, in assessing the significance of the observed alternating patterns in the data no assumptions are made about the underlying noise distribution. In addition, since the distribution of noise-induced alternans magnitudes is calculated separately for each sequence of beats within the analysis window, the method is robust to data non-stationarities in both noise and TWA. The proposed surrogate method for rejecting noise was compared to the standard noise rejection methods used with the Spectral Method (SM) and the Modified Moving Average (MMA) techniques. Using a previously described realistic multi-lead model of TWA, and real physiological noise, we demonstrate the proposed approach reduces false TWA detections, while maintaining a lower missed TWA detection compared with all the other methods tested. A simple averaging-based TWA estimation algorithm was coupled with the surrogate significance testing and was evaluated on three public databases; the Normal Sinus Rhythm Database (NRSDB), the Chronic Heart Failure Database (CHFDB) and the Sudden Cardiac Death Database (SCDDB). Differences in TWA amplitudes between each database were evaluated at matched heart rate (HR) intervals from 40 to 120 beats per minute (BPM). Using the two-sample Kolmogorov-Smirnov test, we found that significant differences in TWA levels exist between each patient group at all decades of heart rates. The most marked difference was generally found at higher heart rates, and the new technique resulted in a larger margin of separability between patient populations than
The Development of Statistical Indices for Detecting Cheaters.
ERIC Educational Resources Information Center
Angoff, William H.
Comparison data on SAT verbal and mathematical were collected on pairs of examinees in three samples for later use in detecting instances of willful copying. Two of the samples were constructed with the knowledge that no examinee could possibly have copied from the answer sheet of any other examinee in the sample. The third sample was taken…
Incipient Fault Detection Using Higher-Order Statistics
1991-08-01
109 5.2 Simulated Wear Experiment..................................109 5.2. 1 Experimental Design ... Design .................................... 128 5.3.2 Collected Data.........................................130 5.3.3 Results...detecting crankshaft drill wear (Liu and Wiu, 1990) using thrust force and axial acceleration amplitude signals. Acoustic emission spectrum features and
LFM-Pro: a tool for detecting significant local structural sites in proteins.
Sacan, Ahmet; Ozturk, Ozgur; Ferhatosmanoglu, Hakan; Wang, Yusu
2007-03-15
The rapidly growing protein structure repositories have opened up new opportunities for discovery and analysis of functional and evolutionary relationships among proteins. Detecting conserved structural sites that are unique to a protein family is of great value in identification of functionally important atoms and residues. Currently available methods are computationally expensive and fail to detect biologically significant local features. We propose Local Feature Mining in Proteins (LFM-Pro) as a framework for automatically discovering family-specific local sites and the features associated with these sites. Our method uses the distance field to backbone atoms to detect geometrically significant structural centers of the protein. A feature vector is generated from the geometrical and biochemical environment around these centers. These features are then scored using a statistical measure, for their ability to distinguish a family of proteins from a background set of unrelated proteins, and successful features are combined into a representative set for the protein family. The utility and success of LFM-Pro are demonstrated on trypsin-like serine proteases family of proteins and on a challenging classification dataset via comparison with DALI. The results verify that our method is successful both in identifying the distinctive sites of a given family of proteins, and in classifying proteins using the extracted features. The software and the datasets are freely available for academic research use at http://bioinfo.ceng.metu.edu.tr/Pub/LFMPro.
ERIC Educational Resources Information Center
Rogosa, David
1981-01-01
The form of the Johnson-Neyman region of significance is shown to be determined by the statistic for testing the null hypothesis that the population within-group regressions are parallel. Results are obtained for both simultaneous and nonsimultaneous regions of significance. (Author)
Detecting modules in biological networks by edge weight clustering and entropy significance
Lecca, Paola; Re, Angela
2015-01-01
Detection of the modular structure of biological networks is of interest to researchers adopting a systems perspective for the analysis of omics data. Computational systems biology has provided a rich array of methods for network clustering. To date, the majority of approaches address this task through a network node classification based on topological or external quantifiable properties of network nodes. Conversely, numerical properties of network edges are underused, even though the information content which can be associated with network edges has augmented due to steady advances in molecular biology technology over the last decade. Properly accounting for network edges in the development of clustering approaches can become crucial to improve quantitative interpretation of omics data, finally resulting in more biologically plausible models. In this study, we present a novel technique for network module detection, named WG-Cluster (Weighted Graph CLUSTERing). WG-Cluster's notable features, compared to current approaches, lie in: (1) the simultaneous exploitation of network node and edge weights to improve the biological interpretability of the connected components detected, (2) the assessment of their statistical significance, and (3) the identification of emerging topological properties in the detected connected components. WG-Cluster utilizes three major steps: (i) an unsupervised version of k-means edge-based algorithm detects sub-graphs with similar edge weights, (ii) a fast-greedy algorithm detects connected components which are then scored and selected according to the statistical significance of their scores, and (iii) an analysis of the convolution between sub-graph mean edge weight and connected component score provides a summarizing view of the connected components. WG-Cluster can be applied to directed and undirected networks of different types of interacting entities and scales up to large omics data sets. Here, we show that WG-Cluster can be
Ultrabroadband direct detection of nonclassical photon statistics at telecom wavelength
Wakui, Kentaro; Eto, Yujiro; Benichi, Hugo; Izumi, Shuro; Yanagida, Tetsufumi; Ema, Kazuhiro; Numata, Takayuki; Fukuda, Daiji; Takeoka, Masahiro; Sasaki, Masahide
2014-01-01
Broadband light sources play essential roles in diverse fields, such as high-capacity optical communications, optical coherence tomography, optical spectroscopy, and spectrograph calibration. Although a nonclassical state from spontaneous parametric down-conversion may serve as a quantum counterpart, its detection and characterization have been a challenging task. Here we demonstrate the direct detection of photon numbers of an ultrabroadband (110 nm FWHM) squeezed state in the telecom band centred at 1535 nm wavelength, using a superconducting transition-edge sensor. The observed photon-number distributions violate Klyshko's criterion for the nonclassicality. From the observed photon-number distribution, we evaluate the second- and third-order correlation functions, and characterize a multimode structure, which implies that several tens of orthonormal modes of squeezing exist in the single optical pulse. Our results and techniques open up a new possibility to generate and characterize frequency-multiplexed nonclassical light sources for quantum info-communications technology. PMID:24694515
Ultrabroadband direct detection of nonclassical photon statistics at telecom wavelength.
Wakui, Kentaro; Eto, Yujiro; Benichi, Hugo; Izumi, Shuro; Yanagida, Tetsufumi; Ema, Kazuhiro; Numata, Takayuki; Fukuda, Daiji; Takeoka, Masahiro; Sasaki, Masahide
2014-04-03
Broadband light sources play essential roles in diverse fields, such as high-capacity optical communications, optical coherence tomography, optical spectroscopy, and spectrograph calibration. Although a nonclassical state from spontaneous parametric down-conversion may serve as a quantum counterpart, its detection and characterization have been a challenging task. Here we demonstrate the direct detection of photon numbers of an ultrabroadband (110 nm FWHM) squeezed state in the telecom band centred at 1535 nm wavelength, using a superconducting transition-edge sensor. The observed photon-number distributions violate Klyshko's criterion for the nonclassicality. From the observed photon-number distribution, we evaluate the second- and third-order correlation functions, and characterize a multimode structure, which implies that several tens of orthonormal modes of squeezing exist in the single optical pulse. Our results and techniques open up a new possibility to generate and characterize frequency-multiplexed nonclassical light sources for quantum info-communications technology.
Reliable detection of directional couplings using rank statistics.
Chicharro, Daniel; Andrzejak, Ralph G
2009-08-01
To detect directional couplings from time series various measures based on distances in reconstructed state spaces were introduced. These measures can, however, be biased by asymmetries in the dynamics' structure, noise color, or noise level, which are ubiquitous in experimental signals. Using theoretical reasoning and results from model systems we identify the various sources of bias and show that most of them can be eliminated by an appropriate normalization. We furthermore diminish the remaining biases by introducing a measure based on ranks of distances. This rank-based measure outperforms existing distance-based measures concerning both sensitivity and specificity for directional couplings. Therefore, our findings are relevant for a reliable detection of directional couplings from experimental signals.
How Do Statistical Detection Methods Compare to Entropy Measures
2012-08-28
project ( GNU /GPL) that makes possible the detection of hidden information in different digital media. StegSecret is a java-based multiplatform...variety of UNIX platforms, Windows and MacOS. VSL studio: VSL is free image steganography and steganalysis software in form of graphical block ...verification: it’s the average value of the LSBs on the current block of 128 bytes. So, if there is a random message embedded, this green curve will
Two New Statistics To Detect Answer Copying. Research Report.
ERIC Educational Resources Information Center
Sotaridona, Leonardo S.; Meijer, Rob R.
Two new indices to detect answer copying on a multiple-choice test, S(1) and S(2) (subscripts), are proposed. The S(1) index is similar to the K-index (P. Holland, 1996) and the K-overscore(2), (K2) index (L. Sotaridona and R. Meijer, in press), but the distribution of the number of matching incorrect answers of the source (examinee s) and the…
Statistical anomaly detection for individuals with cognitive impairments.
Chang, Yao-Jen; Lin, Kang-Ping; Chou, Li-Der; Chen, Shu-Fang; Ma, Tian-Shyan
2014-01-01
We study anomaly detection in a context that considers user trajectories as input and tries to identify anomalies for users following normal routes such as taking public transportation from the workplace to home or vice versa. Trajectories are modeled as a discrete-time series of axis-parallel constraints ("boxes") in the 2-D space. The anomaly can be estimated by considering two trajectories, where one trajectory is the current movement pattern and the other is a weighted trajectory collected from N norms. The proposed system was implemented and evaluated with eight individuals with cognitive impairments. The experimental results showed that recall was 95.0% and precision was 90.9% on average without false alarm suppression. False alarms and false negatives dropped when axis rotation was applied. The precision with axis rotation was 97.6% and the recall was 98.8%. The average time used for sending locations, running anomaly detection, and issuing warnings was in the range of 15.1-22.7 s. Our findings suggest that the ability to adapt anomaly detection devices for appropriate timing of self-alerts will be particularly important.
Statistical detection of the mid-Pleistocene transition
Maasch, K.A. )
1988-01-01
Statistical methods have been used to show quantitatively that the transition in mean and variance observed in delta O-18 records during the middle of the Pleistocene was abrupt. By applying these methods to all of the available records spanning the entire Pleistocene, it appears that this jump was global and primarily represents an increase in ice mass. At roughly the same time an abrupt decrease in sea surface temperature also occurred, indicative of sudden global cooling. This kind of evidence suggests a possible bifurcation of the climate system that must be accounted for in a complete explanation of the ice ages. Theoretical models including internal dynamics are capable of exhibiting this kind of rapid transition. 50 refs.
Statistical detection of the mid-Pleistocene transition
NASA Technical Reports Server (NTRS)
Maasch, K. A.
1988-01-01
Statistical methods have been used to show quantitatively that the transition in mean and variance observed in delta O-18 records during the middle of the Pleistocene was abrupt. By applying these methods to all of the available records spanning the entire Pleistocene, it appears that this jump was global and primarily represents an increase in ice mass. At roughly the same time an abrupt decrease in sea surface temperature also occurred, indicative of sudden global cooling. This kind of evidence suggests a possible bifurcation of the climate system that must be accounted for in a complete explanation of the ice ages. Theoretical models including internal dynamics are capable of exhibiting this kind of rapid transition.
van Assen, Marcel A L M; van Aert, Robbie C M; Nuijten, Michèle B; Wicherts, Jelte M
2014-01-01
De Winter and Happee examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that "selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective" (p.4). Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing. Publishing everything is more effective than only reporting significant outcomes.
ERIC Educational Resources Information Center
Hojat, Mohammadreza; Xu, Gang
2004-01-01
Effect Sizes (ES) are an increasingly important index used to quantify the degree of practical significance of study results. This paper gives an introduction to the computation and interpretation of effect sizes from the perspective of the consumer of the research literature. The key points made are: (1) "ES" is a useful indicator of the…
ERIC Educational Resources Information Center
Thompson, Bruce; Snyder, Patricia A.
1998-01-01
Investigates two aspects of research analyses in quantitative research studies reported in the 1996 issues of "Journal of Counseling & Development" (JCD). Acceptable methodological practice regarding significance testing and evaluation of score reliability has evolved considerably. Contemporary thinking on these issues is described; practice as…
Zou, X H; Zhu, Y P; Ren, G Q; Li, G C; Zhang, J; Zou, L J; Feng, Z B; Li, B H
2017-02-20
Objective: To evaluate the significance of bacteria detection with filter paper method on diagnosis of diabetic foot wound infection. Methods: Eighteen patients with diabetic foot ulcer conforming to the study criteria were hospitalized in Liyuan Hospital Affiliated to Tongji Medical College of Huazhong University of Science and Technology from July 2014 to July 2015. Diabetic foot ulcer wounds were classified according to the University of Texas diabetic foot classification (hereinafter referred to as Texas grade) system, and general condition of patients with wounds in different Texas grade was compared. Exudate and tissue of wounds were obtained, and filter paper method and biopsy method were adopted to detect the bacteria of wounds of patients respectively. Filter paper method was regarded as the evaluation method, and biopsy method was regarded as the control method. The relevance, difference, and consistency of the detection results of two methods were tested. Sensitivity, specificity, positive predictive value, negative predictive value, and accuracy of filter paper method in bacteria detection were calculated. Receiver operating characteristic (ROC) curve was drawn based on the specificity and sensitivity of filter paper method in bacteria detection of 18 patients to predict the detection effect of the method. Data were processed with one-way analysis of variance and Fisher's exact test. In patients tested positive for bacteria by biopsy method, the correlation between bacteria number detected by biopsy method and that by filter paper method was analyzed with Pearson correlation analysis. Results: (1) There were no statistically significant differences among patients with wounds in Texas grade 1, 2, and 3 in age, duration of diabetes, duration of wound, wound area, ankle brachial index, glycosylated hemoglobin, fasting blood sugar, blood platelet count, erythrocyte sedimentation rate, C-reactive protein, aspartate aminotransferase, serum creatinine, and
Key statistics related to CO/sub 2/ emissions: Significant contributing countries
Kellogg, M.A.; Edmonds, J.A.; Scott, M.J.; Pomykala, J.S.
1987-07-01
This country selection task report describes and applies a methodology for identifying a set of countries responsible for significant present and anticipated future emissions of CO/sub 2/ and other radiatively important gases (RIGs). The identification of countries responsible for CO/sub 2/ and other RIGs emissions will help determine to what extent a select number of countries might be capable of influencing future emissions. Once identified, those countries could potentially exercise cooperative collective control of global emissions and thus mitigate the associated adverse affects of those emissions. The methodology developed consists of two approaches: the resource approach and the emissions approach. While conceptually very different, both approaches yield the same fundamental conclusion. The core of any international initiative to control global emissions must include three key countries: the US, USSR, and the People's Republic of China. It was also determined that broader control can be achieved through the inclusion of sixteen additional countries with significant contributions to worldwide emissions.
Holtzman, Jessica N; Miller, Shefali; Hooshmand, Farnaz; Wang, Po W; Chang, Kiki D; Hill, Shelley J; Rasgon, Natalie L; Ketter, Terence A
2015-07-01
The strengths and limitations of considering childhood-and adolescent-onset bipolar disorder (BD) separately versus together remain to be established. We assessed this issue. BD patients referred to the Stanford Bipolar Disorder Clinic during 2000-2011 were assessed with the Systematic Treatment Enhancement Program for BD Affective Disorders Evaluation. Patients with childhood- and adolescent-onset were compared to those with adult-onset for 7 unfavorable bipolar illness characteristics with replicated associations with early-onset patients. Among 502 BD outpatients, those with childhood- (<13 years, N=110) and adolescent- (13-18 years, N=218) onset had significantly higher rates for 4/7 unfavorable illness characteristics, including lifetime comorbid anxiety disorder, at least ten lifetime mood episodes, lifetime alcohol use disorder, and prior suicide attempt, than those with adult-onset (>18 years, N=174). Childhood- but not adolescent-onset BD patients also had significantly higher rates of first-degree relative with mood disorder, lifetime substance use disorder, and rapid cycling in the prior year. Patients with pooled childhood/adolescent - compared to adult-onset had significantly higher rates for 5/7 of these unfavorable illness characteristics, while patients with childhood- compared to adolescent-onset had significantly higher rates for 4/7 of these unfavorable illness characteristics. Caucasian, insured, suburban, low substance abuse, American specialty clinic-referred sample limits generalizability. Onset age is based on retrospective recall. Childhood- compared to adolescent-onset BD was more robustly related to unfavorable bipolar illness characteristics, so pooling these groups attenuated such relationships. Further study is warranted to determine the extent to which adolescent-onset BD represents an intermediate phenotype between childhood- and adult-onset BD. Copyright © 2015 Elsevier B.V. All rights reserved.
van Assen, Marcel A. L. M.; van Aert, Robbie C. M.; Nuijten, Michèle B.; Wicherts, Jelte M.
2014-01-01
Background De Winter and Happee [1] examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that “selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective” (p.4). Methods and Findings Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing. Conclusion Publishing everything is more effective than only reporting significant outcomes. PMID:24465448
Statistically qualified neuro-analytic failure detection method and system
Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.
2002-03-02
An apparatus and method for monitoring a process involve development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two stages: deterministic model adaption and stochastic model modification of the deterministic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics, augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation error minimization technique. Stochastic model modification involves qualifying any remaining uncertainty in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system. Illustrative of the method and apparatus, the method is applied to a peristaltic pump system.
Quantitative linkage: a statistical procedure for its detection and estimation.
Hill, A P
1975-05-01
A new approach for detecting and estimating quantitative linkage which uses sibship data is presented. Using a nested analysis of variance design (with marker genotype nested within sibship), it is shown that under the null hypothesis of no linkage, the expected between marker genotype within sibship mean square (EMSbeta) is equal to the expected within marker genotype within sibship mean square (EMSe), while under the alternative hypothesis of linkage, the first is greater than the second. Thus the regular F-ratio, MSbeta/MSe, can be used to test for quantitative linkage. This is true for both backcross and intercross matings and whether or not there is dominance at the marker locus. A second test involving the comparison of the within marker genotype within sibship variances is available for intercross matings. A maximum likelihood procedure for the estimation for the recombination frequency is also presented.
Statistical Fault Detection for Parallel Applications with AutomaDeD
Bronevetsky, G; Laguna, I; Bagchi, S; de Supinski, B R; Ahn, D; Schulz, M
2010-03-23
Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. The large component count means that these systems fail frequently and often in very complex ways, making them difficult to use and maintain. While prior work on fault detection and diagnosis has focused on faults that significantly reduce system functionality, the wide variety of failure modes in modern systems makes them likely to fail in complex ways that impair system performance but are difficult to detect and diagnose. This paper presents AutomaDeD, a statistical tool that models the timing behavior of each application task and tracks its behavior to identify any abnormalities. If any are observed, AutomaDeD can immediately detect them and report to the system administrator the task where the problem began. This identification of the fault's initial manifestation can provide administrators with valuable insight into the fault's root causes, making it significantly easier and cheaper for them to understand and repair it. Our experimental evaluation shows that AutomaDeD detects a wide range of faults immediately after they occur 80% of the time, with a low false-positive rate. Further, it identifies weaknesses of the current approach that motivate future research.
A powerful weighted statistic for detecting group differences of directed biological networks.
Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Xu, Jing; Ma, Daoxin; Xue, Fuzhong
2016-09-30
Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. Different physiological conditions such as cases and controls may manifest as different networks. Statistical comparison between biological networks can provide not only new insight into the disease mechanism but statistical guidance for drug development. However, the methods developed in previous studies are inadequate to capture the changes in both the nodes and edges, and often ignore the network structure. In this study, we present a powerful weighted statistical test for group differences of directed biological networks, which is independent of the network attributes and can capture the changes in both the nodes and edges, as well as simultaneously accounting for the network structure through putting more weights on the difference of nodes locating on relatively more important position. Simulation studies illustrate that this method had better performance than previous ones under various sample sizes and network structures. One application to GWAS of leprosy successfully identifies the specific gene interaction network contributing to leprosy. Another real data analysis significantly identifies a new biological network, which is related to acute myeloid leukemia. One potential network responsible for lung cancer has also been significantly detected. The source R code is available on our website.
A powerful weighted statistic for detecting group differences of directed biological networks
Yuan, Zhongshang; Ji, Jiadong; Zhang, Xiaoshuai; Xu, Jing; Ma, Daoxin; Xue, Fuzhong
2016-01-01
Complex disease is largely determined by a number of biomolecules interwoven into networks, rather than a single biomolecule. Different physiological conditions such as cases and controls may manifest as different networks. Statistical comparison between biological networks can provide not only new insight into the disease mechanism but statistical guidance for drug development. However, the methods developed in previous studies are inadequate to capture the changes in both the nodes and edges, and often ignore the network structure. In this study, we present a powerful weighted statistical test for group differences of directed biological networks, which is independent of the network attributes and can capture the changes in both the nodes and edges, as well as simultaneously accounting for the network structure through putting more weights on the difference of nodes locating on relatively more important position. Simulation studies illustrate that this method had better performance than previous ones under various sample sizes and network structures. One application to GWAS of leprosy successfully identifies the specific gene interaction network contributing to leprosy. Another real data analysis significantly identifies a new biological network, which is related to acute myeloid leukemia. One potential network responsible for lung cancer has also been significantly detected. The source R code is available on our website. PMID:27686331
Ramírez, J; Górriz, J M; Segura, J C
2007-05-01
Currently, there are technology barriers inhibiting speech processing systems that work in extremely noisy conditions from meeting the demands of modern applications. These systems often require a noise reduction system working in combination with a precise voice activity detector (VAD). This paper shows statistical likelihood ratio tests formulated in terms of the integrated bispectrum of the noisy signal. The integrated bispectrum is defined as a cross spectrum between the signal and its square, and therefore a function of a single frequency variable. It inherits the ability of higher order statistics to detect signals in noise with many other additional advantages: (i) Its computation as a cross spectrum leads to significant computational savings, and (ii) the variance of the estimator is of the same order as that of the power spectrum estimator. The proposed approach incorporates contextual information to the decision rule, a strategy that has reported significant benefits for robust speech recognition applications. The proposed VAD is compared to the G.729, adaptive multirate, and advanced front-end standards as well as recently reported algorithms showing a sustained advantage in speech/nonspeech detection accuracy and speech recognition performance.
A High-Order Statistical Tensor Based Algorithm for Anomaly Detection in Hyperspectral Imagery
Geng, Xiurui; Sun, Kang; Ji, Luyan; Zhao, Yongchao
2014-01-01
Recently, high-order statistics have received more and more interest in the field of hyperspectral anomaly detection. However, most of the existing high-order statistics based anomaly detection methods require stepwise iterations since they are the direct applications of blind source separation. Moreover, these methods usually produce multiple detection maps rather than a single anomaly distribution image. In this study, we exploit the concept of coskewness tensor and propose a new anomaly detection method, which is called COSD (coskewness detector). COSD does not need iteration and can produce single detection map. The experiments based on both simulated and real hyperspectral data sets verify the effectiveness of our algorithm. PMID:25366706
A high-order statistical tensor based algorithm for anomaly detection in hyperspectral imagery.
Geng, Xiurui; Sun, Kang; Ji, Luyan; Zhao, Yongchao
2014-11-04
Recently, high-order statistics have received more and more interest in the field of hyperspectral anomaly detection. However, most of the existing high-order statistics based anomaly detection methods require stepwise iterations since they are the direct applications of blind source separation. Moreover, these methods usually produce multiple detection maps rather than a single anomaly distribution image. In this study, we exploit the concept of coskewness tensor and propose a new anomaly detection method, which is called COSD (coskewness detector). COSD does not need iteration and can produce single detection map. The experiments based on both simulated and real hyperspectral data sets verify the effectiveness of our algorithm.
Mourão, Márcio; Satin, Leslie; Schnell, Santiago
2014-01-01
We investigated commonly used methods (Autocorrelation, Enright, and Discrete Fourier Transform) to estimate the periodicity of oscillatory data and determine which method most accurately estimated periods while being least vulnerable to the presence of noise. Both simulated and experimental data were used in the analysis performed. We determined the significance of calculated periods by applying these methods to several random permutations of the data and then calculating the probability of obtaining the period's peak in the corresponding periodograms. Our analysis suggests that the Enright method is the most accurate for estimating the period of oscillatory data. We further show that to accurately estimate the period of oscillatory data, it is necessary that at least five cycles of data are sampled, using at least four data points per cycle. These results suggest that the Enright method should be more widely applied in order to improve the analysis of oscillatory data. PMID:24699692
Mourão, Márcio; Satin, Leslie; Schnell, Santiago
2014-01-01
We investigated commonly used methods (Autocorrelation, Enright, and Discrete Fourier Transform) to estimate the periodicity of oscillatory data and determine which method most accurately estimated periods while being least vulnerable to the presence of noise. Both simulated and experimental data were used in the analysis performed. We determined the significance of calculated periods by applying these methods to several random permutations of the data and then calculating the probability of obtaining the period's peak in the corresponding periodograms. Our analysis suggests that the Enright method is the most accurate for estimating the period of oscillatory data. We further show that to accurately estimate the period of oscillatory data, it is necessary that at least five cycles of data are sampled, using at least four data points per cycle. These results suggest that the Enright method should be more widely applied in order to improve the analysis of oscillatory data.
ERIC Educational Resources Information Center
Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.
2006-01-01
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…
ERIC Educational Resources Information Center
Oshima, T. C.; Raju, Nambury S.; Nanda, Alice O.
2006-01-01
A new item parameter replication method is proposed for assessing the statistical significance of the noncompensatory differential item functioning (NCDIF) index associated with the differential functioning of items and tests framework. In this new method, a cutoff score for each item is determined by obtaining a (1-alpha ) percentile rank score…
Meta-analysis using effect size distributions of only statistically significant studies.
van Assen, Marcel A L M; van Aert, Robbie C M; Wicherts, Jelte M
2015-09-01
Publication bias threatens the validity of meta-analytic results and leads to overestimation of the effect size in traditional meta-analysis. This particularly applies to meta-analyses that feature small studies, which are ubiquitous in psychology. Here we develop a new method for meta-analysis that deals with publication bias. This method, p-uniform, enables (a) testing of publication bias, (b) effect size estimation, and (c) testing of the null-hypothesis of no effect. No current method for meta-analysis possesses all 3 qualities. Application of p-uniform is straightforward because no additional data on missing studies are needed and no sophisticated assumptions or choices need to be made before applying it. Simulations show that p-uniform generally outperforms the trim-and-fill method and the test of excess significance (TES; Ioannidis & Trikalinos, 2007b) if publication bias exists and population effect size is homogenous or heterogeneity is slight. For illustration, p-uniform and other publication bias analyses are applied to the meta-analysis of McCall and Carriger (1993) examining the association between infants' habituation to a stimulus and their later cognitive ability (IQ). We conclude that p-uniform is a valuable technique for examining publication bias and estimating population effects in fixed-effect meta-analyses, and as sensitivity analysis to draw inferences about publication bias. (c) 2015 APA, all rights reserved).
Saha, Ranajit; Pan, Sudip; Chattaraj, Pratim K
2016-11-05
The validity of the maximum hardness principle (MHP) is tested in the cases of 50 chemical reactions, most of which are organic in nature and exhibit anomeric effect. To explore the effect of the level of theory on the validity of MHP in an exothermic reaction, B3LYP/6-311++G(2df,3pd) and LC-BLYP/6-311++G(2df,3pd) (def2-QZVP for iodine and mercury) levels are employed. Different approximations like the geometric mean of hardness and combined hardness are considered in case there are multiple reactants and/or products. It is observed that, based on the geometric mean of hardness, while 82% of the studied reactions obey the MHP at the B3LYP level, 84% of the reactions follow this rule at the LC-BLYP level. Most of the reactions possess the hardest species on the product side. A 50% null hypothesis is rejected at a 1% level of significance.
Potts, T.T.; Hylko, J.M.; Almond, D.
2007-07-01
A company's overall safety program becomes an important consideration to continue performing work and for procuring future contract awards. When injuries or accidents occur, the employer ultimately loses on two counts - increased medical costs and employee absences. This paper summarizes the human and organizational components that contributed to successful safety programs implemented by WESKEM, LLC's Environmental, Safety, and Health Departments located in Paducah, Kentucky, and Oak Ridge, Tennessee. The philosophy of 'safety, compliance, and then production' and programmatic components implemented at the start of the contracts were qualitatively identified as contributing factors resulting in a significant accumulation of safe work hours and an Experience Modification Rate (EMR) of <1.0. Furthermore, a study by the Associated General Contractors of America quantitatively validated components, already found in the WESKEM, LLC programs, as contributing factors to prevent employee accidents and injuries. Therefore, an investment in the human and organizational components now can pay dividends later by reducing the EMR, which is the key to reducing Workers' Compensation premiums. Also, knowing your employees' demographics and taking an active approach to evaluate and prevent fatigue may help employees balance work and non-work responsibilities. In turn, this approach can assist employers in maintaining a healthy and productive workforce. For these reasons, it is essential that safety needs be considered as the starting point when performing work. (authors)
Tables of square-law signal detection statistics for Hann spectra with 50 percent overlap
NASA Technical Reports Server (NTRS)
Deans, Stanley R.; Cullers, D. Kent
1991-01-01
The Search for Extraterrestrial Intelligence, currently being planned by NASA, will require that an enormous amount of data be analyzed in real time by special purpose hardware. It is expected that overlapped Hann data windows will play an important role in this analysis. In order to understand the statistical implication of this approach, it has been necessary to compute detection statistics for overlapped Hann spectra. Tables of signal detection statistics are given for false alarm rates from 10(exp -14) to 10(exp -1) and signal detection probabilities from 0.50 to 0.99; the number of computed spectra ranges from 4 to 2000.
Adams, James; Howsmon, Daniel P; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. "Leave-one-out" cross-validation was used to ensure statistical independence of results. Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate Speech), but significant associations were found
Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen
2017-01-01
Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate
Wang, Q.; Denton, D.L.; Shukla, R.
2000-01-01
As a follow up to the recommendations of the September 1995 SETAC Pellston Workshop on Whole Effluent Toxicity (WET) on test methods and appropriate endpoints, this paper will discuss the applications and statistical properties of using a statistical criterion of minimum significant difference (MSD). The authors examined the upper limits of acceptable MSDs as acceptance criterion in the case of normally distributed data. The implications of this approach are examined in terms of false negative rate as well as false positive rate. Results indicated that the proposed approach has reasonable statistical properties. Reproductive data from short-term chronic WET test with Ceriodaphnia dubia tests were used to demonstrate the applications of the proposed approach. The data were collected by the North Carolina Department of Environment, Health, and Natural Resources (Raleigh, NC, USA) as part of their National Pollutant Discharge Elimination System program.
Allan, G Michael; Finley, Caitlin R; McCormack, James; Kumar, Vivek; Kwong, Simon; Braschi, Emelie; Korownyk, Christina; Kolber, Michael R; Lindblad, Adriennne J; Babenko, Oksana; Garrison, Scott
2017-03-20
While journals and reporting guidelines recommend the presentation of confidence intervals, many authors adhere strictly to statistically significant testing. Our objective was to determine what proportions of not statistically significant (NSS) cardiovascular trials include potentially clinically meaningful effects in primary outcomes and if these are associated with authors' conclusions. Cardiovascular studies published in six high-impact journals between 1 January 2010 and 31 December 2014 were identified via PubMed. Two independent reviewers selected trials with major adverse cardiovascular events (stroke, myocardial infarction, or cardiovascular death) as primary outcomes and extracted data on trial characteristics, quality, and primary outcome. Potentially clinically meaningful effects were defined broadly as a relative risk point estimate ≤0.94 (based on the effects of ezetimibe) and/or a lower confidence interval ≤0.75 (based on the effects of statins). We identified 127 randomized trial comparisons from 3200 articles. The primary outcomes were statistically significant (SS) favoring treatment in 21% (27/127), NSS in 72% (92/127), and SS favoring control in 6% (8/127). In 61% of NSS trials (56/92), the point estimate and/or lower confidence interval included potentially meaningful effects. Both point estimate and confidence interval included potentially meaningful effects in 67% of trials (12/18) in which authors' concluded that treatment was superior, in 28% (16/58) with a neutral conclusion, and in 6% (1/16) in which authors' concluded that control was superior. In a sensitivity analysis, 26% of NSS trials would include potential meaningful effects with relative risk thresholds of point estimate ≤0.85 and/or a lower confidence interval ≤0.65. Point estimates and/or confidence intervals included potentially clinically meaningful effects in up to 61% of NSS cardiovascular trials. Authors' conclusions often reflect potentially meaningful results of
NASA Astrophysics Data System (ADS)
Wang, Ping; Dai, Xin-Gang
2016-09-01
The term "APEC Blue" has been created to describe the clear sky days since the Asia-Pacific Economic Cooperation (APEC) summit held in Beijing during November 5-11, 2014. The duration of the APEC Blue is detected from November 1 to November 14 (hereafter Blue Window) by moving t test in statistics. Observations show that APEC Blue corresponds to low air pollution with respect to PM2.5, PM10, SO2, and NO2 under strict emission-control measures (ECMs) implemented in Beijing and surrounding areas. Quantitative assessment shows that ECM is more effective on reducing aerosols than the chemical constituents. Statistical investigation has revealed that the window also resulted from intensified wind variability, as well as weakened static stability of atmosphere (SSA). The wind and ECMs played key roles in reducing air pollution during November 1-7 and 11-13, and strict ECMs and weak SSA become dominant during November 7-10 under weak wind environment. Moving correlation manifests that the emission reduction for aerosols can increase the apparent wind cleanup effect, leading to significant negative correlations of them, and the period-wise changes in emission rate can be well identified by multi-scale correlations basing on wavelet decomposition. In short, this case study manifests statistically how human interference modified air quality in the mega city through controlling local and surrounding emissions in association with meteorological condition.
NASA Technical Reports Server (NTRS)
Gofford, Jason; Reeves, James N.; Tombesi, Francesco; Braito, Valentina; Turner, T. Jane; Miller, Lance; Cappi, Massimo
2013-01-01
We present the results of a new spectroscopic study of Fe K-band absorption in active galactic nuclei (AGN). Using data obtained from the Suzaku public archive we have performed a statistically driven blind search for Fe XXV Healpha and/or Fe XXVI Lyalpha absorption lines in a large sample of 51 Type 1.0-1.9 AGN. Through extensive Monte Carlo simulations we find that statistically significant absorption is detected at E greater than or approximately equal to 6.7 keV in 20/51 sources at the P(sub MC) greater than or equal tov 95 per cent level, which corresponds to approximately 40 per cent of the total sample. In all cases, individual absorption lines are detected independently and simultaneously amongst the two (or three) available X-ray imaging spectrometer detectors, which confirms the robustness of the line detections. The most frequently observed outflow phenomenology consists of two discrete absorption troughs corresponding to Fe XXV Healpha and Fe XXVI Lyalpha at a common velocity shift. From xstar fitting the mean column density and ionization parameter for the Fe K absorption components are log (N(sub H) per square centimeter)) is approximately equal to 23 and log (Xi/erg centimeter per second) is approximately equal to 4.5, respectively. Measured outflow velocities span a continuous range from less than1500 kilometers per second up to approximately100 000 kilometers per second, with mean and median values of approximately 0.1 c and approximately 0.056 c, respectively. The results of this work are consistent with those recently obtained using XMM-Newton and independently provides strong evidence for the existence of very highly ionized circumnuclear material in a significant fraction of both radio-quiet and radio-loud AGN in the local universe.
Spatial-Temporal Change Detection in NDVI Data Through Statistical Parametric Mapping
NASA Astrophysics Data System (ADS)
McKenna, S. A.; Yadav, V.; Gutierrez, K.
2011-12-01
Detection of significant changes in vegetation patterns provides a quantitative means of defining phenological response to changing climate. These changes may be indicative of long-term trends or shorter-duration conditions. In either case, quantifying the significance of the change patterns is critical in order to better understand the underlying processes. Spatial and temporal correlation within imaged data sets complicates change detection and must be taken into account. We apply a novel approach, Statistical Parametric Mapping (SPM), to change detection in Normalized Difference Vegetation Index (NDVI) data. SPM has been developed for identification of regions of anomalous activation in human brain imaging given functional magnetic resonance imaging (fMRI) data. Here, we adapt SPM to work on identifying anomalous regions of vegetation density within 30 years of weekly NDVI imagery. Significant change in any given image pixel is defined as a deviation from the expected value. Expected values are calculated using sinusoidal regression models fit to previous data at that location. The amount of deviation of an observation from the expected value is calculated using a modified t-test that accounts for temporal correlation in the regression data. The t-tests are applied independently to each pixel to create a t-statistic map for every time step. For a given time step, the probability that the maximum t-value exceeds a given threshold can be calculated to determine times with significant deviations, but standard techniques are not applicable due to the large number of pixels searched to find the maximum. SPM takes into account the spatial correlation of the t-statistic map to determine the significance of the maximum observed t-value. Theory developed for truncated Gaussian fields as part of SPM provides the expected number and size of regions within the t-statistic map that exceed a given threshold. The significance of the excursion regions can be assessed and then
A flexible spatial scan statistic with a restricted likelihood ratio for detecting disease clusters.
Tango, Toshiro; Takahashi, Kunihiko
2012-12-30
Spatial scan statistics are widely used tools for detection of disease clusters. Especially, the circular spatial scan statistic proposed by Kulldorff (1997) has been utilized in a wide variety of epidemiological studies and disease surveillance. However, as it cannot detect noncircular, irregularly shaped clusters, many authors have proposed different spatial scan statistics, including the elliptic version of Kulldorff's scan statistic. The flexible spatial scan statistic proposed by Tango and Takahashi (2005) has also been used for detecting irregularly shaped clusters. However, this method sets a feasible limitation of a maximum of 30 nearest neighbors for searching candidate clusters because of heavy computational load. In this paper, we show a flexible spatial scan statistic implemented with a restricted likelihood ratio proposed by Tango (2008) to (1) eliminate the limitation of 30 nearest neighbors and (2) to have surprisingly much less computational time than the original flexible spatial scan statistic. As a side effect, it is shown to be able to detect clusters with any shape reasonably well as the relative risk of the cluster becomes large via Monte Carlo simulation. We illustrate the proposed spatial scan statistic with data on mortality from cerebrovascular disease in the Tokyo Metropolitan area, Japan.
NASA Astrophysics Data System (ADS)
Morgan, Ben; Green, Anne M.
2005-12-01
The direction dependence of the WIMP direct detection rate provides a powerful tool for distinguishing a WIMP signal from possible backgrounds. We study the number of events required to discriminate a WIMP signal from an isotropic background for a detector with 2-d readout using nonparametric circular statistics. We also examine the number of events needed to (i) detect a deviation from rotational symmetry, due to flattening of the Milky Way halo and (ii) detect a deviation in the mean direction due to a tidal stream. If the senses of the recoils are measured then of order 20--70 events (depending on the plane of the 2-d readout and the detector location) will be sufficient to reject isotropy of the raw recoil angles at 90% confidence. If the senses can not be measured these number increase by roughly 2 orders of magnitude (compared with an increase of 1 order of magnitude for the case of full 3-d readout). The distributions of the reduced angles, with the (time-dependent) direction of solar motion subtracted, are far more anisotropic, however, and if the isotropy tests are applied to these angles then the numbers of events required are similar to the case of 3-d readout. A deviation from rotational symmetry will only be detectable if the Milky Way halo is significantly flattened. The deviation in the mean direction due to a tidal stream is potentially detectable, however, depending on the density and direction of the stream. The meridian plane (which contains the Earth’s spin axis) is, for all detector locations, the optimum readout plane for rejecting isotropy. However readout in this plane can not be used for detecting flattening of the Milky Way halo or a stream with direction perpendicular to the galactic plane. In these cases the optimum readout plane depends on the detector location.
A Statistical Analysis of Automated and Manually Detected Fires Using Environmental Satellites
NASA Astrophysics Data System (ADS)
Ruminski, M. G.; McNamara, D.
2003-12-01
The National Environmental Satellite and Data Information Service (NESDIS) of the National Oceanic and Atmospheric Administration (NOAA) has been producing an analysis of fires and smoke over the US since 1998. This product underwent significant enhancement in June 2002 with the introduction of the Hazard Mapping System (HMS), an interactive workstation based system that displays environmental satellite imagery (NOAA Geostationary Operational Environmental Satellite (GOES), NOAA Polar Operational Environmental Satellite (POES) and National Aeronautics and Space Administration (NASA) MODIS data) and fire detects from the automated algorithms for each of the satellite sensors. The focus of this presentation is to present statistics compiled on the fire detects since November 2002. The Automated Biomass Burning Algorithm (ABBA) detects fires using GOES East and GOES West imagery. The Fire Identification, Mapping and Monitoring Algorithm (FIMMA) utilizes NOAA POES 15/16/17 imagery and the MODIS algorithm uses imagery from the MODIS instrument on the Terra and Aqua spacecraft. The HMS allows satellite analysts to inspect and interrogate the automated fire detects and the input satellite imagery. The analyst can then delete those detects that are felt to be false alarms and/or add fire points that the automated algorithms have not selected. Statistics are compiled for the number of automated detects from each of the algorithms, the number of automated detects that are deleted and the number of fire points added by the analyst for the contiguous US and immediately adjacent areas of Mexico and Canada. There is no attempt to distinguish between wildfires and control or agricultural fires. A detailed explanation of the automated algorithms is beyond the scope of this presentation. However, interested readers can find a more thorough description by going to www.ssd.noaa.gov/PS/FIRE/hms.html and scrolling down to Individual Fire Layers. For the period November 2002 thru August
Escoto Ponce de León, M C; Mancilla Díaz, J M; Camacho Ruiz, E J
2008-09-01
The current study used clinical and statistical significance tests to investigate the effects of two forms (didactic or interactive) of a universal prevention program on attitudes about shape and weight, eating behaviors, the influence of body aesthetic models, and self-esteem. Three schools were randomly assigned to one, interactive, didactic, or a control condition. Children (61 girls and 59 boys, age 9-11 years) were evaluated at pre-intervention, post-intervention, and at 6-month follow-up. Programs comprised eight, 90-min sessions. Statistical and clinical significance tests showed more changes in boys and girls with the interactive program versus the didactic intervention and control groups. The findings support the use of interactive programs that highlight identified risk factors and construction of identity based on positive traits distinct to physical appearance.
Kim, Sung-Min; Choi, Yosoon
2017-01-01
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168
Kim, Sung-Min; Choi, Yosoon
2017-06-18
To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
An integrated model for detecting significant chromatin interactions from high-resolution Hi-C data
Carty, Mark; Zamparo, Lee; Sahin, Merve; González, Alvaro; Pelossof, Raphael; Elemento, Olivier; Leslie, Christina S.
2017-01-01
Here we present HiC-DC, a principled method to estimate the statistical significance (P values) of chromatin interactions from Hi-C experiments. HiC-DC uses hurdle negative binomial regression account for systematic sources of variation in Hi-C read counts—for example, distance-dependent random polymer ligation and GC content and mappability bias—and model zero inflation and overdispersion. Applied to high-resolution Hi-C data in a lymphoblastoid cell line, HiC-DC detects significant interactions at the sub-topologically associating domain level, identifying potential structural and regulatory interactions supported by CTCF binding sites, DNase accessibility, and/or active histone marks. CTCF-associated interactions are most strongly enriched in the middle genomic distance range (∼700 kb–1.5 Mb), while interactions involving actively marked DNase accessible elements are enriched both at short (<500 kb) and longer (>1.5 Mb) genomic distances. There is a striking enrichment of longer-range interactions connecting replication-dependent histone genes on chromosome 6, potentially representing the chromatin architecture at the histone locus body. PMID:28513628
High significance detection of the tSZ effect relativistic corrections
NASA Astrophysics Data System (ADS)
Hurier, G.
2016-12-01
The thermal Sunyaev-Zel'dovich (tSZ) effect is produced by the interaction of cosmic microwave background (CMB) photons with the hot (a few keV) and diffuse gas of electrons inside galaxy clusters integrated along the line of sight. This effect produces a distortion of CMB blackbody emission law. This distortion law depends on the electronic temperature of the intra-cluster hot gas, Te, through the so-called tSZ relativistic corrections. In this work, we have performed a statistical analysis of the tSZ spectral distortion on large galaxy cluster samples. We performed a stacking analysis for several electronic temperature bins, using both spectroscopic measurements of X-ray temperatures and a scaling relation between X-ray luminosities and electronic temperatures. We report the first high significance detection of the relativistic tSZ at a significance of 5.3σ. We also demonstrate that the observed tSZ relativistic corrections are consistent with X-ray deduced temperatures. This measurement of the tSZ spectral law demonstrates that tSZ effect spectral distorsion can be used as a probe to measure galaxy cluster temperatures.
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic
Qi, Jin-Peng; Qi, Jie; Zhang, Qing
2016-01-01
Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364
A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.
Qi, Jin-Peng; Qi, Jie; Zhang, Qing
2016-01-01
Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals.
Damage detection of engine bladed-disks using multivariate statistical analysis
NASA Astrophysics Data System (ADS)
Fang, X.; Tang, J.
2006-03-01
The timely detection of damage in aero-engine bladed-disks is an extremely important and challenging research topic. Bladed-disks have high modal density and, particularly, their vibration responses are subject to significant uncertainties due to manufacturing tolerance (blade-to-blade difference or mistuning), operating condition change and sensor noise. In this study, we present a new methodology for the on-line damage detection of engine bladed-disks using their vibratory responses during spin-up or spin-down operations which can be measured by blade-tip-timing sensing technique. We apply a principle component analysis (PCA)-based approach for data compression, feature extraction, and denoising. The non-model based damage detection is achieved by analyzing the change between response features of the healthy structure and of the damaged one. We facilitate such comparison by incorporating the Hotelling's statistic T2 analysis, which yields damage declaration with a given confidence level. The effectiveness of the method is demonstrated by case studies.
Detection and implication of significant temporal b-value variation during earthquake sequences
NASA Astrophysics Data System (ADS)
Gulia, Laura; Tormann, Thessa; Schorlemmer, Danijel; Wiemer, Stefan
2016-04-01
Earthquakes tend to cluster in space and time and periods of increased seismic activity are also periods of increased seismic hazard. Forecasting models currently used in statistical seismology and in Operational Earthquake Forecasting (e.g. ETAS) consider the spatial and temporal changes in the activity rates whilst the spatio-temporal changes in the earthquake size distribution, the b-value, are not included. Laboratory experiments on rock samples show an increasing relative proportion of larger events as the system approaches failure, and a sudden reversal of this trend after the main event. The increasing fraction of larger events during the stress increase period can be mathematically represented by a systematic b-value decrease, while the b-value increases immediately following the stress release. We investigate whether these lab-scale observations also apply to natural earthquake sequences and can help to improve our understanding of the physical processes generating damaging earthquakes. A number of large events nucleated in low b-value regions and spatial b-value variations have been extensively documented in the past. Detecting temporal b-value evolution with confidence is more difficult, one reason being the very different scales that have been suggested for a precursory drop in b-value, from a few days to decadal scale gradients. We demonstrate with the results of detailed case studies of the 2009 M6.3 L'Aquila and 2011 M9 Tohoku earthquakes that significant and meaningful temporal b-value variability can be detected throughout the sequences, which e.g. suggests that foreshock probabilities are not generic but subject to significant spatio-temporal variability. Such potential conclusions require and motivate the systematic study of many sequences to investigate whether general patterns exist that might eventually be useful for time-dependent or even real-time seismic hazard assessment.
Inferential statistics for transient signal detection in radio astronomy phased arrays
NASA Astrophysics Data System (ADS)
Schmid, Natalia A.; Prestage, Richard M.; Alkhweldi, Marwan
2015-05-01
In this paper we develop two statistical rules for the purpose of detecting pulsars and transients using signals from phased array feeds installed on a radio telescope in place of a traditional horn receiver. We assume a known response of the antenna arrays and known coupling among array elements. We briefly summarize a set of pre-processing steps applied to raw array data prior to signal detection and then derive two detection statistics assuming two models for the unknown radio source astronomical signal: (1) the signal is deterministic and (2) the signal is a random process. The performance of both detectors is analyzed using both real and simulated data.
A New Approach to the Detection and Statistical Classification of Ca2+ Sparks
Bányász, Tamás; Chen-Izu, Ye; Balke, C. W.; Izu, Leighton T.
2007-01-01
The availability of high-speed, two-dimensional (2-D) confocal microscopes and the expanding armamentarium of fluorescent probes presents unprecedented opportunities and new challenges for studying the spatial and temporal dynamics of cellular processes. The need to remove subjectivity from the detection process, the difficulty of the human eye to detect subtle changes in fluorescence in these 2-D images, and the large volume of data produced by these confocal microscopes call for the need to develop algorithms to automatically mark the changes in fluorescence. These fluorescence signal changes are often subtle, so the statistical estimate of the likelihood that the detected signal is not noise is an integral part of the detection algorithm. This statistical estimation is fundamental to our new approach to detection; in earlier Ca2+ spark detectors, this statistical assessment was incidental to detection. Importantly, the use of the statistical properties of the signal local to the spark, instead of over the whole image, reduces the false positive and false negative rates. We developed an automatic spark detection algorithm based on these principles and used it to detect sparks on an inhomogeneous background of transverse tubule-labeled rat ventricular cells. Because of the large region of the cell surveyed by the confocal microscope, we can detect a large enough number of sparks to measure the dynamic changes in spark frequency in individual cells. We also found, in contrast to earlier results, that cardiac sparks are spatially symmetric. This new approach puts the detection of fluorescent signals on a firm statistical foundation. PMID:17400702
A new approach to the detection and statistical classification of Ca2+ sparks.
Bányász, Tamás; Chen-Izu, Ye; Balke, C W; Izu, Leighton T
2007-06-15
The availability of high-speed, two-dimensional (2-D) confocal microscopes and the expanding armamentarium of fluorescent probes presents unprecedented opportunities and new challenges for studying the spatial and temporal dynamics of cellular processes. The need to remove subjectivity from the detection process, the difficulty of the human eye to detect subtle changes in fluorescence in these 2-D images, and the large volume of data produced by these confocal microscopes call for the need to develop algorithms to automatically mark the changes in fluorescence. These fluorescence signal changes are often subtle, so the statistical estimate of the likelihood that the detected signal is not noise is an integral part of the detection algorithm. This statistical estimation is fundamental to our new approach to detection; in earlier Ca(2+) spark detectors, this statistical assessment was incidental to detection. Importantly, the use of the statistical properties of the signal local to the spark, instead of over the whole image, reduces the false positive and false negative rates. We developed an automatic spark detection algorithm based on these principles and used it to detect sparks on an inhomogeneous background of transverse tubule-labeled rat ventricular cells. Because of the large region of the cell surveyed by the confocal microscope, we can detect a large enough number of sparks to measure the dynamic changes in spark frequency in individual cells. We also found, in contrast to earlier results, that cardiac sparks are spatially symmetric. This new approach puts the detection of fluorescent signals on a firm statistical foundation.
Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.; Chilton, Lawrence
2008-10-30
The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data from the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.
NASA Astrophysics Data System (ADS)
Fujimoto, K.; Yanagisawa, T.; Uetsuhara, M.
Automated detection and tracking of faint objects in optical, or bearing-only, sensor imagery is a topic of immense interest in space surveillance. Robust methods in this realm will lead to better space situational awareness (SSA) while reducing the cost of sensors and optics. They are especially relevant in the search for high area-to-mass ratio (HAMR) objects, as their apparent brightness can change significantly over time. A track-before-detect (TBD) approach has been shown to be suitable for faint, low signal-to-noise ratio (SNR) images of resident space objects (RSOs). TBD does not rely upon the extraction of feature points within the image based on some thresholding criteria, but rather directly takes as input the intensity information from the image file. Not only is all of the available information from the image used, TBD avoids the computational intractability of the conventional feature-based line detection (i.e., "string of pearls") approach to track detection for low SNR data. Implementation of TBD rooted in finite set statistics (FISST) theory has been proposed recently by Vo, et al. Compared to other TBD methods applied so far to SSA, such as the stacking method or multi-pass multi-period denoising, the FISST approach is statistically rigorous and has been shown to be more computationally efficient, thus paving the path toward on-line processing. In this paper, we intend to apply a multi-Bernoulli filter to actual CCD imagery of RSOs. The multi-Bernoulli filter can explicitly account for the birth and death of multiple targets in a measurement arc. TBD is achieved via a sequential Monte Carlo implementation. Preliminary results with simulated single-target data indicate that a Bernoulli filter can successfully track and detect objects with measurement SNR as low as 2.4. Although the advent of fast-cadence scientific CMOS sensors have made the automation of faint object detection a realistic goal, it is nonetheless a difficult goal, as measurements
Run-Length and Edge Statistics Based Approach for Image Splicing Detection
NASA Astrophysics Data System (ADS)
Dong, Jing; Wang, Wei; Tan, Tieniu; Shi, Yun Q.
In this paper, a simple but efficient approach for blind image splicing detection is proposed. Image splicing is a common and fundamental operation used for image forgery. The detection of image splicing is a preliminary but desirable study for image forensics. Passive detection approaches of image splicing are usually regarded as pattern recognition problems based on features which are sensitive to splicing. In the proposed approach, we analyze the discontinuity of image pixel correlation and coherency caused by splicing in terms of image run-length representation and sharp image characteristics. The statistical features extracted from image run-length representation and image edge statistics are used for splicing detection. The support vector machine (SVM) is used as the classifier. Our experimental results demonstrate that the two proposed features outperform existing ones both in detection accuracy and computational complexity.
Use of power analysis to develop detectable significance criteria for sea urchin toxicity tests
Carr, R.S.; Biedenbach, J.M.
1999-01-01
When sufficient data are available, the statistical power of a test can be determined using power analysis procedures. The term “detectable significance” has been coined to refer to this criterion based on power analysis and past performance of a test. This power analysis procedure has been performed with sea urchin (Arbacia punctulata) fertilization and embryological development data from sediment porewater toxicity tests. Data from 3100 and 2295 tests for the fertilization and embryological development tests, respectively, were used to calculate the criteria and regression equations describing the power curves. Using Dunnett's test, a minimum significant difference (MSD) (β = 0.05) of 15.5% and 19% for the fertilization test, and 16.4% and 20.6% for the embryological development test, for α ≤ 0.05 and α ≤ 0.01, respectively, were determined. The use of this second criterion reduces type I (false positive) errors and helps to establish a critical level of difference based on the past performance of the test.
Yuan, Zhongshang; Ji, Jiadong; Zhang, Tao; Liu, Yi; Zhang, Xiaoshuai; Chen, Wei; Xue, Fuzhong
2016-12-20
Traditional epidemiology often pays more attention to the identification of a single factor rather than to the pathway that is related to a disease, and therefore, it is difficult to explore the disease mechanism. Systems epidemiology aims to integrate putative lifestyle exposures and biomarkers extracted from multiple omics platforms to offer new insights into the pathway mechanisms that underlie disease at the human population level. One key but inadequately addressed question is how to develop powerful statistics to identify whether one candidate pathway is associated with a disease. Bearing in mind that a pathway difference can result from not only changes in the nodes but also changes in the edges, we propose a novel statistic for detecting group differences between pathways, which in principle, captures the nodes changes and edge changes, as well as simultaneously accounting for the pathway structure simultaneously. The proposed test has been proven to follow the chi-square distribution, and various simulations have shown it has better performance than other existing methods. Integrating genome-wide DNA methylation data, we analyzed one real data set from the Bogalusa cohort study and significantly identified a potential pathway, Smoking → SOCS3 → PIK3R1, which was strongly associated with abdominal obesity. The proposed test was powerful and efficient at identifying pathway differences between two groups, and it can be extended to other disciplines that involve statistical comparisons between pathways. The source code in R is available on our website. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Lo, C. F.; Wu, K.; Whitehead, B. A.
1993-01-01
The statistical and neural networks methods have been applied to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. The anomalies are detected based on the amplitude of peaks of fundamental and harmonic frequencies in the power spectral density. These data are reduced to the proper format from sensor data measured by strain gauges and accelerometers. Both methods are feasible to detect the vibration anomalies. The statistical method requires sufficient data points to establish a reasonable statistical distribution data bank. This method is applicable for on-line operation. The neural networks method also needs to have enough data basis to train the neural networks. The testing procedure can be utilized at any time so long as the characteristics of components remain unchanged.
A Statistical Detection of an Anomaly from a Few Noisy Tomographic Projections
NASA Astrophysics Data System (ADS)
Fillatre, Lionel; Nikiforov, Igor
2005-12-01
The problem of detecting an anomaly/target from a very limited number of noisy tomographic projections is addressed from the statistical point of view. The imaged object is composed of an environment, considered as a nuisance parameter, with a possibly hidden anomaly/target. The GLR test is used to solve the problem. When the projection linearly depends on the nuisance parameters, the GLR test coincides with an optimal statistical invariant test.
Detection of Incipient Tooth Defect in Helical Gears Using Multivariate Statistics
NASA Astrophysics Data System (ADS)
Baydar, N.; Chen, Q.; Ball, A.; Kruger, U.
2001-03-01
Multivariate statistical techniques have been successfully used for monitoring process plants and their associated instrumentation. These techniques effectively detect disturbances related to individual measurement sources and consequently provide diagnostic information about the process input. This paper investigates and explores the use of multivariate statistical techniques in a two-stage industrial helical gearbox, to detect localised faults by using vibration signals. The vibration signals, obtained from a number of sensors, are synchronously averaged and then the multivariate statistics, based on principal components analysis, is employed to form a normal (reference) condition model. Fault conditions, which are deviations from a reference model, are detected by monitoring Q - and T2-statistics. Normal operating regions or confidence bounds, based on kernel density estimation (KDE) is introduced to capture the faulty conditions in the gearbox. It is found that Q - and T2-statistics based on PCA can detect incipient local faults at an early stage. The confidence regions, based on KDE can also reveal the growing faults in the gearbox.
Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.
2009-01-01
In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
Irshad, Humayun
2013-01-01
Context: According to Nottingham grading system, mitosis count plays a critical role in cancer diagnosis and grading. Manual counting of mitosis is tedious and subject to considerable inter- and intra-reader variations. Aims: The aim is to improve the accuracy of mitosis detection by selecting the color channels that better capture the statistical and morphological features, which classify mitosis from other objects. Materials and Methods: We propose a framework that includes comprehensive analysis of statistics and morphological features in selected channels of various color spaces that assist pathologists in mitosis detection. In candidate detection phase, we perform Laplacian of Gaussian, thresholding, morphology and active contour model on blue-ratio image to detect and segment candidates. In candidate classification phase, we extract a total of 143 features including morphological, first order and second order (texture) statistics features for each candidate in selected channels and finally classify using decision tree classifier. Results and Discussion: The proposed method has been evaluated on Mitosis Detection in Breast Cancer Histological Images (MITOS) dataset provided for an International Conference on Pattern Recognition 2012 contest and achieved 74% and 71% detection rate, 70% and 56% precision and 72% and 63% F-Measure on Aperio and Hamamatsu images, respectively. Conclusions and Future Work: The proposed multi-channel features computation scheme uses fixed image scale and extracts nuclei features in selected channels of various color spaces. This simple but robust model has proven to be highly efficient in capturing multi-channels statistical features for mitosis detection, during the MITOS international benchmark. Indeed, the mitosis detection of critical importance in cancer diagnosis is a very challenging visual task. In future work, we plan to use color deconvolution as preprocessing and Hough transform or local extrema based candidate detection
Irshad, Humayun
2013-01-01
According to Nottingham grading system, mitosis count plays a critical role in cancer diagnosis and grading. Manual counting of mitosis is tedious and subject to considerable inter- and intra-reader variations. The aim is to improve the accuracy of mitosis detection by selecting the color channels that better capture the statistical and morphological features, which classify mitosis from other objects. We propose a framework that includes comprehensive analysis of statistics and morphological features in selected channels of various color spaces that assist pathologists in mitosis detection. In candidate detection phase, we perform Laplacian of Gaussian, thresholding, morphology and active contour model on blue-ratio image to detect and segment candidates. In candidate classification phase, we extract a total of 143 features including morphological, first order and second order (texture) statistics features for each candidate in selected channels and finally classify using decision tree classifier. The proposed method has been evaluated on Mitosis Detection in Breast Cancer Histological Images (MITOS) dataset provided for an International Conference on Pattern Recognition 2012 contest and achieved 74% and 71% detection rate, 70% and 56% precision and 72% and 63% F-Measure on Aperio and Hamamatsu images, respectively. The proposed multi-channel features computation scheme uses fixed image scale and extracts nuclei features in selected channels of various color spaces. This simple but robust model has proven to be highly efficient in capturing multi-channels statistical features for mitosis detection, during the MITOS international benchmark. Indeed, the mitosis detection of critical importance in cancer diagnosis is a very challenging visual task. In future work, we plan to use color deconvolution as preprocessing and Hough transform or local extrema based candidate detection in order to reduce the number of candidates in mitosis and non-mitosis classes.
Statistical detection and modeling of the over-dispersion of winter storm occurrence
NASA Astrophysics Data System (ADS)
Raschke, M.
2015-08-01
In this communication, I improve the detection and modeling of the over-dispersion of winter storm occurrence. For this purpose, the generalized Poisson distribution and the Bayesian information criterion are introduced; the latter is used for statistical model selection. Moreover, I replace the frequently used dispersion statistics by an over-dispersion parameter which does not depend on the considered return period of storm events. These models and methods are applied in order to properly detect the over-dispersion in winter storm data for Germany, carrying out a joint estimation of the distribution models for different samples.
NASA Astrophysics Data System (ADS)
Ahmed, Sheehan H.; Brooks, Alyson M.; Christensen, Charlotte R.
2017-04-01
We investigate whether the inclusion of baryonic physics influences the formation of thin, coherently rotating planes of satellites such as those seen around the Milky Way and Andromeda. For four Milky Way-mass simulations, each run both as dark matter-only and with baryons included, we are able to identify a planar configuration that significantly maximizes the number of plane satellite members. The maximum plane member satellites are consistently different between the dark matter-only and baryonic versions of the same run due to the fact that satellites are both more likely to be destroyed and to infall later in the baryonic runs. Hence, studying satellite planes in dark matter-only simulations is misleading, because they will be composed of different satellite members than those that would exist if baryons were included. Additionally, the destruction of satellites in the baryonic runs leads to less radially concentrated satellite distributions, a result that is critical to making planes that are statistically significant compared to a random distribution. Since all planes pass through the centre of the galaxy, it is much harder to create a plane of a given height from a random distribution if the satellites have a low radial concentration. We identify Andromeda's low radial satellite concentration as a key reason why the plane in Andromeda is highly significant. Despite this, when corotation is considered, none of the satellite planes identified for the simulated galaxies are as statistically significant as the observed planes around the Milky Way and Andromeda, even in the baryonic runs.
2012-03-01
NEAR EARTH OBJECT DETECTION USING A POISSON STATISTICAL MODEL FOR DETECTION ON IMAGES MODELED FROM THE PANORAMIC SURVEY...the United States. AFIT/GE/ENG/12-33 NEAR EARTH OBJECT DETECTION USING POISSON STATISTICAL MODEL FOR DETECTION ON IMAGES MODELED FROM THE...57 Figure 18. Sample image of Polaris with short exposure ................................................ 58
Turk-Browne, Nicholas B.; Scholl, Brian J.; Chun, Marvin M.; Johnson, Marcia K.
2009-01-01
Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during statistical learning to explore these questions. Participants viewed statistically structured versus unstructured sequences of shapes while performing a task unrelated to the structure. Robust neural responses to statistical structure were observed, and these responses were notable in four ways: First, responses to structure were observed in the striatum and medial temporal lobe, suggesting that statistical learning may be related to other forms of associative learning and relational memory. Second, statistical regularities yielded greater activation in category-specific visual regions (object-selective lateral occipital cortex and word-selective ventral occipito-temporal cortex), demonstrating that these regions are sensitive to information distributed in time. Third, evidence of learning emerged early during familiarization, showing that statistical learning can operate very quickly and with little exposure. Finally, neural signatures of learning were dissociable from subsequent explicit familiarity, suggesting that learning can occur in the absence of awareness. Overall, our findings help elucidate the underlying nature of statistical learning. PMID:18823241
Turk-Browne, Nicholas B; Scholl, Brian J; Chun, Marvin M; Johnson, Marcia K
2009-10-01
Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during statistical learning to explore these questions. Participants viewed statistically structured versus unstructured sequences of shapes while performing a task unrelated to the structure. Robust neural responses to statistical structure were observed, and these responses were notable in four ways: First, responses to structure were observed in the striatum and medial temporal lobe, suggesting that statistical learning may be related to other forms of associative learning and relational memory. Second, statistical regularities yielded greater activation in category-specific visual regions (object-selective lateral occipital cortex and word-selective ventral occipito-temporal cortex), demonstrating that these regions are sensitive to information distributed in time. Third, evidence of learning emerged early during familiarization, showing that statistical learning can operate very quickly and with little exposure. Finally, neural signatures of learning were dissociable from subsequent explicit familiarity, suggesting that learning can occur in the absence of awareness. Overall, our findings help elucidate the underlying nature of statistical learning.
An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic
ERIC Educational Resources Information Center
Maeda, Hotaka; Zhang, Bo
2017-01-01
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
ERIC Educational Resources Information Center
Turk-Browne, Nicholas B.; Scholl, Brian J.; Chun, Marvin M.; Johnson, Marcia K.
2009-01-01
Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during…
ERIC Educational Resources Information Center
Turk-Browne, Nicholas B.; Scholl, Brian J.; Chun, Marvin M.; Johnson, Marcia K.
2009-01-01
Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during…
Dual-band, infrared buried mine detection using a statistical pattern recognition approach
Buhl, M.R.; Hernandez, J.E.; Clark, G.A.; Sengupta, S.K.
1993-08-01
The main objective of this work was to detect surrogate land mines, which were buried in clay and sand, using dual-band, infrared images. A statistical pattern recognition approach was used to achieve this objective. This approach is discussed and results of applying it to real images are given.
Comparing the Aberrant Response Detection Performance of Thirty-Six Person-Fit Statistics
ERIC Educational Resources Information Center
Karabatsos, George
2003-01-01
The accurate measurement of examinee test performance is critical to educational decision-making, and inaccurate measurement can lead to negative consequences for examinees. Person-fit statistics are important in a psychometric analysis for detecting examinees with aberrant response patterns that lead to inaccurate measurement. Unfortunately,…
A Statistical Test for Detecting Answer Copying on Multiple-Choice Tests
ERIC Educational Resources Information Center
van der Linden, Wim J.; Sotaridona, Leonardo
2004-01-01
A statistical test for the detection of answer copying on multiple-choice tests is presented. The test is based on the idea that the answers of examinees to test items may be the result of three possible processes: (1) knowing, (2) guessing, and (3) copying, but that examinees who do not have access to the answers of other examinees can arrive at…
An Algorithm to Improve Test Answer Copying Detection Using the Omega Statistic
ERIC Educational Resources Information Center
Maeda, Hotaka; Zhang, Bo
2017-01-01
The omega (?) statistic is reputed to be one of the best indices for detecting answer copying on multiple choice tests, but its performance relies on the accurate estimation of copier ability, which is challenging because responses from the copiers may have been contaminated. We propose an algorithm that aims to identify and delete the suspected…
Linden, Ariel
2008-04-01
Prior to implementing a disease management (DM) strategy, a needs assessment should be conducted to determine whether sufficient opportunity exists for an intervention to be successful in the given population. A central component of this assessment is a sample size analysis to determine whether the population is of sufficient size to allow the expected program effect to achieve statistical significance. This paper discusses the parameters that comprise the generic sample size formula for independent samples and their interrelationships, followed by modifications for the DM setting. In addition, a table is provided with sample size estimates for various effect sizes. Examples are described in detail along with strategies for overcoming common barriers. Ultimately, conducting these calculations up front will help set appropriate expectations about the ability to demonstrate the success of the intervention.
Perneger, Thomas V; Combescure, Christophe
2017-07-01
Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.
Munson, P. J.; Singh, R. K.
1997-01-01
Statistical potentials based on pairwise interactions between C alpha atoms are commonly used in protein threading/fold-recognition attempts. Inclusion of higher order interaction is a possible means of improving the specificity of these potentials. Delaunay tessellation of the C alpha-atom representation of protein structure has been suggested as a means of defining multi-body interactions. A large number of parameters are required to define all four-body interactions of 20 amino acid types (20(4) = 160,000). Assuming that residue order within a four-body contact is irrelevant reduces this to a manageable 8,855 parameters, using a nonredundant dataset of 608 protein structures. Three lines of evidence support the significance and utility of the four-body potential for sequence-structure matching. First, compared to the four-body model, all lower-order interaction models (three-body, two-body, one-body) are found statistically inadequate to explain the frequency distribution of residue contacts. Second, coherent patterns of interaction are seen in a graphic presentation of the four-body potential. Many patterns have plausible biophysical explanations and are consistent across sets of residues sharing certain properties (e.g., size, hydrophobicity, or charge). Third, the utility of the multi-body potential is tested on a test set of 12 same-length pairs of proteins of known structure for two protocols: Sequence-recognizes-structure, where a query sequence is threaded (without gap) through the native and a non-native structure; and structure-recognizes-sequence, where a query structure is threaded by its native and another non-native sequence. Using cross-validated training, protein sequences correctly recognized their native structure in all 24 cases. Conversely, structures recognized the native sequence in 23 of 24 cases. Further, the score differences between correct and decoy structures increased significantly using the three- or four-body potential compared to
Space Object Detection and Tracking Within a Finite Set Statistics Framework
2017-04-13
AFRL-AFOSR-CL-TR-2017-0005 Space Object Detection & Tracking Within a Finite Set Statistics Framework Martin Adams Department of Electrical...MM-YYYY) 21-04-2017 2. REPORT TYPE Final 3. DATES COVERED (From - To) 01 Feb 2015 to 31 Jan 2017 4. TITLE AND SUBTITLE Space Object Detection...Grant No. FA9550-15-1-0069, devoted to the investigation and improvement of the detection and tracking methods of inactive Resident Space Objects (RSOs
Statistical methods for detecting differentially abundant features in clinical metagenomic samples.
White, James Robert; Nagarajan, Niranjan; Pop, Mihai
2009-04-01
Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied
Parkhomenko, Elena; Tritchler, David; Lemire, Mathieu; Hu, Pingzhao; Beyene, Joseph
2009-12-15
In high-dimensional studies such as genome-wide association studies, the correction for multiple testing in order to control total type I error results in decreased power to detect modest effects. We present a new analytical approach based on the higher criticism statistic that allows identification of the presence of modest effects. We apply our method to the genome-wide study of rheumatoid arthritis provided in the Genetic Analysis Workshop 16 Problem 1 data set. There is evidence for unknown bias in this study that could be explained by the presence of undetected modest effects. We compared the asymptotic and empirical thresholds for the higher criticism statistic. Using the asymptotic threshold we detected the presence of modest effects genome-wide. We also detected modest effects using 90th percentile of the empirical null distribution as a threshold; however, there is no such evidence when the 95th and 99th percentiles were used. While the higher criticism method suggests that there is some evidence for modest effects, interpreting individual single-nucleotide polymorphisms with significant higher criticism statistics is of undermined value. The goal of higher criticism is to alert the researcher that genetic effects remain to be discovered and to promote the use of more targeted and powerful studies to detect the remaining effects.
Gadbury, Gary L; Allison, David B
2012-01-01
Much has been written regarding p-values below certain thresholds (most notably 0.05) denoting statistical significance and the tendency of such p-values to be more readily publishable in peer-reviewed journals. Intuition suggests that there may be a tendency to manipulate statistical analyses to push a "near significant p-value" to a level that is considered significant. This article presents a method for detecting the presence of such manipulation (herein called "fiddling") in a distribution of p-values from independent studies. Simulations are used to illustrate the properties of the method. The results suggest that the method has low type I error and that power approaches acceptable levels as the number of p-values being studied approaches 1000.
Liu, Wei; Feng, Huanqing; Li, Chuanfu; Huang, Yufeng; Wu, Dehuang; Tong, Tong
2009-01-01
In this paper, we present a method that detects intracranial space-occupying lesions in two-dimensional (2D) brain high-resolution CT images. Use of statistical texture atlas technique localizes anatomy variation in the gray level distribution of brain images, and in turn, identifies the regions with lesions. The statistical texture atlas involves 147 HRCT slices of normal individuals and its construction is extremely time-consuming. To improve the performance of atlas construction, we have implemented the pixel-wise texture extraction procedure on Nvidia 8800GTX GPU with Compute Unified Device Architecture (CUDA) platform. Experimental results indicate that the extracted texture feature is distinctive and robust enough, and is suitable for detecting uniform and mixed density space-occupying lesions. In addition, a significant speedup against straight forward CPU version was achieved with CUDA.
Langley, Sarah R; Mayr, Manuel
2015-11-03
Label-free LC-MS/MS proteomics has proven itself to be a powerful method for evaluating protein identification and quantification from complex samples. For comparative proteomics, several methods have been used to detect the differential expression of proteins from such data. We have assessed seven methods used across the literature for detecting differential expression from spectral count quantification: Student's t-test, significance analysis of microarrays (SAM), normalised spectral abundance factor (NSAF), normalised spectral abundance factor-power law global error model (NSAF-PLGEM), spectral index (SpI), DESeq and QSpec. We used 2000 simulated datasets as well as publicly available data from a proteomic standards study to assess the ability of these methods to detect differential expression in varying effect sizes and proportions of differentially expressed proteins. At two false discovery rate (FDR) levels, we find that several of the methods detect differential expression within the data with reasonable precision, others detect differential expression at the expense of low precision, and finally, others which fail to identify any differentially expressed proteins. The inability of these seven methods to fully capture the differential landscape, even at the largest effect size, illustrates some of the limitations of the existing technologies and the statistical methodologies. In label-free mass spectrometry experiments, protein identification and quantification have always been important, but there is now a growing focus on comparative proteomics. Detecting differential expression in protein levels can inform on important biological mechanisms and provide direction for further study. Given the high cost and labour intensive nature of validation experiments, statistical methods are important for prioritising proteins of interest. Here, we have performed a comparative analysis to investigate the statistical methodologies for detecting differential expression
Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.
1999-01-01
Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier
Kim, Jiyu; Jung, Inkyung
2017-01-01
Spatial scan statistics with circular or elliptic scanning windows are commonly used for cluster detection in various applications, such as the identification of geographical disease clusters from epidemiological data. It has been pointed out that the method may have difficulty in correctly identifying non-compact, arbitrarily shaped clusters. In this paper, we evaluated the Gini coefficient for detecting irregularly shaped clusters through a simulation study. The Gini coefficient, the use of which in spatial scan statistics was recently proposed, is a criterion measure for optimizing the maximum reported cluster size. Our simulation study results showed that using the Gini coefficient works better than the original spatial scan statistic for identifying irregularly shaped clusters, by reporting an optimized and refined collection of clusters rather than a single larger cluster. We have provided a real data example that seems to support the simulation results. We think that using the Gini coefficient in spatial scan statistics can be helpful for the detection of irregularly shaped clusters. PMID:28129368
NASA Astrophysics Data System (ADS)
Mamin, H. J.; Budakian, R.; Chui, B. W.; Rugar, D.
2005-07-01
We have detected and manipulated the naturally occurring N statistical polarization in nuclear spin ensembles using magnetic resonance force microscopy. Using protocols previously developed for detecting single electron spins, we have measured signals from ensembles of nuclear spins in a volume of roughly (150nm)3 with a sensitivity of roughly 2000 net spins in a 2.5h averaging window. Three systems have been studied, F19 nuclei in CaF2 , and H1 nuclei (protons) in both polymethylmethacrylate and collagen, a naturally occurring protein. By detecting the statistical polarization, we not only can work with relatively small ensembles, but we eliminate any need to wait a longitudinal relaxation time T1 to polarize the spins. We have also made use of the fact that the statistical polarization, which can be considered a form of spin noise, has a finite correlation time. A method similar to one previously proposed by Carlson [Bull. Am. Phys. Soc. 44, 541 (1999)] has been used to suppress the effect of the statistical uncertainty and extract meaningful information from time-averaged measurements. By implementing this method, we have successfully made nutation and transverse spin relaxation time measurements in CaF2 at low temperatures.
Kim, Jiyu; Jung, Inkyung
2017-01-01
Spatial scan statistics with circular or elliptic scanning windows are commonly used for cluster detection in various applications, such as the identification of geographical disease clusters from epidemiological data. It has been pointed out that the method may have difficulty in correctly identifying non-compact, arbitrarily shaped clusters. In this paper, we evaluated the Gini coefficient for detecting irregularly shaped clusters through a simulation study. The Gini coefficient, the use of which in spatial scan statistics was recently proposed, is a criterion measure for optimizing the maximum reported cluster size. Our simulation study results showed that using the Gini coefficient works better than the original spatial scan statistic for identifying irregularly shaped clusters, by reporting an optimized and refined collection of clusters rather than a single larger cluster. We have provided a real data example that seems to support the simulation results. We think that using the Gini coefficient in spatial scan statistics can be helpful for the detection of irregularly shaped clusters.
Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms
NASA Astrophysics Data System (ADS)
Berkson, Emily E.; Messinger, David W.
2016-05-01
Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.
Nam, Se Jin; Kim, Eun-Kyung; Kim, Min Jung; Moon, Hee Jung; Yoon, Jung Hyun
2015-03-01
OBJECTIVE. The purpose of this article is to evaluate the clinical significance of subcentimeter enhancing lesions incidentally detected on preoperative breast MRI in patients with breast cancer and the role of second-look ultrasound in lesion detection and characterization. MATERIALS AND METHODS. From January 2010 through December 2010, 180 lesions measuring less than 10 mm incidentally detected on MRI in 108 women with second-look ultrasound examinations were included (mean patient age, 47.9 years; mean [± SD] lesion size, 5.56 ± 1.64 mm). Seventy-two (40.0%) lesions were smaller than 5 mm, and 108 (60.0%) were 5 mm or larger. Of the 180 lesions, 103 (57.2%) had been biopsied or excised by localization, and 77 (42.8%) with benign ultrasound features had been followed with ultrasound for at least 2 years. Clinical and imaging features were recorded for analysis. RESULTS. Of the 180 enhancing lesions detected on MRI, 14 (7.8%) were malignant and 166 (92.2%) were benign. The malignancy rate of lesions 5 mm or larger was higher than that for lesions smaller than 5 mm (10.2% vs 4.2%), without statistical significance (p = 0.344). The washout enhancement pattern was statistically significantly associated with malignancy (p = 0.032). Although malignant ultrasound features such as nonparallel orientation were more common in malignant lesions, most malignancies had benign features, including oval shape, parallel orientation, and circumscribed margins, with BI-RADS category 4a (n = 12; 85.8%) as the final assessment. CONCLUSION. Second-look ultrasound is a feasible method for evaluating MRI-detected subcentimeter sized lesions in preoperative assessment of patients with breast cancer. A lower threshold should be applied with consideration of MRI features in deciding whether to biopsy or excise these lesions.
Molecular and statistical approaches to the detection and correction of errors in genotype databases
Brzustowicz, L.M.; Xie, X.; Merette, C.; Townsend, L.; Gilliam, T.C.; Ott, J. )
1993-11-01
Errors in genotyping data have been shown to have a significant effect on the estimation of recombination fractions in high-resolution genetic maps. Previous estimates of errors in existing databases have been limited to the analysis of relatively few markers and have suggested rates in the range 0.5%-1.5%. The present study capitalizes on the fact that within the Centre d'Etude du Polymorphisme Humain (CEPH) collection of reference families, 21 individuals are members of more than one family, with separate DNA samples provided by CEPH for each appearance of these individuals. By comparing the genotypes of these individuals in each of the families in which they occur, an estimated error rate of 1.4% was calculated for all loci in the version 4.0 CEPH database. Removing those individuals who were clearly identified by CEPH as appearing in more than one family resulted in a 3.0% error rate for the remaining samples, suggesting that some error checking of the identified repeated individuals may occur prior to data submission. An error rate of 3.0% for version 4.0 data was also obtained for four chromosome 5 markers that were retyped through the entire CEPH collection. The effects of these errors on a multipoint map were significant, with a total sex-averaged length of 36.09 cM with the errors, and 19.47 cM with the errors corrected. Several statistical approaches to detect and allow for errors during linkage analysis are presented. One method, which identified families containing possible errors on the basis of the impact on the maximum lod score, showed particular promise, especially when combined with the limited retyping of the identified families. The impact of the demonstrated error rate in an established genotype database on high-resolution mapping is significant, raising the question of the overall value of incorporating such existing data into new genetic maps. 15 refs., 8 tabs.
Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy
2016-02-01
Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
Efficient detection of wound-bed and peripheral skin with statistical colour models.
Veredas, Francisco J; Mesa, Héctor; Morente, Laura
2015-04-01
A pressure ulcer is a clinical pathology of localised damage to the skin and underlying tissue caused by pressure, shear or friction. Reliable diagnosis supported by precise wound evaluation is crucial in order to success on treatment decisions. This paper presents a computer-vision approach to wound-area detection based on statistical colour models. Starting with a training set consisting of 113 real wound images, colour histogram models are created for four different tissue types. Back-projections of colour pixels on those histogram models are used, from a Bayesian perspective, to get an estimate of the posterior probability of a pixel to belong to any of those tissue classes. Performance measures obtained from contingency tables based on a gold standard of segmented images supplied by experts have been used for model selection. The resulting fitted model has been validated on a training set consisting of 322 wound images manually segmented and labelled by expert clinicians. The final fitted segmentation model shows robustness and gives high mean performance rates [(AUC: .9426 (SD .0563); accuracy: .8777 (SD .0799); F-score: 0.7389 (SD .1550); Cohen's kappa: .6585 (SD .1787)] when segmenting significant wound areas that include healing tissues.
Statistical detection of slow-mode waves in solar polar regions with SDO/AIA
Su, J. T.
2014-10-01
Observations from the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory are utilized to statistically investigate the propagating quasi-periodic oscillations in the solar polar plume and inter-plume regions. On average, the periods are found to be nearly equal in the three coronal channels of AIA 171 Å, 193 Å, and 211 Å, and the wavelengths increase with temperature from 171 Å, 193 Å, and 211 Å. The phase speeds may be inferred from the above parameters. Furthermore, the speed ratios of v {sub 193}/v {sub 171} and v {sub 211}/v {sub 171} are derived, e.g., 1.4 ± 0.8 and 2.0 ± 1.9 in the plume regions, respectively, which are equivalent to the theoretical ones for acoustic waves. We find that there are no significant differences for the detected parameters between the plume and inter-plume regions. To our knowledge, this is the first time that we have simultaneously obtained the phase speeds of slow-mode waves in the three channels in the open coronal magnetic structures due to the method adopted in the present work, which is able to minimize the influence of the jets or eruptions on wave signals.
NASA Astrophysics Data System (ADS)
Wilson, Mark; Mitra, Sunanda; Roberson, Glenn H.; Shieh, Yao-Yang
1997-10-01
Currently early detection of breast cancer is primarily accomplished by mammography and suspicious findings may lead to a decision for performing a biopsy. Digital enhancement and pattern recognition techniques may aid in early detection of some patterns such as microcalcification clusters indicating onset of DCIS (ductal carcinoma in situ) that accounts for 20% of all mammographically detected breast cancers and could be treated when detected early. These individual calcifications are hard to detect due to size and shape variability and inhomogeneous background texture. Our study addresses only early detection of microcalcifications that allows the radiologist to interpret the x-ray findings in computer-aided enhanced form easier than evaluating the x-ray film directly. We present an algorithm which locates microcalcifications based on local grayscale variability and of tissue structures and image statistics. Threshold filters with lower and upper bounds computed from the image statistics of the entire image and selected subimages were designed to enhance the entire image. This enhanced image was used as the initial image for identifying the micro-calcifications based on the variable box threshold filters at different resolutions. The test images came from the Texas Tech University Health Sciences Center and the MIAS mammographic database, which are classified into various categories including microcalcifications. Classification of other types of abnormalities in mammograms based on their characteristic features is addressed in later studies.
Merkies, I S J; van Nes, S I; Hanna, K; Hughes, R A C; Deng, C
2010-11-01
The ICE trial demonstrated the efficacy of immune globulin intravenous (IGIV-C) over placebo in chronic inflammatory demyelinating polyradiculoneuropathy (CIDP). However, improving the interpretability of the results by analysing the minimum clinically important difference (MCID) had not been considered. To identify MCID thresholds of various outcome measures using different methods and to test treatment differences (IGIV-C vs placebo) using these thresholds. One anchor-based (Short Form-36 question 2) and three distribution-based (½ SD, 1 SE of measurement, and effect size) techniques were employed to identify MCID cut-offs for various impairments (electromyographic parameters, Medical Research Council (MRC) sum score, grip strength, inflammatory neuropathy cause and treatment (INCAT) sensory sum score), disability (INCAT scale score, Rotterdam handicap scale (RHS) score) and quality of life (SF-36). IGIV-C or placebo was administered every 3 weeks for up to 24 weeks to 117 CIDP patients. Patients who did not improve by ≥1 point on the INCAT scale received alternate treatment. The proportion of patients with results exceeding identified MCID thresholds was compared. Results MCID cut-offs for outcomes were determined using each method. For the INCAT disability scale (primary ICE-trial outcome), all MCID methods identified significantly more responders with IGIV-C than placebo. Significant differences favouring IGIV-C were also demonstrated for various nerve conduction parameters, MRC sum score, grip strength, RHS score and SF-36 physical component summary score. In addition to being statistically significant, all MCID analyses showed that CIDP improvements with IGIV-C are clinically meaningful. Consideration of MCID is recommended in future therapeutic trials. Trial Registration Number NCT00220740 (http://ClinicalTrials.gov).
Sivasamy, Aneetha Avalappampatty; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better.
Avalappampatty Sivasamy, Aneetha; Sundan, Bose
2015-01-01
The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668
Nicol, Samuel; Roach, Jennifer K.; Griffith, Brad
2013-01-01
Over the past 50 years, the number and size of high-latitude lakes have decreased throughout many regions; however, individual lake trends have been variable in direction and magnitude. This spatial heterogeneity in lake change makes statistical detection of temporal trends challenging, particularly in small analysis areas where weak trends are difficult to separate from inter- and intra-annual variability. Factors affecting trend detection include inherent variability, trend magnitude, and sample size. In this paper, we investigated how the statistical power to detect average linear trends in lake size of 0.5, 1.0 and 2.0 %/year was affected by the size of the analysis area and the number of years of monitoring in National Wildlife Refuges in Alaska. We estimated power for large (930–4,560 sq km) study areas within refuges and for 2.6, 12.9, and 25.9 sq km cells nested within study areas over temporal extents of 4–50 years. We found that: (1) trends in study areas could be detected within 5–15 years, (2) trends smaller than 2.0 %/year would take >50 years to detect in cells within study areas, and (3) there was substantial spatial variation in the time required to detect change among cells. Power was particularly low in the smallest cells which typically had the fewest lakes. Because small but ecologically meaningful trends may take decades to detect, early establishment of long-term monitoring will enhance power to detect change. Our results have broad applicability and our method is useful for any study involving change detection among variable spatial and temporal extents.
NASA Astrophysics Data System (ADS)
Hoell, Simon; Omenzetter, Piotr
2017-07-01
Considering jointly damage sensitive features (DSFs) of signals recorded by multiple sensors, applying advanced transformations to these DSFs and assessing systematically their contribution to damage detectability and localisation can significantly enhance the performance of structural health monitoring systems. This philosophy is explored here for partial autocorrelation coefficients (PACCs) of acceleration responses. They are interrogated with the help of the linear discriminant analysis based on the Fukunaga-Koontz transformation using datasets of the healthy and selected reference damage states. Then, a simple but efficient fast forward selection procedure is applied to rank the DSF components with respect to statistical distance measures specialised for either damage detection or localisation. For the damage detection task, the optimal feature subsets are identified based on the statistical hypothesis testing. For damage localisation, a hierarchical neuro-fuzzy tool is developed that uses the DSF ranking to establish its own optimal architecture. The proposed approaches are evaluated experimentally on data from non-destructively simulated damage in a laboratory scale wind turbine blade. The results support our claim of being able to enhance damage detectability and localisation performance by transforming and optimally selecting DSFs. It is demonstrated that the optimally selected PACCs from multiple sensors or their Fukunaga-Koontz transformed versions can not only improve the detectability of damage via statistical hypothesis testing but also increase the accuracy of damage localisation when used as inputs into a hierarchical neuro-fuzzy network. Furthermore, the computational effort of employing these advanced soft computing models for damage localisation can be significantly reduced by using transformed DSFs.
NASA Astrophysics Data System (ADS)
Govindan, R. B.; Al-Shargabi, Tareq; Andescavage, Nickie N.; Metzler, Marina; Lenin, R. B.; Plessis, Adré du
2017-01-01
Phase differences of two signals in perfect synchrony exhibit a narrow band distribution, whereas the phase differences of two asynchronous signals exhibit uniform distribution. We assess the statistical significance of the phase synchronization between two signals by using a signed rank test to compare the distribution of their phase differences to the theoretically expected uniform distribution for two asynchronous signals. Using numerical simulation of a second order autoregressive (AR2) process, we show that the proposed approach correctly identifies the coupling between the AR2 process and the driving white noise. We also identify the optimal p-value that distinguishes coupled scenarios from uncoupled ones. To identify the limiting cases, we study the phase synchronization between two independent white noises as a function of bandwidth of the filter in a different second simulation. We identify the frequency bandwidth below which the proposed approach fails and suggest using a data-driven approach for those scenarios. Finally, we demonstrate the application of this approach to study the coupling between beat-to-beat cardiac intervals and continuous blood pressure obtained from critically-ill infants to characterize the baroreflex function.
Bornmann, Lutz; Leydesdorff, Loet
2013-01-01
Using the InCites tool of Thomson Reuters, this study compares normalized citation impact values calculated for China, Japan, France, Germany, United States, and the UK throughout the time period from 1981 to 2010. InCites offers a unique opportunity to study the normalized citation impacts of countries using (i) a long publication window (1981 to 2010), (ii) a differentiation in (broad or more narrow) subject areas, and (iii) allowing for the use of statistical procedures in order to obtain an insightful investigation of national citation trends across the years. Using four broad categories, our results show significantly increasing trends in citation impact values for France, the UK, and especially Germany across the last thirty years in all areas. The citation impact of papers from China is still at a relatively low level (mostly below the world average), but the country follows an increasing trend line. The USA exhibits a stable pattern of high citation impact values across the years. With small impact differences between the publication years, the US trend is increasing in engineering and technology but decreasing in medical and health sciences as well as in agricultural sciences. Similar to the USA, Japan follows increasing as well as decreasing trends in different subject areas, but the variability across the years is small. In most of the years, papers from Japan perform below or approximately at the world average in each subject area.
Bornmann, Lutz; Leydesdorff, Loet
2013-01-01
Using the InCites tool of Thomson Reuters, this study compares normalized citation impact values calculated for China, Japan, France, Germany, United States, and the UK throughout the time period from 1981 to 2010. InCites offers a unique opportunity to study the normalized citation impacts of countries using (i) a long publication window (1981 to 2010), (ii) a differentiation in (broad or more narrow) subject areas, and (iii) allowing for the use of statistical procedures in order to obtain an insightful investigation of national citation trends across the years. Using four broad categories, our results show significantly increasing trends in citation impact values for France, the UK, and especially Germany across the last thirty years in all areas. The citation impact of papers from China is still at a relatively low level (mostly below the world average), but the country follows an increasing trend line. The USA exhibits a stable pattern of high citation impact values across the years. With small impact differences between the publication years, the US trend is increasing in engineering and technology but decreasing in medical and health sciences as well as in agricultural sciences. Similar to the USA, Japan follows increasing as well as decreasing trends in different subject areas, but the variability across the years is small. In most of the years, papers from Japan perform below or approximately at the world average in each subject area. PMID:23418600
NASA Astrophysics Data System (ADS)
He, Yu-Hao; Chao-Lin, Lü; Zhang, Wei-Jun; Zhang, Lu; Wu, Jun-Jie; Chen, Si-Jing; You, Li-Xing; Wang, Zhen
2015-06-01
A new method to study the transient detection efficiency (DE) and pulse amplitude of superconducting nanowire single photon detectors (SNSPD) during the current recovery process is proposed — statistically analyzing the single photon response under photon illumination with a high repetition rate. The transient DE results match well with the DEs deduced from the static current dependence of DE combined with the waveform of a single-photon detection event. This proves that static measurement results can be used to analyze the transient current recovery process after a detection event. The results are relevant for understanding the current recovery process of SNSPDs after a detection event and for determining the counting rate of SNSPDs. Project supported by the Strategic Priority Research Program (B) of the Chinese Academy of Sciences (Grant No. XDB04010200), the National Basic Research Program of China (Grant No. 2011CBA00202), and the National Natural Science Foundation of China (Grant No. 61401441).
Statistics provide guidance for indigenous organic carbon detection on Mars missions.
Sephton, Mark A; Carter, Jonathan N
2014-08-01
Data from the Viking and Mars Science Laboratory missions indicate the presence of organic compounds that are not definitively martian in origin. Both contamination and confounding mineralogies have been suggested as alternatives to indigenous organic carbon. Intuitive thought suggests that we are repeatedly obtaining data that confirms the same level of uncertainty. Bayesian statistics may suggest otherwise. If an organic detection method has a true positive to false positive ratio greater than one, then repeated organic matter detection progressively increases the probability of indigeneity. Bayesian statistics also reveal that methods with higher ratios of true positives to false positives give higher overall probabilities and that detection of organic matter in a sample with a higher prior probability of indigenous organic carbon produces greater confidence. Bayesian statistics, therefore, provide guidance for the planning and operation of organic carbon detection activities on Mars. Suggestions for future organic carbon detection missions and instruments are as follows: (i) On Earth, instruments should be tested with analog samples of known organic content to determine their true positive to false positive ratios. (ii) On the mission, for an instrument with a true positive to false positive ratio above one, it should be recognized that each positive detection of organic carbon will result in a progressive increase in the probability of indigenous organic carbon being present; repeated measurements, therefore, can overcome some of the deficiencies of a less-than-definitive test. (iii) For a fixed number of analyses, the highest true positive to false positive ratio method or instrument will provide the greatest probability that indigenous organic carbon is present. (iv) On Mars, analyses should concentrate on samples with highest prior probability of indigenous organic carbon; intuitive desires to contrast samples of high prior probability and low prior
ERIC Educational Resources Information Center
Harrison, Judith; Thompson, Bruce; Vannest, Kimberly J.
2009-01-01
This article reviews the literature on interventions targeting the academic performance of students with attention-deficit/hyperactivity disorder (ADHD) and does so within the context of the statistical significance testing controversy. Both the arguments for and against null hypothesis statistical significance tests are reviewed. Recent standards…
Recommended methods for statistical analysis of data containing less-than-detectable measurements
Atwood, C.L.; Blackwood, L.G.; Harris, G.A.; Loehr, C.A.
1990-09-01
This report is a manual for statistical workers dealing with environmental measurements, when some of the measurements are not given exactly but are only reported as less than detectable. For some statistical settings with such data, many methods have been proposed in the literature, while for others few or none have been proposed. This report gives a recommended method in each of the settings considered. The body of the report gives a brief description of each recommended method. Appendix A gives example programs using the statistical package SAS, for those methods that involve nonstandard methods. Appendix B presents the methods that were compared and the reasons for selecting each recommended method, and explains any fine points that might be of interest. This is an interim version. Future revisions will complete the recommendations. 34 refs., 2 figs., 11 tabs.
Recommended methods for statistical analysis of data containing less-than-detectable measurements
Atwood, C.L.; Blackwood, L.G.; Harris, G.A.; Loehr, C.A.
1991-09-01
This report is a manual for statistical workers dealing with environmental measurements, when some of the measurements are not given exactly but are only reported as less than detectable. For some statistical settings with such data, many methods have been proposed in the literature, while for others few or none have been proposed. This report gives a recommended method in each of the settings considered. The body of the report gives a brief description of each recommended method. Appendix A gives example programs using the statistical package SAS, for those methods that involve nonstandard methods. Appendix B presents the methods that were compared and the reasons for selecting each recommended method, and explains any fine points that might be of interest. 7 refs., 4 figs.
A statistical approach of fatigue crack detection for a structural hotspot
NASA Astrophysics Data System (ADS)
Jin, Pei; Zhou, Li
2012-04-01
This work focuses on an unsupervised, data driven statistical approach to detect and monitor fatigue crack growth in lug joint samples using surface mounted piezoelectric sensors. Early and faithful detection of fatigue cracks in a lug joint can guide in taking preventive measures, thus avoiding any possible fatal structural failure. The on-line damage state at any given fatigue cycle is estimated using a damage index approach as the dynamical properties of a structure change with the initiation of a new crack or the growth of an existing crack. Using the measurements performed on an intact lug joint as baseline, damage indices are evaluated from the frequency response of the lug joint with an unknown damage state. As the damage indices are evaluated, a Bayesian analysis is committed and a statistical metric is evaluated to identify damage state(say crack length).
Signal waveform detection with statistical automaton for internet and web service streaming.
Tseng, Kuo-Kun; Ji, Yuzhu; Liu, Yiming; Huang, Nai-Lun; Zeng, Fufu; Lin, Fang-Ying
2014-01-01
In recent years, many approaches have been suggested for Internet and web streaming detection. In this paper, we propose an approach to signal waveform detection for Internet and web streaming, with novel statistical automatons. The system records network connections over a period of time to form a signal waveform and compute suspicious characteristics of the waveform. Network streaming according to these selected waveform features by our newly designed Aho-Corasick (AC) automatons can be classified. We developed two versions, that is, basic AC and advanced AC-histogram waveform automata, and conducted comprehensive experimentation. The results confirm that our approach is feasible and suitable for deployment.
Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales.
Goldenberg, Anna; Shmueli, Galit; Caruana, Richard A; Fienberg, Stephen E
2002-04-16
The recent series of anthrax attacks has reinforced the importance of biosurveillance systems for the timely detection of epidemics. This paper describes a statistical framework for monitoring grocery data to detect a large-scale but localized bioterrorism attack. Our system illustrates the potential of data sources that may be more timely than traditional medical and public health data. The system includes several layers, each customized to grocery data and tuned to finding footprints of an epidemic. We also propose an evaluation methodology that is suitable in the absence of data on large-scale bioterrorist attacks and disease outbreaks.
Hu, Juju; Hu, Haijiang; Ji, Yinghua
2010-03-15
Periodic nonlinearity that ranges from tens of nanometers to a few nanometers in heterodyne interferometer limits its use in high accuracy measurement. A novel method is studied to detect the nonlinearity errors based on the electrical subdivision and the analysis method of statistical signal in heterodyne Michelson interferometer. Under the movement of micropositioning platform with the uniform velocity, the method can detect the nonlinearity errors by using the regression analysis and Jackknife estimation. Based on the analysis of the simulations, the method can estimate the influence of nonlinearity errors and other noises for the dimensions measurement in heterodyne Michelson interferometer.
Signal Waveform Detection with Statistical Automaton for Internet and Web Service Streaming
Liu, Yiming; Huang, Nai-Lun; Zeng, Fufu; Lin, Fang-Ying
2014-01-01
In recent years, many approaches have been suggested for Internet and web streaming detection. In this paper, we propose an approach to signal waveform detection for Internet and web streaming, with novel statistical automatons. The system records network connections over a period of time to form a signal waveform and compute suspicious characteristics of the waveform. Network streaming according to these selected waveform features by our newly designed Aho-Corasick (AC) automatons can be classified. We developed two versions, that is, basic AC and advanced AC-histogram waveform automata, and conducted comprehensive experimentation. The results confirm that our approach is feasible and suitable for deployment. PMID:25032231
Banks-Leite, Cristina; Pardini, Renata; Boscolo, Danilo; Cassano, Camila Righetto; Püttker, Thomas; Barros, Camila Santos; Barlow, Jos
2014-01-01
1. In recent years, there has been a fast development of models that adjust for imperfect detection. These models have revolutionized the analysis of field data, and their use has repeatedly demonstrated the importance of sampling design and data quality. There are, however, several practical limitations associated with the use of detectability models which restrict their relevance to tropical conservation science. 2. We outline the main advantages of detectability models, before examining their limitations associated with their applicability to the analysis of tropical communities, rare species and large-scale data sets. Finally, we discuss whether detection probability needs to be controlled before and/or after data collection. 3. Models that adjust for imperfect detection allow ecologists to assess data quality by estimating uncertainty and to obtain adjusted ecological estimates of populations and communities. Importantly, these models have allowed informed decisions to be made about the conservation and management of target species. 4. Data requirements for obtaining unadjusted estimates are substantially lower than for detectability-adjusted estimates, which require relatively high detection/recapture probabilities and a number of repeated surveys at each location. These requirements can be difficult to meet in large-scale environmental studies where high levels of spatial replication are needed, or in the tropics where communities are composed of many naturally rare species. However, while imperfect detection can only be adjusted statistically, covariates of detection probability can also be controlled through study design. Using three study cases where we controlled for covariates of detection probability through sampling design, we show that the variation in unadjusted ecological estimates from nearly 100 species was qualitatively the same as that obtained from adjusted estimates. Finally, we discuss that the decision as to whether one should control for
Malm, Christer B; Khoo, Nelson S; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas
2016-01-01
The discovery of erythropoietin (EPO) simplified blood doping in sports, but improved detection methods, for EPO has forced cheating athletes to return to blood transfusion. Autologous blood transfusion with cryopreserved red blood cells (RBCs) is the method of choice, because no valid method exists to accurately detect such event. In endurance sports, it can be estimated that elite athletes improve performance by up to 3% with blood doping, regardless of method. Valid detection methods for autologous blood doping is important to maintain credibility of athletic performances. Recreational male (N = 27) and female (N = 11) athletes served as Transfusion (N = 28) and Control (N = 10) subjects in two different transfusion settings. Hematological variables and physical performance were measured before donation of 450 or 900 mL whole blood, and until four weeks after re-infusion of the cryopreserved RBC fraction. Blood was analyzed for transferrin, iron, Hb, EVF, MCV, MCHC, reticulocytes, leucocytes and EPO. Repeated measures multivariate analysis of variance (MANOVA) and pattern recognition using Principal Component Analysis (PCA) and Orthogonal Projections of Latent Structures (OPLS) discriminant analysis (DA) investigated differences between Control and Transfusion groups over time. Significant increase in performance (15 ± 8%) and VO2max (17 ± 10%) (mean ± SD) could be measured 48 h after RBC re-infusion, and remained increased for up to four weeks in some subjects. In total, 533 blood samples were included in the study (Clean = 220, Transfused = 313). In response to blood transfusion, the largest change in hematological variables occurred 48 h after blood donation, when Control and Transfused groups could be separated with OPLS-DA (R2 = 0.76/Q2 = 0.59). RBC re-infusion resulted in the best model (R2 = 0.40/Q2 = 0.10) at the first sampling point (48 h), predicting one false positive and one false negative. Over all, a 25% and 86% false positives ratio was
[RQ-PCR detection of GST-π and LRP genes in adult acute leukemia and its clinical significance].
Wang, Jing; Xiao, Zhen
2012-02-01
This study was aimed to detect the glutathione S-transferase-π (GST-π) and lung resistance-related protein (LRP) genes and to investigate their relationship with multidrug resistance (MDR) of patients with acute leukemia (AL). Real-time fluorescent quantitative reverse transcription polymerase chain reaction (RQ-PCR) was used to detect the expression of GST-π and LRP genes in peripheral blood mononuclear cells from 44 AL patients and 27 normal subjects. The results showed that the significant difference in GST-π expression level was found between newly diagnosed patients and complete remission patients and between refractory patients and complete remission patients (P < 0.01), while expression level of LRP genes showed obvious difference (P ≤ 0.01) between newly diagnosed patients and refractory patients and between complete remission patients and refractory patients. Statistical analysis indicated that there was no correlation between GST-π gene and LRP gene. The expression of GST-π and LRP genes was not significantly different in different white blood cell (WBC) count groups and different clinical typing groups (ALL and ANLL). It is concluded that the mechanism of MDR resulting from GST-π and LRP genes is different, thereby combination detection of GST-π and LRP genes demonstrates a larger role for evaluating prognosis of AL patients, as compared with detection of GST-π or LRP gene alone. The WBC count and leukemia typing have no relationship with expression of GST-π and LRP genes.
A statistical model of the photomultiplier gain process with applications to optical pulse detection
NASA Technical Reports Server (NTRS)
Tan, H. H.
1982-01-01
A Markov diffusion model was used to determine an approximate probability density for the random gain. This approximate density preserves the correct second-order statistics and appears to be in reasonably good agreement with experimental data. The receiver operating curve for a pulse counter detector of PMT cathode emission events was analyzed using this density. The error performance of a simple binary direct detection optical communication system was also derived. Previously announced in STAR as N82-25100
A statistical model of the photomultiplier gain process with applications to optical pulse detection
NASA Technical Reports Server (NTRS)
Tan, H. H.
1982-01-01
A Markov diffusion model was used to determine an approximate probability density for the random gain. This approximate density preserves the correct second-order statistics and appears to be in reasonably good agreement with experimental data. The receiver operating curve for a pulse counter detector of PMT cathode emission events was analyzed using this density. The error performance of a simple binary direct detection optical communication system was also derived.
Fernández-Llamazares, Alvaro; Belmonte, Jordina; Delgado, Rosario; De Linares, Concepción
2014-04-01
Airborne pollen records are a suitable indicator for the study of climate change. The present work focuses on the role of annual pollen indices for the detection of bioclimatic trends through the analysis of the aerobiological spectra of 11 taxa of great biogeographical relevance in Catalonia over an 18-year period (1994-2011), by means of different parametric and non-parametric statistical methods. Among others, two non-parametric rank-based statistical tests were performed for detecting monotonic trends in time series data of the selected airborne pollen types and we have observed that they have similar power in detecting trends. Except for those cases in which the pollen data can be well-modeled by a normal distribution, it is better to apply non-parametric statistical methods to aerobiological studies. Our results provide a reliable representation of the pollen trends in the region and suggest that greater pollen quantities are being liberated to the atmosphere in the last years, specially by Mediterranean taxa such as Pinus, Total Quercus and Evergreen Quercus, although the trends may differ geographically. Longer aerobiological monitoring periods are required to corroborate these results and survey the increasing levels of certain pollen types that could exert an impact in terms of public health.
Effects of measurement statistics on the detection of damage in the Alamosa Canyon Bridge
Doebling, S.W.; Farrar, C.R.; Goodman, R.S.
1996-12-31
This paper presents a comparison of the statistics on the measured model parameters of a bridge structure to the expected changes in those parameters caused by damage. It is then determined if the changes resulting from damage are statistically significant. This paper considers the most commonly used modal parameters for indication of damage: modal frequency, mode shape, and mode shape curvature. The approach is divided into two steps. First, the relative uncertainties (arising from random error sources) of the measured modal frequencies, mode shapes, and mode shape curvatures are determined by Monte Carlo analysis of the measured data. Based on these uncertainties, 95% statistical confidence bounds are computed for these parameters. The second step is the determination of the measured change in these parameters resulting from structural damage. Changes which are outside the 95% bounds are considered to be statistically significant. It is proposed that this statistical significance can be used to selectively filter which modes are used for damage identification. The primary conclusion of the paper is that the selection of the appropriate parameters to use in the damage identification algorithm must take into account not only the sensitivity of the damage indicator to the structural deterioration, but also the uncertainty inherent in the measurement of the parameters used to compute the indicator.
Denton, Debra L; Diamond, Jerry; Zheng, Lei
2011-05-01
The U.S. Environmental Protection Agency (U.S. EPA) and state agencies implement the Clean Water Act, in part, by evaluating the toxicity of effluent and surface water samples. A common goal for both regulatory authorities and permittees is confidence in an individual test result (e.g., no-observed-effect concentration [NOEC], pass/fail, 25% effective concentration [EC25]), which is used to make regulatory decisions, such as reasonable potential determinations, permit compliance, and watershed assessments. This paper discusses an additional statistical approach (test of significant toxicity [TST]), based on bioequivalence hypothesis testing, or, more appropriately, test of noninferiority, which examines whether there is a nontoxic effect at a single concentration of concern compared with a control. Unlike the traditional hypothesis testing approach in whole effluent toxicity (WET) testing, TST is designed to incorporate explicitly both α and β error rates at levels of toxicity that are unacceptable and acceptable, given routine laboratory test performance for a given test method. Regulatory management decisions are used to identify unacceptable toxicity levels for acute and chronic tests, and the null hypothesis is constructed such that test power is associated with the ability to declare correctly a truly nontoxic sample as acceptable. This approach provides a positive incentive to generate high-quality WET data to make informed decisions regarding regulatory decisions. This paper illustrates how α and β error rates were established for specific test method designs and tests the TST approach using both simulation analyses and actual WET data. In general, those WET test endpoints having higher routine (e.g., 50th percentile) within-test control variation, on average, have higher method-specific α values (type I error rate), to maintain a desired type II error rate. This paper delineates the technical underpinnings of this approach and demonstrates the benefits
NASA Technical Reports Server (NTRS)
Friedlander, Alan L.; Harry, David P., III
1960-01-01
An exploratory analysis of vehicle guidance during the approach to a target planet is presented. The objective of the guidance maneuver is to guide the vehicle to a specific perigee distance with a high degree of accuracy and minimum corrective velocity expenditure. The guidance maneuver is simulated by considering the random sampling of real measurements with significant error and reducing this information to prescribe appropriate corrective action. The instrumentation system assumed includes optical and/or infrared devices to indicate range and a reference angle in the trajectory plane. Statistical results are obtained by Monte-Carlo techniques and are shown as the expectation of guidance accuracy and velocity-increment requirements. Results are nondimensional and applicable to any planet within limits of two-body assumptions. The problem of determining how many corrections to make and when to make them is a consequence of the conflicting requirement of accurate trajectory determination and propulsion. Optimum values were found for a vehicle approaching a planet along a parabolic trajectory with an initial perigee distance of 5 radii and a target perigee of 1.02 radii. In this example measurement errors were less than i minute of arc. Results indicate that four corrections applied in the vicinity of 50, 16, 15, and 1.5 radii, respectively, yield minimum velocity-increment requirements. Thrust devices capable of producing a large variation of velocity-increment size are required. For a vehicle approaching the earth, miss distances within 32 miles are obtained with 90-percent probability. Total velocity increments used in guidance are less than 3300 feet per second with 90-percent probability. It is noted that the above representative results are valid only for the particular guidance scheme hypothesized in this analysis. A parametric study is presented which indicates the effects of measurement error size, initial perigee, and initial energy on the guidance
Statistical issues and challenges associated with rapid detection of bio-terrorist attacks.
Fienberg, Stephen E; Shmueli, Galit
2005-02-28
The traditional focus for detecting outbreaks of an epidemic or bio-terrorist attack has been on the collection and analysis of medical and public health data. Although such data are the most direct indicators of symptoms, they tend to be collected, delivered, and analysed days, weeks, and even months after the outbreak. By the time this information reaches decision makers it is often too late to treat the infected population or to react in some other way. In this paper, we explore different sources of data, traditional and non-traditional, that can be used for detecting a bio-terrorist attack in a timely manner. We set our discussion in the context of state-of-the-art syndromic surveillance systems and we focus on statistical issues and challenges associated with non-traditional data sources and the timely integration of multiple data sources for detection purposes. Copyright 2005 John Wiley & Sons, Ltd.
NASA Astrophysics Data System (ADS)
Kim, Hyeonsu; Seo, Jongpil; Ahn, Jongmin; Chung, Jaehak
2017-07-01
We propose a mitigation scheme for snapping shrimp noise when it corrupts an orthogonal frequency division multiplexing (OFDM) signal in underwater acoustic communication systems. The OFDM signal distorted by the snapping shrimp noise is filtered by a band-stop filter. The snapping shrimp noises in the filtered signal are detected by a detector with a constant false alarm rate whose threshold is derived theoretically from the statistics of the background noise. The detected signals are reconstructed by a simple reconstruction method. The proposed scheme has a higher detection capability and a lower mean square error of the channel estimation for simulated data and a lower bit error rate for practical ocean OFDM data collected in northern East China Sea than the conventional noise-mitigating methods.
NASA Astrophysics Data System (ADS)
Zeng, Bobo; Wang, Guijin; Ruan, Zhiwei; Lin, Xinggang; Meng, Long
2012-07-01
High-performance pedestrian detection with good accuracy and fast speed is an important yet challenging task in computer vision. We design a novel feature named pair normalized channel feature (PNCF), which simultaneously combines and normalizes two channel features in image channels, achieving a highly discriminative power and computational efficiency. PNCF applies to both gradient channels and color channels so that shape and appearance information are described and integrated in the same feature. To efficiently explore the formidably large PNCF feature space, we propose a statistics-based feature learning method to select a small number of potentially discriminative candidate features, which are fed into the boosting algorithm. In addition, channel compression and a hybrid pyramid are employed to speed up the multiscale detection. Experiments illustrate the effectiveness of PNCF and its learning method. Our proposed detector outperforms the state-of-the-art on several benchmark datasets in both detection accuracy and efficiency.
Dipnall, Joanna F; Pasco, Julie A; Berk, Michael; Williams, Lana J; Dodd, Seetal; Jacka, Felice N; Meyer, Denny
2016-01-01
Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009-2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and complex survey sampling methodology and
Statistical method for detecting phase shifts in alpha rhythm from human electroencephalogram data
NASA Astrophysics Data System (ADS)
Naruse, Yasushi; Takiyama, Ken; Okada, Masato; Umehara, Hiroaki
2013-04-01
We developed a statistical method for detecting discontinuous phase changes (phase shifts) in fluctuating alpha rhythms in the human brain from electroencephalogram (EEG) data obtained in a single trial. This method uses the state space models and the line process technique, which is a Bayesian method for detecting discontinuity in an image. By applying this method to simulated data, we were able to detect the phase and amplitude shifts in a single simulated trial. Further, we demonstrated that this method can detect phase shifts caused by a visual stimulus in the alpha rhythm from experimental EEG data even in a single trial. The results for the experimental data showed that the timings of the phase shifts in the early latency period were similar between many of the trials, and that those in the late latency period were different between the trials. The conventional averaging method can only detect phase shifts that occur at similar timings between many of the trials, and therefore, the phase shifts that occur at differing timings cannot be detected using the conventional method. Consequently, our obtained results indicate the practicality of our method. Thus, we believe that our method will contribute to studies examining the phase dynamics of nonlinear alpha rhythm oscillators.
Chiang, Michael F.; Melia, Michele; Buffenn, Angela N.; Lambert, Scott R.; Recchia, Franco M.; Simpson, Jennifer L.; Yang, Michael B.
2013-01-01
Objective To evaluate the accuracy of detecting clinically significant retinopathy of prematurity (ROP) using wide-angle digital retinal photography. Methods Literature searches of PubMed and the Cochrane Library databases were conducted last on December 7, 2010, and yielded 414 unique citations. The authors assessed these 414 citations and marked 82 that potentially met the inclusion criteria. These 82 studies were reviewed in full text; 28 studies met inclusion criteria. The authors extracted from these studies information about study design, interventions, outcomes, and study quality. After data abstraction, 18 were excluded for study deficiencies or because they were superseded by a more recent publication. The methodologist reviewed the remaining 10 studies and assigned ratings of evidence quality; 7 studies were rated level I evidence and 3 studies were rated level III evidence. Results There is level I evidence from ≥5 studies demonstrating that digital retinal photography has high accuracy for detection of clinically significant ROP. Level III studies have reported high accuracy, without any detectable complications, from real-world operational programs intended to detect clinically significant ROP through remote site interpretation of wide-angle retinal photographs. Conclusions Wide-angle digital retinal photography has the potential to complement standard ROP care. It may provide advantages through objective documentation of clinical examination findings, improved recognition of disease progression by comparing previous photographs, and the creation of image libraries for education and research. Financial Disclosure(s) Proprietary or commercial disclosure may be found after the references. PMID:22541632
FDM: a graph-based statistical method to detect differential transcription using RNA-seq data.
Singh, Darshan; Orellana, Christian F; Hu, Yin; Jones, Corbin D; Liu, Yufeng; Chiang, Derek Y; Liu, Jinze; Prins, Jan F
2011-10-01
In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq). We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region as differentially transcribed
FDM: a graph-based statistical method to detect differential transcription using RNA-seq data
Singh, Darshan; Orellana, Christian F.; Hu, Yin; Jones, Corbin D.; Liu, Yufeng; Chiang, Derek Y.; Liu, Jinze; Prins, Jan F.
2011-01-01
Motivation: In eukaryotic cells, alternative splicing expands the diversity of RNA transcripts and plays an important role in tissue-specific differentiation, and can be misregulated in disease. To understand these processes, there is a great need for methods to detect differential transcription between samples. Our focus is on samples observed using short-read RNA sequencing (RNA-seq). Methods: We characterize differential transcription between two samples as the difference in the relative abundance of the transcript isoforms present in the samples. The magnitude of differential transcription of a gene between two samples can be measured by the square root of the Jensen Shannon Divergence (JSD*) between the gene's transcript abundance vectors in each sample. We define a weighted splice-graph representation of RNA-seq data, summarizing in compact form the alignment of RNA-seq reads to a reference genome. The flow difference metric (FDM) identifies regions of differential RNA transcript expression between pairs of splice graphs, without need for an underlying gene model or catalog of transcripts. We present a novel non-parametric statistical test between splice graphs to assess the significance of differential transcription, and extend it to group-wise comparison incorporating sample replicates. Results: Using simulated RNA-seq data consisting of four technical replicates of two samples with varying transcription between genes, we show that (i) the FDM is highly correlated with JSD* (r=0.82) when average RNA-seq coverage of the transcripts is sufficiently deep; and (ii) the FDM is able to identify 90% of genes with differential transcription when JSD* >0.28 and coverage >7. This represents higher sensitivity than Cufflinks (without annotations) and rDiff (MMD), which respectively identified 69 and 49% of the genes in this region as differential transcribed. Using annotations identifying the transcripts, Cufflinks was able to identify 86% of the genes in this region
NASA Astrophysics Data System (ADS)
Millard, Steven P.; Deverel, Steven J.
1988-12-01
As concern over the effects of trace amounts of pollutants has increased, so has the need for statistical methods that deal appropriately with data that include values reported as "less than" the detection limit. It has become increasingly common for water quality data to include censored values that reflect more than one detection limit for a single analyte. For such multiply censored data sets, standard statistical methods (for example, to compare analyte concentration in two areas) are not valid. In such cases, methods from the biostatistical field of survival analysis are applicable. Several common two-sample censored data rank tests are explained, and their behaviors are studied via a Monte Carlo simulation in which sample sizes and censoring mechanisms are varied under an assumed lognormal distribution. These tests are applied to shallow groundwater chemistry data from two sites in the San Joaquin Valley, California. The best overall test, in terms of maintained α level, is the normal scores test based on a permutation variance. In cases where the α level is maintained, however, the Peto-Prentice statistic based on an asymptotic variance performs as well or better.
NASA Astrophysics Data System (ADS)
Ortega-Martinez, Antonio; Padilla-Martinez, Juan Pablo; Franco, Walfre
2016-04-01
The skin contains several fluorescent molecules or fluorophores that serve as markers of structure, function and composition. UV fluorescence excitation photography is a simple and effective way to image specific intrinsic fluorophores, such as the one ascribed to tryptophan which emits at a wavelength of 345 nm upon excitation at 295 nm, and is a marker of cellular proliferation. Earlier, we built a clinical UV photography system to image cellular proliferation. In some samples, the naturally low intensity of the fluorescence can make it difficult to separate the fluorescence of cells in higher proliferation states from background fluorescence and other imaging artifacts -- like electronic noise. In this work, we describe a statistical image segmentation method to separate the fluorescence of interest. Statistical image segmentation is based on image averaging, background subtraction and pixel statistics. This method allows to better quantify the intensity and surface distributions of fluorescence, which in turn simplify the detection of borders. Using this method we delineated the borders of highly-proliferative skin conditions and diseases, in particular, allergic contact dermatitis, psoriatic lesions and basal cell carcinoma. Segmented images clearly define lesion borders. UV fluorescence excitation photography along with statistical image segmentation may serve as a quick and simple diagnostic tool for clinicians.
Is systematic sextant biopsy suitable for the detection of clinically significant prostate cancer?
Manseck, A; Froehner, M; Oehlschlaeger, S; Hakenberg, O; Friedrich, K; Theissig, F; Wirth, M P
2000-01-01
The optimal extent of the prostate biopsy remains controversial. There is a need to avoid detection of insignificant cancer but not to miss significant and curable tumors. In alternative treatments of prostate cancer, repeated sextant biopsies are used to estimate the response. The aim of this study was to investigate the reliability of a repeated systematic sextant biopsy as the standard biopsy technique in patients with significant tumors which are being considered for curative treatment. Systematic sextant biopsy was performed in vitro in 92 radical prostatectomy specimens. Of these patients, 81 (88.0%) had palpable lesions. Of the 92 investigated patients, 70 (76.1%) had potentially curable pT2-3pN0 prostate cancers. In these patients, the cancer was detected only in 72.9% of cases by a repeated in vitro biopsy. In the pT2 tumors, there was a detection rate of only 66.7%. This study underlines the fact that a considerable number of significant and potentially curable tumors remain undetected by the conventional sextant biopsy. A negative sextant biopsy does not rule out significant prostate cancer. Copyright 2000 S. Karger AG, Basel
NASA Technical Reports Server (NTRS)
Moore, G. K.
1976-01-01
An investigation was carried out to determine the feasibility of mapping lineaments on SKYLAB photographs of central Tennessee and to determine the hydrologic significance of these lineaments, particularly as concerns the occurrence and productivity of ground water. Sixty-nine percent more lineaments were found on SKYLAB photographs by stereo viewing than by projection viewing, but longer lineaments were detected by projection viewing. Most SKYLAB lineaments consisted of topographic depressions and they followed or paralleled the streams. The remainder were found by vegetation alinements and the straight sides of ridges. Test drilling showed that the median yield of wells located on SKYLAB lineaments were about six times the median yield of wells located by random drilling. The best single detection method, in terms of potential savings, was stereo viewing. Larger savings might be achieved by locating wells on lineaments detected by both stereo viewing and projection.
Oberer, Richard B.
2002-10-01
The current practice of nondestructive assay (NDA) of fissile materials using neutrons is dominated by the ^{3}He detector. This has been the case since the mid 1980s when Fission Multiplicity Detection (FMD) was replaced with thermal well counters and neutron multiplicity counting (NMC). The thermal well counters detect neutrons by neutron capture in the ^{3}He detector subsequent to moderation. The process of detection requires from 30 to 60 μs. As will be explained in Section 3.3 the rate of detecting correlated neutrons (signal) from the same fission are independent of this time but the rate of accidental correlations (noise) are proportional to this time. The well counters are at a distinct disadvantage when there is a large source of uncorrelated neutrons present from (α, n) reactions for example. Plastic scintillating detectors, as were used in FMD, require only about 20 ns to detect neutrons from fission. One thousandth as many accidental coincidences are therefore accumulated. The major problem with the use of fast-plastic scintillation detectors, however, is that both neutrons and gamma rays are detected. The pulses from the two are indistinguishable in these detectors. For this thesis, a new technique was developed to use higher-order time correlation statistics to distinguish combinations of neutron and gamma ray detections in fast-plastic scintillation detectors. A system of analysis to describe these correlations was developed based on simple physical principles. Other sources of correlations from non-fission events are identified and integrated into the analysis developed for fission events. A number of ratios and metric are identified to determine physical properties of the source from the correlations. It is possible to determine both the quantity being measured and detection efficiency from these ratios from a single measurement without a separate calibration. To account for detector dead-time, an alternative analytical technique
Irshad, Humayun; Roux, Ludovic; Racoceanu, Daniel
2013-01-01
Accurate counting of mitosis in breast cancer histopathology plays a critical role in the grading process. Manual counting of mitosis is tedious and subject to considerable inter- and intra-reader variations. This work aims at improving the accuracy of mitosis detection by selecting the color channels that better capture the statistical and morphological features having mitosis discrimination from other objects. The proposed framework includes comprehensive analysis of first and second order statistical features together with morphological features in selected color channels and a study on balancing the skewed dataset using SMOTE method for increasing the predictive accuracy of mitosis classification. The proposed framework has been evaluated on MITOS data set during an ICPR 2012 contest and ranked second from 17 finalists. The proposed framework achieved 74% detection rate, 70% precision and 72% F-Measure. In future work, we plan to apply our mitosis detection tool to images produced by different types of slide scanners, including multi-spectral and multi-focal microscopy.
A Space–Time Permutation Scan Statistic for Disease Outbreak Detection
2005-01-01
Background The ability to detect disease outbreaks early is important in order to minimize morbidity and mortality through timely implementation of disease prevention and control measures. Many national, state, and local health departments are launching disease surveillance systems with daily analyses of hospital emergency department visits, ambulance dispatch calls, or pharmacy sales for which population-at-risk information is unavailable or irrelevant. Methods and Findings We propose a prospective space–time permutation scan statistic for the early detection of disease outbreaks that uses only case numbers, with no need for population-at-risk data. It makes minimal assumptions about the time, geographical location, or size of the outbreak, and it adjusts for natural purely spatial and purely temporal variation. The new method was evaluated using daily analyses of hospital emergency department visits in New York City. Four of the five strongest signals were likely local precursors to citywide outbreaks due to rotavirus, norovirus, and influenza. The number of false signals was at most modest. Conclusion If such results hold up over longer study times and in other locations, the space–time permutation scan statistic will be an important tool for local and national health departments that are setting up early disease detection surveillance systems. PMID:15719066
Osche, G R
2000-08-20
Single- and multiple-pulse detection statistics are presented for aperture-averaged direct detection optical receivers operating against partially developed speckle fields. A partially developed speckle field arises when the probability density function of the received intensity does not follow negative exponential statistics. The case of interest here is the target surface that exhibits diffuse as well as specular components in the scattered radiation. An approximate expression is derived for the integrated intensity at the aperture, which leads to single- and multiple-pulse discrete probability density functions for the case of a Poisson signal in Poisson noise with an additive coherent component. In the absence of noise, the single-pulse discrete density function is shown to reduce to a generalized negative binomial distribution. The radar concept of integration loss is discussed in the context of direct detection optical systems where it is shown that, given an appropriate set of system parameters, multiple-pulse processing can be more efficient than single-pulse processing over a finite range of the integration parameter n.
Tavares, Gilberto; Zsigraiová, Zdena; Semiao, Viriato; Carvalho, Maria da Graca
2011-07-01
This work proposes the application of two multivariate statistical methods, principal component analysis (PCA) and partial least square (PLS), to a continuous process of a municipal solid waste (MSW) moving grate-type incinerator for process control--monitoring, fault detection and diagnosis--through the extraction of information from historical data. PCA model is built for process monitoring capable of detecting abnormal situations and the original 16-variable process dimension is reduced to eight, the first 4 being able to capture together 86% of the total process variation. PLS model is constructed to predict the generated superheated steam flow rate allowing for control of its set points. The model retained six of the original 13 variables, explaining together 90% of the input variation and almost 98% of the output variation. The proposed methodology is demonstrated by applying those multivariate statistical methods to process data continuously measured in an actual incinerator. Both models exhibited very good performance in fault detection and isolation. In predicting the generated superheated steam flow rate for its set point control the PLS model performed very well with low prediction errors (RMSE of 3.1 and 4.1).
Performance analysis of Wald-statistic based network detection methods for radiation sources
Sen, Satyabrata; Rao, Nageswara S; Wu, Qishi; Barry, M. L..; Grieme, M.; Brooks, Richard R; Cordone, G.
2016-01-01
There have been increasingly large deployments of radiation detection networks that require computationally fast algorithms to produce prompt results over ad-hoc sub-networks of mobile devices, such as smart-phones. These algorithms are in sharp contrast to complex network algorithms that necessitate all measurements to be sent to powerful central servers. In this work, at individual sensors, we employ Wald-statistic based detection algorithms which are computationally very fast, and are implemented as one of three Z-tests and four chi-square tests. At fusion center, we apply the K-out-of-N fusion to combine the sensors hard decisions. We characterize the performance of detection methods by deriving analytical expressions for the distributions of underlying test statistics, and by analyzing the fusion performances in terms of K, N, and the false-alarm rates of individual detectors. We experimentally validate our methods using measurements from indoor and outdoor characterization tests of the Intelligence Radiation Sensors Systems (IRSS) program. In particular, utilizing the outdoor measurements, we construct two important real-life scenarios, boundary surveillance and portal monitoring, and present the results of our algorithms.
Chen, Zhongxue; Liu, Qingzhong; Nadarajah, Saralees
2012-04-15
As an epigenetic alteration, DNA methylation plays an important role in epigenetic controls of gene transcription. Recent advances in genome-wide scan of DNA methylation provide great opportunities in studying the impact of DNA methylation on many human diseases including various types of cancer. Due to the unique feature of this type of data, applicable statistical methods are limited and new sophisticated approaches are desirable. In this article, we propose a new statistical test to detect differentially methylated loci for case control methylation data generated by Illumina arrays. This new method utilizes the important finding that DNA methylation is highly correlated with age. The proposed method estimates the overall P-value by combining the P-values from independent individual tests each for one age group. Through real data application and simulation study, we show that the proposed test is robust and usually more powerful than other methods.
Gatt, Philip; Johnson, Steven; Nichols, Terry
2009-06-10
The performance of single and multielement Geiger-mode avalanche photodiode (GM-APD) devices are investigated as a function of the detector's reset or dead time. The theoretical results, developed herein, capture the effects of both quantum fluctuations and speckle noise and are shown to agree with Monte Carlo simulation measurements. First, a theory for the mean response or count rate to an arbitrary input flux is developed. The probability that the GM-APD is armed is shown to be the ratio of this mean response to the input flux. This arm probability, P(A), is then utilized to derive the signal photon detection efficiency (SPDE), which is the fraction of signal photons that are detected. The SPDE is a function of the input flux, the arm probability, and the dead time. When the dead time is zero, GM-APDs behave linearly, P(A) is unity, and the SPDE theory is simplified to the detector's effective quantum efficiency. When the dead time is long compared to the acquisition gate time, the theory converges to previously published "infinite" dead-time theories. The SPDE theory is then applied to develop other key ladar performance metrics, e.g., signal-to-noise ratio and detection statistics. The GM-APD detection statistics are shown to converge to that of a linear photon counting device when the combined signal and noise flux is much less than the reset rate. For higher flux levels, the SPDE degrades, due to a decreased arm probability, and the detection probability degrades relative to that of a linear device.
Greenhalgh, T.
1997-01-01
It is possible to be seriously misled by taking the statistical competence (and/or the intellectual honesty) of authors for granted. Some common errors committed (deliberately or inadvertently) by the authors of papers are given in the final box. PMID:9277611
Maximum linkage space-time permutation scan statistics for disease outbreak detection.
Costa, Marcelo A; Kulldorff, Martin
2014-06-10
In disease surveillance, the prospective space-time permutation scan statistic is commonly used for the early detection of disease outbreaks. The scanning window that defines potential clusters of diseases is cylindrical in shape, which does not allow incorporating into the cluster shape potential factors that can contribute to the spread of the disease, such as information about roads, landscape, among others. Furthermore, the cylinder scanning window assumes that the spatial extent of the cluster does not change in time. Alternatively, a dynamic space-time cluster may indicate the potential spread of the disease through time. For instance, the cluster may decrease over time indicating that the spread of the disease is vanishing. This paper proposes two irregularly shaped space-time permutation scan statistics. The cluster geometry is dynamically created using a graph structure. The graph can be created to include nearest-neighbor structures, geographical adjacency information or any relevant prior information regarding the contagious behavior of the event under surveillance. The new methods are illustrated using influenza cases in three New England states, and compared with the cylindrical version. A simulation study is provided to investigate some properties of the proposed arbitrary cluster detection techniques. We have successfully developed two new space-time permutation scan statistics methods with irregular shapes and improved computational performance. The results demonstrate the potential of these methods to quickly detect disease outbreaks with irregular geometries. Future work aims at performing intensive simulation studies to evaluate the proposed methods using different scenarios, number of cases, and graph structures.
Maximum linkage space-time permutation scan statistics for disease outbreak detection
2014-01-01
Background In disease surveillance, the prospective space-time permutation scan statistic is commonly used for the early detection of disease outbreaks. The scanning window that defines potential clusters of diseases is cylindrical in shape, which does not allow incorporating into the cluster shape potential factors that can contribute to the spread of the disease, such as information about roads, landscape, among others. Furthermore, the cylinder scanning window assumes that the spatial extent of the cluster does not change in time. Alternatively, a dynamic space-time cluster may indicate the potential spread of the disease through time. For instance, the cluster may decrease over time indicating that the spread of the disease is vanishing. Methods This paper proposes two irregularly shaped space-time permutation scan statistics. The cluster geometry is dynamically created using a graph structure. The graph can be created to include nearest-neighbor structures, geographical adjacency information or any relevant prior information regarding the contagious behavior of the event under surveillance. Results The new methods are illustrated using influenza cases in three New England states, and compared with the cylindrical version. A simulation study is provided to investigate some properties of the proposed arbitrary cluster detection techniques. Conclusion We have successfully developed two new space-time permutation scan statistics methods with irregular shapes and improved computational performance. The results demonstrate the potential of these methods to quickly detect disease outbreaks with irregular geometries. Future work aims at performing intensive simulation studies to evaluate the proposed methods using different scenarios, number of cases, and graph structures. PMID:24916839
NASA Astrophysics Data System (ADS)
Ciuonzo, D.; Orlando, D.; Pallotta, L.
2016-12-01
This letter deals with the problem of adaptive signal detection in partially-homogeneous and persymmetric Gaussian disturbance within the framework of invariance theory. First, a suitable group of transformations leaving the problem invariant is introduced and the Maximal Invariant Statistic (MIS) is derived. Then, it is shown that the (Two-step) Generalized-Likelihood Ratio test, Rao and Wald tests can be all expressed in terms of the MIS, thus proving that they all ensure a Constant False-Alarm Rate (CFAR).
Detection of coronal mass ejections using AdaBoost on grayscale statistic features
NASA Astrophysics Data System (ADS)
Zhang, Ling; Yin, Jian-qin; Lin, Jia-ben; Wang, Xiao-fan; Guo, Juan
2016-10-01
We present an automatic algorithm to detect coronal mass ejections (CMEs) in Large Angle Spectrometric Coronagraph (LASCO) C2 running difference images. The algorithm includes 3 steps: (1) split the running difference images into blocks according to slice size and analyze the grayscale statistics of the blocks from a set of images with and without CMEs; (2) select the optimal parameters for slice size, gray threshold and fraction of the bright points and (3) use AdaBoost to combine the weak classifiers designed according to the optimal parameters. Experimental results show that our method is effective and has a high accuracy rate.
Whittington, S L
1991-06-01
Heterogeneity and small sample size are problems that affect many paleodemographic studies. The former can cause the overall distribution of age at death to be an amalgam that does not accurately reflect the distributions of any of the groups composing the heterogeneous population. The latter can make it difficult to separate significant from nonsignificant demographic differences between groups. Survival analysis, a methodology that involves the survival distribution function and various regression models, can be applied to distributions of age at death in order to reveal statistically significant demographic differences and to control for heterogeneity. Survival analysis was used on demographic data from a heterogeneous sample of skeletons of low status Maya who lived in and around Copan, Honduras, between A.D. 400 and 1200. Results contribute to understanding the collapse of Classic Maya civilization.
Shia, Jinru
2016-01-01
The last two decades have seen significant advancement in our understanding of colorectal tumors with DNA mismatch repair (MMR) deficiency. The ever-emerging revelations of new molecular and genetic alterations in various clinical conditions have necessitated constant refinement of disease terminology and classification. Thus, a case with the clinical condition of hereditary non-polyposis colorectal cancer as defined by the Amsterdam criteria may be one of Lynch syndrome characterized by a germline defect in one of the several MMR genes, one of the yet-to-be-defined “Lynch-like syndrome” if there is evidence of MMR deficiency in the tumor but no detectable germline MMR defect or tumor MLH1 promoter methylation, or “familial colorectal cancer type X” if there is no evidence of MMR deficiency. The detection of these conditions carries significant clinical implications. The detection tools and strategies are constantly evolving. The Bethesda guidelines symbolize a selective approach that uses clinical information and tumor histology as the basis to select high-risk individuals. Such a selective approach has subsequently been found to have limited sensitivity, and is thus gradually giving way to the alternative universal approach that tests all newly diagnosed colorectal cancers. Notably, the universal approach also has its own limitations; its cost-effectiveness in real practice, in particular, remains to be determined. Meanwhile, technological advances such as the next-generation sequencing are offering the promise of direct genetic testing for MMR deficiency at an affordable cost probably in the near future. This article reviews the up-to-date molecular definitions of the various conditions related to MMR deficiency, and discusses the tools and strategies that have been used in detecting these conditions. Special emphasis will be placed on the evolving nature and the clinical importance of the disease definitions and the detection strategies. PMID:25716099
Detection of object-based manipulation by the statistical features of object contour.
Richao, Chen; Gaobo, Yang; Ningbo, Zhu
2014-03-01
Object-based manipulations, such as adding or removing objects for digital video, are usually malicious forgery operations. Compared with the conventional double MPEG compression or frame-based tampering, it makes more sense to detect these object-based manipulations because they might directly affect our understanding towards the video content. In this paper, a passive video forensics scheme is proposed for object-based forgery operations. After extracting the adjustable width areas around object boundary, several statistical features such as the moment features of detailed wavelet coefficients and the average gradient of each colour channel are obtained and input into support vector machine (SVM) as feature vectors for the classification of natural objects and forged ones. Experimental results on several videos sequence with static background show that the proposed approach can achieve an accuracy of correct detection from 70% to 95%.
Du, Fei; Li, Yibo; Jin, Shijiu
2015-08-18
An accurate performance analysis on the MDL criterion for source enumeration in array processing is presented in this paper. The enumeration results of MDL can be predicted precisely by the proposed procedure via the statistical analysis of the sample eigenvalues, whose distributive properties are investigated with the consideration of their interactions. A novel approach is also developed for the performance evaluation when the source number is underestimated by a number greater than one, which is denoted as "multiple-missed detection", and the probability of a specific underestimated source number can be estimated by ratio distribution analysis. Simulation results are included to demonstrate the superiority of the presented method over available results and confirm the ability of the proposed approach to perform multiple-missed detection analysis.
Ai, Xiaofei; Fu, Qianqian; Wang, Jun; Zheng, Yingchun; Han, Cong; Li, Qinghua; Sun, Qi; Ru, Kun
2014-06-01
To explore the feasibility of detecting lymphoma with the application of BIOMED-2 standardized immunoglobulin/T cell receptor (IG/TCR) gene rearrangement system in formalin fixed paraffin-embedded (FFPE) tissue samples, and to discuss the relationship between the longest amplification fragment of extracted DNA and positive detection rate of different IGH V-J primers. DNA was extracted from 50 cases of FFPE tissue samples. Multiplex-PCR amplifications were performed and then the IG/TCR gene rearrangements were analyzed using BIOMED-2 standardized clonality analysis system. (1)When the DNA concentration was diluted to 50-100 ng/μl from 100-500 ng/μl, the proportion of the longest amplification fragment (300-400 bp) of DNA was improved from 10.0% to 90.0% in 30 cases of diffuse large B cell lymphoma (DLBCL) wax roll samples (P<0.01). The positive rate of IGH+IGK was increased from 46.7% to 83.3%, the difference was statistically significant (P=0.006). The lengths of the longest amplification fragments of DNA were all longer than 300 bp in the paraffin section samples of DLBCL. The positive rate of IGH+IGK of these samples was 96.7%. The difference of the positive rate of IGH+IGK between the wax roll samples and the paraffin section samples had no statistical significance (P=0.195). (2)When the concentration of DNA was high, most of the longest amplification fragments of extracted DNA were 100 bp or 200 bp, and the detection rate of short fragment IGH FR3 was more stable than that of long fragment IGH FR1. (3)The clonality analysis of TCRG+TCRB in all 13 cases of peripheral T cell lymphoma samples showed positive results, while no positive IG/TCR clones were found in 7 cases of reactive lymphoid tissue hyperplasia in control group. Dilution of DNA is the only method to improve not only the proportion of longest fragment amplification but also the detection rate of clonality. The detection rate of IGH FR3 would not be affected by the concentration of DNA. The
NASA Astrophysics Data System (ADS)
Taboada, Fernando L.
2002-09-01
Low probability of intercept (LPI) is that property of an emitter that because of its low power, wide bandwidth, frequency variability, or other design attributes, makes it difficult to be detected or identified by means of passive intercept devices such as radar warning, electronic support and electronic intelligence receivers. In order to detect LPI radar waveforms new signal processing techniques are required. This thesis first develops a MATLAB toolbox to generate important types of LPI waveforms based on frequency and phase modulation. The power spectral density and the periodic ambiguity function are examined for each waveforms. These signals are then used to test a novel signal processing technique that detects the waveforms parameters and classifies the intercepted signal in various degrees of noise. The technique is based on the use of parallel filter (sub-band) arrays and higher order statistics (third-order cumulant estimator). Each sub-band signal is treated individually and is followed by the third-order estimator in order to suppress any symmetrical noise that might be present. The significance of this technique is that it separates the LPI waveforms in small frequency bands, providing a detailed time-frequency description of the unknown signal. Finally, the resulting output matrix is processed by a feature extraction routine to detect the waveforms parameters. Identification of the signal is based on the modulation parameters detected.
NASA Astrophysics Data System (ADS)
Chung, Moo K.; Kim, Seung-Goo; Schaefer, Stacey M.; van Reekum, Carien M.; Peschke-Schmitz, Lara; Sutterer, Matthew J.; Davidson, Richard J.
2014-03-01
The sparse regression framework has been widely used in medical image processing and analysis. However, it has been rarely used in anatomical studies. We present a sparse shape modeling framework using the Laplace- Beltrami (LB) eigenfunctions of the underlying shape and show its improvement of statistical power. Tradition- ally, the LB-eigenfunctions are used as a basis for intrinsically representing surface shapes as a form of Fourier descriptors. To reduce high frequency noise, only the first few terms are used in the expansion and higher frequency terms are simply thrown away. However, some lower frequency terms may not necessarily contribute significantly in reconstructing the surfaces. Motivated by this idea, we present a LB-based method to filter out only the significant eigenfunctions by imposing a sparse penalty. For dense anatomical data such as deformation fields on a surface mesh, the sparse regression behaves like a smoothing process, which will reduce the error of incorrectly detecting false negatives. Hence the statistical power improves. The sparse shape model is then applied in investigating the influence of age on amygdala and hippocampus shapes in the normal population. The advantage of the LB sparse framework is demonstrated by showing the increased statistical power.
NASA Astrophysics Data System (ADS)
Huang, Yizhen
2005-07-01
Digital image forgery detection is becoming increasing important. In recently 2 years, a new upsurge has been started to study direct detection methods, which utilize the hardware features of digital cameras. Such features may be weakened or lost once tampered, or they may not be consistent if synthesizing several images into a single one. This manuscript first clarifies the concept of trueness of digital images and summarizes these methods with their crack by a general model. The recently proposed EM algorithm plus Fourier transform that checks the Color Filter Array (CFA) interpolation statistical feature (ISF) is taken as a case study. We propose 3 methods to recover the CFA-ISF of a fake image: (1) artificial CFA interpolation (2) a linear CFA-ISF recovery model with optimal uniform measure (3) a quadratic CFA-ISF recovery model with least square measure. A software prototype CFA-ISF Indicator & Adjustor integrating the detection and anti-detection algorithms is developed and shown. Experiments under our product validate the effectiveness of our methods.
Statistical foundations of audit trail analysis for the detection of computer misuse
Helman, P. . Computer Science Dept.); Liepins, G. Univ. of Tennessee, Knoxville, TN . Computer Science Dept.)
1993-09-01
The authors model computer transactions as generated by two stationary stochastic processes, the legitimate (normal) process N and the misuse process M. They define misuse (anomaly) detection to be the identification of transactions most likely to have been generated by M. They formally demonstrate that the accuracy of misuse detectors is bounded by a function of the difference of the densities of the processes N and M over the space of transactions. In practice, detection accuracy can be far below this bound, and generally improves with increasing sample size of historical (training) data. Careful selection of transaction attributes also can improve detection accuracy; they suggest several criteria for attribute selection, including adequate sampling rate and separation between models. They demonstrate that exactly optimizing even the simplest of these criteria is NP-hard, thus motivating a heuristic approach. They further differentiate between modeling (density estimation) and nonmodeling approaches. They introduce a frequentist method as a special case of the former, and Wisdom and Sense, developed at Los Alamos National Laboratory, as a special case of the latter. For nonmodeling approaches such as Wisdom and Sense that generate statistical rules, they show that the rules must be maximally specific to ensure consistency with Bayesian analysis. Finally, they provide suggestions for testing detection systems and present limited test results using Wisdom and Sense and the frequentist approach.
The statistical power to detect cross-scale interactions at macroscales
Wagner, Tyler; Fergus, C. Emi; Stow, Craig A.; Cheruvelil, Kendra S.; Soranno, Patricia A.
2016-01-01
Macroscale studies of ecological phenomena are increasingly common because stressors such as climate and land-use change operate at large spatial and temporal scales. Cross-scale interactions (CSIs), where ecological processes operating at one spatial or temporal scale interact with processes operating at another scale, have been documented in a variety of ecosystems and contribute to complex system dynamics. However, studies investigating CSIs are often dependent on compiling multiple data sets from different sources to create multithematic, multiscaled data sets, which results in structurally complex, and sometimes incomplete data sets. The statistical power to detect CSIs needs to be evaluated because of their importance and the challenge of quantifying CSIs using data sets with complex structures and missing observations. We studied this problem using a spatially hierarchical model that measures CSIs between regional agriculture and its effects on the relationship between lake nutrients and lake productivity. We used an existing large multithematic, multiscaled database, LAke multiscaled GeOSpatial, and temporal database (LAGOS), to parameterize the power analysis simulations. We found that the power to detect CSIs was more strongly related to the number of regions in the study rather than the number of lakes nested within each region. CSI power analyses will not only help ecologists design large-scale studies aimed at detecting CSIs, but will also focus attention on CSI effect sizes and the degree to which they are ecologically relevant and detectable with large data sets.
Detection of significant variation in acoustic output of an electromagnetic lithotriptor.
Pishchalnikov, Yuri A; McAteer, James A; Vonderhaar, R Jason; Pishchalnikova, Irina V; Williams, James C; Evan, Andrew P
2006-11-01
We describe the observation of significant instability in the output of an electromagnetic lithotriptor. This instability had a form that was not detected by routine assessment, but rather was observed only by collecting many consecutive shock waves in nonstop regimen. A Dornier DoLi-50 lithotriptor used exclusively for basic research was tested and approved by the regional technician. This assessment included hydrophone measures at select power levels with the collection of about 25 shock waves per setting. Subsequent laboratory characterization used a fiberoptic hydrophone and storage oscilloscope for data acquisition. Waveforms were collected nonstop for hundreds of pulses. Output was typically stable for greater than 1,000 shock waves but substantial fluctuations in acoustic pressures were also observed. For example, output at power level 3 (mean peak positive acoustic pressure +/- SD normally 44 +/- 2 MPa) increased dramatically to greater than 50 MPa or decreased significantly to approximately 30 MPa for hundreds of shock waves. The cause of instability was eventually traced to a faulty lithotriptor power supply. Instability in lithotriptor acoustic output can occur and it may not be detected by routine assessment. Collecting waveforms in a nonstop regimen dramatically increases sampling size, improving the detection of instability. Had the instability that we observed occurred during patient treatment, the energy delivered may well have exceeded the planned dose. Since the potential for adverse effects in lithotripsy increases as the dose is increased, it would be valuable to develop ways to better monitor the acoustic output of lithotriptors.
[Significance of high sensitive CRP assay for early detection of newborn babies infection diseases].
Otsuki, Takaaki; Okabe, Hidetoshi
2002-01-01
We have evaluated the accuracy of high sensitive CRP assay method using evanescent wave Immunoassy system(Evanet 20) and significance of this assay on the early detection of infectious diseases in newborn babies. In this assay system, prozone phenomenon was not detected up to 40 mg/dl. The reproducibility of this assay was quite good and the intra run CV value of the same sample was less than 5% for the assay of serum, plasma and whole blood. There was a high correlation between the CRP values in the serum and plasma(r = 0.98, regression formula y = 0.89x + 4.07). Similarly, the values in whole blood and serum samples were quite well correlated(r = 0.98, regression formula y = 0.91x - 6.75). Various humoral elements such as bilirubin, hemoglobin and Chyl did not significantly influence this assay method. A slight increase in blood CRP was clearly demonstrated in the early phase of infectious diseases of newborn babies and monitoring of CRP by this assay system seemed to be quite useful to detect the early phase of infectious diseases in newborn babies. This assay system requires only a small quantity of whole blood to perform quantitative analysis of very small amounts of other substances. Accordingly, this assay system seems to be quite effective for monitoring minute increases in various proteinaceous blood components in emergent laboratory examination or POCT.
Significance of MPEG-7 textural features for improved mass detection in mammography.
Eltonsy, Nevine H; Tourassi, Georgia D; Fadeev, Aleksey; Elmaghraby, Adel S
2006-01-01
The purpose of the study is to investigate the significance of MPEG-7 textural features for improving the detection of masses in screening mammograms. The detection scheme was originally based on morphological directional neighborhood features extracted from mammographic regions of interest (ROIs). Receiver Operating Characteristics (ROC) was performed to evaluate the performance of each set of features independently and merged into a back-propagation artificial neural network (BPANN) using the leave-one-out sampling scheme (LOOSS). The study was based on a database of 668 mammographic ROIs (340 depicting cancer regions and 328 depicting normal parenchyma). Overall, the ROC area index of the BPANN using the directional morphological features was Az=0.85+/-0.01. The MPEG-7 edge histogram descriptor-based BPNN showed an ROC area index of Az=0.71+/-0.01 while homogeneous textural descriptors using 30 and 120 channels helped the BPNN achieve similar ROC area indexes of Az=0.882+/-0.02 and Az=0.877+/-0.01 respectively. After merging the MPEG-7 homogeneous textural features with the directional neighborhood features the performance of the BPANN increased providing an ROC area index of Az=0.91+/-0.01. MPEG-7 homogeneous textural descriptor significantly improved the morphology-based detection scheme.
NASA Astrophysics Data System (ADS)
Hoell, Simon; Omenzetter, Piotr
2017-04-01
The increasing demand for carbon neutral energy in a challenging economic environment is a driving factor for erecting ever larger wind turbines in harsh environments using novel wind turbine blade (WTBs) designs characterized by high flexibilities and lower buckling capacities. To counteract resulting increasing of operation and maintenance costs, efficient structural health monitoring systems can be employed to prevent dramatic failures and to schedule maintenance actions according to the true structural state. This paper presents a novel methodology for classifying structural damages using vibrational responses from a single sensor. The method is based on statistical classification using Bayes' theorem and an advanced statistic, which allows controlling the performance by varying the number of samples which represent the current state. This is done for multivariate damage sensitive features defined as partial autocorrelation coefficients (PACCs) estimated from vibrational responses and principal component analysis scores from PACCs. Additionally, optimal DSFs are composed not only for damage classification but also for damage detection based on binary statistical hypothesis testing, where features selections are found with a fast forward procedure. The method is applied to laboratory experiments with a small scale WTB with wind-like excitation and non-destructive damage scenarios. The obtained results demonstrate the advantages of the proposed procedure and are promising for future applications of vibration-based structural health monitoring in WTBs.
Bastolla, Ugo
2014-01-01
The properties of biomolecules depend both on physics and on the evolutionary process that formed them. These two points of view produce a powerful synergism. Physics sets the stage and the constraints that molecular evolution has to obey, and evolutionary theory helps in rationalizing the physical properties of biomolecules, including protein folding thermodynamics. To complete the parallelism, protein thermodynamics is founded on the statistical mechanics in the space of protein structures, and molecular evolution can be viewed as statistical mechanics in the space of protein sequences. In this review, we will integrate both points of view, applying them to detecting selection on the stability of the folded state of proteins. We will start discussing positive design, which strengthens the stability of the folded against the unfolded state of proteins. Positive design justifies why statistical potentials for protein folding can be obtained from the frequencies of structural motifs. Stability against unfolding is easier to achieve for longer proteins. On the contrary, negative design, which consists in destabilizing frequently formed misfolded conformations, is more difficult to achieve for longer proteins. The folding rate can be enhanced by strengthening short-range native interactions, but this requirement contrasts with negative design, and evolution has to trade-off between them. Finally, selection can accelerate functional movements by favoring low frequency normal modes of the dynamics of the native state that strongly correlate with the functional conformation change. PMID:24970217
Using statistical distances to detect changes in the normal behavior of ECG-Holter signals
NASA Astrophysics Data System (ADS)
Bastos de Figueiredo, Julio C.; Furuie, Sergio S.
2001-05-01
One of the main problems in the study of complex systems is to define a good metric that can distinguish between different dynamical behaviors in a nonlinear system. In this work we describe a method to detect different types of behaviors in a long term ECG-Holter using short portions of the Holter signal. This method is based on the calculation of the statistical distance between two distributions in a phase-space of a dynamical system. A short portion of an ECG-Holter signal with normal behavior is used to reconstruct the trajectory of an attractor in low dimensional phase-space. The points in this trajectory are interpreted as statistical distributions in the phase-space and assumed to represent the normal dynamical behavior of the ECG recording in this space. A fast algorithm is then used to compute the statistical distance between this attractor and all other attractors that are built using a sliding temporal window over the signal. For normal cases the distance stayed almost constant and below a threshold. For cases with abnormal transients, on the abnormal portion of ECG, the distance increased consistently with morphological changes.
NASA Astrophysics Data System (ADS)
Løvsletten, Ola; Rypdal, Martin; Rypdal, Kristoffer; Fredriksen, Hege-Beate
2015-04-01
We explore the statistics of instrumental surface temperature records on 5°× 5°, 2°× 2°, and equal-area grids. In particular, we compute the significance of determinstic trends against two parsimonious null models; auto-regressive processes of order 1, AR(1), and fractional Gaussian noises (fGn's). Both of these two null models contain a memory parameter which quantifies the temporal climate variability, with white noise nested in both classes of models. Estimates of the persistence parameters show significant positive serial correlation for most grid cells, with higher persistence over occeans compared to land areas. This shows that, in a trend detection framework, we need to take into account larger spurious trends than what follows from the frequently used white noise assumption. Tested against the fGn null hypothesis, we find that ~ 68% (~ 47%) of the time series have significant trends at the 5% (1%) significance level. If we assume an AR(1) null hypothesis instead, then the result is that ~ 94% (~ 88%) of the time series have significant trends at the 5% (1%) significance level. For both null models, the locations where we do not find significant trends are mostly the ENSO regions and the North-Atlantic. We try to discriminate between the two null models by means of likelihood-ratios. If we at each grid point choose the null model preferred by the model selection test, we find that ~ 82% (~ 73%) of the time series have significant trends at the 5% (1%). We conclude that there is emerging evidence of significant warming trends also at regional scales, although with a much lower signal-to-noise ratio compared to global mean temperatures. Another finding is that many temperature records are consistent with error models for internal variability that exhibit long-range dependence, whereas the temperature fluctuations of the tropical oceans are strongly influenced by the ENSO, and therefore seemingly more consistent with random processes with short
Perles, Stephanie J.; Wagner, Tyler; Irwin, Brian J.; Manning, Douglas R.; Callahan, Kristina K.; Marshall, Matthew R.
2014-01-01
Forests are socioeconomically and ecologically important ecosystems that are exposed to a variety of natural and anthropogenic stressors. As such, monitoring forest condition and detecting temporal changes therein remain critical to sound public and private forestland management. The National Parks Service’s Vital Signs monitoring program collects information on many forest health indicators, including species richness, cover by exotics, browse pressure, and forest regeneration. We applied a mixed-model approach to partition variability in data for 30 forest health indicators collected from several national parks in the eastern United States. We then used the estimated variance components in a simulation model to evaluate trend detection capabilities for each indicator. We investigated the extent to which the following factors affected ability to detect trends: (a) sample design: using simple panel versus connected panel design, (b) effect size: increasing trend magnitude, (c) sample size: varying the number of plots sampled each year, and (d) stratified sampling: post-stratifying plots into vegetation domains. Statistical power varied among indicators; however, indicators that measured the proportion of a total yielded higher power when compared to indicators that measured absolute or average values. In addition, the total variability for an indicator appeared to influence power to detect temporal trends more than how total variance was partitioned among spatial and temporal sources. Based on these analyses and the monitoring objectives of theVital Signs program, the current sampling design is likely overly intensive for detecting a 5 % trend·year−1 for all indicators and is appropriate for detecting a 1 % trend·year−1 in most indicators.
NASA Astrophysics Data System (ADS)
Perles, Stephanie J.; Wagner, Tyler; Irwin, Brian J.; Manning, Douglas R.; Callahan, Kristina K.; Marshall, Matthew R.
2014-09-01
Forests are socioeconomically and ecologically important ecosystems that are exposed to a variety of natural and anthropogenic stressors. As such, monitoring forest condition and detecting temporal changes therein remain critical to sound public and private forestland management. The National Parks Service's Vital Signs monitoring program collects information on many forest health indicators, including species richness, cover by exotics, browse pressure, and forest regeneration. We applied a mixed-model approach to partition variability in data for 30 forest health indicators collected from several national parks in the eastern United States. We then used the estimated variance components in a simulation model to evaluate trend detection capabilities for each indicator. We investigated the extent to which the following factors affected ability to detect trends: (a) sample design: using simple panel versus connected panel design, (b) effect size: increasing trend magnitude, (c) sample size: varying the number of plots sampled each year, and (d) stratified sampling: post-stratifying plots into vegetation domains. Statistical power varied among indicators; however, indicators that measured the proportion of a total yielded higher power when compared to indicators that measured absolute or average values. In addition, the total variability for an indicator appeared to influence power to detect temporal trends more than how total variance was partitioned among spatial and temporal sources. Based on these analyses and the monitoring objectives of the Vital Signs program, the current sampling design is likely overly intensive for detecting a 5 % trend·year-1 for all indicators and is appropriate for detecting a 1 % trend·year-1 in most indicators.
Perles, Stephanie J; Wagner, Tyler; Irwin, Brian J; Manning, Douglas R; Callahan, Kristina K; Marshall, Matthew R
2014-09-01
Forests are socioeconomically and ecologically important ecosystems that are exposed to a variety of natural and anthropogenic stressors. As such, monitoring forest condition and detecting temporal changes therein remain critical to sound public and private forestland management. The National Parks Service's Vital Signs monitoring program collects information on many forest health indicators, including species richness, cover by exotics, browse pressure, and forest regeneration. We applied a mixed-model approach to partition variability in data for 30 forest health indicators collected from several national parks in the eastern United States. We then used the estimated variance components in a simulation model to evaluate trend detection capabilities for each indicator. We investigated the extent to which the following factors affected ability to detect trends: (a) sample design: using simple panel versus connected panel design, (b) effect size: increasing trend magnitude, (c) sample size: varying the number of plots sampled each year, and (d) stratified sampling: post-stratifying plots into vegetation domains. Statistical power varied among indicators; however, indicators that measured the proportion of a total yielded higher power when compared to indicators that measured absolute or average values. In addition, the total variability for an indicator appeared to influence power to detect temporal trends more than how total variance was partitioned among spatial and temporal sources. Based on these analyses and the monitoring objectives of the Vital Signs program, the current sampling design is likely overly intensive for detecting a 5 % trend·year(-1) for all indicators and is appropriate for detecting a 1 % trend·year(-1) in most indicators.
Thompson, J E; van Leeuwen, P J; Moses, D; Shnier, R; Brenner, P; Delprado, W; Pulbrook, M; Böhm, M; Haynes, A M; Hayen, A; Stricker, P D
2016-05-01
We assess the accuracy of multiparametric magnetic resonance imaging for significant prostate cancer detection before diagnostic biopsy in men with an abnormal prostate specific antigen/digital rectal examination. A total of 388 men underwent multiparametric magnetic resonance imaging, including T2-weighted, diffusion weighted and dynamic contrast enhanced imaging before biopsy. Two radiologists used PI-RADS to allocate a score of 1 to 5 for suspicion of significant prostate cancer (Gleason 7 with more than 5% grade 4). PI-RADS 3 to 5 was considered positive. Transperineal template guided mapping biopsy of 18 regions (median 30 cores) was performed with additional manually directed cores from magnetic resonance imaging positive regions. The anatomical location, size and grade of individual cancer areas in the biopsy regions (18) as the primary outcome and in prostatectomy specimens (117) as the secondary outcome were correlated to the magnetic resonance imaging positive regions. Of the 388 men who were enrolled in the study 344 were analyzed. Multiparametric magnetic resonance imaging was positive in 77.0% of patients, 62.5% had prostate cancer and 41.6% had significant prostate cancer. The detection of significant prostate cancer by multiparametric magnetic resonance imaging had a sensitivity of 96%, specificity of 36%, negative predictive value of 92% and positive predictive value of 52%. Adding PI-RADS to the multivariate model, including prostate specific antigen, digital rectal examination, prostate volume and age, improved the AUC from 0.776 to 0.879 (p <0.001). Anatomical concordance analysis showed a low mismatch between the magnetic resonance imaging positive regions and biopsy positive regions (4 [2.9%]), and the significant prostate cancer area in the radical prostatectomy specimen (3 [3.3%]). In men with an abnormal prostate specific antigen/digital rectal examination, multiparametric magnetic resonance imaging detected significant prostate cancer
Rönnegård, Lars; Valdar, William
2012-07-24
A number of recent works have introduced statistical methods for detecting genetic loci that affect phenotypic variability, which we refer to as variability-controlling quantitative trait loci (vQTL). These are genetic variants whose allelic state predicts how much phenotype values will vary about their expected means. Such loci are of great potential interest in both human and non-human genetic studies, one reason being that a detected vQTL could represent a previously undetected interaction with other genes or environmental factors. The simultaneous publication of these new methods in different journals has in many cases precluded opportunity for comparison. We survey some of these methods, the respective trade-offs they imply, and the connections between them. The methods fall into three main groups: classical non-parametric, fully parametric, and semi-parametric two-stage approximations. Choosing between alternatives involves balancing the need for robustness, flexibility, and speed. For each method, we identify important assumptions and limitations, including those of practical importance, such as their scope for including covariates and random effects. We show in simulations that both parametric methods and their semi-parametric approximations can give elevated false positive rates when they ignore mean-variance relationships intrinsic to the data generation process. We conclude that choice of method depends on the trait distribution, the need to include non-genetic covariates, and the population size and structure, coupled with a critical evaluation of how these fit with the assumptions of the statistical model.
Dipnall, Joanna F.
2016-01-01
Background Atheoretical large-scale data mining techniques using machine learning algorithms have promise in the analysis of large epidemiological datasets. This study illustrates the use of a hybrid methodology for variable selection that took account of missing data and complex survey design to identify key biomarkers associated with depression from a large epidemiological study. Methods The study used a three-step methodology amalgamating multiple imputation, a machine learning boosted regression algorithm and logistic regression, to identify key biomarkers associated with depression in the National Health and Nutrition Examination Study (2009–2010). Depression was measured using the Patient Health Questionnaire-9 and 67 biomarkers were analysed. Covariates in this study included gender, age, race, smoking, food security, Poverty Income Ratio, Body Mass Index, physical activity, alcohol use, medical conditions and medications. The final imputed weighted multiple logistic regression model included possible confounders and moderators. Results After the creation of 20 imputation data sets from multiple chained regression sequences, machine learning boosted regression initially identified 21 biomarkers associated with depression. Using traditional logistic regression methods, including controlling for possible confounders and moderators, a final set of three biomarkers were selected. The final three biomarkers from the novel hybrid variable selection methodology were red cell distribution width (OR 1.15; 95% CI 1.01, 1.30), serum glucose (OR 1.01; 95% CI 1.00, 1.01) and total bilirubin (OR 0.12; 95% CI 0.05, 0.28). Significant interactions were found between total bilirubin with Mexican American/Hispanic group (p = 0.016), and current smokers (p<0.001). Conclusion The systematic use of a hybrid methodology for variable selection, fusing data mining techniques using a machine learning algorithm with traditional statistical modelling, accounted for missing data and
Alexander, D N; Ederer, G M; Matsen, J M
1976-01-01
The bioluminescent reaction of adenosine 5'-triphosphate (ATP) with luciferin and luciferase has been used in conjunction with a sensitive photometer (Lab-Line's ATP photometer) to detect significant bacteriuria in urine. This rapid method of screening urine specimens for bacteriuria was evaluated by using 348 urine specimens submitted to the clinical microbiology laboratory at the University of Minnesota Hospitals for routine culture using the calibrated loop-streak plate method. There was 89.4% agreement between the culture method and the ATP assay, with 7.0% false positive and 27.0% false negative results from the ATP assay using 10(5) organisms/ml of urine or greater as positive for significant bacteriuria and less than 10(5) organisms/ml as negative for significant bacteriuria. PMID:767357
Clinical significance of pulmonary nodules detected on abdominal CT in pediatric patients.
Breen, Micheál; Zurakowski, David; Lee, Edward Y
2015-11-01
The clinical significance of a pulmonary nodule that is detected incidentally on CT studies in children is unknown. In addition, there is limited information regarding the management of incidentally detected pulmonary nodules discovered on abdominal CT studies in children. The purpose of this study was to investigate the clinical significance of incidental pulmonary nodules detected on abdominal CT studies in children. This was a retrospective study performed following institutional review board approval. Abdominal CT reports in patients younger than 18 years of age from July 2004 to June 2011 were reviewed for the terms "nodule," "nodular" or "mass" in reference to the lung bases. The study population included those pediatric patients in whom pulmonary nodules were initially detected on abdominal CT studies. The largest pulmonary nodules detected on CT studies were evaluated for their features (size, shape, margin, attenuation, location, and presence of calcification and cavitation). Follow-up CT studies and clinical records were reviewed for demographic information, history of underlying malignancies and the clinical outcome of the incidental pulmonary nodules. Comparison of malignant versus benign pulmonary nodules was performed with respect to the size of the nodule, imaging features on CT, and patient history of malignancy using the Student's t-test and Fisher exact test. Youden J-index in receiver operating characteristic (ROC) analysis was used to determine the optimal cut-off size for suggesting a high risk of malignancy of incidentally detected pulmonary nodules. Pulmonary nodules meeting inclusion criteria were detected in 62 (1.2%) of 5,234 patients. The mean age of patients with nodules was 11.2 years (range: 5 months-18 years). Thirty-one patients (50%) had follow-up CT studies and two of these patients (6%) were subsequently found to have malignant pulmonary nodules. Both of these patients had a history of malignancy. Of the remaining 31 patients
Kim, Jihoon; Grillo, Janice M; Ohno-Machado, Lucila
2011-01-01
Objective To determine whether statistical and machine-learning methods, when applied to electronic health record (EHR) access data, could help identify suspicious (ie, potentially inappropriate) access to EHRs. Methods From EHR access logs and other organizational data collected over a 2-month period, the authors extracted 26 features likely to be useful in detecting suspicious accesses. Selected events were marked as either suspicious or appropriate by privacy officers, and served as the gold standard set for model evaluation. The authors trained logistic regression (LR) and support vector machine (SVM) models on 10-fold cross-validation sets of 1291 labeled events. The authors evaluated the sensitivity of final models on an external set of 58 events that were identified as truly inappropriate and investigated independently from this study using standard operating procedures. Results The area under the receiver operating characteristic curve of the models on the whole data set of 1291 events was 0.91 for LR, and 0.95 for SVM. The sensitivity of the baseline model on this set was 0.8. When the final models were evaluated on the set of 58 investigated events, all of which were determined as truly inappropriate, the sensitivity was 0 for the baseline method, 0.76 for LR, and 0.79 for SVM. Limitations The LR and SVM models may not generalize because of interinstitutional differences in organizational structures, applications, and workflows. Nevertheless, our approach for constructing the models using statistical and machine-learning techniques can be generalized. An important limitation is the relatively small sample used for the training set due to the effort required for its construction. Conclusion The results suggest that statistical and machine-learning methods can play an important role in helping privacy officers detect suspicious accesses to EHRs. PMID:21672912
NASA Astrophysics Data System (ADS)
Wang, Jiahui; Li, Feng; Doi, Kunio; Li, Qiang
2009-11-01
Accurate detection of diffuse lung disease is an important step for computerized diagnosis and quantification of this disease. It is also a difficult clinical task for radiologists. We developed a computerized scheme to assist radiologists in the detection of diffuse lung disease in multi-detector computed tomography (CT). Two radiologists selected 31 normal and 37 abnormal CT scans with ground glass opacity, reticular, honeycombing and nodular disease patterns based on clinical reports. The abnormal cases in our database must contain at least an abnormal area with a severity of moderate or severe level that was subjectively rated by the radiologists. Because statistical texture features may lack the power to distinguish a nodular pattern from a normal pattern, the abnormal cases that contain only a nodular pattern were excluded. The areas that included specific abnormal patterns in the selected CT images were then delineated as reference standards by an expert chest radiologist. The lungs were first segmented in each slice by use of a thresholding technique, and then divided into contiguous volumes of interest (VOIs) with a 64 × 64 × 64 matrix size. For each VOI, we determined and employed statistical texture features, such as run-length and co-occurrence matrix features, to distinguish abnormal from normal lung parenchyma. In particular, we developed new run-length texture features with clear physical meanings to considerably improve the accuracy of our detection scheme. A quadratic classifier was employed for distinguishing between normal and abnormal VOIs by the use of a leave-one-case-out validation scheme. A rule-based criterion was employed to further determine whether a case was normal or abnormal. We investigated the impact of new and conventional texture features, VOI size and the dimensionality for regions of interest on detecting diffuse lung disease. When we employed new texture features for 3D VOIs of 64 × 64 × 64 voxels, our system achieved the
Automatic detection of significant and subtle arterial lesions from coronary CT angiography
NASA Astrophysics Data System (ADS)
Kang, Dongwoo; Slomka, Piotr; Nakazato, Ryo; Cheng, Victor Y.; Min, James K.; Li, Debiao; Berman, Daniel S.; Kuo, C.-C. Jay; Dey, Damini
2012-02-01
Visual analysis of three-dimensional (3D) Coronary Computed Tomography Angiography (CCTA) remains challenging due to large number of image slices and tortuous character of the vessels. We aimed to develop an accurate, automated algorithm for detection of significant and subtle coronary artery lesions compared to expert interpretation. Our knowledge-based automated algorithm consists of centerline extraction which also classifies 3 main coronary arteries and small branches in each main coronary artery, vessel linearization, lumen segmentation with scan-specific lumen attenuation ranges, and lesion location detection. Presence and location of lesions are identified using a multi-pass algorithm which considers expected or "normal" vessel tapering and luminal stenosis from the segmented vessel. Expected luminal diameter is derived from the scan by automated piecewise least squares line fitting over proximal and mid segments (67%) of the coronary artery, considering small branch locations. We applied this algorithm to 21 CCTA patient datasets, acquired with dual-source CT, where 7 datasets had 17 lesions with stenosis greater than or equal to 25%. The reference standard was provided by visual and quantitative identification of lesions with any >=25% stenosis by an experienced expert reader. Our algorithm identified 16 out of the 17 lesions confirmed by the expert. There were 16 additional lesions detected (average 0.13/segment); 6 out of 16 of these were actual lesions with <25% stenosis. On persegment basis, sensitivity was 94%, specificity was 86% and accuracy was 87%. Our algorithm shows promising results in the high sensitivity detection and localization of significant and subtle CCTA arterial lesions.
Look what else we found - clinically significant abnormalities detected during routine ROP screening
Jayadev, Chaitra; Vinekar, Anand; Bauer, Noel; Mangalesh, Shwetha; Mahendradas, Padmamalini; Kemmanu, Vasudha; Mallipatna, Ashwin; Shetty, Bhujang
2015-01-01
Purpose: The purpose of this study was to report the spectrum of anterior and posterior segment diagnoses in Asian Indian premature infants detected serendipitously during routine retinopathy of prematurity (ROP) screening during a 1 year period. Methods: A retrospective review of all Retcam (Clarity MSI, USA) imaging sessions during the year 2011 performed on infants born either <2001 g at birth and/or <34.1 weeks of gestation recruited for ROP screening was performed. All infants had a minimum of seven images at each session, which included the dilated anterior segment, disc, and macula center and the four quadrants using the 130° lens. Results: Of the 8954 imaging sessions of 1450 new infants recruited in 2011, there were 111 (7.66%) with a diagnosis other than ROP. Anterior segment diagnoses seen in 31 (27.9%) cases included clinically significant cataract, lid abnormalities, anophthalmos, microphthalmos, and corneal diseases. Posterior segment diagnoses in 80 (72.1%) cases included retinal hemorrhages, cherry red spots, and neonatal uveitis of infective etiologies. Of the 111 cases, 15 (13.5%) underwent surgical procedures and 24 (21.6%) underwent medical procedures; importantly, two eyes with retinoblastoma were detected which were managed timely. Conclusions: This study emphasizes the importance of ocular digital imaging in premature infants. Visually significant, potentially life-threatening, and even treatable conditions were detected serendipitously during routine ROP screening that may be missed or detected late otherwise. This pilot data may be used to advocate for a possible universal infant eye screening program using digital imaging. PMID:26139795
Significance of parametric spectral ratio methods in detection and recognition of whispered speech
NASA Astrophysics Data System (ADS)
Mathur, Arpit; Reddy, Shankar M.; Hegde, Rajesh M.
2012-12-01
In this article the significance of a new parametric spectral ratio method that can be used to detect whispered speech segments within normally phonated speech is described. Adaptation methods based on the maximum likelihood linear regression (MLLR) are then used to realize a mismatched train-test style speech recognition system. This proposed parametric spectral ratio method computes a ratio spectrum of the linear prediction (LP) and the minimum variance distortion-less response (MVDR) methods. The smoothed ratio spectrum is then used to detect whispered segments of speech within neutral speech segments effectively. The proposed LP-MVDR ratio method exhibits robustness at different SNRs as indicated by the whisper diarization experiments conducted on the CHAINS and the cell phone whispered speech corpus. The proposed method also performs reasonably better than the conventional methods for whisper detection. In order to integrate the proposed whisper detection method into a conventional speech recognition engine with minimal changes, adaptation methods based on the MLLR are used herein. The hidden Markov models corresponding to neutral mode speech are adapted to the whispered mode speech data in the whispered regions as detected by the proposed ratio method. The performance of this method is first evaluated on whispered speech data from the CHAINS corpus. The second set of experiments are conducted on the cell phone corpus of whispered speech. This corpus is collected using a set up that is used commercially for handling public transactions. The proposed whisper speech recognition system exhibits reasonably better performance when compared to several conventional methods. The results shown indicate the possibility of a whispered speech recognition system for cell phone based transactions.
NASA Astrophysics Data System (ADS)
Greenberg, Ariela Caren
Differential item functioning (DIF) and differential distractor functioning (DDF) are methods used to screen for item bias (Camilli & Shepard, 1994; Penfield, 2008). Using an applied empirical example, this mixed-methods study examined the congruency and relationship of DIF and DDF methods in screening multiple-choice items. Data for Study I were drawn from item responses of 271 female and 236 male low-income children on a preschool science assessment. Item analyses employed a common statistical approach of the Mantel-Haenszel log-odds ratio (MH-LOR) to detect DIF in dichotomously scored items (Holland & Thayer, 1988), and extended the approach to identify DDF (Penfield, 2008). Findings demonstrated that the using MH-LOR to detect DIF and DDF supported the theoretical relationship that the magnitude and form of DIF and are dependent on the DDF effects, and demonstrated the advantages of studying DIF and DDF in multiple-choice items. A total of 4 items with DIF and DDF and 5 items with only DDF were detected. Study II incorporated an item content review, an important but often overlooked and under-published step of DIF and DDF studies (Camilli & Shepard). Interviews with 25 female and 22 male low-income preschool children and an expert review helped to interpret the DIF and DDF results and their comparison, and determined that a content review process of studied items can reveal reasons for potential item bias that are often congruent with the statistical results. Patterns emerged and are discussed in detail. The quantitative and qualitative analyses were conducted in an applied framework of examining the validity of the preschool science assessment scores for evaluating science programs serving low-income children, however, the techniques can be generalized for use with measures across various disciplines of research.
Jha, Dilip Kumar; Vinithkumar, N V; Sahu, Biraja Kumar; Das, Apurba Kumar; Dheenan, P S; Venkateshwaran, P; Begum, Mehmuna; Ganesh, T; Prashanthi Devi, M; Kirubagaran, R
2014-08-15
Aerial Bay is one of the harbor towns of Andaman and Nicobar Islands, the union territory of India. Nevertheless, it is least studied marine environment, particularly for physico-chemical assessment. Therefore, to evaluate the annual spatiotemporal variations of physico-chemical parameters, seawater samples collected from 20 sampling stations covering three seasons were analyzed. Multivariate statistics is applied to the investigated data in an attempt to understand the causes of variation in physico-chemical parameters. Cluster analysis distinguished mangrove and open sea stations from other areas by considering distinctive physico-chemical characteristics. Factor analysis revealed 79.5% of total variance in physico-chemical parameters. Strong loading included transparency, TSS, DO, BOD, salinity, nitrate, nitrite, inorganic phosphate, total phosphorus and silicate. In addition, box-whisker plots and Geographical Information System based land use data further facilitated and supported multivariate results. Copyright © 2014 Elsevier Ltd. All rights reserved.
A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features
Amudha, P.; Karthik, S.; Sivakumari, S.
2015-01-01
Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup'99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different. PMID:26221625
A Hybrid Swarm Intelligence Algorithm for Intrusion Detection Using Significant Features.
Amudha, P; Karthik, S; Sivakumari, S
2015-01-01
Intrusion detection has become a main part of network security due to the huge number of attacks which affects the computers. This is due to the extensive growth of internet connectivity and accessibility to information systems worldwide. To deal with this problem, in this paper a hybrid algorithm is proposed to integrate Modified Artificial Bee Colony (MABC) with Enhanced Particle Swarm Optimization (EPSO) to predict the intrusion detection problem. The algorithms are combined together to find out better optimization results and the classification accuracies are obtained by 10-fold cross-validation method. The purpose of this paper is to select the most relevant features that can represent the pattern of the network traffic and test its effect on the success of the proposed hybrid classification algorithm. To investigate the performance of the proposed method, intrusion detection KDDCup'99 benchmark dataset from the UCI Machine Learning repository is used. The performance of the proposed method is compared with the other machine learning algorithms and found to be significantly different.
Towards spatial localisation of harmful algal blooms; statistics-based spatial anomaly detection
NASA Astrophysics Data System (ADS)
Shutler, J. D.; Grant, M. G.; Miller, P. I.
2005-10-01
Harmful algal blooms are believed to be increasing in occurrence and their toxins can be concentrated by filter-feeding shellfish and cause amnesia or paralysis when ingested. As a result fisheries and beaches in the vicinity of blooms may need to be closed and the local population informed. For this avoidance planning timely information on the existence of a bloom, its species and an accurate map of its extent would be prudent. Current research to detect these blooms from space has mainly concentrated on spectral approaches towards determining species. We present a novel statistics-based background-subtraction technique that produces improved descriptions of an anomaly's extent from remotely-sensed ocean colour data. This is achieved by extracting bulk information from a background model; this is complemented by a computer vision ramp filtering technique to specifically detect the perimeter of the anomaly. The complete extraction technique uses temporal-variance estimates which control the subtraction of the scene of interest from the time-weighted background estimate, producing confidence maps of anomaly extent. Through the variance estimates the method learns the associated noise present in the data sequence, providing robustness, and allowing generic application. Further, the use of the median for the background model reduces the effects of anomalies that appear within the time sequence used to generate it, allowing seasonal variations in the background levels to be closely followed. To illustrate the detection algorithm's application, it has been applied to two spectrally different oceanic regions.
A Parallel Finite Set Statistical Simulator for Multi-Target Detection and Tracking
NASA Astrophysics Data System (ADS)
Hussein, I.; MacMillan, R.
2014-09-01
Finite Set Statistics (FISST) is a powerful Bayesian inference tool for the joint detection, classification and tracking of multi-target environments. FISST is capable of handling phenomena such as clutter, misdetections, and target birth and decay. Implicit within the approach are solutions to the data association and target label-tracking problems. Finally, FISST provides generalized information measures that can be used for sensor allocation across different types of tasks such as: searching for new targets, and classification and tracking of known targets. These FISST capabilities have been demonstrated on several small-scale illustrative examples. However, for implementation in a large-scale system as in the Space Situational Awareness problem, these capabilities require a lot of computational power. In this paper, we implement FISST in a parallel environment for the joint detection and tracking of multi-target systems. In this implementation, false alarms and misdetections will be modeled. Target birth and decay will not be modeled in the present paper. We will demonstrate the success of the method for as many targets as we possibly can in a desktop parallel environment. Performance measures will include: number of targets in the simulation, certainty of detected target tracks, computational time as a function of clutter returns and number of targets, among other factors.
Nielsen, Mette J.; Kazankov, Konstantin; Leeming, Diana J.; Karsdal, Morten A.; Krag, Aleksander; Barrera, Francisco; McLeod, Duncan; George, Jacob; Grønbæk, Henning
2015-01-01
Background and Aim Detection of advanced fibrosis (Metavir F≥3) is important to identify patients with a high urgency of antiviral treatments vs. those whose treatment could be deferred (F≤2). The aim was to assess the diagnostic value of novel serological extracellular matrix protein fragments as potential biomarkers for clinically significant and advanced fibrosis. Methods Specific protein fragments of matrix metalloprotease degraded type I, III, IV and VI collagen (C1M, C3M, C4M, C6M) and type III and IV collagen formation (Pro-C3 and P4NP7S) were assessed in plasma from 403 chronic hepatitis C patients by specific ELISAs. Patients were stratified according to Metavir Fibrosis stage; F0 (n = 46), F1 (n = 161), F2 (n = 95), F3 (n = 44) and F4 (n = 33) based on liver biopsy. Results Pro-C3 was significantly elevated in patients with significant fibrosis (≥F2) compared to F0-F1 (p<0.05), while the markers C3M, C4M, C6M and P4NP7S were significantly elevated in patients with advanced fibrosis (≥F3) compared to F0-F2 (p<0.05). C1M showed no difference between fibrosis stages. Using Receiver Operating Characteristics analysis, the best marker for detecting ≥F2 and ≥F3 was Pro-C3 with AUC = 0.75 and AUC = 0.86. Combination of Pro-C3 and C4M with age, BMI and gender in a multiple ordered logistic regression model improved the diagnostic value for detecting ≥F2 and ≥F3 with AUC = 0.80 and AUC = 0.88. Conclusion The Pro-C3 protein fragment provided clinically relevant diagnostic accuracy as a single marker of liver fibrosis. A model combining Pro-C3 and C4M along with patient’s age, body mass index and gender increased the diagnostic power for identifying clinically significant fibrosis. PMID:26406331
Lim, Hyun Kyung; Park, Sung Tae
2016-01-01
Background and Purpose Incidental thyroid lesions are frequently found on contrast-enhanced magnetic resonance (CE-MR) angiography. The purpose of this study is to determine the prevalence of thyroid incidentalomas detected by CE-MR angiography and to evaluate their clinical significance by correlation with ultrasound (US) and cytopathological results. Materials and Methods We retrospectively reviewed 3,299 consecutive CE-MR angiography examinations performed at our institution between January 2010 and March 2013. Two radiologists evaluated the CE-MR angiography imaging in consensus regarding the presence, location, and vascularity of thyroid incidentaloma. We correlated these findings with follow-up US and cytopathologic results. Results The prevalence of thyroid incidentalomas detected by CE-MR angiography was 4.6% (152/3,299 patients). CE-MR angiography showed hypervascularity in 86.8% (145/167), isovascularity in 8.4% (14/167), and hypovascularity in 4.8% (8/167) of thyroid nodules compared to vascularity of thyroid parenchyma. Among the patients with thyroid incidentaloma, 34 patients (22.4%) were followed by US examination, and all 36 nodules on CE-MR angiography were detected on follow-up US. Of these nodules, 9 (25%) nodules were classified as probably benign, 26 (72.2%) as indeterminate, and 1 (2.8%) as suspicious malignant nodule. Among the 16 nodules with available cytopathologic results, 12 nodules were benign, 2 nodules were follicular neoplasm, and 2 nodules showed non-diagnostic results. Conclusion Incidental thyroid nodules were found in 4.6% of CE-MR angiography examinations. Because the high incidence of indeterminate US feature among thyroid incidentaloma, when a thyroid incidentaloma is detected on CE-MR angiography, further evaluation with US should be performed. PMID:26919607
Lim, Hyun Kyung; Park, Sung Tae; Ha, Hongil; Choi, Seo-youn
2016-01-01
Incidental thyroid lesions are frequently found on contrast-enhanced magnetic resonance (CE-MR) angiography. The purpose of this study is to determine the prevalence of thyroid incidentalomas detected by CE-MR angiography and to evaluate their clinical significance by correlation with ultrasound (US) and cytopathological results. We retrospectively reviewed 3,299 consecutive CE-MR angiography examinations performed at our institution between January 2010 and March 2013. Two radiologists evaluated the CE-MR angiography imaging in consensus regarding the presence, location, and vascularity of thyroid incidentaloma. We correlated these findings with follow-up US and cytopathologic results. The prevalence of thyroid incidentalomas detected by CE-MR angiography was 4.6% (152/3,299 patients). CE-MR angiography showed hypervascularity in 86.8% (145/167), isovascularity in 8.4% (14/167), and hypovascularity in 4.8% (8/167) of thyroid nodules compared to vascularity of thyroid parenchyma. Among the patients with thyroid incidentaloma, 34 patients (22.4%) were followed by US examination, and all 36 nodules on CE-MR angiography were detected on follow-up US. Of these nodules, 9 (25%) nodules were classified as probably benign, 26 (72.2%) as indeterminate, and 1 (2.8%) as suspicious malignant nodule. Among the 16 nodules with available cytopathologic results, 12 nodules were benign, 2 nodules were follicular neoplasm, and 2 nodules showed non-diagnostic results. Incidental thyroid nodules were found in 4.6% of CE-MR angiography examinations. Because the high incidence of indeterminate US feature among thyroid incidentaloma, when a thyroid incidentaloma is detected on CE-MR angiography, further evaluation with US should be performed.
Early snowmelt events: detection, distribution, and significance in a major sub-arctic watershed
NASA Astrophysics Data System (ADS)
Alese Semmens, Kathryn; Ramage, Joan; Bartsch, Annett; Liston, Glen E.
2013-03-01
High latitude drainage basins are experiencing higher average temperatures, earlier snowmelt onset in spring, and an increase in rain on snow (ROS) events in winter, trends that climate models project into the future. Snowmelt-dominated basins are most sensitive to winter temperature increases that influence the frequency of ROS events and the timing and duration of snowmelt, resulting in changes to spring runoff. Of specific interest in this study are early melt events that occur in late winter preceding melt onset in the spring. The study focuses on satellite determination and characterization of these early melt events using the Yukon River Basin (Canada/USA) as a test domain. The timing of these events was estimated using data from passive (Advanced Microwave Scanning Radiometer—EOS (AMSR-E)) and active (SeaWinds on Quick Scatterometer (QuikSCAT)) microwave remote sensors, employing detection algorithms for brightness temperature (AMSR-E) and radar backscatter (QuikSCAT). The satellite detected events were validated with ground station meteorological and hydrological data, and the spatial and temporal variability of the events across the entire river basin was characterized. Possible causative factors for the detected events, including ROS, fog, and positive air temperatures, were determined by comparing the timing of the events to parameters from SnowModel and National Centers for Environmental Prediction North American Regional Reanalysis (NARR) outputs, and weather station data. All melt events coincided with above freezing temperatures, while a limited number corresponded to ROS (determined from SnowModel and ground data) and a majority to fog occurrence (determined from NARR). The results underscore the significant influence that warm air intrusions have on melt in some areas and demonstrate the large temporal and spatial variability over years and regions. The study provides a method for melt detection and a baseline from which to assess future change.
Wagner, Tyler; Irwin, Brian J.; James R. Bence,; Daniel B. Hayes,
2016-01-01
Monitoring to detect temporal trends in biological and habitat indices is a critical component of fisheries management. Thus, it is important that management objectives are linked to monitoring objectives. This linkage requires a definition of what constitutes a management-relevant “temporal trend.” It is also important to develop expectations for the amount of time required to detect a trend (i.e., statistical power) and for choosing an appropriate statistical model for analysis. We provide an overview of temporal trends commonly encountered in fisheries management, review published studies that evaluated statistical power of long-term trend detection, and illustrate dynamic linear models in a Bayesian context, as an additional analytical approach focused on shorter term change. We show that monitoring programs generally have low statistical power for detecting linear temporal trends and argue that often management should be focused on different definitions of trends, some of which can be better addressed by alternative analytical approaches.
Falagas, Matthew E; Kouranos, Vasilios D; Michalopoulos, Argyris; Rodopoulou, Sophia P; Athanasoulia, Anastasia P; Karageorgopoulos, Drosos E
2010-02-15
Comparative cohort studies are often conducted to identify novel therapeutic strategies or prognostic factors for ventilator-associated pneumonia (VAP). We aimed to evaluate the power of such studies to provide clinically and statistically significant conclusions with regard to mortality differences. We searched in PubMed and Scopus for comparative cohort studies that evaluated mortality in patients with VAP. We calculated the central estimates and corresponding 95% confidence intervals (CIs) for mortality differences between compared patient groups. We also calculated the statistical power of the included studies to detect a difference in mortality that corresponds to a risk ratio of 0.80. We identified 39 (20 prospective) comparative cohort studies on VAP as eligible for inclusion in this analysis. The median absolute risk difference in mortality between compared groups was 10% (interquartile range [IQR], 5%-18%), and the median width of the 95% CI of the absolute risk difference in mortality was 34% (IQR, 28%-42.5%). The median power of the included studies to detect a risk ratio for mortality of 0.80 was 14.7% (IQR, 10.6%-21.8%). There is considerable uncertainty around the central estimate of comparative cohort studies on VAP with regard to mortality differences. For a wiser use of resources allocated to research, we emphasize the need to conduct cohort studies with larger sample size so that potential differences between the compared groups are more likely to be shown.
The functional significance of calcification of coronary arteries as detected on CT.
Timins, M E; Pinsk, R; Sider, L; Bear, G
1991-12-01
We evaluated the coronary arteries on computed tomography (CT) scans of the chest and on coronary angiograms of 27 patients who underwent both studies. We related the presence or absence of coronary artery calcification on CT to percentage stenosis on angiogram. For the left anterior descending artery (LAD), the likelihood of calcification rose proportionately with degree of stenosis; this was less true for the circumflex, and not true for the right coronary artery (RCA). The sensitivity of CT in detecting coronary artery calcification in patients with angiographic criteria of significant coronary artery disease (CAD) was 78% for the LAD, 63% for the circumflex, and 16% for the RCA. Specificities were 78%, 80%, and 100%, and positive predictive values were 88%, 83%, and 100%. The high positive predictive values suggest that coronary artery calcification diagnosed by chest CT has a high correlation with clinically significant CAD. Therefore, when we detect such calcification in a patient without documented heart disease, we suggest that a cardiac workup is indicated.
Hardingham, Jennifer E; Grover, Phulwinder; Winter, Marnie; Hewett, Peter J; Price, Timothy J; Thierry, Benjamin
2015-01-01
Circulating tumor cells (CTC) may be defined as tumor- or metastasis-derived cells that are present in the bloodstream. The CTC pool in colorectal cancer (CRC) patients may include not only epithelial tumor cells, but also tumor cells undergoing epithelial–mesenchymal transition (EMT) and tumor stem cells. A significant number of patients diagnosed with early stage CRC subsequently relapse with recurrent or metastatic disease despite undergoing “curative” resection of their primary tumor. This suggests that an occult metastatic disease process was already underway, with viable tumor cells being shed from the primary tumor site, at least some of which have proliferative and metastatic potential and the ability to survive in the bloodstream. Such tumor cells are considered to be responsible for disease relapse in these patients. Their detection in peripheral blood at the time of diagnosis or after resection of the primary tumor may identify those early-stage patients who are at risk of developing recurrent or metastatic disease and who would benefit from adjuvant therapy. CTC may also be a useful adjunct to radiological assessment of tumor response to therapy. Over the last 20 years many approaches have been developed for the isolation and characterization of CTC. However, none of these methods can be considered the gold standard for detection of the entire pool of CTC. Recently our group has developed novel unbiased inertial microfluidics to enrich for CTC, followed by identification of CTC by imaging flow cytometry. Here, we provide a review of progress on CTC detection and clinical significance over the last 20 years. PMID:26605644
DETECTION OF SIGNIFICANT VARIATION IN ACOUSTIC OUTPUT OF AN ELECTROMAGNETIC LITHOTRIPTER
Pishchalnikov, Yuri A.; McAteer, James A.; VonDerHaar, R. Jason; Pishchalnikova, Irina V.; Williams, James C.; Evan, Andrew P.
2008-01-01
Purpose Here we describe observation of significant instability in the output of an electromagnetic lithotripter, instability of a form that was not detected by routine methods of assessment, but was observed only by collecting many consecutive shock waves in non-stop regime. Materials and Methods A Dornier DoLi-50 lithotripter used exclusively for basic research was tested and approved by the regional technician. This assessment included hydrophone measures at selected power levels, with collection of about 25 shock waves per setting. Subsequent laboratory characterization employed a fiber optic hydrophone (FOPH-500) and a storage oscilloscope for data acquisition. Waveforms were collected non-stop for hundreds of pulses. Results Output was typically stable for >1000 shock waves, but substantial fluctuations in acoustic pressures were also observed. For example, output at power level 3 (P+ normally 44 ±2 MPa) increased dramatically (P+ >50 MPa) or dropped significantly (P+ ~30 MPa) for hundreds of shock waves. The cause of the instability was eventually traced to a faulty power supply of the lithotripter. Conclusions Instability in acoustic output of a lithotripter can occur and not be detected by routine methods of assessment. Collection of waveforms in non-stop regime dramatically increases the sampling size, improving the detection of instability. Had the instability we observed occurred during patient treatment, the energy delivered may well have exceeded the planned dose. Since the potential for adverse effects in lithotripsy increases as dose is increased, it would be valuable to develop ways to better monitor the acoustic output of lithotripters. PMID:17070315
Yokoyama, Shozo; Takenaka, Naomi
2005-04-01
Red-green color vision is strongly suspected to enhance the survival of its possessors. Despite being red-green color blind, however, many species have successfully competed in nature, which brings into question the evolutionary advantage of achieving red-green color vision. Here, we propose a new method of identifying positive selection at individual amino acid sites with the premise that if positive Darwinian selection has driven the evolution of the protein under consideration, then it should be found mostly at the branches in the phylogenetic tree where its function had changed. The statistical and molecular methods have been applied to 29 visual pigments with the wavelengths of maximal absorption at approximately 510-540 nm (green- or middle wavelength-sensitive [MWS] pigments) and at approximately 560 nm (red- or long wavelength-sensitive [LWS] pigments), which are sampled from a diverse range of vertebrate species. The results show that the MWS pigments are positively selected through amino acid replacements S180A, Y277F, and T285A and that the LWS pigments have been subjected to strong evolutionary conservation. The fact that these positively selected M/LWS pigments are found not only in animals with red-green color vision but also in those with red-green color blindness strongly suggests that both red-green color vision and color blindness have undergone adaptive evolution independently in different species.
Du, Fei; Li, Yibo; Jin, Shijiu
2015-01-01
An accurate performance analysis on the MDL criterion for source enumeration in array processing is presented in this paper. The enumeration results of MDL can be predicted precisely by the proposed procedure via the statistical analysis of the sample eigenvalues, whose distributive properties are investigated with the consideration of their interactions. A novel approach is also developed for the performance evaluation when the source number is underestimated by a number greater than one, which is denoted as “multiple-missed detection”, and the probability of a specific underestimated source number can be estimated by ratio distribution analysis. Simulation results are included to demonstrate the superiority of the presented method over available results and confirm the ability of the proposed approach to perform multiple-missed detection analysis. PMID:26295232
SU-E-T-207: Flatness and Symmetry Threshold Detection Using Statistical Process Control.
Able, C; Hampton, C; Baydush, A
2012-06-01
AAPM TG-142 guidelines state that beam uniformity (flatness and symmetry) should maintain a constancy of 1 % relative to baseline. The focus of this study is to determine if statistical process control (SPC) methodology using process control charts (PCC) of steering coil currents (SCC) can detect changes in beam uniformity prior to exceeding the 1% constancy criteria. SCCs for the transverse and radial planes are adjusted such that a reproducibly useful photon or electron beam is available. Transverse and radial - positioning and angle SCC are routinely documented in the Morning Check file during daily warm-up. The 6 MV beam values for our linac were analyzed using average and range (Xbar/R) PCC. Using this data as a baseline, an experiment was performed in which each SCC was changed from its mean value (steps of 0.01 or 0.02 Ampere) while holding the other SCC constant. The effect on beam uniformity was measured using a beam scanning system. These experimental SCC values were plotted in the PCC to determine if they would exceed the predetermined limits. The change in SCC required to exceed the 1% constancy criteria was detected by the PCC for 3 out of the 4 steering coils. The reliability of the result in the one coil not detected (transverse position coil) is questionable because the SCC slowly drifted during the experiment (0.05 A) regardless of the servo control setting. X-bar/R charts of SCC can detect exceptional variation prior to exceeding the beam uniformity criteria set forth in AAPM TG-142. The high level of PCC sensitivity to change may result in an alarm when in fact minimal change in beam uniformity has occurred. Further study is needed to determine if a combination of individual SCC alarms would reduce the false positive rate for beam uniformity intervention. This project was supoorted by a grant from Varian Medical Systems, Inc. © 2012 American Association of Physicists in Medicine.
Structural damage detection using extended Kalman filter combined with statistical process control
NASA Astrophysics Data System (ADS)
Jin, Chenhao; Jang, Shinae; Sun, Xiaorong
2015-04-01
Traditional modal-based methods, which identify damage based upon changes in vibration characteristics of the structure on a global basis, have received considerable attention in the past decades. However, the effectiveness of the modalbased methods is dependent on the type of damage and the accuracy of the structural model, and these methods may also have difficulties when applied to complex structures. The extended Kalman filter (EKF) algorithm which has the capability to estimate parameters and catch abrupt changes, is currently used in continuous and automatic structural damage detection to overcome disadvantages of traditional methods. Structural parameters are typically slow-changing variables under effects of operational and environmental conditions, thus it would be difficult to observe the structural damage and quantify the damage in real-time with EKF only. In this paper, a Statistical Process Control (SPC) is combined with EFK method in order to overcome this difficulty. Based on historical measurements of damage-sensitive feathers involved in the state-space dynamic models, extended Kalman filter (EKF) algorithm is used to produce real-time estimations of these features as well as standard derivations, which can then be used to form control ranges for SPC to detect any abnormality of the selected features. Moreover, confidence levels of the detection can be adjusted by choosing different times of sigma and number of adjacent out-of-range points. The proposed method is tested using simulated data of a three floors linear building in different damage scenarios, and numerical results demonstrate high damage detection accuracy and light computation of this presented method.
Wang, Shijun; Yao, Jianhua; Petrick, Nicholas; Summers, Ronald M.
2010-01-01
Colon cancer is the second leading cause of cancer-related deaths in the United States. Computed tomographic colonography (CTC) combined with a computer aided detection system provides a feasible approach for improving colonic polyps detection and increasing the use of CTC for colon cancer screening. To distinguish true polyps from false positives, various features extracted from polyp candidates have been proposed. Most of these traditional features try to capture the shape information of polyp candidates or neighborhood knowledge about the surrounding structures (fold, colon wall, etc.). In this paper, we propose a new set of shape descriptors for polyp candidates based on statistical curvature information. These features called histograms of curvature features are rotation, translation and scale invariant and can be treated as complementing existing feature set. Then in order to make full use of the traditional geometric features (defined as group A) and the new statistical features (group B) which are highly heterogeneous, we employed a multiple kernel learning method based on semi-definite programming to learn an optimized classification kernel from the two groups of features. We conducted leave-one-patient-out test on a CTC dataset which contained scans from 66 patients. Experimental results show that a support vector machine (SVM) based on the combined feature set and the semi-definite optimization kernel achieved higher FROC performance compared to SVMs using the two groups of features separately. At a false positive per scan rate of 5, the sensitivity of the SVM using the combined features improved from 0.77 (Group A) and 0.73 (Group B) to 0.83 (p ≤ 0.01). PMID:20953299
NASA Astrophysics Data System (ADS)
Hurtado, Miguel A.
In this work, we consider the application of classical statistical inference to the fusion of data from different sensing technologies for object detection applications in order to increase the overall performance for a given active safety automotive system. Research evolved mainly around a centralized sensor fusion architecture assuming that three non-identical sensors, modeled by corresponding probability density functions (pdfs), provide discrete information of target being present or absent with associated probabilities of detection and false alarm for the sensor fusion engine. The underlying sensing technologies are the following standard automotive sensors: 24.5 GHz radar, high dynamic range infrared camera and a laser-radar. A complete mathematical framework was developed to select the optimal decision rule based on a generalized multinomial distribution resulting from a sum of weighted Bernoulli random variables from the Neyman-Pearson lemma and the likelihood ratio test. Moreover, to better understand the model and to obtain upper bounds on the performance of the fusion rules, we assumed exponential pdfs for each sensor and a parallel mathematical expression was obtained based on a generalized gamma distribution resulting from a sum of weighted exponential random variables for the situation when the continuous random vector of information is available. Mathematical expressions and results were obtained for modeling the following case scenarios: (i) non-identical sensors, (ii) identical sensors, (iii) combination of nonidentical and identical sensors, (iv) faulty sensor operation, (v) dominant sensor operation, (vi) negative sensor operation, and (vii) distributed sensor fusion. The second and final part of this research focused on: (a) simulation of statistical models for each sensing technology, (b) comparisons with distributed fusion, (c) overview of dynamic sensor fusion and adaptive decision rules.
Frome, EL
2005-09-20
Environmental exposure measurements are, in general, positive and may be subject to left censoring; i.e,. the measured value is less than a ''detection limit''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. Parametric methods used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level, an upper percentile, and the exceedance fraction are used to characterize exposure levels, and confidence limits are used to describe the uncertainty in these estimates. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on an upper percentile (i.e., the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly enhanced the availability of high-quality nonproprietary (open source) software that serves as the basis for implementing the methods in this paper.
NASA Astrophysics Data System (ADS)
Hébert-Dufresne, Laurent; Grochow, Joshua A.; Allard, Antoine
2016-08-01
We introduce a network statistic that measures structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and interpretable at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. Yet, the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks.
The score statistic of the LD-lod analysis: detecting linkage adaptive to linkage disequilibrium.
Huang, J; Jiang, Y
2001-01-01
We study the properties of a modified lod score method for testing linkage that incorporates linkage disequilibrium (LD-lod). By examination of its score statistic, we show that the LD-lod score method adaptively combines two sources of information: (a) the IBD sharing score which is informative for linkage regardless of the existence of LD and (b) the contrast between allele-specific IBD sharing scores which is informative for linkage only in the presence of LD. We also consider the connection between the LD-lod score method and the transmission-disequilibrium test (TDT) for triad data and the mean test for affected sib pair (ASP) data. We show that, for triad data, the recessive LD-lod test is asymptotically equivalent to the TDT; and for ASP data, it is an adaptive combination of the TDT and the ASP mean test. We demonstrate that the LD-lod score method has relatively good statistical efficiency in comparison with the ASP mean test and the TDT for a broad range of LD and the genetic models considered in this report. Therefore, the LD-lod score method is an interesting approach for detecting linkage when the extent of LD is unknown, such as in a genome-wide screen with a dense set of genetic markers. Copyright 2001 S. Karger AG, Basel
Hébert-Dufresne, Laurent; Grochow, Joshua A.; Allard, Antoine
2016-01-01
We introduce a network statistic that measures structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and interpretable at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. Yet, the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks. PMID:27535466
Hébert-Dufresne, Laurent; Grochow, Joshua A; Allard, Antoine
2016-08-18
We introduce a network statistic that measures structural properties at the micro-, meso-, and macroscopic scales, while still being easy to compute and interpretable at a glance. Our statistic, the onion spectrum, is based on the onion decomposition, which refines the k-core decomposition, a standard network fingerprinting method. The onion spectrum is exactly as easy to compute as the k-cores: It is based on the stages at which each vertex gets removed from a graph in the standard algorithm for computing the k-cores. Yet, the onion spectrum reveals much more information about a network, and at multiple scales; for example, it can be used to quantify node heterogeneity, degree correlations, centrality, and tree- or lattice-likeness. Furthermore, unlike the k-core decomposition, the combined degree-onion spectrum immediately gives a clear local picture of the network around each node which allows the detection of interesting subgraphs whose topological structure differs from the global network organization. This local description can also be leveraged to easily generate samples from the ensemble of networks with a given joint degree-onion distribution. We demonstrate the utility of the onion spectrum for understanding both static and dynamic properties on several standard graph models and on many real-world networks.
Application of statistical pattern classification methods for damage detection to field data
NASA Astrophysics Data System (ADS)
Cabrera, Carlos; Cheung, Allen; Sarabandi, Pooya; Nair, K. Krishnan; Kiremidjian, Anne
2007-04-01
The field of Structural Health Monitoring (SHM) has received considerable attention for its potential applications to monitoring civil infrastructure. However, the damage detection algorithms that form the backbone of these systems have primarily been tested on simulated data instead of full-scale structures because of the scarcity of real structural acceleration data. In response to this deficiency in testing, we present the performance of two damage detection algorithms used with ambient acceleration data collected during the staged demolition of the fullscale Z24 Bridge in Switzerland. The algorithms use autoregressive coefficients as features of the acceleration data and hypothesis testing and Gaussian Mixture Modeling to detect and quantify damage. While experimental or numerically simulated data have provided consistently positive results, field data from real structures, the Z24 Bridge, show that there can be significant false positives in the predictions. Difficulties with data collection in the field are also revealed pointing to the need for careful signal conditioning prior to algorithm application.
Statistical modeling for sensitive detection of low-frequency single nucleotide variants.
Hao, Yangyang; Zhang, Pengyue; Xuei, Xiaoling; Nakshatri, Harikrishna; Edenberg, Howard J; Li, Lang; Liu, Yunlong
2016-08-22
Sensitive detection of low-frequency single nucleotide variants carries great significance in many applications. In cancer genetics research, tumor biopsies are a mixture of normal and tumor cells from various subpopulations due to tumor heterogeneity. Thus the frequencies of somatic variants from a subpopulation tend to be low. Liquid biopsies, which monitor circulating tumor DNA in blood to detect metastatic potential, also face the challenge of detecting low-frequency variants due to the small percentage of the circulating tumor DNA in blood. Moreover, in population genetics research, although pooled sequencing of a large number of individuals is cost-effective, pooling dilutes the signals of variants from any individual. Detection of low frequency variants is difficult and can be cofounded by sequencing artifacts. Existing methods are limited in sensitivity and mainly focus on frequencies around 2 % to 5 %; most fail to consider differential sequencing artifacts. We aimed to push down the frequency detection limit close to the position specific sequencing error rates by modeling the observed erroneous read counts with respect to genomic sequence contexts. 4 distributions suitable for count data modeling (using generalized linear models) were extensively characterized in terms of their goodness-of-fit as well as the performances on real sequencing data benchmarks, which were specifically designed for testing detection of low-frequency variants; two sequencing technologies with significantly different chemistry mechanisms were used to explore systematic errors. We found the zero-inflated negative binomial distribution generalized linear mode is superior to the other models tested, and the advantage is most evident at 0.5 % to 1 % range. This method is also generalizable to different sequencing technologies. Under standard sequencing protocols and depth given in the testing benchmarks, 95.3 % recall and 79.9 % precision for Ion Proton data, 95.6 % recall
NASA Astrophysics Data System (ADS)
Flach, Milan; Mahecha, Miguel; Gans, Fabian; Rodner, Erik; Bodesheim, Paul; Guanche-Garcia, Yanira; Brenning, Alexander; Denzler, Joachim; Reichstein, Markus
2016-04-01
The number of available Earth observations (EOs) is currently substantially increasing. Detecting anomalous patterns in these multivariate time series is an important step in identifying changes in the underlying dynamical system. Likewise, data quality issues might result in anomalous multivariate data constellations and have to be identified before corrupting subsequent analyses. In industrial application a common strategy is to monitor production chains with several sensors coupled to some statistical process control (SPC) algorithm. The basic idea is to raise an alarm when these sensor data depict some anomalous pattern according to the SPC, i.e. the production chain is considered 'out of control'. In fact, the industrial applications are conceptually similar to the on-line monitoring of EOs. However, algorithms used in the context of SPC or process monitoring are rarely considered for supervising multivariate spatio-temporal Earth observations. The objective of this study is to exploit the potential and transferability of SPC concepts to Earth system applications. We compare a range of different algorithms typically applied by SPC systems and evaluate their capability to detect e.g. known extreme events in land surface processes. Specifically two main issues are addressed: (1) identifying the most suitable combination of data pre-processing and detection algorithm for a specific type of event and (2) analyzing the limits of the individual approaches with respect to the magnitude, spatio-temporal size of the event as well as the data's signal to noise ratio. Extensive artificial data sets that represent the typical properties of Earth observations are used in this study. Our results show that the majority of the algorithms used can be considered for the detection of multivariate spatiotemporal events and directly transferred to real Earth observation data as currently assembled in different projects at the European scale, e.g. http://baci-h2020.eu
Statistics, Handle with Care: Detecting Multiple Model Components with the Likelihood Ratio Test
NASA Astrophysics Data System (ADS)
Protassov, Rostislav; van Dyk, David A.; Connors, Alanna; Kashyap, Vinay L.; Siemiginowska, Aneta
2002-05-01
The likelihood ratio test (LRT) and the related F-test, popularized in astrophysics by Eadie and coworkers in 1971, Bevington in 1969, Lampton, Margon, & Bowyer, in 1976, Cash in 1979, and Avni in 1978, do not (even asymptotically) adhere to their nominal χ2 and F-distributions in many statistical tests common in astrophysics, thereby casting many marginal line or source detections and nondetections into doubt. Although the above authors illustrate the many legitimate uses of these statistics, in some important cases it can be impossible to compute the correct false positive rate. For example, it has become common practice to use the LRT or the F-test to detect a line in a spectral model or a source above background despite the lack of certain required regularity conditions. (These applications were not originally suggested by Cash or by Bevington.) In these and other settings that involve testing a hypothesis that is on the boundary of the parameter space, contrary to common practice, the nominal χ2 distribution for the LRT or the F-distribution for the F-test should not be used. In this paper, we characterize an important class of problems in which the LRT and the F-test fail and illustrate this nonstandard behavior. We briefly sketch several possible acceptable alternatives, focusing on Bayesian posterior predictive probability values. We present this method in some detail since it is a simple, robust, and intuitive approach. This alternative method is illustrated using the gamma-ray burst of 1997 May 8 (GRB 970508) to investigate the presence of an Fe K emission line during the initial phase of the observation. There are many legitimate uses of the LRT and the F-test in astrophysics, and even when these tests are inappropriate, there remain several statistical alternatives (e.g., judicious use of error bars and Bayes factors). Nevertheless, there are numerous cases of the inappropriate use of the LRT and similar tests in the literature, bringing substantive
Post, Gloria B; Louis, Judith B; Cooper, Keith R; Boros-Russo, Betty Jane; Lippincott, R Lee
2009-06-15
After detection of perfluorooctanoic acid (PFOA) in two New Jersey (NJ) public water systems (PWS) at concentrations up to 0.19 microg/L, a study of PFOA in 23 other NJ PWS was conducted in 2006. PFOA was detected in 15 (65%) of the systems at concentrations ranging from 0.005 to 0.039 microg/L. To assess the significance of these data, the contribution of drinking water to human exposure to PFOA was evaluated, and a health-based drinking water concentration protective for lifetime exposure of 0.04 microg/L was developed through a risk assessment approach. Both the exposure assessment and the health-based drinking water concentrations are based on the previously reported 100:1 ratio between the concentration of PFOA in serum and drinking water in a community with highly contaminated drinking water. The applicability of this ratio to lower drinking water concentrations was confirmed using data on serum levels and water concentrations from other communities. The health-based concentration is based on toxicological end points identified by U.S. Environmental Protection Agency (USEPA) in its 2005 draft risk assessment Recent information on PFOA's toxicity not considered in the USEPA risk assessment urther supports the health-based concentration of 0.04 microg/L. In additional sampling of 18 PWS in 2007-2008, PFOA in most systems was below the health-based concentration. However, PFOA was detected above the health-based concentration in five systems, including one not previously sampled.
Yuan, Ling; Wang, Xinyun; Zheng, Haiyan
2007-06-20
Fragile histidine triad (FHIT) is a candidate tumor suppressor gene. Aberrant expression of FHIT has been observed in multiple carcinomas induced by environmental carcinogens, especially in lung cancer. In this study, the expression of FHIT protein in lung cancer progression tissue microarray was detected and their roles in oncogenesis and progression of lung cancer were discussed. The expression of FHIT protein in tissue microarray with 270 cores was detected by SP immunohistochemistry method, in which there were 89 cases of primary lung cancer, 12 cases of lymph node metastasis of lung cancer, 12 cases of precancerous lesion and 10 cases of normal lung tissue, and the clinicopathological features of lung cancer were analyzed. The expression of FHIT was localized in the cytoplasm. Loss of FHIT expression in primary cancers, precancerous lesion and lymph node metastasis of lung cancer was 46.1%, 41.7% and 50.0% respectively, while 0 in 10 cases of normal tissues. A significant difference of FHIT expression was observed among four groups (P < 0.05). Loss of FHIT expression in precancerous lesion, primary lung cancer and lymph node metastasis of lung cancer was significantly higher than that in normal lung tissue (P < 0.05). The difference among precancerous lesion, primary lung cancer and lymph node metastasis of lung cancer groups was not statistically significant (P > 0.05). Loss of FHIT expression was related to tumor histologicol types, degree of cell differentiation and the smoking history of patients (P < 0.05), but not to sex, age, gross appearance types, TNM stages, or lymph node metastasis (P > 0.05). The protein expression level of FHIT is reduced in primary cancers and precancerous tissues, especially in most squamous cell carcinomas, poorly differentiated group and the patients with a smoking history. These results indicate that loss of FHIT expression might correlate with carcinogenesis, development of lung cancer and the carcinogenesis induced by
Seo, Hyo Jung; Pagsisihan, Jefferson R.; Choi, Seung Hong; Cheon, Gi Jeong; Chung, June-Key; Lee, Dong Soo; Kang, Keon Wook
2015-01-01
Purpose We evaluated hemodynamic significance of stenosis on magnetic resonance angiography (MRA) using acetazolamide perfusion single photon emission computed tomography (SPECT). Materials and Methods Of 171 patients, stenosis in internal carotid artery (ICA) and middle cerebral artery (MCA) (ICA-MCA) on MRA and cerebrovascular reserve (CVR) of MCA territory on SPECT was measured using quantification and a 3-grade system. Stenosis and CVR grades were compared with each other, and their prognostic value for subsequent stroke was evaluated. Results Of 342 ICA-MCA, 151 (44%) presented stenosis on MRA; grade 1 in 69 (20%) and grade 2 in 82 (24%) cases. Decreased CVR was observed in 9% of grade 0 stenosis, 25% of grade 1, and 35% of grade 2. The average CVR of grade 0 was significantly different from grade 1 (p<0.001) and grade 2 stenosis (p=0.007). In quantitative analysis, average CVR index was -0.56±7.91 in grade 0, -1.81±6.66 in grade 1 and -1.18±5.88 in grade 2 stenosis. Agreement between stenosis and CVR grades was fair in patients with lateralizing and non-lateralizing symptoms (κ=0.230 and 0.346). Of the factors tested, both MRA and CVR were not significant prognostic factors (p=0.104 and 0.988, respectively), whereas hypertension and renal disease were significant factors (p<0.05, respectively). Conclusion A considerable proportion of ICA-MCA stenosis detected on MRA does not cause CVR impairment despite a fair correlation between them. Thus, hemodynamic state needs to be assessed for evaluating significance of stenosis, particularly in asymptomatic patients. PMID:26446655
Myocardial mechanical and QTc dispersion for the detection of significant coronary artery disease.
Stankovic, Ivan; Putnikovic, Biljana; Janicijevic, Aleksandra; Jankovic, Milica; Cvjetan, Radosava; Pavlovic, Sinisa; Kalezic-Radmili, Tijana; Panic, Milos; Milicevic, Predrag; Ilic, Ivan; Cvorovic, Vojkan; Neskovic, Aleksandar N
2015-09-01
Ischaemic but viable myocardium may exhibit prolongation of contraction and QT interval duration, but it is largely unknown whether non-invasive assessment of regional heterogeneities of myocardial deformation and QT interval duration could identify patients with significant coronary artery disease (CAD). We retrospectively studied 205 patients with suspected CAD who underwent coronary angiography. QTc dispersion was assessed from a 12-lead electrocardiogram (ECG) as the difference between the longest and shortest QTc intervals. Contraction duration was assessed as time from the ECG R-(Q-)wave to peak longitudinal strain in each of 18 left ventricular segments. Mechanical dispersion was defined as either the standard deviation of 18 time intervals (dispersionSD18) or as the difference between the longest and shortest time intervals (dispersiondelta). Longitudinal strain was measured by speckle tracking echocardiography. Mean contraction duration was longer in patients with significant CAD compared with control subjects (428 ± 51 vs. 410 ± 40 ms; P = 0.032), and it was correlated to QTc interval duration (r = 0.47; P < 0.001). In contrast to QTc interval duration and dispersion, both parameters of mechanical dispersion were independently associated with CAD (P < 0.001) and had incremental value over traditional risk factors, wall motion abnormalities, and global longitudinal strain (GLS) for the detection of significant CAD. The QTc interval and myocardial contraction duration are related to the presence of significant CAD in patients without a history of previous myocardial infarction. Myocardial mechanical dispersion has an incremental value to GLS for identifying patients with significant CAD. Published on behalf of the European Society of Cardiology. All rights reserved. © The Author 2015. For permissions please email: journals.permissions@oup.com.
Kyin May, Kyin; Htet Zaw, Min; Capistrano Canlas, Carolina; Hannah Seah, Mary; Menil Serrano, Catherine; Hartman, Mikael; Ho, Pei
2013-01-01
Objective: This study aims to evaluate the accuracy of AVF and AVG duplex ultrasound (US) compared to angiographic findings in patients with suspected failing dialysis access. Materials and Methods: From July 2008 to December 2010, US was performed on 35 hemodialysis patients with 51 vascular accesses having clinical feature or dialysis parameter suspicious of access problem. Peak systolic velocity ratio of ≥2 was the criteria for diagnosing stenosis ≥50%. Fistulogram was performed in all these patients. Results of US and fistulogram were compared using Kappa and Receiver Operator Characteristic (ROC) analyses. Results: In 51 accesses (35 AVF, 16 AVG), US diagnosed significant stenosis in 45 accesses according to the criteria and angiogram confirmed 44 significant stenoses. In AVF lesions, Kappa was 0.533 with 93.3% sensitivity and 60% specificity for US whereas in AVG lesions, Kappa was 0.636 with 100% sensitivity and 50% specificity. Overall Kappa value of 0.56 meant fair to good agreement. ROC demonstrated area under the curve being 0.79 for all cases and was significant (p = 0.016). Using the ≥50% criteria for stenosis diagnosed by US yielded the best sensitivity (95.5%) and specificity (57.1%). Conclusion: Duplex ultrasound study, using ≥50% criteria, is a sensitive tool for stenosis detection in patients with suspected failing AVF and AVG. PMID:23641285
Zhang, Junai; Liu, Ganbin; Zeng, Jincheng; Wang, Wandang; Xiang, Wenyu; Kong, Bin; Yi, Lailong; Xu, Junfa
2015-04-01
To investigate the level of plasma interleukin 37 (IL-37) and explore the clinical significance of IL-37 in patients with active pulmonary tuberculosis (ATB). ELISA was used to detect the level of plasma IL-37 from 30 patients with ATB, 15 patients who had been treated for ATB, and 21 healthy volunteers as controls. The level of plasma IL-37 in patients with ATB was significantly higher than that in healthy controls. The monitoring on the 15 patients showed that plasma IL-37 was reduced after treatment for ATB. The level of plasma IL-37 in patients with anti-Mycobecterium tuberculosis antibody positive or sputum smear positive were higher than that in patients with anti-Mycobecterium tuberculosis antibody negative or sputum smear negative for Mycobecterium tuberculosis, and the level was negatively correlated with the number of white blood cells in peripheral blood. The patients with ATB present with significantly increased level of plasma IL-37, which might be an indicator of curative effect in ATB.
Privitera, Gregory J; Agnello, Jaela E; Walters, Shelby A; Bender, Stacy L
2015-05-01
An experiment was conducted to test the hypothesis that feedback about an ADHD diagnosis influences how a nonclinical sample scores on the Adult ADHD Self-Report Scale (ASRS) screener. A total of 54 participants who scored below clinical significance on the ASRS in a pretest, that is, marked fewer than 4 of 6 items found to be most predictive of symptoms consistent with clinical diagnosis of adult ADHD, completed the assessment again 1 week later in a posttest with "negative," "positive," or no feedback written on the posttest to indicate how participants scored on the pretest. In all, 8 of 10 participants who scored in the clinical significance range for ADHD in the posttest were those who received positive feedback. Scores for the positive feedback group increased most from pretest to posttest for inattentive domain items (R(2) = .19). Patient beliefs prior to a diagnostic screening can influence ASRS self-report ratings. © 2012 SAGE Publications.
Sibio, Simone; Fiorani, Cristina; Stolfi, Carmine; Divizia, Andrea; Pezzuto, Roberto; Montagnese, Fabrizio; Bagaglini, Giulia; Sammartino, Paolo; Sica, Giuseppe Sigismondo
2015-01-01
Peritoneal washing is now part of the standard clinical practice in several abdominal and pelvic neoplasias. However, in colorectal cancer surgery, intra-peritoneal free cancer cells (IFCC) presence is not routinely investigated and their prognostic meaning is still unclear. When peritoneal washing results are positive for the presence of IFCC a worse outcome is usually expected in these colorectal cancer operated patients, but it what is not clear is whether it is associated with an increased risk of local recurrence. It is authors’ belief that one of the main reasons why IFCC are not researched as integral part of the routine staging system for colon cancer is that there still isn’t a diagnostic or detection method with enough sensibility and specificity. However, the potential clinical implications of a routine research for the presence IFCC in colon neoplasias are enormous: not only to obtain a more accurate clinical staging but also to offer different therapy protocols, based on the presence of IFCC. Based on this, adjuvant chemotherapy could be offered to those patients found to be positive for IFCC; also, protocols of proactive intraperitoneal chemotherapy could be applied. Although presence of IFCC appears to have a valid prognostic significance, further studies are needed to standardize detection and examination procedures, to determine if there are and which are the stages more likely to benefit from routine search for IFCC. PMID:26425265
Liu, Ran; Jin, Cuiyun; Song, Fengjuan; Liu, Jing
2013-01-01
The conductivity and permittivity of tumors are known to differ significantly from those of normal tissues. Electrical impedance tomography (EIT) is a relatively new imaging method for exploiting these differences. However, the accuracy of data capture is one of the difficult problems urgently to be solved in the clinical application of EIT technology. A new concept of EIT sensitizers is put forward in this paper with the goal of expanding the contrast ratio of tumor and healthy tissue to enhance EIT imaging quality. The use of nanoparticles for changing tumor characteristics and determining the infiltration vector for easier detection has been widely accepted in the biomedical field. Ultra-pure water, normal saline, and gold nanoparticles, three kinds of material with large differences in electrical characteristics, are considered as sensitizers and undergo mathematical model analysis and animal experimentation. Our preliminary results suggest that nanoparticles are promising for sensitization work. Furthermore, in experimental and simulation results, we found that we should select different sensitizers for the detection of different types and stages of tumor.
Sibio, Simone; Fiorani, Cristina; Stolfi, Carmine; Divizia, Andrea; Pezzuto, Roberto; Montagnese, Fabrizio; Bagaglini, Giulia; Sammartino, Paolo; Sica, Giuseppe Sigismondo
2015-09-27
Peritoneal washing is now part of the standard clinical practice in several abdominal and pelvic neoplasias. However, in colorectal cancer surgery, intra-peritoneal free cancer cells (IFCC) presence is not routinely investigated and their prognostic meaning is still unclear. When peritoneal washing results are positive for the presence of IFCC a worse outcome is usually expected in these colorectal cancer operated patients, but it what is not clear is whether it is associated with an increased risk of local recurrence. It is authors' belief that one of the main reasons why IFCC are not researched as integral part of the routine staging system for colon cancer is that there still isn't a diagnostic or detection method with enough sensibility and specificity. However, the potential clinical implications of a routine research for the presence IFCC in colon neoplasias are enormous: not only to obtain a more accurate clinical staging but also to offer different therapy protocols, based on the presence of IFCC. Based on this, adjuvant chemotherapy could be offered to those patients found to be positive for IFCC; also, protocols of proactive intraperitoneal chemotherapy could be applied. Although presence of IFCC appears to have a valid prognostic significance, further studies are needed to standardize detection and examination procedures, to determine if there are and which are the stages more likely to benefit from routine search for IFCC.
Significance of Viable but Nonculturable Escherichia coli: Induction, Detection, and Control.
Ding, Tian; Suo, Yuanjie; Xiang, Qisen; Zhao, Xihong; Chen, Shiguo; Ye, Xingqian; Liu, Donghong
2017-03-28
Diseases caused by foodborne or waterborne pathogens are emerging. Many pathogens can enter into the viable but nonculturable (VBNC) state, which is a survival strategy when exposed to harsh environmental stresses. Pathogens in the VBNC state have the ability to evade conventional microbiological detection methods, posing a significant and potential health risk. Therefore, controlling VBNC bacteria in food processing and the environment is of great importance. As the typical one of the gram-negatives, Escherichia coli (E. coli) is a widespread foodborne and waterborne pathogenic bacterium and is able to enter into a VBNC state in extreme conditions (similar to the other gram-negative bacteria), including inducing factors and resuscitation stimulus. VBNC E. coli has the ability to recover both culturability and pathogenicity, which may bring potential health risk. This review describes the concrete factors (nonthermal treatment, chemical agents, and environmental factors) that induce E. coli into the VBNC state, the condition or stimulus required for resuscitation of VBNC E. coli, and the methods for detecting VBNC E. coli. Furthermore, the mechanism of genes and proteins involved in the VBNC E. coli is also discussed in this review.
NASA Astrophysics Data System (ADS)
Suarjaya, I. Made Agus Dwi; Kasahara, Yoshiya; Goto, Yoshitaka
2017-07-01
This paper shows a statistical analysis of 10.2 kHz Omega broadcasts of an artificial signal broadcast from ground stations, propagated in the plasmasphere, and detected using an automatic detection method we developed. We study the propagation patterns of the Omega signals to understand the propagation characteristics that are strongly affected by plasmaspheric electron density and the ambient magnetic field. We show the unique propagation patterns of the Omega 10.2 kHz signal when it was broadcast from two high-middle-latitude stations. We use about eight years of data captured by the Poynting flux analyzer subsystem on board the Akebono satellite from October 1989 to September 1997. We demonstrate that the signals broadcast from almost the same latitude (in geomagnetic coordinates) propagated differently depending on the geographic latitude. We also study propagation characteristics as a function of local time, season, and solar activity. The Omega signal tended to propagate farther on the nightside than on the dayside and was more widely distributed during winter than during summer. When solar activity was at maximum, the Omega signal propagated at a lower intensity level. In contrast, when solar activity was at minimum, the Omega signal propagated at a higher intensity and farther from the transmitter station.[Figure not available: see fulltext.
NASA Astrophysics Data System (ADS)
Zakaria, Chahnez; Curé, Olivier; Salzano, Gabriella; Smaïli, Kamel
In Computer Supported Cooperative Work (CSCW), it is crucial for project leaders to detect conflicting situations as early as possible. Generally, this task is performed manually by studying a set of documents exchanged between team members. In this paper, we propose a full-fledged automatic solution that identifies documents, subjects and actors involved in relational conflicts. Our approach detects conflicts in emails, probably the most popular type of documents in CSCW, but the methods used can handle other text-based documents. These methods rely on the combination of statistical and ontological operations. The proposed solution is decomposed in several steps: (i) we enrich a simple negative emotion ontology with terms occuring in the corpus of emails, (ii) we categorize each conflicting email according to the concepts of this ontology and (iii) we identify emails, subjects and team members involved in conflicting emails using possibilistic description logic and a set of proposed measures. Each of these steps are evaluated and validated on concrete examples. Moreover, this approach's framework is generic and can be easily adapted to domains other than conflicts, e.g. security issues, and extended with operations making use of our proposed set of measures.
Statistical modeling, detection, and segmentation of stains in digitized fabric images
NASA Astrophysics Data System (ADS)
Gururajan, Arunkumar; Sari-Sarraf, Hamed; Hequet, Eric F.
2007-02-01
This paper will describe a novel and automated system based on a computer vision approach, for objective evaluation of stain release on cotton fabrics. Digitized color images of the stained fabrics are obtained, and the pixel values in the color and intensity planes of these images are probabilistically modeled as a Gaussian Mixture Model (GMM). Stain detection is posed as a decision theoretic problem, where the null hypothesis corresponds to absence of a stain. The null hypothesis and the alternate hypothesis mathematically translate into a first order GMM and a second order GMM respectively. The parameters of the GMM are estimated using a modified Expectation-Maximization (EM) algorithm. Mi