Science.gov

Sample records for detecting statistically significant

  1. Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

    ERIC Educational Resources Information Center

    Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

    2014-01-01

    This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…

  2. Lack of Statistical Significance

    ERIC Educational Resources Information Center

    Kehle, Thomas J.; Bray, Melissa A.; Chafouleas, Sandra M.; Kawano, Takuji

    2007-01-01

    Criticism has been leveled against the use of statistical significance testing (SST) in many disciplines. However, the field of school psychology has been largely devoid of critiques of SST. Inspection of the primary journals in school psychology indicated numerous examples of SST with nonrandom samples and/or samples of convenience. In this…

  3. Statistical Significance Testing.

    ERIC Educational Resources Information Center

    McLean, James E., Ed.; Kaufman, Alan S., Ed.

    1998-01-01

    The controversy about the use or misuse of statistical significance testing has become the major methodological issue in educational research. This special issue contains three articles that explore the controversy, three commentaries on these articles, an overall response, and three rejoinders by the first three authors. They are: (1)…

  4. Statistical or biological significance?

    PubMed

    Saxon, Emma

    2015-01-01

    Oat plants grown at an agricultural research facility produce higher yields in Field 1 than in Field 2, under well fertilised conditions and with similar weather exposure; all oat plants in both fields are healthy and show no sign of disease. In this study, the authors hypothesised that the soil microbial community might be different in each field, and these differences might explain the difference in oat plant growth. They carried out a metagenomic analysis of the 16 s ribosomal 'signature' sequences from bacteria in 50 randomly located soil samples in each field to determine the composition of the bacterial community. The study identified >1000 species, most of which were present in both fields. The authors identified two plant growth-promoting species that were significantly reduced in soil from Field 2 (Student's t-test P < 0.05), and concluded that these species might have contributed to reduced yield. PMID:26541972

  5. Statistically significant relational data mining :

    SciTech Connect

    Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann; Pinar, Ali; Robinson, David Gerald; Berger-Wolf, Tanya; Bhowmick, Sanjukta; Casleton, Emily; Kaiser, Mark; Nordman, Daniel J.; Wilson, Alyson G.

    2014-02-01

    This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publications that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.

  6. Detecting multiple periodicities in observational data with the multifrequency periodogram - I. Analytic assessment of the statistical significance

    NASA Astrophysics Data System (ADS)

    Baluev, Roman V.

    2013-11-01

    We consider the `multifrequency' periodogram, in which the putative signal is modelled as a sum of two or more sinusoidal harmonics with independent frequencies. It is useful in cases when the data may contain several periodic components, especially when their interaction with each other and with the data sampling patterns might produce misleading results. Although the multifrequency statistic itself was constructed earlier, for example by G. Foster in his CLEANest algorithm, its probabilistic properties (the detection significance levels) are still poorly known and much of what is deemed known is not rigorous. These detection levels are nonetheless important for data analysis. We argue that to prove the simultaneous existence of all n components revealed in a multiperiodic variation, it is mandatory to apply at least 2n - 1 significance tests, among which most involve various multifrequency statistics, and only n tests are single-frequency ones. The main result of this paper is an analytic estimation of the statistical significance of the frequency tuples that the multifrequency periodogram can reveal. Using the theory of extreme values of random fields (the generalized Rice method), we find a useful approximation to the relevant false alarm probability. For the double-frequency periodogram, this approximation is given by the elementary formula (π/16)W2e- zz2, where W denotes the normalized width of the settled frequency range, and z is the observed periodogram maximum. We carried out intensive Monte Carlo simulations to show that the practical quality of this approximation is satisfactory. A similar analytic expression for the general multifrequency periodogram is also given, although with less numerical verification.

  7. Statistical significance of the gallium anomaly

    SciTech Connect

    Giunti, Carlo; Laveder, Marco

    2011-06-15

    We calculate the statistical significance of the anomalous deficit of electron neutrinos measured in the radioactive source experiments of the GALLEX and SAGE solar neutrino detectors, taking into account the uncertainty of the detection cross section. We found that the statistical significance of the anomaly is {approx}3.0{sigma}. A fit of the data in terms of neutrino oscillations favors at {approx}2.7{sigma} short-baseline electron neutrino disappearance with respect to the null hypothesis of no oscillations.

  8. Significant results: statistical or clinical?

    PubMed Central

    2016-01-01

    The null hypothesis significance test method is popular in biological and medical research. Many researchers have used this method for their research without exact knowledge, though it has both merits and shortcomings. Readers will know its shortcomings, as well as several complementary or alternative methods, as such the estimated effect size and the confidence interval. PMID:27066201

  9. Social significance of community structure: Statistical view

    NASA Astrophysics Data System (ADS)

    Li, Hui-Jia; Daniels, Jasmine J.

    2015-01-01

    Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.

  10. Social significance of community structure: statistical view.

    PubMed

    Li, Hui-Jia; Daniels, Jasmine J

    2015-01-01

    Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p-value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc. PMID:25679651

  11. Multi-spectral detection of statistically significant components in pre-seismic electromagnetic emissions related with Athens 1999, M = 5.9 earthquake

    NASA Astrophysics Data System (ADS)

    Kalimeris, A.; Potirakis, S. M.; Eftaxias, K.; Antonopoulos, G.; Kopanas, J.; Nomikos, C.

    2016-05-01

    A multi-spectral analysis of the kHz electromagnetic time series associated with Athens' earthquake (M = 5.9, 7 September 1999) is presented here, that results to the reliable discrimination of the fracto-electromagnetic emissions from the natural geo-electromagnetic field background. Five spectral analysis methods are utilized in order to resolve the statistically significant variability modes of the studied dynamical system out of a red noise background (the revised Multi-Taper Method, the Singular Spectrum Analysis, and the Wavelet Analysis among them). The performed analysis reveals the existence of three distinct epochs in the time series for the period before the earthquake, a "quiet", a "transitional" and an "active" epoch. Towards the end of the active epoch, during a sub-period which is approximately starting two days before the earthquake, the dynamical system passes into a high activity state, where electromagnetic signal emissions become powerful and statistically significant almost in all time-scales. The temporal behavior of the studied system in each one of these epochs is further searched through mathematical reconstruction in the time domain of those spectral features that were found to be statistically significant. The transition of the system from the quiet to the active state proved to be detectable first in the long time-scales and afterwards in the short scales. Finally, a Hurst exponent analysis revealed persistent characteristics embedded in the two strong EM bursts observed during the "active" epoch.

  12. Statistical Significance vs. Practical Significance: An Exploration through Health Education

    ERIC Educational Resources Information Center

    Rosen, Brittany L.; DeMaria, Andrea L.

    2012-01-01

    The purpose of this paper is to examine the differences between statistical and practical significance, including strengths and criticisms of both methods, as well as provide information surrounding the application of various effect sizes and confidence intervals within health education research. Provided are recommendations, explanations and…

  13. Comments on the Statistical Significance Testing Articles.

    ERIC Educational Resources Information Center

    Knapp, Thomas R.

    1998-01-01

    Expresses a "middle-of-the-road" position on statistical significance testing, suggesting that it has its place but that confidence intervals are generally more useful. Identifies 10 errors of omission or commission in the papers reviewed that weaken the positions taken in their discussions. (SLD)

  14. Statistical significance of normalized global alignment.

    PubMed

    Peris, Guillermo; Marzal, Andrés

    2014-03-01

    The comparison of homologous proteins from different species is a first step toward a function assignment and a reconstruction of the species evolution. Though local alignment is mostly used for this purpose, global alignment is important for constructing multiple alignments or phylogenetic trees. However, statistical significance of global alignments is not completely clear, lacking a specific statistical model to describe alignments or depending on computationally expensive methods like Z-score. Recently we presented a normalized global alignment, defined as the best compromise between global alignment cost and length, and showed that this new technique led to better classification results than Z-score at a much lower computational cost. However, it is necessary to analyze the statistical significance of the normalized global alignment in order to be considered a completely functional algorithm for protein alignment. Experiments with unrelated proteins extracted from the SCOP ASTRAL database showed that normalized global alignment scores can be fitted to a log-normal distribution. This fact, obtained without any theoretical support, can be used to derive statistical significance of normalized global alignments. Results are summarized in a table with fitted parameters for different scoring schemes. PMID:24400820

  15. Assessing the statistical significance of periodogram peaks

    NASA Astrophysics Data System (ADS)

    Baluev, R. V.

    2008-04-01

    The least-squares (or Lomb-Scargle) periodogram is a powerful tool that is routinely used in many branches of astronomy to search for periodicities in observational data. The problem of assessing the statistical significance of candidate periodicities for a number of periodograms is considered. Based on results in extreme value theory, improved analytic estimations of false alarm probabilities are given. These include an upper limit to the false alarm probability (or a lower limit to the significance). The estimations are tested numerically in order to establish regions of their practical applicability.

  16. Statistical Significance of Trends in Exoplanetary Atmospheres

    NASA Astrophysics Data System (ADS)

    Harrington, Joseph; Bowman, M.; Blumenthal, S. D.; Loredo, T. J.; UCF Exoplanets Group

    2013-10-01

    Cowan and Agol (2011) and we (Harrington et al. 2007, 2010, 2011, 2012, 2013) have noted that at higher equilibrium temperatures, observed exoplanet fluxes are substantially higher than even the elevated equilibrium temperature predicts. With a substantial increase in the number of atmospheric flux measurements, we can now test the statistical significance of this trend. We can also cast the data on a variety of axes to search further for the physics behind both the jump in flux above about 2000 K and the wide scatter in fluxes at all temperatures. This work was supported by NASA Planetary Atmospheres grant NNX12AI69G and NASA Astrophysics Data Analysis Program grant NNX13AF38G.

  17. On the statistical significance of climate trends

    NASA Astrophysics Data System (ADS)

    Franzke, Christian

    2010-05-01

    One of the major problems in climate science is the prediction of future climate change due to anthropogenic green-house gas emissions. The earth's climate is not changing in a uniform way because it is a complex nonlinear system of many interacting components. The overall warming trend can be interrupted by cooling periods due to natural variability. Thus, in order to statistically distinguish between internal climate variability and genuine trends one has to assume a certain null model of the climate variability. Traditionally a short-range, and not a long-range, dependent null model is chosen. Here I show evidence for the first time that temperature data at 8 stations across Antarctica are long-range dependent and that the choice of a long-range, rather than a short-range, dependent null model negates the statistical significance of temperature trends at 2 out of 3 stations. These results show the short comings of traditional trend analysis and imply that more attention should be given to the correlation structure of climate data, in particular if they are long-range dependent. In this study I use the Empirical Mode Decomposition (EMD) to decompose the univariate temperature time series into a finite number of Intrinsic Mode Functions (IMF) and an instantaneous mean. While there is no unambiguous definition of a trend, in this study we interpret the instantaneous mean as a trend which is possibly nonlinear. The EMD method has been shown to be a powerful method for extracting trends from noisy and nonlinear time series. I will show that this way of identifying trends is superior to the traditional linear least-square fits.

  18. Statistical mechanics of community detection

    NASA Astrophysics Data System (ADS)

    Reichardt, Jörg; Bornholdt, Stefan

    2006-07-01

    Starting from a general ansatz, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass. Our approach applies to weighted and directed networks alike. It contains the ad hoc introduced quality function from [J. Reichardt and S. Bornholdt, Phys. Rev. Lett. 93, 218701 (2004)] and the modularity Q as defined by Newman and Girvan [Phys. Rev. E 69, 026113 (2004)] as special cases. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further, we show how hierarchies and overlap in the community structure can be detected. Computationally efficient local update rules for optimization procedures to find the ground state are given. We show how the ansatz may be used to discover the community around a given node without detecting all communities in the full network and we give benchmarks for the performance of this extension. Finally, we give expectation values for the modularity of random graphs, which can be used in the assessment of statistical significance of community structure.

  19. Statistical mechanics of community detection.

    PubMed

    Reichardt, Jörg; Bornholdt, Stefan

    2006-07-01

    Starting from a general ansatz, we show how community detection can be interpreted as finding the ground state of an infinite range spin glass. Our approach applies to weighted and directed networks alike. It contains the ad hoc introduced quality function from [J. Reichardt and S. Bornholdt, Phys. Rev. Lett. 93, 218701 (2004)] and the modularity Q as defined by Newman and Girvan [Phys. Rev. E 69, 026113 (2004)] as special cases. The community structure of the network is interpreted as the spin configuration that minimizes the energy of the spin glass with the spin states being the community indices. We elucidate the properties of the ground state configuration to give a concise definition of communities as cohesive subgroups in networks that is adaptive to the specific class of network under study. Further, we show how hierarchies and overlap in the community structure can be detected. Computationally efficient local update rules for optimization procedures to find the ground state are given. We show how the ansatz may be used to discover the community around a given node without detecting all communities in the full network and we give benchmarks for the performance of this extension. Finally, we give expectation values for the modularity of random graphs, which can be used in the assessment of statistical significance of community structure. PMID:16907154

  20. Statistical methodology for pathogen detection.

    PubMed

    Ogliari, Paulo José; de Andrade, Dalton Francisco; Pacheco, Juliano Anderson; Franchin, Paulo Rogério; Batista, Cleide Rosana Vieira

    2007-08-01

    The main goal of the present study was to discuss the application of the McNemar test to the comparison of proportions in dependent samples. Data were analyzed from studies conducted to verify the suitability of replacing a conventional method with a new one for identifying the presence of Salmonella. It is shown that, in most situations, the McNemar test does not provide all the elements required by the microbiologist to make a final decision and that appropriate functions of the proportions need to be considered. Sample sizes suitable to guarantee a test with a high power in the detection of significant differences regarding the problem studied are obtained by simulation. Examples of functions that are of great value to the microbiologist are presented. PMID:17803152

  1. Reviewer Bias for Statistically Significant Results: A Reexamination.

    ERIC Educational Resources Information Center

    Fagley, N. S.; McKinney, I. Jean

    1983-01-01

    Reexamines the article by Atkinson, Furlong, and Wampold (1982) and questions their conclusion that reviewers were biased toward statistically significant results. A statistical power analysis shows the power of their bogus study was low. Low power in a study reporting nonsignificant findings is a valid reason for recommending not to publish.…

  2. Advances in Testing the Statistical Significance of Mediation Effects

    ERIC Educational Resources Information Center

    Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.

    2006-01-01

    P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…

  3. Testing the Difference of Correlated Agreement Coefficients for Statistical Significance

    ERIC Educational Resources Information Center

    Gwet, Kilem L.

    2016-01-01

    This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…

  4. Statistical significance test for transition matrices of atmospheric Markov chains

    NASA Technical Reports Server (NTRS)

    Vautard, Robert; Mo, Kingtse C.; Ghil, Michael

    1990-01-01

    Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.

  5. Decadal power in land air temperatures: Is it statistically significant?

    NASA Astrophysics Data System (ADS)

    Thejll, Peter A.

    2001-12-01

    The geographical distribution and properties of the well-known 10-11 year signal in terrestrial temperature records is investigated. By analyzing the Global Historical Climate Network data for surface air temperatures we verify that the signal is strongest in North America and is similar in nature to that reported earlier by R. G. Currie. The decadal signal is statistically significant for individual stations, but it is not possible to show that the signal is statistically significant globally, using strict tests. In North America, during the twentieth century, the decadal variability in the solar activity cycle is associated with the decadal part of the North Atlantic Oscillation index series in such a way that both of these signals correspond to the same spatial pattern of cooling and warming. A method for testing statistical results with Monte Carlo trials on data fields with specified temporal structure and specific spatial correlation retained is presented.

  6. A Comparison of Statistical Significance Tests for Selecting Equating Functions

    ERIC Educational Resources Information Center

    Moses, Tim

    2009-01-01

    This study compared the accuracies of nine previously proposed statistical significance tests for selecting identity, linear, and equipercentile equating functions in an equivalent groups equating design. The strategies included likelihood ratio tests for the loglinear models of tests' frequency distributions, regression tests, Kolmogorov-Smirnov…

  7. Your Chi-Square Test Is Statistically Significant: Now What?

    ERIC Educational Resources Information Center

    Sharpe, Donald

    2015-01-01

    Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…

  8. Assigning statistical significance to proteotypic peptides via database searches

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

    2011-01-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  9. Assigning statistical significance to proteotypic peptides via database searches.

    PubMed

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2011-02-01

    Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId's knowledge database to include proteotypic information, utilized RAId's statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId's programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those that occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489

  10. Advances in Significance Testing for Cluster Detection

    NASA Astrophysics Data System (ADS)

    Coleman, Deidra Andrea

    Over the past two decades, much attention has been given to data driven project goals such as the Human Genome Project and the development of syndromic surveillance systems. A major component of these types of projects is analyzing the abundance of data. Detecting clusters within the data can be beneficial as it can lead to the identification of specified sequences of DNA nucleotides that are related to important biological functions or the locations of epidemics such as disease outbreaks or bioterrorism attacks. Cluster detection techniques require efficient and accurate hypothesis testing procedures. In this dissertation, we improve upon the hypothesis testing procedures for cluster detection by enhancing distributional theory and providing an alternative method for spatial cluster detection using syndromic surveillance data. In Chapter 2, we provide an efficient method to compute the exact distribution of the number and coverage of h-clumps of a collection of words. This method involves defining a Markov chain using a minimal deterministic automaton to reduce the number of states needed for computation. We allow words of the collection to contain other words of the collection making the method more general. We use our method to compute the distributions of the number and coverage of h-clumps in the Chi motif of H. influenza.. In Chapter 3, we provide an efficient algorithm to compute the exact distribution of multiple window discrete scan statistics for higher-order, multi-state Markovian sequences. This algorithm involves defining a Markov chain to efficiently keep track of probabilities needed to compute p-values of the statistic. We use our algorithm to identify cases where the available approximation does not perform well. We also use our algorithm to detect unusual clusters of made free throw shots by National Basketball Association players during the 2009-2010 regular season. In Chapter 4, we give a procedure to detect outbreaks using syndromic

  11. A Statistical Approach to Autocorrelation Detection of Low Frequency Earthquakes

    NASA Astrophysics Data System (ADS)

    Aguiar, A. C.; Beroza, G. C.

    2012-12-01

    We have analyzed tremor data during the April, 2006 tremor episode in the Nankai Trough in SW Japan using the auto-correlation approach of Brown et al. (2008), which detects low frequency earthquakes (LFEs) based on pair-wise matching. We have found that the statistical behavior of the autocorrelations of each station is different and for this reason we have based our LFE detection method on the autocorrelation of each station individually. Analyzing one station at a time assures that the detection threshold will only depend on the station being analyzed. Once detections are found on each station individually, using a low detection threshold based on a Gaussian distribution of the correlation coefficients, the results are compared within stations and declared a detection if they are found in a statistically significant number of the stations, following multinomial statistics. We have compared our detections using the single station method to the detections found by Shelly et al. (2007) for the 2006 April 16 events and find a significant number of similar detections as well as many new detections that were not found using templates from known LFEs. We are working towards developing a sound statistical basis for event detection. This approach should improve our ability to detect LFEs within weak tremor signals where they are not already identified, and should be applicable to earthquake swarms and sequences in general.

  12. Statistical Fault Detection & Diagnosis Expert System

    SciTech Connect

    Wegerich, Stephan

    1996-12-18

    STATMON is an expert system that performs real-time fault detection and diagnosis of redundant sensors in any industrial process requiring high reliability. After a training period performed during normal operation, the expert system monitors the statistical properties of the incoming signals using a pattern recognition test. If the test determines that statistical properties of the signals have changed, the expert system performs a sequence of logical steps to determine which sensor or machine component has degraded.

  13. Statistical significance of climate sensitivity predictors obtained by data mining

    NASA Astrophysics Data System (ADS)

    Caldwell, Peter M.; Bretherton, Christopher S.; Zelinka, Mark D.; Klein, Stephen A.; Santer, Benjamin D.; Sanderson, Benjamin M.

    2014-03-01

    Several recent efforts to estimate Earth's equilibrium climate sensitivity (ECS) focus on identifying quantities in the current climate which are skillful predictors of ECS yet can be constrained by observations. This study automates the search for observable predictors using data from phase 5 of the Coupled Model Intercomparison Project. The primary focus of this paper is assessing statistical significance of the resulting predictive relationships. Failure to account for dependence between models, variables, locations, and seasons is shown to yield misleading results. A new technique for testing the field significance of data-mined correlations which avoids these problems is presented. Using this new approach, all 41,741 relationships we tested were found to be explainable by chance. This leads us to conclude that data mining is best used to identify potential relationships which are then validated or discarded using physically based hypothesis testing.

  14. Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance

    PubMed Central

    Kramer, Karen L.; Veile, Amanda; Otárola-Castillo, Erik

    2016-01-01

    Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children’s growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children’s monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children’s growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children’s growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children’s growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance. PMID:26938742

  15. Statistical significance across multiple optimization models for community partition

    NASA Astrophysics Data System (ADS)

    Li, Ju; Li, Hui-Jia; Mao, He-Jin; Chen, Junhua

    2016-05-01

    The study of community structure is an important problem in a wide range of applications, which can help us understand the real network system deeply. However, due to the existence of random factors and error edges in real networks, how to measure the significance of community structure efficiently is a crucial question. In this paper, we present a novel statistical framework computing the significance of community structure across multiple optimization methods. Different from the universal approaches, we calculate the similarity between a given node and its leader and employ the distribution of link tightness to derive the significance score, instead of a direct comparison to a randomized model. Based on the distribution of community tightness, a new “p-value” form significance measure is proposed for community structure analysis. Specially, the well-known approaches and their corresponding quality functions are unified to a novel general formulation, which facilitates in providing a detailed comparison across them. To determine the position of leaders and their corresponding followers, an efficient algorithm is proposed based on the spectral theory. Finally, we apply the significance analysis to some famous benchmark networks and the good performance verified the effectiveness and efficiency of our framework.

  16. American Vocational Education Research Association Members' Perceptions of Statistical Significance Tests and Other Statistical Controversies.

    ERIC Educational Resources Information Center

    Gordon, Howard R. D.

    A random sample of 113 members of the American Vocational Education Research Association (AVERA) was surveyed to obtain baseline information regarding AVERA members' perceptions of statistical significance tests. The Psychometrics Group Instrument was used to collect data from participants. Of those surveyed, 67% were male, 93% had earned a…

  17. Statistical Fault Detection & Diagnosis Expert System

    Energy Science and Technology Software Center (ESTSC)

    1996-12-18

    STATMON is an expert system that performs real-time fault detection and diagnosis of redundant sensors in any industrial process requiring high reliability. After a training period performed during normal operation, the expert system monitors the statistical properties of the incoming signals using a pattern recognition test. If the test determines that statistical properties of the signals have changed, the expert system performs a sequence of logical steps to determine which sensor or machine component hasmore » degraded.« less

  18. Statistical keyword detection in literary corpora

    NASA Astrophysics Data System (ADS)

    Herrera, J. P.; Pury, P. A.

    2008-05-01

    Understanding the complexity of human language requires an appropriate analysis of the statistical distribution of words in texts. We consider the information retrieval problem of detecting and ranking the relevant words of a text by means of statistical information referring to the spatial use of the words. Shannon's entropy of information is used as a tool for automatic keyword extraction. By using The Origin of Species by Charles Darwin as a representative text sample, we show the performance of our detector and compare it with another proposals in the literature. The random shuffled text receives special attention as a tool for calibrating the ranking indices.

  19. Statistical significance of task related deep brain EEG dynamic changes in the time-frequency domain.

    PubMed

    Chládek, J; Brázdil, M; Halámek, J; Plešinger, F; Jurák, P

    2013-01-01

    We present an off-line analysis procedure for exploring brain activity recorded from intra-cerebral electroencephalographic data (SEEG). The objective is to determine the statistical differences between different types of stimulations in the time-frequency domain. The procedure is based on computing relative signal power change and subsequent statistical analysis. An example of characteristic statistically significant event-related de/synchronization (ERD/ERS) detected across different frequency bands following different oddball stimuli is presented. The method is used for off-line functional classification of different brain areas. PMID:24109865

  20. Statistical controversies in clinical research: statistical significance-too much of a good thing ….

    PubMed

    Buyse, M; Hurvitz, S A; Andre, F; Jiang, Z; Burris, H A; Toi, M; Eiermann, W; Lindsay, M-A; Slamon, D

    2016-05-01

    The use and interpretation of P values is a matter of debate in applied research. We argue that P values are useful as a pragmatic guide to interpret the results of a clinical trial, not as a strict binary boundary that separates real treatment effects from lack thereof. We illustrate our point using the result of BOLERO-1, a randomized, double-blind trial evaluating the efficacy and safety of adding everolimus to trastuzumab and paclitaxel as first-line therapy for HER2+ advanced breast cancer. In this trial, the benefit of everolimus was seen only in the predefined subset of patients with hormone receptor-negative breast cancer at baseline (progression-free survival hazard ratio = 0.66, P = 0.0049). A strict interpretation of this finding, based on complex 'alpha splitting' rules to assess statistical significance, led to the conclusion that the benefit of everolimus was not statistically significant either overall or in the subset. We contend that this interpretation does not do justice to the data, and we argue that the benefit of everolimus in hormone receptor-negative breast cancer is both statistically compelling and clinically relevant. PMID:26861602

  1. Statistical fingerprinting for malware detection and classification

    SciTech Connect

    Prowell, Stacy J.; Rathgeb, Christopher T.

    2015-09-15

    A system detects malware in a computing architecture with an unknown pedigree. The system includes a first computing device having a known pedigree and operating free of malware. The first computing device executes a series of instrumented functions that, when executed, provide a statistical baseline that is representative of the time it takes the software application to run on a computing device having a known pedigree. A second computing device executes a second series of instrumented functions that, when executed, provides an actual time that is representative of the time the known software application runs on the second computing device. The system detects malware when there is a difference in execution times between the first and the second computing devices.

  2. Infants with Williams syndrome detect statistical regularities in continuous speech.

    PubMed

    Cashon, Cara H; Ha, Oh-Ryeong; Graf Estes, Katharine; Saffran, Jenny R; Mervis, Carolyn B

    2016-09-01

    Williams syndrome (WS) is a rare genetic disorder associated with delays in language and cognitive development. The reasons for the language delay are unknown. Statistical learning is a domain-general mechanism recruited for early language acquisition. In the present study, we investigated whether infants with WS were able to detect the statistical structure in continuous speech. Eighteen 8- to 20-month-olds with WS were familiarized with 2min of a continuous stream of synthesized nonsense words; the statistical structure of the speech was the only cue to word boundaries. They were tested on their ability to discriminate statistically-defined "words" and "part-words" (which crossed word boundaries) in the artificial language. Despite significant cognitive and language delays, infants with WS were able to detect the statistical regularities in the speech stream. These findings suggest that an inability to track the statistical properties of speech is unlikely to be the primary basis for the delays in the onset of language observed in infants with WS. These results provide the first evidence of statistical learning by infants with developmental delays. PMID:27299804

  3. Statistical detection of systematic election irregularities

    PubMed Central

    Klimek, Peter; Yegorov, Yuri; Hanel, Rudolf; Thurner, Stefan

    2012-01-01

    Democratic societies are built around the principle of free and fair elections, and that each citizen’s vote should count equally. National elections can be regarded as large-scale social experiments, where people are grouped into usually large numbers of electoral districts and vote according to their preferences. The large number of samples implies statistical consequences for the polling results, which can be used to identify election irregularities. Using a suitable data representation, we find that vote distributions of elections with alleged fraud show a kurtosis substantially exceeding the kurtosis of normal elections, depending on the level of data aggregation. As an example, we show that reported irregularities in recent Russian elections are, indeed, well-explained by systematic ballot stuffing. We develop a parametric model quantifying the extent to which fraudulent mechanisms are present. We formulate a parametric test detecting these statistical properties in election results. Remarkably, this technique produces robust outcomes with respect to the resolution of the data and therefore, allows for cross-country comparisons. PMID:23010929

  4. Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?

    NASA Astrophysics Data System (ADS)

    Vu, Minh Tue; Aribarg, Thannob; Supratid, Siriporn; Raghavan, Srivatsan V.; Liong, Shie-Yui

    2015-08-01

    Artificial neural network (ANN) is an established technique with a flexible mathematical structure that is capable of identifying complex nonlinear relationships between input and output data. The present study utilizes ANN as a method of statistically downscaling global climate models (GCMs) during the rainy season at meteorological site locations in Bangkok, Thailand. The study illustrates the applications of the feed forward back propagation using large-scale predictor variables derived from both the ERA-Interim reanalyses data and present day/future GCM data. The predictors are first selected over different grid boxes surrounding Bangkok region and then screened by using principal component analysis (PCA) to filter the best correlated predictors for ANN training. The reanalyses downscaled results of the present day climate show good agreement against station precipitation with a correlation coefficient of 0.8 and a Nash-Sutcliffe efficiency of 0.65. The final downscaled results for four GCMs show an increasing trend of precipitation for rainy season over Bangkok by the end of the twenty-first century. The extreme values of precipitation determined using statistical indices show strong increases of wetness. These findings will be useful for policy makers in pondering adaptation measures due to flooding such as whether the current drainage network system is sufficient to meet the changing climate and to plan for a range of related adaptation/mitigation measures.

  5. Timescales for detecting a significant acceleration in sea level rise

    NASA Astrophysics Data System (ADS)

    Haigh, Ivan D.; Wahl, Thomas; Rohling, Eelco J.; Price, René M.; Pattiaratchi, Charitha B.; Calafat, Francisco M.; Dangendorf, Sönke

    2014-04-01

    There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records.

  6. Timescales for detecting a significant acceleration in sea level rise.

    PubMed

    Haigh, Ivan D; Wahl, Thomas; Rohling, Eelco J; Price, René M; Pattiaratchi, Charitha B; Calafat, Francisco M; Dangendorf, Sönke

    2014-01-01

    There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records. PMID:24728012

  7. Timescales for detecting a significant acceleration in sea level rise

    PubMed Central

    Haigh, Ivan D.; Wahl, Thomas; Rohling, Eelco J.; Price, René M.; Pattiaratchi, Charitha B.; Calafat, Francisco M.; Dangendorf, Sönke

    2014-01-01

    There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records. PMID:24728012

  8. Assessing statistical significance in multivariable genome wide association analysis

    PubMed Central

    Buzdugan, Laura; Kalisch, Markus; Navarro, Arcadi; Schunk, Daniel; Fehr, Ernst; Bühlmann, Peter

    2016-01-01

    Motivation: Although Genome Wide Association Studies (GWAS) genotype a very large number of single nucleotide polymorphisms (SNPs), the data are often analyzed one SNP at a time. The low predictive power of single SNPs, coupled with the high significance threshold needed to correct for multiple testing, greatly decreases the power of GWAS. Results: We propose a procedure in which all the SNPs are analyzed in a multiple generalized linear model, and we show its use for extremely high-dimensional datasets. Our method yields P-values for assessing significance of single SNPs or groups of SNPs while controlling for all other SNPs and the family wise error rate (FWER). Thus, our method tests whether or not a SNP carries any additional information about the phenotype beyond that available by all the other SNPs. This rules out spurious correlations between phenotypes and SNPs that can arise from marginal methods because the ‘spuriously correlated’ SNP merely happens to be correlated with the ‘truly causal’ SNP. In addition, the method offers a data driven approach to identifying and refining groups of SNPs that jointly contain informative signals about the phenotype. We demonstrate the value of our method by applying it to the seven diseases analyzed by the Wellcome Trust Case Control Consortium (WTCCC). We show, in particular, that our method is also capable of finding significant SNPs that were not identified in the original WTCCC study, but were replicated in other independent studies. Availability and implementation: Reproducibility of our research is supported by the open-source Bioconductor package hierGWAS. Contact: peter.buehlmann@stat.math.ethz.ch Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27153677

  9. Damage detection in mechanical structures using extreme value statistic.

    SciTech Connect

    Worden, K.; Allen, D. W.; Sohn, H.; Farrar, C. R.

    2002-01-01

    The first and most important objective of any damage identification algorithms is to ascertain with confidence if damage is present or not. Many methods have been proposed for damage detection based on ideas of novelty detection founded in pattern recognition and multivariate statistics. The philosophy of novelty detection is simple. Features are first extracted from a baseline system to be monitored, and subsequent data are then compared to see if the new features are outliers, which significantly depart from the rest of population. In damage diagnosis problems, the assumption is that outliers are generated from a damaged condition of the monitored system. This damage classification necessitates the establishment of a decision boundary. Choosing this threshold value is often based on the assumption that the parent distribution of data is Gaussian in nature. While the problem of novelty detection focuses attention on the outlier or extreme values of the data i.e. those points in the tails of the distribution, the threshold selection using the normality assumption weighs the central population of data. Therefore, this normality assumption might impose potentially misleading behavior on damage classification, and is likely to lead the damage diagnosis astray. In this paper, extreme value statistics is integrated with the novelty detection to specifically model the tails of the distribution of interest. Finally, the proposed technique is demonstrated on simulated numerical data and time series data measured from an eight degree-of-freedom spring-mass system.

  10. Statistical Analysis of Examination to Detect Cheating.

    ERIC Educational Resources Information Center

    Code, Ronald P.

    1985-01-01

    A number of statistical procedures that were developed in 1983 at the University of Medicine and Dentistry of New Jersey-Rutgers Medical School to verify the suspicion that a student cheated during an examination are described. (MLW)

  11. Steganography forensics method for detecting least significant bit replacement attack

    NASA Astrophysics Data System (ADS)

    Wang, Xiaofeng; Wei, Chengcheng; Han, Xiao

    2015-01-01

    We present an image forensics method to detect least significant bit replacement steganography attack. The proposed method provides fine-grained forensics features by using the hierarchical structure that combines pixels correlation and bit-planes correlation. This is achieved via bit-plane decomposition and difference matrices between the least significant bit-plane and each one of the others. Generated forensics features provide the susceptibility (changeability) that will be drastically altered when the cover image is embedded with data to form a stego image. We developed a statistical model based on the forensics features and used least square support vector machine as a classifier to distinguish stego images from cover images. Experimental results show that the proposed method provides the following advantages. (1) The detection rate is noticeably higher than that of some existing methods. (2) It has the expected stability. (3) It is robust for content-preserving manipulations, such as JPEG compression, adding noise, filtering, etc. (4) The proposed method provides satisfactory generalization capability.

  12. Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

    NASA Astrophysics Data System (ADS)

    Williams, Arnold C.; Pachowicz, Peter W.

    2004-09-01

    Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.

  13. Understanding the Sampling Distribution and Its Use in Testing Statistical Significance.

    ERIC Educational Resources Information Center

    Breunig, Nancy A.

    Despite the increasing criticism of statistical significance testing by researchers, particularly in the publication of the 1994 American Psychological Association's style manual, statistical significance test results are still popular in journal articles. For this reason, it remains important to understand the logic of inferential statistics. A…

  14. Modeling single-molecule detection statistics

    NASA Astrophysics Data System (ADS)

    Enderlein, Joerg; Robbins, David L.; Ambrose, W. P.; Goodwin, Peter M.; Keller, Richard A.

    1997-05-01

    We present experimental results of single B-phycoerythrin molecule detection in a fluid flow at different sample introduction rates. A new mathematical approach is used for calculating the resulting burst size distributions. The calculations are based upon a complete physical model including absorption, fluorescence and photobleaching characteristics of the fluorophore; its diffusion; the sample stream hydrodynamics; the spatially dependent optical detection efficiency; and the excitation laser beam characteristics. Special attention is paid to the phenomenon of `molecular noise'--fluctuations in the number of overlapping crossings of molecules through the detection volume. The importance of this study and its connections to experimental applications are discussed.

  15. Outcomes of pharmacological management of nocturia with non-antidiuretic agents: does statistically significant equal clinically significant?

    PubMed

    Smith, Ariana L; Wein, Alan J

    2011-05-01

    To evaluate the statistical and clinical efficacy of the pharmacological treatments of nocturia using non-antidiuretic agents. A literature review of treatments of nocturia specifically addressing the impact of alpha blockers, 5-alpha reductase inhibitors (5ARI) and antimuscarinics on reduction in nocturnal voids. Despite commonly reported statistically significant results, nocturia has shown a poor clinical response to traditional therapies for benign prostatic hyperplasia including alpha blockers and 5ARI. Similarly, nocturia has shown a poor clinical response to traditional therapies for overactive bladder including antimuscarinics. Statistical success has been achieved in some groups with a variety of alpha blockers and antimuscarinic agents, but the clinical significance of these changes is doubtful. It is likely that other types of therapy will need to be employed in order to achieve a clinically significant reduction in nocturia. PMID:21518417

  16. Dark census: Statistically detecting the satellite populations of distant galaxies

    NASA Astrophysics Data System (ADS)

    Cyr-Racine, Francis-Yan; Moustakas, Leonidas A.; Keeton, Charles R.; Sigurdson, Kris; Gilman, Daniel A.

    2016-08-01

    In the standard structure formation scenario based on the cold dark matter paradigm, galactic halos are predicted to contain a large population of dark matter subhalos. While the most massive members of the subhalo population can appear as luminous satellites and be detected in optical surveys, establishing the existence of the low mass and mostly dark subhalos has proven to be a daunting task. Galaxy-scale strong gravitational lenses have been successfully used to study mass substructures lying close to lensed images of bright background sources. However, in typical galaxy-scale lenses, the strong lensing region only covers a small projected area of the lens's dark matter halo, implying that the vast majority of subhalos cannot be directly detected in lensing observations. In this paper, we point out that this large population of dark satellites can collectively affect gravitational lensing observables, hence possibly allowing their statistical detection. Focusing on the region of the galactic halo outside the strong lensing area, we compute from first principles the statistical properties of perturbations to the gravitational time delay and position of lensed images in the presence of a mass substructure population. We find that in the standard cosmological scenario, the statistics of these lensing observables are well approximated by Gaussian distributions. The formalism developed as part of this calculation is very general and can be applied to any halo geometry and choice of subhalo mass function. Our results significantly reduce the computational cost of including a large substructure population in lens models and enable the use of Bayesian inference techniques to detect and characterize the distributed satellite population of distant lens galaxies.

  17. Statistics and Machine Learning based Outlier Detection Techniques for Exoplanets

    NASA Astrophysics Data System (ADS)

    Goel, Amit; Montgomery, Michele

    2015-08-01

    Architectures of planetary systems are observable snapshots in time that can indicate formation and dynamic evolution of planets. The observable key parameters that we consider are planetary mass and orbital period. If planet masses are significantly less than their host star masses, then Keplerian Motion is defined as P^2 = a^3 where P is the orbital period in units of years and a is the orbital period in units of Astronomical Units (AU). Keplerian motion works on small scales such as the size of the Solar System but not on large scales such as the size of the Milky Way Galaxy. In this work, for confirmed exoplanets of known stellar mass, planetary mass, orbital period, and stellar age, we analyze Keplerian motion of systems based on stellar age to seek if Keplerian motion has an age dependency and to identify outliers. For detecting outliers, we apply several techniques based on statistical and machine learning methods such as probabilistic, linear, and proximity based models. In probabilistic and statistical models of outliers, the parameters of a closed form probability distributions are learned in order to detect the outliers. Linear models use regression analysis based techniques for detecting outliers. Proximity based models use distance based algorithms such as k-nearest neighbour, clustering algorithms such as k-means, or density based algorithms such as kernel density estimation. In this work, we will use unsupervised learning algorithms with only the proximity based models. In addition, we explore the relative strengths and weaknesses of the various techniques by validating the outliers. The validation criteria for the outliers is if the ratio of planetary mass to stellar mass is less than 0.001. In this work, we present our statistical analysis of the outliers thus detected.

  18. Uses and Abuses of Statistical Significance Tests and Other Statistical Resources: A Comparative Study

    ERIC Educational Resources Information Center

    Monterde-i-Bort, Hector; Frias-Navarro, Dolores; Pascual-Llobell, Juan

    2010-01-01

    The empirical study we present here deals with a pedagogical issue that has not been thoroughly explored up until now in our field. Previous empirical studies in other sectors have identified the opinions of researchers about this topic, showing that completely unacceptable interpretations have been made of significance tests and other statistical…

  19. A decision surface-based taxonomy of detection statistics

    NASA Astrophysics Data System (ADS)

    Bouffard, François

    2012-09-01

    Current and past literature on the topic of detection statistics - in particular those used in hyperspectral target detection - can be intimidating for newcomers, especially given the huge number of detection tests described in the literature. Detection tests for hyperspectral measurements, such as those generated by dispersive or Fourier transform spectrometers used in remote sensing of atmospheric contaminants, are of paramount importance if any level of analysis automation is to be achieved. The detection statistics used in hyperspectral target detection are generally borrowed and adapted from other fields such as radar signal processing or acoustics. Consequently, although remarkable efforts have been made to clarify and categorize the vast number of available detection tests, understanding their differences, similarities, limits and other intricacies is still an exacting journey. Reasons for this state of affairs include heterogeneous nomenclature and mathematical notation, probably due to the multiple origins of hyperspectral target detection formalisms. Attempts at sorting out detection statistics using ambiguously defined properties may also cause more harm than good. Ultimately, a detection statistic is entirely characterized by its decision boundary. Thus, we propose to catalogue detection statistics according to the shape of their decision surfaces, which greatly simplifies this taxonomy exercise. We make a distinction between the topology resulting from the mathematical formulation of the statistic and mere parameters that adjust the boundary's precise shape, position and orientation. Using this simple approach, similarities between various common detection statistics are found, limit cases are reduced to simpler statistics, and a general understanding of the available detection tests and their properties becomes much easier to achieve.

  20. Detection of bearing damage by statistic vibration analysis

    NASA Astrophysics Data System (ADS)

    Sikora, E. A.

    2016-04-01

    The condition of bearings, which are essential components in mechanisms, is crucial to safety. The analysis of the bearing vibration signal, which is always contaminated by certain types of noise, is a very important standard for mechanical condition diagnosis of the bearing and mechanical failure phenomenon. In this paper the method of rolling bearing fault detection by statistical analysis of vibration is proposed to filter out Gaussian noise contained in a raw vibration signal. The results of experiments show that the vibration signal can be significantly enhanced by application of the proposed method. Besides, the proposed method is used to analyse real acoustic signals of a bearing with inner race and outer race faults, respectively. The values of attributes are determined according to the degree of the fault. The results confirm that the periods between the transients, which represent bearing fault characteristics, can be successfully detected.

  1. On the statistical significance of the bulk flow measured by the Planck satellite

    NASA Astrophysics Data System (ADS)

    Atrio-Barandela, F.

    2013-09-01

    A recent analysis of data collected by the Planck satellite detected a net dipole at the location of X-ray selected galaxy clusters, corresponding to a large-scale bulk flow extending at least to z ~ 0.18, the median redshift of the cluster sample. The amplitude of this flow, as measured with Planck, is consistent with earlier findings based on data from the Wilkinson Microwave Anisotropy Probe (WMAP). However, the uncertainty assigned to the dipole by the Planck team is much larger than that found in the WMAP studies, leading the authors of the Planck study to conclude that the observed bulk flow is not statistically significant. Here, we show that two of the three implementations of random sampling used in the error analysis of the Planck study lead to systematic overestimates in the uncertainty of the measured dipole. Random simulations of the sky do not take into account that the actual realization of the sky leads to filtered data that have a 12% lower root-mean-square dispersion than the average simulation. Using rotations around the Galactic pole (the Z axis), increases the uncertainty of the X and Y components of the dipole and artificially reduces the significance of the dipole detection from 98-99% to less than 90% confidence. When either effect is taken into account, the corrected errors agree with those obtained using random distributions of clusters on Planck data, and the resulting statistical significance of the dipole measured by Planck is consistent with that of the WMAP results.

  2. Statistical Detection of Atypical Aircraft Flights

    NASA Technical Reports Server (NTRS)

    Statler, Irving; Chidester, Thomas; Shafto, Michael; Ferryman, Thomas; Amidan, Brett; Whitney, Paul; White, Amanda; Willse, Alan; Cooley, Scott; Jay, Joseph; Rosenthal, Loren; Swickard, Andrea; Bates, Derrick; Scherrer, Chad; Webb, Bobbie-Jo; Lawrence, Robert; Mosbrucker, Chris; Prothero, Gary; Andrei, Adi; Romanowski, Tim; Robin, Daniel; Prothero, Jason; Lynch, Robert; Lowe, Michael

    2006-01-01

    A computational method and software to implement the method have been developed to sift through vast quantities of digital flight data to alert human analysts to aircraft flights that are statistically atypical in ways that signify that safety may be adversely affected. On a typical day, there are tens of thousands of flights in the United States and several times that number throughout the world. Depending on the specific aircraft design, the volume of data collected by sensors and flight recorders can range from a few dozen to several thousand parameters per second during a flight. Whereas these data have long been utilized in investigating crashes, the present method is oriented toward helping to prevent crashes by enabling routine monitoring of flight operations to identify portions of flights that may be of interest with respect to safety issues.

  3. Using the bootstrap to establish statistical significance for relative validity comparisons among patient-reported outcome measures

    PubMed Central

    2013-01-01

    Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463

  4. Identification of Statistically Significant Differences between Standard Scores on the Woodcock Reading Mastery Tests.

    ERIC Educational Resources Information Center

    Simpson, Robert G.

    1981-01-01

    Occasionally, differences in test scores seem to indicate that a student performs much better in one reading area than in another when, in reality, the differences may not be statistically significant. The author presents a table in which statistically significant differences between Woodcock test standard scores are identified. (Author)

  5. "What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

    ERIC Educational Resources Information Center

    Ozturk, Elif

    2012-01-01

    The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…

  6. Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform

    ERIC Educational Resources Information Center

    Norris, John M.

    2015-01-01

    Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…

  7. A Review of Post-1994 Literature on Whether Statistical Significance Tests Should Be Banned.

    ERIC Educational Resources Information Center

    Sullivan, Jeremy R.

    This paper summarizes the literature regarding statistical significance testing with an emphasis on: (1) the post-1994 literature in various disciplines; (2) alternatives to statistical significance testing; and (3) literature exploring why researchers have demonstrably failed to be influenced by the 1994 American Psychological Association…

  8. The Importance of Invariance Procedures as against Tests of Statistical Significance.

    ERIC Educational Resources Information Center

    Fish, Larry

    A growing controversy surrounds the strict interpretation of statistical significance tests in social research. Statistical significance tests fail in particular to provide estimates for the stability of research results. Methods that do provide such estimates are known as invariance or cross-validation procedures. Invariance analysis is largely…

  9. Performance evaluation of hydrological models: Statistical significance for reducing subjectivity in goodness-of-fit assessments

    NASA Astrophysics Data System (ADS)

    Ritter, Axel; Muñoz-Carpena, Rafael

    2013-02-01

    SummarySuccess in the use of computer models for simulating environmental variables and processes requires objective model calibration and verification procedures. Several methods for quantifying the goodness-of-fit of observations against model-calculated values have been proposed but none of them is free of limitations and are often ambiguous. When a single indicator is used it may lead to incorrect verification of the model. Instead, a combination of graphical results, absolute value error statistics (i.e. root mean square error), and normalized goodness-of-fit statistics (i.e. Nash-Sutcliffe Efficiency coefficient, NSE) is currently recommended. Interpretation of NSE values is often subjective, and may be biased by the magnitude and number of data points, data outliers and repeated data. The statistical significance of the performance statistics is an aspect generally ignored that helps in reducing subjectivity in the proper interpretation of the model performance. In this work, approximated probability distributions for two common indicators (NSE and root mean square error) are derived with bootstrapping (block bootstrapping when dealing with time series), followed by bias corrected and accelerated calculation of confidence intervals. Hypothesis testing of the indicators exceeding threshold values is proposed in a unified framework for statistically accepting or rejecting the model performance. It is illustrated how model performance is not linearly related with NSE, which is critical for its proper interpretation. Additionally, the sensitivity of the indicators to model bias, outliers and repeated data is evaluated. The potential of the difference between root mean square error and mean absolute error for detecting outliers is explored, showing that this may be considered a necessary but not a sufficient condition of outlier presence. The usefulness of the approach for the evaluation of model performance is illustrated with case studies including those with

  10. Why Are People Bad at Detecting Randomness? A Statistical Argument

    ERIC Educational Resources Information Center

    Williams, Joseph J.; Griffiths, Thomas L.

    2013-01-01

    Errors in detecting randomness are often explained in terms of biases and misconceptions. We propose and provide evidence for an account that characterizes the contribution of the inherent statistical difficulty of the task. Our account is based on a Bayesian statistical analysis, focusing on the fact that a random process is a special case of…

  11. Statistically significant faunal differences among Middle Ordovician age, Chickamauga Group bryozoan bioherms, central Alabama

    SciTech Connect

    Crow, C.J.

    1985-01-01

    Middle Ordovician age Chickamauga Group carbonates crop out along the Birmingham and Murphrees Valley anticlines in central Alabama. The macrofossil contents on exposed surfaces of seven bioherms have been counted to determine their various paleontologic characteristics. Twelve groups of organisms are present in these bioherms. Dominant organisms include bryozoans, algae, brachiopods, sponges, pelmatozoans, stromatoporoids and corals. Minor accessory fauna include predators, scavengers and grazers such as gastropods, ostracods, trilobites, cephalopods and pelecypods. Vertical and horizontal niche zonation has been detected for some of the bioherm dwelling fauna. No one bioherm of those studied exhibits all 12 groups of organisms; rather, individual bioherms display various subsets of the total diversity. Statistical treatment (G-test) of the diversity data indicates a lack of statistical homogeneity of the bioherms, both within and between localities. Between-locality population heterogeneity can be ascribed to differences in biologic responses to such gross environmental factors as water depth and clarity, and energy levels. At any one locality, gross aspects of the paleoenvironments are assumed to have been more uniform. Significant differences among bioherms at any one locality may have resulted from patchy distribution of species populations, differential preservation and other factors.

  12. Proteny: discovering and visualizing statistically significant syntenic clusters at the proteome level

    PubMed Central

    Gehrmann, Thies; Reinders, Marcel J.T.

    2015-01-01

    Background: With more and more genomes being sequenced, detecting synteny between genomes becomes more and more important. However, for microorganisms the genomic divergence quickly becomes large, resulting in different codon usage and shuffling of gene order and gene elements such as exons. Results: We present Proteny, a methodology to detect synteny between diverged genomes. It operates on the amino acid sequence level to be insensitive to codon usage adaptations and clusters groups of exons disregarding order to handle diversity in genomic ordering between genomes. Furthermore, Proteny assigns significance levels to the syntenic clusters such that they can be selected on statistical grounds. Finally, Proteny provides novel ways to visualize results at different scales, facilitating the exploration and interpretation of syntenic regions. We test the performance of Proteny on a standard ground truth dataset, and we illustrate the use of Proteny on two closely related genomes (two different strains of Aspergillus niger) and on two distant genomes (two species of Basidiomycota). In comparison to other tools, we find that Proteny finds clusters with more true homologies in fewer clusters that contain more genes, i.e. Proteny is able to identify a more consistent synteny. Further, we show how genome rearrangements, assembly errors, gene duplications and the conservation of specific genes can be easily studied with Proteny. Availability and implementation: Proteny is freely available at the Delft Bioinformatics Lab website http://bioinformatics.tudelft.nl/dbl/software. Contact: t.gehrmann@tudelft.nl Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26116928

  13. Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

    NASA Technical Reports Server (NTRS)

    Xu, Kuan-Man

    2006-01-01

    A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.

  14. A Network-Based Method to Assess the Statistical Significance of Mild Co-Regulation Effects

    PubMed Central

    Horvát, Emőke-Ágnes; Zhang, Jitao David; Uhlmann, Stefan; Sahin, Özgür; Zweig, Katharina Anna

    2013-01-01

    Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis. PMID:24039936

  15. Brazilian Amazonia Deforestation Detection Using Spatio-Temporal Scan Statistics

    NASA Astrophysics Data System (ADS)

    Vieira, C. A. O.; Santos, N. T.; Carneiro, A. P. S.; Balieiro, A. A. S.

    2012-07-01

    The spatio-temporal models, developed for analyses of diseases, can also be used for others fields of study, including concerns about forest and deforestation. The aim of this paper is to quantitatively check priority areas in order to combat deforestation on the Amazon forest, using the space-time scan statistic. The study area location is at the south of the Amazonas State and cover around 297.183 kilometre squares, including the municipality of Boca do Acre, Labrea, Canutama, Humaita, Manicore, Novo Aripuana e Apui County on the north region of Brazil. This area has showed a significant change for land cover, which has increased the number of deforestation's alerts. Therefore this situation becomes a concern and gets more investigation, trying to stop factors that increase the number of cases in the area. The methodology includes the location and year that deforestation's alert occurred. These deforestation's alerts are mapped by the DETER (Detection System of Deforestation in Real Time in Amazonia), which is carry out by the Brazilian Space Agency (INPE). The software SatScanTM v7.0 was used in order to define space-time permutation scan statistic for detection of deforestation cases. The outcome of this experiment shows an efficient model to detect space-time clusters of deforestation's alerts. The model was efficient to detect the location, the size, the order and characteristics about activities at the end of the experiments. Two clusters were considered actives and kept actives up to the end of the study. These clusters are located in Canutama and Lábrea County. This quantitative spatial modelling of deforestation warnings allowed: firstly, identifying actives clustering of deforestation, in which the environment government official are able to concentrate their actions; secondly, identifying historic clustering of deforestation, in which the environment government official are able to monitoring in order to avoid them to became actives again; and finally

  16. The Utility of Statistical Significance Testing in Psychological and Educational Research: A Review of Recent Literature and Proposed Alternatives.

    ERIC Educational Resources Information Center

    Sullivan, Jeremy R.

    2001-01-01

    Summarizes the post-1994 literature in psychology and education regarding statistical significance testing, emphasizing limitations and defenses of statistical testing and alternatives or supplements to statistical significance testing. (SLD)

  17. Clinical relevance vs. statistical significance: Using neck outcomes in patients with temporomandibular disorders as an example.

    PubMed

    Armijo-Olivo, Susan; Warren, Sharon; Fuentes, Jorge; Magee, David J

    2011-12-01

    Statistical significance has been used extensively to evaluate the results of research studies. Nevertheless, it offers only limited information to clinicians. The assessment of clinical relevance can facilitate the interpretation of the research results into clinical practice. The objective of this study was to explore different methods to evaluate the clinical relevance of the results using a cross-sectional study as an example comparing different neck outcomes between subjects with temporomandibular disorders and healthy controls. Subjects were compared for head and cervical posture, maximal cervical muscle strength, endurance of the cervical flexor and extensor muscles, and electromyographic activity of the cervical flexor muscles during the CranioCervical Flexion Test (CCFT). The evaluation of clinical relevance of the results was performed based on the effect size (ES), minimal important difference (MID), and clinical judgement. The results of this study show that it is possible to have statistical significance without having clinical relevance, to have both statistical significance and clinical relevance, to have clinical relevance without having statistical significance, or to have neither statistical significance nor clinical relevance. The evaluation of clinical relevance in clinical research is crucial to simplify the transfer of knowledge from research into practice. Clinical researchers should present the clinical relevance of their results. PMID:21658987

  18. Statistical detection of EEG synchrony using empirical bayesian inference.

    PubMed

    Singh, Archana K; Asoh, Hideki; Takeda, Yuji; Phillips, Steven

    2015-01-01

    There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries. PMID:25822617

  19. Statistical Detection of EEG Synchrony Using Empirical Bayesian Inference

    PubMed Central

    Singh, Archana K.; Asoh, Hideki; Takeda, Yuji; Phillips, Steven

    2015-01-01

    There is growing interest in understanding how the brain utilizes synchronized oscillatory activity to integrate information across functionally connected regions. Computing phase-locking values (PLV) between EEG signals is a popular method for quantifying such synchronizations and elucidating their role in cognitive tasks. However, high-dimensionality in PLV data incurs a serious multiple testing problem. Standard multiple testing methods in neuroimaging research (e.g., false discovery rate, FDR) suffer severe loss of power, because they fail to exploit complex dependence structure between hypotheses that vary in spectral, temporal and spatial dimension. Previously, we showed that a hierarchical FDR and optimal discovery procedures could be effectively applied for PLV analysis to provide better power than FDR. In this article, we revisit the multiple comparison problem from a new Empirical Bayes perspective and propose the application of the local FDR method (locFDR; Efron, 2001) for PLV synchrony analysis to compute FDR as a posterior probability that an observed statistic belongs to a null hypothesis. We demonstrate the application of Efron's Empirical Bayes approach for PLV synchrony analysis for the first time. We use simulations to validate the specificity and sensitivity of locFDR and a real EEG dataset from a visual search study for experimental validation. We also compare locFDR with hierarchical FDR and optimal discovery procedures in both simulation and experimental analyses. Our simulation results showed that the locFDR can effectively control false positives without compromising on the power of PLV synchrony inference. Our results from the application locFDR on experiment data detected more significant discoveries than our previously proposed methods whereas the standard FDR method failed to detect any significant discoveries. PMID:25822617

  20. Algorithm for Detecting Significant Locations from Raw GPS Data

    NASA Astrophysics Data System (ADS)

    Kami, Nobuharu; Enomoto, Nobuyuki; Baba, Teruyuki; Yoshikawa, Takashi

    We present a fast algorithm for probabilistically extracting significant locations from raw GPS data based on data point density. Extracting significant locations from raw GPS data is the first essential step of algorithms designed for location-aware applications. Assuming that a location is significant if users spend a certain time around that area, most current algorithms compare spatial/temporal variables, such as stay duration and a roaming diameter, with given fixed thresholds to extract significant locations. However, the appropriate threshold values are not clearly known in priori and algorithms with fixed thresholds are inherently error-prone, especially under high noise levels. Moreover, for N data points, they are generally O(N 2) algorithms since distance computation is required. We developed a fast algorithm for selective data point sampling around significant locations based on density information by constructing random histograms using locality sensitive hashing. Evaluations show competitive performance in detecting significant locations even under high noise levels.

  1. Inferential Conditions in the Statistical Detection of Measurement Bias.

    ERIC Educational Resources Information Center

    Millsap, Roger E.; Meredith, William

    1992-01-01

    Inferential conditions in the statistical detection of measurement bias are discussed in the contexts of differential item functioning and predictive bias in educational and employment settings. It is concluded that bias measures that rely strictly on observed measures are not generally diagnostic of measurement bias or lack of bias. (SLD)

  2. A Computer Program for Detection of Statistical Outliers

    ERIC Educational Resources Information Center

    Pascale, Pietro J.; Lovas, Charles M.

    1976-01-01

    Presents a Fortran program which computes the rejection criteria of ten procedures for detecting outlying observations. These criteria are defined on comment cards. Journal sources for the statistical equations are listed. After applying rejection rules, the program calculates the mean and standard deviation of the censored sample. (Author/RC)

  3. Impact of criticism of null-hypothesis significance testing on statistical reporting practices in conservation biology.

    PubMed

    Fidler, Fiona; Burgman, Mark A; Cumming, Geoff; Buttrose, Robert; Thomason, Neil

    2006-10-01

    Over the last decade, criticisms of null-hypothesis significance testing have grown dramatically, and several alternative practices, such as confidence intervals, information theoretic, and Bayesian methods, have been advocated. Have these calls for change had an impact on the statistical reporting practices in conservation biology? In 2000 and 2001, 92% of sampled articles in Conservation Biology and Biological Conservation reported results of null-hypothesis tests. In 2005 this figure dropped to 78%. There were corresponding increases in the use of confidence intervals, information theoretic, and Bayesian techniques. Of those articles reporting null-hypothesis testing--which still easily constitute the majority--very few report statistical power (8%) and many misinterpret statistical nonsignificance as evidence for no effect (63%). Overall, results of our survey show some improvements in statistical practice, but further efforts are clearly required to move the discipline toward improved practices. PMID:17002771

  4. Statistical methods for detecting periodic fragments in DNA sequence data

    PubMed Central

    2011-01-01

    Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the

  5. Statistical analysis of spectral data for vegetation detection

    NASA Astrophysics Data System (ADS)

    Love, Rafael; Cathcart, J. Michael

    2006-05-01

    Identification and reduction of false alarms provide a critical component in the detection of landmines. Research at Georgia Tech over the past several years has focused on this problem through an examination of the signature characteristics of various background materials. These efforts seek to understand the physical basis and features of these signatures as an aid to the development of false target identification techniques. The investigation presented in this paper deal concentrated on the detection of foliage in long wave infrared imagery. Data collected by a hyperspectral long-wave infrared sensor provided the background signatures used in this study. These studies focused on an analysis of the statistical characteristics of both the intensity signature and derived emissivity data. Results from these studies indicate foliage signatures possess unique characteristics that can be exploited to enable detection of vegetation in LWIR images. This paper will present review of the approach and results of the statistical analysis.

  6. Effect size, confidence interval and statistical significance: a practical guide for biologists.

    PubMed

    Nakagawa, Shinichi; Cuthill, Innes C

    2007-11-01

    Null hypothesis significance testing (NHST) is the dominant statistical approach in biology, although it has many, frequently unappreciated, problems. Most importantly, NHST does not provide us with two crucial pieces of information: (1) the magnitude of an effect of interest, and (2) the precision of the estimate of the magnitude of that effect. All biologists should be ultimately interested in biological importance, which may be assessed using the magnitude of an effect, but not its statistical significance. Therefore, we advocate presentation of measures of the magnitude of effects (i.e. effect size statistics) and their confidence intervals (CIs) in all biological journals. Combined use of an effect size and its CIs enables one to assess the relationships within data more effectively than the use of p values, regardless of statistical significance. In addition, routine presentation of effect sizes will encourage researchers to view their results in the context of previous research and facilitate the incorporation of results into future meta-analysis, which has been increasingly used as the standard method of quantitative review in biology. In this article, we extensively discuss two dimensionless (and thus standardised) classes of effect size statistics: d statistics (standardised mean difference) and r statistics (correlation coefficient), because these can be calculated from almost all study designs and also because their calculations are essential for meta-analysis. However, our focus on these standardised effect size statistics does not mean unstandardised effect size statistics (e.g. mean difference and regression coefficient) are less important. We provide potential solutions for four main technical problems researchers may encounter when calculating effect size and CIs: (1) when covariates exist, (2) when bias in estimating effect size is possible, (3) when data have non-normal error structure and/or variances, and (4) when data are non

  7. Statistical Significance of Long-Range `Optimal Climate Normal' Temperature and Precipitation Forecasts.

    NASA Astrophysics Data System (ADS)

    Wilks, Daniel S.

    1996-04-01

    A simple approach to long-range forecasting of monthly or seasonal quantities is as the average of observations over some number of the most recent years. Finding this `optimal climate normal' (OCN) involves examining the relationships between the observed variable and averages of its values over the previous one to 30 years and selecting the averaging period yielding the best results. This procedure involves a multiplicity of comparisons, which will lead to misleadingly positive results for developments data. The statistical significance of these OCNs are assessed here using a resampling procedure, in which time series of U.S. Climate Division data are repeatedly shuffled to produce statistical distributions of forecast performance measures, under the null hypothesis that the OCNs exhibit no predictive skill. Substantial areas in the United States are found for which forecast performance appears to be significantly better than would occur by chance.Another complication in the assessment of the statistical significance of the OCNs derives from the spatial correlation exhibited by the data. Because of this correlation, instances of Type I errors (false rejections of local null hypotheses) will tend to occur with spatial coherency and accordingly have the potential to be confused with regions for which there may be real predictability. The `field significance' of the collections of local tests is also assessed here by simultaneously and coherently shuffling the time series for the Climate Divisions. Areas exhibiting significant local tests are large enough to conclude that seasonal OCN temperature forecasts exhibit significant skill over parts of the United States for all seasons except SON, OND, and NDJ, and that seasonal OCN precipitation forecasts are significantly skillful only in the fall. Statistical significance is weaker for monthly than for seasonal OCN temperature forecasts, and the monthly OCN precipitation forecasts do not exhibit significant predictive

  8. Statistical Significance of the Trends in Monthly Heavy Precipitation Over the US

    SciTech Connect

    Mahajan, Salil; North, Dr. Gerald R.; Saravanan, Dr. R.; Genton, Dr. Marc G.

    2012-01-01

    Trends in monthly heavy precipitation, defined by a return period of one year, are assessed for statistical significance in observations and Global Climate Model (GCM) simulations over the contiguous United States using Monte Carlo non-parametric and parametric bootstrapping techniques. The results from the two Monte Carlo approaches are found to be similar to each other, and also to the traditional non-parametric Kendall's {tau} test, implying the robustness of the approach. Two different observational data-sets are employed to test for trends in monthly heavy precipitation and are found to exhibit consistent results. Both data-sets demonstrate upward trends, one of which is found to be statistically significant at the 95% confidence level. Upward trends similar to observations are observed in some climate model simulations of the twentieth century, but their statistical significance is marginal. For projections of the twenty-first century, a statistically significant upwards trend is observed in most of the climate models analyzed. The change in the simulated precipitation variance appears to be more important in the twenty-first century projections than changes in the mean precipitation. Stochastic fluctuations of the climate-system are found to be dominate monthly heavy precipitation as some GCM simulations show a downwards trend even in the twenty-first century projections when the greenhouse gas forcings are strong.

  9. Evaluating Statistical Significance Using Corrected and Uncorrected Magnitude of Effect Size Estimates.

    ERIC Educational Resources Information Center

    Snyder, Patricia; Lawson, Stephen

    Magnitude of effect measures (MEMs), when adequately understood and correctly used, are important aids for researchers who do not want to rely solely on tests of statistical significance in substantive result interpretation. The MEM tells how much of the dependent variable can be controlled, predicted, or explained by the independent variables.…

  10. Using the Descriptive Bootstrap to Evaluate Result Replicability (Because Statistical Significance Doesn't)

    ERIC Educational Resources Information Center

    Spinella, Sarah

    2011-01-01

    As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…

  11. Alphas and Asterisks: The Development of Statistical Significance Testing Standards in Sociology

    ERIC Educational Resources Information Center

    Leahey, Erin

    2005-01-01

    In this paper, I trace the development of statistical significance testing standards in sociology by analyzing data from articles published in two prestigious sociology journals between 1935 and 2000. I focus on the role of two key elements in the diffusion literature, contagion and rationality, as well as the role of institutional factors. I…

  12. Weighing the costs of different errors when determining statistical significant during monitoring

    Technology Transfer Automated Retrieval System (TEKTRAN)

    Selecting appropriate significance levels when constructing confidence intervals and performing statistical analyses with rangeland monitoring data is not a straightforward process. This process is burdened by the conventional selection of “95% confidence” (i.e., Type I error rate, a =0.05) as the d...

  13. Statistical Significance, Effect Size, and Replication: What Do the Journals Say?

    ERIC Educational Resources Information Center

    DeVaney, Thomas A.

    2001-01-01

    Studied the attitudes of representatives of journals in education, sociology, and psychology through an electronic survey completed by 194 journal representatives. Results suggest that the majority of journals do not have written policies concerning the reporting of results from statistical significance testing, and most indicated that statistical…

  14. Statistical Significance of the Contribution of Variables to the PCA Solution: An Alternative Permutation Strategy

    ERIC Educational Resources Information Center

    Linting, Marielle; van Os, Bart Jan; Meulman, Jacqueline J.

    2011-01-01

    In this paper, the statistical significance of the contribution of variables to the principal components in principal components analysis (PCA) is assessed nonparametrically by the use of permutation tests. We compare a new strategy to a strategy used in previous research consisting of permuting the columns (variables) of a data matrix…

  15. Recent Literature on Whether Statistical Significance Tests Should or Should Not Be Banned.

    ERIC Educational Resources Information Center

    Deegear, James

    This paper summarizes the literature regarding statistical significant testing with an emphasis on recent literature in various discipline and literature exploring why researchers have demonstrably failed to be influenced by the American Psychological Association publication manual's encouragement to report effect sizes. Also considered are…

  16. Interpreting Statistical Significance Test Results: A Proposed New "What If" Method.

    ERIC Educational Resources Information Center

    Kieffer, Kevin M.; Thompson, Bruce

    As the 1994 publication manual of the American Psychological Association emphasized, "p" values are affected by sample size. As a result, it can be helpful to interpret the results of statistical significant tests in a sample size context by conducting so-called "what if" analyses. However, these methods can be inaccurate unless "corrected" effect…

  17. Statistical tests for detecting movements in repeatedly measured geodetic networks

    NASA Astrophysics Data System (ADS)

    Niemeier, W.

    1981-01-01

    Geodetic networks with two or more measuring epochs can be found rather frequently, for example in connection with the investigation of recent crustal movements, in the field of monitoring problems in engineering surveying or in ordinary control networks. For these repeatedly measured networks the so-called congruency problem has to be solved, i.e. possible changes in the geometry of the net have to be found. In practice distortions of bench marks and an extension or densification of the net (differences in the 1st-order design) and/or changes in the measuring elements or techniques (differences in the 2nd-order design) can frequently be found between different epochs. In this paper a rigorous mathematical procedure is presented for this congruency analysis of multiple measured networks, taking into account these above-mentioned differences in the network design. As a first step, statistical tests are carried out to detect the epochs with departures from congruency. As a second step the individual points with significant movements within these critical epochs can be identified. A numerical example for the analysis of a monitoring network with 9 epochs is given.

  18. Statistically significant deviations from additivity: What do they mean in assessing toxicity of mixtures?

    PubMed

    Liu, Yang; Vijver, Martina G; Qiu, Hao; Baas, Jan; Peijnenburg, Willie J G M

    2015-12-01

    There is increasing attention from scientists and policy makers to the joint effects of multiple metals on organisms when present in a mixture. Using root elongation of lettuce (Lactuca sativa L.) as a toxicity endpoint, the combined effects of binary mixtures of Cu, Cd, and Ni were studied. The statistical MixTox model was used to search deviations from the reference models i.e. concentration addition (CA) and independent action (IA). The deviations were subsequently interpreted as 'interactions'. A comprehensive experiment was designed to test the reproducibility of the 'interactions'. The results showed that the toxicity of binary metal mixtures was equally well predicted by both reference models. We found statistically significant 'interactions' in four of the five total datasets. However, the patterns of 'interactions' were found to be inconsistent or even contradictory across the different independent experiments. It is recommended that a statistically significant 'interaction', must be treated with care and is not necessarily biologically relevant. Searching a statistically significant interaction can be the starting point for further measurements and modeling to advance the understanding of underlying mechanisms and non-additive interactions occurring inside the organisms. PMID:26188643

  19. The Effects of Electrode Impedance on Data Quality and Statistical Significance in ERP Recordings

    PubMed Central

    Kappenman, Emily S.; Luck, Steven J.

    2010-01-01

    To determine whether data quality is meaningfully reduced by high electrode impedance, EEG was recorded simultaneously from low- and high-impedance electrode sites during an oddball task. Low-frequency noise was found to be increased at high-impedance sites relative to low-impedance sites, especially when the recording environment was warm and humid. The increased noise at the high-impedance sites caused an increase in the number of trials needed to obtain statistical significance in analyses of P3 amplitude, but this could be partially mitigated by high-pass filtering and artifact rejection. High electrode impedance did not reduce statistical power for the N1 wave unless the recording environment was warm and humid. Thus, high electrode impedance may increase noise and decrease statistical power under some conditions, but these effects can be reduced by using a cool and dry recording environment and appropriate signal processing methods. PMID:20374541

  20. Simulated performance of an order statistic threshold strategy for detection of narrowband signals

    NASA Technical Reports Server (NTRS)

    Satorius, E.; Brady, R.; Deich, W.; Gulkis, S.; Olsen, E.

    1988-01-01

    The application of order statistics to signal detection is becoming an increasingly active area of research. This is due to the inherent robustness of rank estimators in the presence of large outliers that would significantly degrade more conventional mean-level-based detection systems. A detection strategy is presented in which the threshold estimate is obtained using order statistics. The performance of this algorithm in the presence of simulated interference and broadband noise is evaluated. In this way, the robustness of the proposed strategy in the presence of the interference can be fully assessed as a function of the interference, noise, and detector parameters.

  1. A Generative Statistical Algorithm for Automatic Detection of Complex Postures

    PubMed Central

    Amit, Yali; Biron, David

    2015-01-01

    This paper presents a method for automated detection of complex (non-self-avoiding) postures of the nematode Caenorhabditis elegans and its application to analyses of locomotion defects. Our approach is based on progressively detailed statistical models that enable detection of the head and the body even in cases of severe coilers, where data from traditional trackers is limited. We restrict the input available to the algorithm to a single digitized frame, such that manual initialization is not required and the detection problem becomes embarrassingly parallel. Consequently, the proposed algorithm does not propagate detection errors and naturally integrates in a “big data” workflow used for large-scale analyses. Using this framework, we analyzed the dynamics of postures and locomotion of wild-type animals and mutants that exhibit severe coiling phenotypes. Our approach can readily be extended to additional automated tracking tasks such as tracking pairs of animals (e.g., for mating assays) or different species. PMID:26439258

  2. A statistical modeling approach for detecting generalized synchronization

    PubMed Central

    Schumacher, Johannes; Haslinger, Robert; Pipa, Gordon

    2012-01-01

    Detecting nonlinear correlations between time series presents a hard problem for data analysis. We present a generative statistical modeling method for detecting nonlinear generalized synchronization. Truncated Volterra series are used to approximate functional interactions. The Volterra kernels are modeled as linear combinations of basis splines, whose coefficients are estimated via l1 and l2 regularized maximum likelihood regression. The regularization manages the high number of kernel coefficients and allows feature selection strategies yielding sparse models. The method's performance is evaluated on different coupled chaotic systems in various synchronization regimes and analytical results for detecting m:n phase synchrony are presented. Experimental applicability is demonstrated by detecting nonlinear interactions between neuronal local field potentials recorded in different parts of macaque visual cortex. PMID:23004851

  3. Statistical feature selection for enhanced detection of brain tumor

    NASA Astrophysics Data System (ADS)

    Chaddad, Ahmad; Colen, Rivka R.

    2014-09-01

    Feature-based methods are widely used in the brain tumor recognition system. Robust of early cancer detection is one of the most powerful image processing tools. Specifically, statistical features, such as geometric mean, harmonic mean, mean excluding outliers, median, percentiles, skewness and kurtosis, have been extracted from brain tumor glioma to aid in discriminating two levels namely, Level I and Level II using fluid attenuated inversion recovery (FLAIR) sequence in the diagnosis of brain tumor. Statistical feature describes the major characteristics of each level from glioma which is an important step to evaluate heterogeneity of cancer area pixels. In this paper, we address the task of feature selection to identify the relevant subset of features in the statistical domain, while discarding those that are either redundant or confusing, thereby improving the performance of feature-based scheme to distinguish between Level I and Level II. We apply a Decision Structure algorithm to find the optimal combination of nonhomogeneity based statistical features for the problem at hand. We employ a Naïve Bayes classifier to evaluate the performance of the optimal statistical feature based scheme in terms of its glioma Level I and Level II discrimination capability and use real-data collected from 17 patients have a glioblastoma multiforme (GBM). Dataset provided from 3 Tesla MR imaging system by MD Anderson Cancer Center. For the specific data analyzed, it is shown that the identified dominant features yield higher classification accuracy, with lower number of false alarms and missed detections, compared to the full statistical based feature set. This work has been proposed and analyzed specific GBM types which Level I and Level II and the dominant features were considered as feature aid to prognostic indicators. These features were selected automatically to be better able to determine prognosis from classical imaging studies.

  4. Detection of a diffusive cloak via second-order statistics

    NASA Astrophysics Data System (ADS)

    Koirala, Milan; Yamilov, Alexey

    2016-08-01

    We propose a scheme to detect the diffusive cloak proposed by Schittny et al [Science 345, 427 (2014)]. We exploit the fact that diffusion of light is an approximation that disregards wave interference. The long-range contribution to intensity correlation is sensitive to locations of paths crossings and the interference inside the medium, allowing one to detect the size and position, including the depth, of the diffusive cloak. Our results also suggest that it is possible to separately manipulate the first- and the second-order statistics of wave propagation in turbid media.

  5. Detection of a diffusive cloak via second-order statistics.

    PubMed

    Koirala, Milan; Yamilov, Alexey

    2016-08-15

    We propose a scheme to detect the diffusive cloak proposed by Schittny et al. [Science345, 427 (2014).SCIEAS0036-807510.1126/science.1254524]. We exploit the fact that diffusion of light is an approximation that disregards wave interference. The long-range contribution to intensity correlation is sensitive to the locations of path crossings and the interference inside the medium, allowing one to detect the size and position, including the depth, of the diffusive cloak. Our results also suggest that it is possible to separately manipulate the first- and the second-order statistics of wave propagation in turbid media. PMID:27519108

  6. Statistically normalized coherent change detection for synthetic aperture sonar imagery

    NASA Astrophysics Data System (ADS)

    G-Michael, Tesfaye; Tucker, J. D.; Roberts, Rodney G.

    2016-05-01

    Coherent Change Detection (CCD) is a process of highlighting an area of activity in scenes (seafloor) under survey and generated from pairs of synthetic aperture sonar (SAS) images of approximately the same location observed at two different time instances. The problem of CCD and subsequent anomaly feature extraction/detection is complicated due to several factors such as the presence of random speckle pattern in the images, changing environmental conditions, and platform instabilities. These complications make the detection of weak target activities even more difficult. Typically, the degree of similarity between two images measured at each pixel locations is the coherence between the complex pixel values in the two images. Higher coherence indicates little change in the scene represented by the pixel and lower coherence indicates change activity in the scene. Such coherence estimation scheme based on the pixel intensity correlation is an ad-hoc procedure where the effectiveness of the change detection is determined by the choice of threshold which can lead to high false alarm rates. In this paper, we propose a novel approach for anomalous change pattern detection using the statistical normalized coherence and multi-pass coherent processing. This method may be used to mitigate shadows by reducing the false alarms resulting in the coherent map due to speckles and shadows. Test results of the proposed methods on a data set of SAS images will be presented, illustrating the effectiveness of the normalized coherence in terms statistics from multi-pass survey of the same scene.

  7. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

    NASA Astrophysics Data System (ADS)

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

    2016-02-01

    Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.

  8. Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance.

    PubMed

    Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Suffredini, Anthony F; Sacks, David B; Yu, Yi-Kuo

    2016-02-01

    Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple 'fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ. PMID:26510657

  9. Pregnancy-associated breast cancer: significance of early detection.

    PubMed

    Ulery, Maryann; Carter, Linnette; McFarlin, Barbara L; Giurgescu, Carmen

    2009-01-01

    Pregnancy-associated breast cancer (PABC) is defined as cancer of the breast diagnosed during pregnancy and up to 1 year postpartum. Delays in diagnosis are frequently associated with increased morbidity and mortality. The aim of this article is to determine the significance of early detection of PABC and to alert health care providers to include PABC in the differential diagnosis when evaluating a breast mass in the perinatal period. This integrative literature review evaluated 15 research studies by using the hypothetical deductive model of clinical reasoning to determine factors related to diagnosis of PABC. As women delay childbearing, the incidence of PABC increases with age. In the reviewed studies, breast cancer was diagnosed with greater frequency in the postpartum period than during any trimester in pregnancy. Delay in diagnosis is complicated by axillary lymph node metastasis, high-grade tumors at diagnosis, and poor outcomes. Early detection is a significant predictor of improved outcomes. Diagnostic modalities such as ultrasound, mammography, and biopsy can be safely used for diagnostic purposes in the evaluation of potential cases of PABC during pregnancy. PMID:19720336

  10. Mass spectrometry-based protein identification with accurate statistical significance assignment

    PubMed Central

    Alves, Gelio; Yu, Yi-Kuo

    2015-01-01

    Motivation: Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. Results: We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. Availability and implementation: The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Contact: yyu@ncbi.nlm.nih.gov Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25362092

  11. A Multi-Core Parallelization Strategy for Statistical Significance Testing in Learning Classifier Systems

    PubMed Central

    Rudd, James; Moore, Jason H.; Urbanowicz, Ryan J.

    2013-01-01

    Permutation-based statistics for evaluating the significance of class prediction, predictive attributes, and patterns of association have only appeared within the learning classifier system (LCS) literature since 2012. While still not widely utilized by the LCS research community, formal evaluations of test statistic confidence are imperative to large and complex real world applications such as genetic epidemiology where it is standard practice to quantify the likelihood that a seemingly meaningful statistic could have been obtained purely by chance. LCS algorithms are relatively computationally expensive on their own. The compounding requirements for generating permutation-based statistics may be a limiting factor for some researchers interested in applying LCS algorithms to real world problems. Technology has made LCS parallelization strategies more accessible and thus more popular in recent years. In the present study we examine the benefits of externally parallelizing a series of independent LCS runs such that permutation testing with cross validation becomes more feasible to complete on a single multi-core workstation. We test our python implementation of this strategy in the context of a simulated complex genetic epidemiological data mining problem. Our evaluations indicate that as long as the number of concurrent processes does not exceed the number of CPU cores, the speedup achieved is approximately linear. PMID:24358057

  12. Statistical significance of hair analysis of clenbuterol to discriminate therapeutic use from contamination.

    PubMed

    Krumbholz, Aniko; Anielski, Patricia; Gfrerer, Lena; Graw, Matthias; Geyer, Hans; Schänzer, Wilhelm; Dvorak, Jiri; Thieme, Detlef

    2014-01-01

    Clenbuterol is a well-established β2-agonist, which is prohibited in sports and strictly regulated for use in the livestock industry. During the last few years clenbuterol-positive results in doping controls and in samples from residents or travellers from a high-risk country were suspected to be related the illegal use of clenbuterol for fattening. A sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) method was developed to detect low clenbuterol residues in hair with a detection limit of 0.02 pg/mg. A sub-therapeutic application study and a field study with volunteers, who have a high risk of contamination, were performed. For the application study, a total dosage of 30 µg clenbuterol was applied to 20 healthy volunteers on 5 subsequent days. One month after the beginning of the application, clenbuterol was detected in the proximal hair segment (0-1 cm) in concentrations between 0.43 and 4.76 pg/mg. For the second part, samples of 66 Mexican soccer players were analyzed. In 89% of these volunteers, clenbuterol was detectable in their hair at concentrations between 0.02 and 1.90 pg/mg. A comparison of both parts showed no statistical difference between sub-therapeutic application and contamination. In contrast, discrimination to a typical abuse of clenbuterol is apparently possible. Due to these findings results of real doping control samples can be evaluated. PMID:25388545

  13. Statistical method for determining and comparing limits of detection of bioassays.

    PubMed

    Holstein, Carly A; Griffin, Maryclare; Hong, Jing; Sampson, Paul D

    2015-10-01

    The current bioassay development literature lacks the use of statistically robust methods for calculating the limit of detection of a given assay. Instead, researchers often employ simple methods that provide a rough estimate of the limit of detection, often without a measure of the confidence in the estimate. This scarcity of robust methods is likely due to a realistic preference for simple and accessible methods and to a lack of such methods that have reduced the concepts of limit of detection theory to practice for the specific application of bioassays. Here, we have developed a method for determining limits of detection for bioassays that is statistically robust and reduced to practice in a clear and accessible manner geared at researchers, not statisticians. This method utilizes a four-parameter logistic curve fit to translate signal intensity to analyte concentration, which is a curve that is commonly employed in quantitative bioassays. This method generates a 95% confidence interval of the limit of detection estimate to provide a measure of uncertainty and a means by which to compare the analytical sensitivities of different assays statistically. We have demonstrated this method using real data from the development of a paper-based influenza assay in our laboratory to illustrate the steps and features of the method. Using this method, assay developers can calculate statistically valid limits of detection and compare these values for different assays to determine when a change to the assay design results in a statistically significant improvement in analytical sensitivity. PMID:26376354

  14. Robust Statistical Detection of Power-Law Cross-Correlation

    NASA Astrophysics Data System (ADS)

    Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert

    2016-06-01

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram.

  15. Statistical detection of nanoparticles in cells by darkfield microscopy.

    PubMed

    Gnerucci, Alessio; Romano, Giovanni; Ratto, Fulvio; Centi, Sonia; Baccini, Michela; Santosuosso, Ugo; Pini, Roberto; Fusi, Franco

    2016-07-01

    In the fields of nanomedicine, biophotonics and radiation therapy, nanoparticle (NP) detection in cell models often represents a fundamental step for many in vivo studies. One common question is whether NPs have or have not interacted with cells. In this context, we propose an imaging based technique to detect the presence of NPs in eukaryotic cells. Darkfield images of cell cultures at low magnification (10×) are acquired in different spectral ranges and recombined so as to enhance the contrast due to the presence of NPs. Image analysis is applied to extract cell-based parameters (i.e. mean intensity), which are further analyzed by statistical tests (Student's t-test, permutation test) in order to obtain a robust detection method. By means of a statistical sample size analysis, the sensitivity of the whole methodology is quantified in terms of the minimum cell number that is needed to identify the presence of NPs. The method is presented in the case of HeLa cells incubated with gold nanorods labeled with anti-CA125 antibodies, which exploits the overexpression of CA125 in ovarian cancers. Control cases are considered as well, including PEG-coated NPs and HeLa cells without NPs. PMID:27381231

  16. A new statistical approach to climate change detection and attribution

    NASA Astrophysics Data System (ADS)

    Ribes, Aurélien; Zwiers, Francis W.; Azaïs, Jean-Marc; Naveau, Philippe

    2016-04-01

    We propose here a new statistical approach to climate change detection and attribution that is based on additive decomposition and simple hypothesis testing. Most current statistical methods for detection and attribution rely on linear regression models where the observations are regressed onto expected response patterns to different external forcings. These methods do not use physical information provided by climate models regarding the expected response magnitudes to constrain the estimated responses to the forcings. Climate modelling uncertainty is difficult to take into account with regression based methods and is almost never treated explicitly. As an alternative to this approach, our statistical model is only based on the additivity assumption; the proposed method does not regress observations onto expected response patterns. We introduce estimation and testing procedures based on likelihood maximization, and show that climate modelling uncertainty can easily be accounted for. Some discussion is provided on how to practically estimate the climate modelling uncertainty based on an ensemble of opportunity. Our approach is based on the "models are statistically indistinguishable from the truth" paradigm, where the difference between any given model and the truth has the same distribution as the difference between any pair of models, but other choices might also be considered. The properties of this approach are illustrated and discussed based on synthetic data. Lastly, the method is applied to the linear trend in global mean temperature over the period 1951-2010. Consistent with the last IPCC assessment report, we find that most of the observed warming over this period (+0.65 K) is attributable to anthropogenic forcings (+0.67 ± 0.12 K, 90 % confidence range), with a very limited contribution from natural forcings (-0.01± 0.02 K).

  17. Algorithms for Detecting Significantly Mutated Pathways in Cancer

    NASA Astrophysics Data System (ADS)

    Vandin, Fabio; Upfal, Eli; Raphael, Benjamin J.

    Recent genome sequencing studies have shown that the somatic mutations that drive cancer development are distributed across a large number of genes. This mutational heterogeneity complicates efforts to distinguish functional mutations from sporadic, passenger mutations. Since cancer mutations are hypothesized to target a relatively small number of cellular signaling and regulatory pathways, a common approach is to assess whether known pathways are enriched for mutated genes. However, restricting attention to known pathways will not reveal novel cancer genes or pathways. An alterative strategy is to examine mutated genes in the context of genome-scale interaction networks that include both well characterized pathways and additional gene interactions measured through various approaches. We introduce a computational framework for de novo identification of subnetworks in a large gene interaction network that are mutated in a significant number of patients. This framework includes two major features. First, we introduce a diffusion process on the interaction network to define a local neighborhood of "influence" for each mutated gene in the network. Second, we derive a two-stage multiple hypothesis test to bound the false discovery rate (FDR) associated with the identified subnetworks. We test these algorithms on a large human protein-protein interaction network using mutation data from two recent studies: glioblastoma samples from The Cancer Genome Atlas and lung adenocarcinoma samples from the Tumor Sequencing Project. We successfully recover pathways that are known to be important in these cancers, such as the p53 pathway. We also identify additional pathways, such as the Notch signaling pathway, that have been implicated in other cancers but not previously reported as mutated in these samples. Our approach is the first, to our knowledge, to demonstrate a computationally efficient strategy for de novo identification of statistically significant mutated subnetworks. We

  18. A randomized trial in a massive online open course shows people don't know what a statistically significant relationship looks like, but they can learn.

    PubMed

    Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff

    2014-01-01

    Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457

  19. A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn

    PubMed Central

    Fisher, Aaron; Anderson, G. Brooke; Peng, Roger

    2014-01-01

    Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%–49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%–76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457

  20. Statistical damage detection method for frame structures using a confidence interval

    NASA Astrophysics Data System (ADS)

    Li, Weiming; Zhu, Hongping; Luo, Hanbin; Xia, Yong

    2010-03-01

    A novel damage detection method is applied to a 3-story frame structure, to obtain statistical quantification control criterion of the existence, location and identification of damage. The mean, standard deviation, and exponentially weighted moving average (EWMA) are applied to detect damage information according to statistical process control (SPC) theory. It is concluded that the detection is insignificant with the mean and EWMA because the structural response is not independent and is not a normal distribution. On the other hand, the damage information is detected well with the standard deviation because the influence of the data distribution is not pronounced with this parameter. A suitable moderate confidence level is explored for more significant damage location and quantification detection, and the impact of noise is investigated to illustrate the robustness of the method.

  1. FloatBoost learning and statistical face detection.

    PubMed

    Li, Stan Z; Zhang, ZhenQiu

    2004-09-01

    A novel learning procedure, called FloatBoost, is proposed for learning a boosted classifier for achieving the minimum error rate. FloatBoost learning uses a backtrack mechanism after each iteration of AdaBoost learning to minimize the error rate directly, rather than minimizing an exponential function of the margin as in the traditional AdaBoost algorithms. A second contribution of the paper is a novel statistical model for learning best weak classifiers using a stagewise approximation of the posterior probability. These novel techniques lead to a classifier which requires fewer weak classifiers than AdaBoost yet achieves lower error rates in both training and testing, as demonstrated by extensive experiments. Applied to face detection, the FloatBoost learning method, together with a proposed detector pyramid architecture, leads to the first real-time multiview face detection system reported. PMID:15742888

  2. Statistics over features for internal carotid arterial disorders detection.

    PubMed

    Ubeyli, Elif Derya

    2008-03-01

    The objective of the present study is to extract the representative features of the internal carotid arterial (ICA) Doppler ultrasound signals and to present the accurate classification model. This paper presented the usage of statistics over the set of the extracted features (Lyapunov exponents and the power levels of the power spectral density estimates obtained by the eigenvector methods) in order to reduce the dimensionality of the extracted feature vectors. Since classification is more accurate when the pattern is simplified through representation by important features, feature extraction and selection play an important role in classifying systems such as neural networks. Mixture of experts (ME) and modified mixture of experts (MME) architectures were formulated and used as basis for detection of arterial disorders. Three types of ICA Doppler signals (Doppler signals recorded from healthy subjects, subjects having stenosis, and subjects having occlusion) were classified. The classification results confirmed that the proposed ME and MME has potential in detecting the arterial disorders. PMID:18179791

  3. Statistically robust detection of spontaneous, non-stereotypical neural signals.

    PubMed

    Liu, Fan; Merwine, David K; Grzywacz, Norberto M

    2006-06-15

    Neural signals of interest are often temporally spontaneous and non-stereotypical in waveform. Detecting such signals is difficult, since one cannot use time-locking or simple template-matching techniques. We have sought a statistical method for automatically estimating the baseline in these conditions, and subsequently detecting the occurrence of neural signals. One could consider the signals as outliers in the distribution of neural activity and thus separate them from the baseline with median-based techniques. However, we found that baseline estimators that rely on the median are problematic. They introduce progressively greater estimation errors as the neural signal's duration, amplitude or frequency increases. Therefore, we tested several mode-based algorithms, taking advantage of the most probable state of the neural activity being the baseline. We found that certain mode-based algorithms perform baseline estimation well, with low susceptibility to changes in event duration, amplitude or frequency. Once the baseline is properly established, its median absolute deviation (MAD) can be determined. One can then use it to detect spontaneous signals robustly as outliers from the noise distribution. We also demonstrate how the choice of detection threshold in terms of MADs can be used to bias against false positives, without creating too many false negatives or vice versa. PMID:16430965

  4. STATISTICAL METHOD FOR DETECTION OF A TREND IN ATMOSPHERIC SULFATE

    EPA Science Inventory

    Daily atmospheric concentrations of sulfate collected in northeastern Pennsylvania are regressed against meteorological factors, ozone, and time in order to determine if a significant trend in sulfate can be detected. he data used in this analysis were collected during the Sulfat...

  5. Statistical significance of trends and trend differences in layer-average atmospheric temperature time series

    NASA Astrophysics Data System (ADS)

    Santer, B. D.; Wigley, T. M. L.; Boyle, J. S.; Gaffen, D. J.; Hnilo, J. J.; Nychka, D.; Parker, D. E.; Taylor, K. E.

    2000-03-01

    This paper examines trend uncertainties in layer-average free atmosphere temperatures arising from the use of different trend estimation methods. It also considers statistical issues that arise in assessing the significance of individual trends and of trend differences between data sets. Possible causes of these trends are not addressed. We use data from satellite and radiosonde measurements and from two reanalysis projects. To facilitate intercomparison, we compute from reanalyses and radiosonde data temperatures equivalent to those from the satellite-based Microwave Sounding Unit (MSU). We compare linear trends based on minimization of absolute deviations (LA) and minimization of squared deviations (LS). Differences are generally less than 0.05°C/decade over 1959-1996. Over 1979-1993, they exceed 0.10°C/decade for lower tropospheric time series and 0.15°C/decade for the lower stratosphere. Trend fitting by the LA method can degrade the lower-tropospheric trend agreement of 0.03°C/decade (over 1979-1996) previously reported for the MSU and radiosonde data. In assessing trend significance we employ two methods to account for temporal autocorrelation effects. With our preferred method, virtually none of the individual 1979-1993 trends in deep-layer temperatures are significantly different from zero. To examine trend differences between data sets we compute 95% confidence intervals for individual trends and show that these overlap for almost all data sets considered. Confidence intervals for lower-tropospheric trends encompass both zero and the model-projected trends due to anthropogenic effects. We also test the significance of a trend in d(t), the time series of differences between a pair of data sets. Use of d(t) removes variability common to both time series and facilitates identification of small trend differences. This more discerning test reveals that roughly 30% of the data set comparisons have significant differences in lower-tropospheric trends

  6. StegoWall: blind statistical detection of hidden data

    NASA Astrophysics Data System (ADS)

    Voloshynovskiy, Sviatoslav V.; Herrigel, Alexander; Rytsar, Yuri B.; Pun, Thierry

    2002-04-01

    Novel functional possibilities, provided by recent data hiding technologies, carry out the danger of uncontrolled (unauthorized) and unlimited information exchange that might be used by people with unfriendly interests. The multimedia industry as well as the research community recognize the urgent necessity for network security and copyright protection, or rather the lack of adequate law for digital multimedia protection. This paper advocates the need for detecting hidden data in digital and analog media as well as in electronic transmissions, and for attempting to identify the underlying hidden data. Solving this problem calls for the development of an architecture for blind stochastic hidden data detection in order to prevent unauthorized data exchange. The proposed architecture is called StegoWall; its key aspects are the solid investigation, the deep understanding, and the prediction of possible tendencies in the development of advanced data hiding technologies. The basic idea of our complex approach is to exploit all information about hidden data statistics to perform its detection based on a stochastic framework. The StegoWall system will be used for four main applications: robust watermarking, secret communications, integrity control and tamper proofing, and internet/network security.

  7. Pectoral muscle detection in mammograms using local statistical features.

    PubMed

    Liu, Li; Liu, Qian; Lu, Wei

    2014-10-01

    Mammography is a primary imaging method for breast cancer diagnosis. It is an important issue to accurately identify and separate pectoral muscles (PM) from breast tissues. Hough-transform-based methods are commonly adopted for PM detection. But their performances are susceptible when PM edges cannot be depicted by straight lines. In this study, we present a new pectoral muscle identification algorithm which utilizes statistical features of pixel responses. First, the Anderson-Darling goodness-of-fit test is used to extract a feature image by assuming non-Gaussianity for PM boundaries. Second, a global weighting scheme based on the location of PM was applied onto the feature image to suppress non-PM regions. From the weighted image, a preliminary set of pectoral muscles boundary components is detected via row-wise peak detection. An iterative procedure based on the edge continuity and orientation is used to determine the final PM boundary. Our results on a public mammogram database were assessed using four performance metrics: the false positive rate, the false negative rate, the Hausdorff distance, and the average distance. Compared to previous studies, our method demonstrates the state-of-art performance in terms of four measures. PMID:24482043

  8. Statistical modeling for particle impact noise detection testing

    SciTech Connect

    Prairie, R.R. ); Zimmer, W.J. )

    1990-01-01

    Particle Impact Noise Detection (PIND) testing is widely used to test electronic devices for the presence of conductive particles which can cause catastrophic failure. This paper develops a statistical model based on the rate of particles contaminating the part, the rate of particles induced by the test vibration, the escape rate, and the false alarm rate. Based on data from a large number of PIND tests for a canned transistor, the model is shown to fit the observed results closely. Knowledge of the parameters for which this fit is made is important in evaluating the effectiveness of the PIND test procedure and for developing background judgment about the performance of the PIND test. Furthermore, by varying the input parameters to the model, the resulting yield, failure rate and percent fallout can be examined and used to plan and implement PIND test programs.

  9. Robust Statistical Detection of Power-Law Cross-Correlation

    PubMed Central

    Blythe, Duncan A. J.; Nikulin, Vadim V.; Müller, Klaus-Robert

    2016-01-01

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630

  10. Robust Statistical Detection of Power-Law Cross-Correlation.

    PubMed

    Blythe, Duncan A J; Nikulin, Vadim V; Müller, Klaus-Robert

    2016-01-01

    We show that widely used approaches in statistical physics incorrectly indicate the existence of power-law cross-correlations between financial stock market fluctuations measured over several years and the neuronal activity of the human brain lasting for only a few minutes. While such cross-correlations are nonsensical, no current methodology allows them to be reliably discarded, leaving researchers at greater risk when the spurious nature of cross-correlations is not clear from the unrelated origin of the time series and rather requires careful statistical estimation. Here we propose a theory and method (PLCC-test) which allows us to rigorously and robustly test for power-law cross-correlations, correctly detecting genuine and discarding spurious cross-correlations, thus establishing meaningful relationships between processes in complex physical systems. Our method reveals for the first time the presence of power-law cross-correlations between amplitudes of the alpha and beta frequency ranges of the human electroencephalogram. PMID:27250630

  11. Statistical significant changes in ground thermal conditions of alpine Austria during the last decade

    NASA Astrophysics Data System (ADS)

    Kellerer-Pirklbauer, Andreas

    2016-04-01

    Longer data series (e.g. >10 a) of ground temperatures in alpine regions are helpful to improve the understanding regarding the effects of present climate change on distribution and thermal characteristics of seasonal frost- and permafrost-affected areas. Beginning in 2004 - and more intensively since 2006 - a permafrost and seasonal frost monitoring network was established in Central and Eastern Austria by the University of Graz. This network consists of c.60 ground temperature (surface and near-surface) monitoring sites which are located at 1922-3002 m a.s.l., at latitude 46°55'-47°22'N and at longitude 12°44'-14°41'E. These data allow conclusions about general ground thermal conditions, potential permafrost occurrence, trend during the observation period, and regional pattern of changes. Calculations and analyses of several different temperature-related parameters were accomplished. At an annual scale a region-wide statistical significant warming during the observation period was revealed by e.g. an increase in mean annual temperature values (mean, maximum) or the significant lowering of the surface frost number (F+). At a seasonal scale no significant trend of any temperature-related parameter was in most cases revealed for spring (MAM) and autumn (SON). Winter (DJF) shows only a weak warming. In contrast, the summer (JJA) season reveals in general a significant warming as confirmed by several different temperature-related parameters such as e.g. mean seasonal temperature, number of thawing degree days, number of freezing degree days, or days without night frost. On a monthly basis August shows the statistically most robust and strongest warming of all months, although regional differences occur. Despite the fact that the general ground temperature warming during the last decade is confirmed by the field data in the study region, complications in trend analyses arise by temperature anomalies (e.g. warm winter 2006/07) or substantial variations in the winter

  12. Statistics, Probability, Significance, Likelihood: Words Mean What We Define Them to Mean

    ERIC Educational Resources Information Center

    Drummond, Gordon B.; Tom, Brian D. M.

    2011-01-01

    Statisticians use words deliberately and specifically, but not necessarily in the way they are used colloquially. For example, in general parlance "statistics" can mean numerical information, usually data. In contrast, one large statistics textbook defines the term "statistic" to denote "a characteristic of a "sample", such as the average score",…

  13. Statistical method for detecting structural change in the growth process.

    PubMed

    Ninomiya, Yoshiyuki; Yoshimoto, Atsushi

    2008-03-01

    Due to competition among individual trees and other exogenous factors that change the growth environment, each tree grows following its own growth trend with some structural changes in growth over time. In the present article, a new method is proposed to detect a structural change in the growth process. We formulate the method as a simple statistical test for signal detection without constructing any specific model for the structural change. To evaluate the p-value of the test, the tube method is developed because the regular distribution theory is insufficient. Using two sets of tree diameter growth data sampled from planted forest stands of Cryptomeria japonica in Japan, we conduct an analysis of identifying the effect of thinning on the growth process as a structural change. Our results demonstrate that the proposed method is useful to identify the structural change caused by thinning. We also provide the properties of the method in terms of the size and power of the test. PMID:17608782

  14. There's more than one way to conduct a replication study: Beyond statistical significance.

    PubMed

    Anderson, Samantha F; Maxwell, Scott E

    2016-03-01

    As the field of psychology struggles to trust published findings, replication research has begun to become more of a priority to both scientists and journals. With this increasing emphasis placed on reproducibility, it is essential that replication studies be capable of advancing the field. However, we argue that many researchers have been only narrowly interpreting the meaning of replication, with studies being designed with a simple statistically significant or nonsignificant results framework in mind. Although this interpretation may be desirable in some cases, we develop a variety of additional "replication goals" that researchers could consider when planning studies. Even if researchers are aware of these goals, we show that they are rarely used in practice-as results are typically analyzed in a manner only appropriate to a simple significance test. We discuss each goal conceptually, explain appropriate analysis procedures, and provide 1 or more examples to illustrate these analyses in practice. We hope that these various goals will allow researchers to develop a more nuanced understanding of replication that can be flexible enough to answer the various questions that researchers might seek to understand. PMID:26214497

  15. Application of universal kriging for estimation of earthquake ground motion: Statistical significance of results

    SciTech Connect

    Carr, J.R.; Roberts, K.P.

    1989-02-01

    Universal kriging is compared with ordinary kriging for estimation of earthquake ground motion. Ordinary kriging is based on a stationary random function model; universal kriging is based on a nonstationary random function model representing first-order drift. Accuracy of universal kriging is compared with that for ordinary kriging; cross-validation is used as the basis for comparison. Hypothesis testing on these results shows that accuracy obtained using universal kriging is not significantly different from accuracy obtained using ordinary kriging. Test based on normal distribution assumptions are applied to errors measured in the cross-validation procedure; t and F tests reveal no evidence to suggest universal and ordinary kriging are different for estimation of earthquake ground motion. Nonparametric hypothesis tests applied to these errors and jackknife statistics yield the same conclusion: universal and ordinary kriging are not significantly different for this application as determined by a cross-validation procedure. These results are based on application to four independent data sets (four different seismic events).

  16. Mass detection on real and synthetic mammograms: human observer templates and local statistics

    NASA Astrophysics Data System (ADS)

    Castella, Cyril; Kinkel, Karen; Verdun, Francis R.; Eckstein, Miguel P.; Abbey, Craig K.; Bochud, François O.

    2007-03-01

    In this study we estimated human observer templates associated with the detection of a realistic mass signal superimposed on real and simulated but realistic synthetic mammographic backgrounds. Five trained naÃve observers participated in two-alternative forced-choice (2-AFC) experiments in which they were asked to detect a spherical mass signal extracted from a mammographic phantom. This signal was superimposed on statistically stationary clustered lumpy backgrounds (CLB) in one instance, and on nonstationary real mammographic backgrounds in another. Human observer linear templates were estimated using a genetic algorithm. An additional 2-AFC experiment was conducted with twin noise in order to determine which local statistical properties of the real backgrounds influenced the ability of the human observers to detect the signal. Results show that the estimated linear templates are not significantly different for stationary and nonstationary backgrounds. The estimated performance of the linear template compared with the human observer is within 5% in terms of percent correct (Pc) for the 2-AFC task. Detection efficiency is significantly higher on nonstationary real backgrounds than on globally stationary synthetic CLB. Using the twin-noise experiment and a new method to relate image features to observers trial to trial decisions, we found that the local statistical properties preventing or making the detection task easier were the standard deviation and three features derived from the neighborhood gray-tone difference matrix: coarseness, contrast and strength. These statistical features showed a dependency with the human performance only when they are estimated within an area sufficiently small around the searched location. These findings emphasize that nonstationary backgrounds need to be described by their local statistics and not by global ones like the noise Wiener spectrum.

  17. Statistical Analysis of Data with Non-Detectable Values

    SciTech Connect

    Frome, E.L.

    2004-08-26

    Environmental exposure measurements are, in general, positive and may be subject to left censoring, i.e. the measured value is less than a ''limit of detection''. In occupational monitoring, strategies for assessing workplace exposures typically focus on the mean exposure level or the probability that any measurement exceeds a limit. A basic problem of interest in environmental risk assessment is to determine if the mean concentration of an analyte is less than a prescribed action level. Parametric methods, used to determine acceptable levels of exposure, are often based on a two parameter lognormal distribution. The mean exposure level and/or an upper percentile (e.g. the 95th percentile) are used to characterize exposure levels, and upper confidence limits are needed to describe the uncertainty in these estimates. In certain situations it is of interest to estimate the probability of observing a future (or ''missed'') value of a lognormal variable. Statistical methods for random samples (without non-detects) from the lognormal distribution are well known for each of these situations. In this report, methods for estimating these quantities based on the maximum likelihood method for randomly left censored lognormal data are described and graphical methods are used to evaluate the lognormal assumption. If the lognormal model is in doubt and an alternative distribution for the exposure profile of a similar exposure group is not available, then nonparametric methods for left censored data are used. The mean exposure level, along with the upper confidence limit, is obtained using the product limit estimate, and the upper confidence limit on the 95th percentile (i.e. the upper tolerance limit) is obtained using a nonparametric approach. All of these methods are well known but computational complexity has limited their use in routine data analysis with left censored data. The recent development of the R environment for statistical data analysis and graphics has greatly

  18. Biological meaning, statistical significance, and classification of local spatial similarities in nonhomologous proteins.

    PubMed Central

    Alexandrov, N. N.; Go, N.

    1994-01-01

    We have completed an exhaustive search for the common spatial arrangements of backbone fragments (SARFs) in nonhomologous proteins. This type of local structural similarity, incorporating short fragments of backbone atoms, arranged not necessarily in the same order along the polypeptide chain, appears to be important for protein function and stability. To estimate the statistical significance of the similarities, we have introduced a similarity score. We present several locally similar structures, with a large similarity score, which have not yet been reported. On the basis of the results of pairwise comparison, we have performed hierarchical cluster analysis of protein structures. Our analysis is not limited by comparison of single chains but also includes complex molecules consisting of several subunits. The SARFs with backbone fragments from different polypeptide chains provide a stable interaction between subunits in protein molecules. In many cases the active site of enzyme is located at the same position relative to the common SARFs, implying a function of the certain SARFs as a universal interface of the protein-substrate interaction. PMID:8069217

  19. The Detection and Statistics of Giant Arcs behind CLASH Clusters

    NASA Astrophysics Data System (ADS)

    Xu, Bingxiao; Postman, Marc; Meneghetti, Massimo; Seitz, Stella; Zitrin, Adi; Merten, Julian; Maoz, Dani; Frye, Brenda; Umetsu, Keiichi; Zheng, Wei; Bradley, Larry; Vega, Jesus; Koekemoer, Anton

    2016-02-01

    We developed an algorithm to find and characterize gravitationally lensed galaxies (arcs) to perform a comparison of the observed and simulated arc abundance. Observations are from the Cluster Lensing And Supernova survey with Hubble (CLASH). Simulated CLASH images are created using the MOKA package and also clusters selected from the high-resolution, hydrodynamical simulations, MUSIC, over the same mass and redshift range as the CLASH sample. The algorithm's arc elongation accuracy, completeness, and false positive rate are determined and used to compute an estimate of the true arc abundance. We derive a lensing efficiency of 4 ± 1 arcs (with length ≥6″ and length-to-width ratio ≥7) per cluster for the X-ray-selected CLASH sample, 4 ± 1 arcs per cluster for the MOKA-simulated sample, and 3 ± 1 arcs per cluster for the MUSIC-simulated sample. The observed and simulated arc statistics are in full agreement. We measure the photometric redshifts of all detected arcs and find a median redshift zs = 1.9 with 33% of the detected arcs having zs > 3. We find that the arc abundance does not depend strongly on the source redshift distribution but is sensitive to the mass distribution of the dark matter halos (e.g., the c-M relation). Our results show that consistency between the observed and simulated distributions of lensed arc sizes and axial ratios can be achieved by using cluster-lensing simulations that are carefully matched to the selection criteria used in the observations.

  20. Statistical language analysis for automatic exfiltration event detection.

    SciTech Connect

    Robinson, David Gerald

    2010-04-01

    This paper discusses the recent development a statistical approach for the automatic identification of anomalous network activity that is characteristic of exfiltration events. This approach is based on the language processing method eferred to as latent dirichlet allocation (LDA). Cyber security experts currently depend heavily on a rule-based framework for initial detection of suspect network events. The application of the rule set typically results in an extensive list of uspect network events that are then further explored manually for suspicious activity. The ability to identify anomalous network events is heavily dependent on the experience of the security personnel wading through the network log. Limitations f this approach are clear: rule-based systems only apply to exfiltration behavior that has previously been observed, and experienced cyber security personnel are rare commodities. Since the new methodology is not a discrete rule-based pproach, it is more difficult for an insider to disguise the exfiltration events. A further benefit is that the methodology provides a risk-based approach that can be implemented in a continuous, dynamic or evolutionary fashion. This permits uspect network activity to be identified early with a quantifiable risk associated with decision making when responding to suspicious activity.

  1. Significant statistically relationship between the great volcanic eruptions and the count of sunspots from 1610 to the present

    NASA Astrophysics Data System (ADS)

    Casati, Michele

    2014-05-01

    The assertion that solar activity may play a significant role in the trigger of large volcanic eruptions is, and has been discussed by many geophysicists. Numerous scientific papers have established a possible correlation between these events and the electromagnetic coupling between the Earth and the Sun, but none of them has been able to highlight a possible statistically significant relationship between large volcanic eruptions and any of the series, such as geomagnetic activity, solar wind, sunspots number. In our research, we compare the 148 volcanic eruptions with index VEI4, the major 37 historical volcanic eruptions equal to or greater than index VEI5, recorded from 1610 to 2012 , with its sunspots number. Staring, as the threshold value, a monthly sunspot number of 46 (recorded during the great eruption of Krakatoa VEI6 historical index, August 1883), we note some possible relationships and conduct a statistical test. • Of the historical 31 large volcanic eruptions with index VEI5+, recorded between 1610 and 1955, 29 of these were recorded when the SSN<46. The remaining 2 eruptions were not recorded when the SSN<46, but rather during solar maxima of the solar cycle of the year 1739 and in the solar cycle No. 14 (Shikotsu eruption of 1739 and Ksudach 1907). • Of the historical 8 large volcanic eruptions with index VEI6+, recorded from 1610 to the present, 7 of these were recorded with SSN<46 and more specifically, within the three large solar minima known : Maunder (1645-1710), Dalton (1790-1830) and during the solar minimums occurred between 1880 and 1920. As the only exception, we note the eruption of Pinatubo of June 1991, recorded in the solar maximum of cycle 22. • Of the historical 6 major volcanic eruptions with index VEI5+, recorded after 1955, 5 of these were not recorded during periods of low solar activity, but rather during solar maxima, of the cycles 19,21 and 22. The significant tests, conducted with the chi-square χ ² = 7,782, detect a

  2. Automatic brain tumor detection in MRI: methodology and statistical validation

    NASA Astrophysics Data System (ADS)

    Iftekharuddin, Khan M.; Islam, Mohammad A.; Shaik, Jahangheer; Parra, Carlos; Ogg, Robert

    2005-04-01

    Automated brain tumor segmentation and detection are immensely important in medical diagnostics because it provides information associated to anatomical structures as well as potential abnormal tissue necessary to delineate appropriate surgical planning. In this work, we propose a novel automated brain tumor segmentation technique based on multiresolution texture information that combines fractal Brownian motion (fBm) and wavelet multiresolution analysis. Our wavelet-fractal technique combines the excellent multiresolution localization property of wavelets to texture extraction of fractal. We prove the efficacy of our technique by successfully segmenting pediatric brain MR images (MRIs) from St. Jude Children"s Research Hospital. We use self-organizing map (SOM) as our clustering tool wherein we exploit both pixel intensity and multiresolution texture features to obtain segmented tumor. Our test results show that our technique successfully segments abnormal brain tissues in a set of T1 images. In the next step, we design a classifier using Feed-Forward (FF) neural network to statistically validate the presence of tumor in MRI using both the multiresolution texture and the pixel intensity features. We estimate the corresponding receiver operating curve (ROC) based on the findings of true positive fractions and false positive fractions estimated from our classifier at different threshold values. An ROC, which can be considered as a gold standard to prove the competence of a classifier, is obtained to ascertain the sensitivity and specificity of our classifier. We observe that at threshold 0.4 we achieve true positive value of 1.0 (100%) sacrificing only 0.16 (16%) false positive value for the set of 50 T1 MRI analyzed in this experiment.

  3. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  4. Detection of Significant Groups in Hierarchical Clustering by Resampling.

    PubMed

    Sebastiani, Paola; Perls, Thomas T

    2016-01-01

    Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and "tree-cutting" procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method. PMID:27551289

  5. Detection of Significant Groups in Hierarchical Clustering by Resampling

    PubMed Central

    Sebastiani, Paola; Perls, Thomas T.

    2016-01-01

    Hierarchical clustering is a simple and reproducible technique to rearrange data of multiple variables and sample units and visualize possible groups in the data. Despite the name, hierarchical clustering does not provide clusters automatically, and “tree-cutting” procedures are often used to identify subgroups in the data by cutting the dendrogram that represents the similarities among groups used in the agglomerative procedure. We introduce a resampling-based technique that can be used to identify cut-points of a dendrogram with a significance level based on a reference distribution for the heights of the branch points. The evaluation on synthetic data shows that the technique is robust in a variety of situations. An example with real biomarker data from the Long Life Family Study shows the usefulness of the method. PMID:27551289

  6. A Non-Parametric Surrogate-based Test of Significance for T-Wave Alternans Detection

    PubMed Central

    Nemati, Shamim; Abdala, Omar; Bazán, Violeta; Yim-Yeh, Susie; Malhotra, Atul; Clifford, Gari

    2010-01-01

    We present a non-parametric adaptive surrogate test that allows for the differentiation of statistically significant T-Wave Alternans (TWA) from alternating patterns that can be solely explained by the statistics of noise. The proposed test is based on estimating the distribution of noise induced alternating patterns in a beat sequence from a set of surrogate data derived from repeated reshuffling of the original beat sequence. Thus, in assessing the significance of the observed alternating patterns in the data no assumptions are made about the underlying noise distribution. In addition, since the distribution of noise-induced alternans magnitudes is calculated separately for each sequence of beats within the analysis window, the method is robust to data non-stationarities in both noise and TWA. The proposed surrogate method for rejecting noise was compared to the standard noise rejection methods used with the Spectral Method (SM) and the Modified Moving Average (MMA) techniques. Using a previously described realistic multi-lead model of TWA, and real physiological noise, we demonstrate the proposed approach reduces false TWA detections, while maintaining a lower missed TWA detection compared with all the other methods tested. A simple averaging-based TWA estimation algorithm was coupled with the surrogate significance testing and was evaluated on three public databases; the Normal Sinus Rhythm Database (NRSDB), the Chronic Heart Failure Database (CHFDB) and the Sudden Cardiac Death Database (SCDDB). Differences in TWA amplitudes between each database were evaluated at matched heart rate (HR) intervals from 40 to 120 beats per minute (BPM). Using the two-sample Kolmogorov-Smirnov test, we found that significant differences in TWA levels exist between each patient group at all decades of heart rates. The most marked difference was generally found at higher heart rates, and the new technique resulted in a larger margin of separability between patient populations than

  7. Detection of significant pathways in osteoporosis based on graph clustering.

    PubMed

    Xiao, Haijun; Shan, Liancheng; Zhu, Haiming; Xue, Feng

    2012-12-01

    Osteoporosis is the most common and serious skeletal disorder among the elderly, characterized by a low bone mineral density (BMD). Low bone mass in the elderly is highly dependent on their peak bone mass (PBM) as young adults. Circulating monocytes serve as early progenitors of osteoclasts and produce significant molecules for bone metabolism. An improved understanding of the biology and genetics of osteoclast differentiation at the pathway level is likely to be beneficial for the development of novel targeted approaches for osteoporosis. The objective of this study was to explore gene expression profiles comprehensively by grouping individual differentially expressed genes (DEGs) into gene sets and pathways using the graph clustering approach and Gene Ontology (GO) term enrichment analysis. The results indicated that the DEGs between high and low PBM samples were grouped into nine gene sets. The genes in clusters 1 and 8 (including GBP1, STAT1, CXCL10 and EIF2AK2) may be associated with osteoclast differentiation by the immune system response. The genes in clusters 2, 7 and 9 (including SOCS3, SOD2, ATF3, ADM EGR2 and BCL2A1) may be associated with osteoclast differentiation by responses to various stimuli. This study provides a number of candidate genes that warrant further investigation, including DDX60, HERC5, RSAD2, SIGLEC1, CMPK2, MX1, SEPING1, EPSTI1, C9orf72, PHLDA2, PFKFB3, PLEKHG2, ANKRD28, IL1RN and RNF19B. PMID:22992777

  8. Detecting modules in biological networks by edge weight clustering and entropy significance.

    PubMed

    Lecca, Paola; Re, Angela

    2015-01-01

    Detection of the modular structure of biological networks is of interest to researchers adopting a systems perspective for the analysis of omics data. Computational systems biology has provided a rich array of methods for network clustering. To date, the majority of approaches address this task through a network node classification based on topological or external quantifiable properties of network nodes. Conversely, numerical properties of network edges are underused, even though the information content which can be associated with network edges has augmented due to steady advances in molecular biology technology over the last decade. Properly accounting for network edges in the development of clustering approaches can become crucial to improve quantitative interpretation of omics data, finally resulting in more biologically plausible models. In this study, we present a novel technique for network module detection, named WG-Cluster (Weighted Graph CLUSTERing). WG-Cluster's notable features, compared to current approaches, lie in: (1) the simultaneous exploitation of network node and edge weights to improve the biological interpretability of the connected components detected, (2) the assessment of their statistical significance, and (3) the identification of emerging topological properties in the detected connected components. WG-Cluster utilizes three major steps: (i) an unsupervised version of k-means edge-based algorithm detects sub-graphs with similar edge weights, (ii) a fast-greedy algorithm detects connected components which are then scored and selected according to the statistical significance of their scores, and (iii) an analysis of the convolution between sub-graph mean edge weight and connected component score provides a summarizing view of the connected components. WG-Cluster can be applied to directed and undirected networks of different types of interacting entities and scales up to large omics data sets. Here, we show that WG-Cluster can be

  9. Detecting modules in biological networks by edge weight clustering and entropy significance

    PubMed Central

    Lecca, Paola; Re, Angela

    2015-01-01

    Detection of the modular structure of biological networks is of interest to researchers adopting a systems perspective for the analysis of omics data. Computational systems biology has provided a rich array of methods for network clustering. To date, the majority of approaches address this task through a network node classification based on topological or external quantifiable properties of network nodes. Conversely, numerical properties of network edges are underused, even though the information content which can be associated with network edges has augmented due to steady advances in molecular biology technology over the last decade. Properly accounting for network edges in the development of clustering approaches can become crucial to improve quantitative interpretation of omics data, finally resulting in more biologically plausible models. In this study, we present a novel technique for network module detection, named WG-Cluster (Weighted Graph CLUSTERing). WG-Cluster's notable features, compared to current approaches, lie in: (1) the simultaneous exploitation of network node and edge weights to improve the biological interpretability of the connected components detected, (2) the assessment of their statistical significance, and (3) the identification of emerging topological properties in the detected connected components. WG-Cluster utilizes three major steps: (i) an unsupervised version of k-means edge-based algorithm detects sub-graphs with similar edge weights, (ii) a fast-greedy algorithm detects connected components which are then scored and selected according to the statistical significance of their scores, and (iii) an analysis of the convolution between sub-graph mean edge weight and connected component score provides a summarizing view of the connected components. WG-Cluster can be applied to directed and undirected networks of different types of interacting entities and scales up to large omics data sets. Here, we show that WG-Cluster can be

  10. Avalanche Photodiode Statistics in Triggered-avalanche Detection Mode

    NASA Technical Reports Server (NTRS)

    Tan, H. H.

    1984-01-01

    The output of a triggered avalanche mode avalanche photodiode is modeled as Poisson distributed primary avalanche events plus conditionally Poisson distributed trapped carrier induced secondary events. The moment generating function as well as the mean and variance of the diode output statistics are derived. The dispersion of the output statistics is shown to always exceed that of the Poisson distribution. Several examples are considered in detail.

  11. Statistical Versus Clinical Significance for Infants with Brain Injury: Reanalysis of Outcome Data from a Randomized Controlled Study

    PubMed Central

    Badr, Lina Kurdahi

    2009-01-01

    By adopting more appropriate statistical methods to appraise data from a previously published randomized controlled trial (RCT), we evaluated the statistical and clinical significance of an intervention on the 18 month neurodevelopmental outcome of infants with suspected brain injury. The intervention group (n =32) received extensive, individualized cognitive/sensorimotor stimulation by public health nurses (PHNs) while the control group (n = 30) received standard follow-up care. At 18 months 43 infants remained in the study (22 = intervention, 21 = control). The results indicate that there was a significant statistical change within groups and a clinical significance whereby more infants in the intervention group improved in mental, motor and neurological functioning at 18 months compared to the control group. The benefits of looking at clinical significance from a meaningful aspect for practitioners are emphasized. PMID:19276403

  12. Statistical physics inspired methods to assign statistical significance in bioinformatics and proteomics: From sequence comparison to mass spectrometry based peptide sequencing

    NASA Astrophysics Data System (ADS)

    Alves, Gelio

    After the sequencing of many complete genomes, we are in a post-genomic era in which the most important task has changed from gathering genetic information to organizing the mass of data as well as under standing how components interact with each other. The former is usually undertaking using bioinformatics methods, while the latter task is generally termed proteomics. Success in both parts demands correct statistical significance assignments for results found. In my dissertation. I study two concrete examples: global sequence alignment statistics and peptide sequencing/identification using mass spectrometry. High-performance liquid chromatography coupled to a mass spectrometer (HPLC/MS/MS), enabling peptide identifications and thus protein identifications, has become the tool of choice in large-scale proteomics experiments. Peptide identification is usually done by database searches methods. The lack of robust statistical significance assignment among current methods motivated the development of a novel de novo algorithm, RAId, whose score statistics then provide statistical significance for high scoring peptides found in our custom, enzyme-digested peptide library. The ease of incorporating post-translation modifications is another important feature of RAId. To organize the massive protein/DNA data accumulated, biologists often cluster proteins according to their similarity via tools such as sequence alignment. Homologous proteins share similar domains. To assess the similarity of two domains usually requires alignment from head to toe, ie. a global alignment. A good alignment score statistics with an appropriate null model enable us to distinguish the biologically meaningful similarity from chance similarity. There has been much progress in local alignment statistics, which characterize score statistics when alignments tend to appear as a short segment of the whole sequence. For global alignment, which is useful in domain alignment, there is still much room for

  13. Statistical Anomaly Detection for Monitoring of Human Dynamics

    NASA Astrophysics Data System (ADS)

    Kamiya, K.; Fuse, T.

    2015-05-01

    Understanding of human dynamics has drawn attention to various areas. Due to the wide spread of positioning technologies that use GPS or public Wi-Fi, location information can be obtained with high spatial-temporal resolution as well as at low cost. By collecting set of individual location information in real time, monitoring of human dynamics is recently considered possible and is expected to lead to dynamic traffic control in the future. Although this monitoring focuses on detecting anomalous states of human dynamics, anomaly detection methods are developed ad hoc and not fully systematized. This research aims to define an anomaly detection problem of the human dynamics monitoring with gridded population data and develop an anomaly detection method based on the definition. According to the result of a review we have comprehensively conducted, we discussed the characteristics of the anomaly detection of human dynamics monitoring and categorized our problem to a semi-supervised anomaly detection problem that detects contextual anomalies behind time-series data. We developed an anomaly detection method based on a sticky HDP-HMM, which is able to estimate the number of hidden states according to input data. Results of the experiment with synthetic data showed that our proposed method has good fundamental performance with respect to the detection rate. Through the experiment with real gridded population data, an anomaly was detected when and where an actual social event had occurred.

  14. Determination of significant variables in compound wear using a statistical model

    SciTech Connect

    Pumwa, J.; Griffin, R.B.; Smith, C.M.

    1997-07-01

    This paper will report on a study of dry compound wear of normalized 1018 steel on A2 tool steel. Compound wear is a combination of sliding and impact wear. The compound wear machine consisted of an A2 tool steel wear plate that could be rotated, and an indentor head that held the 1018 carbon steel wear pins. The variables in the system were the rpm of the wear plate, the force with which the indentor strikes the wear plate, and the frequency with which the indentor strikes the wear plate. A statistically designed experiment was used to analyze the effects of the different variables on the compound wear process. The model developed showed that wear could be reasonably well predicted using a defined variable that was called the workrate. The paper will discuss the results of the modeling and the metallurgical changes that occurred at the indentor interface, with the wear plate, during the wear process.

  15. Detection of low contrasted membranes in electron microscope images: statistical contour validation

    NASA Astrophysics Data System (ADS)

    Karathanou, A.; Buessler, J.-L.; Kihl, H.; Urban, J.-P.

    2009-02-01

    Images of biological objects in transmission electron microscopy (TEM) are particularly noisy and low contrasted, making their processing a challenging task to accomplish. During these last years, several software tools were conceived for the automatic or semi-automatic acquisition of TEM images. However, tools for the automatic analysis of these images are still rare. Our study concerns in particular the automatic identification of artificial membranes at medium magnification for the control of an electron microscope. We recently proposed a segmentation strategy in order to detect the regions of interest. In this paper, we introduce a complementary technique to improve contour recognition by a statistical validation algorithm. Our technique explores the profile transition between two objects. A transition is validated if there exists a gradient orthogonal to the contour that is statistically significant.

  16. Statistical behavior of ten million experimental detection limits

    NASA Astrophysics Data System (ADS)

    Voigtman, Edward; Abraham, Kevin T.

    2011-02-01

    Using a lab-constructed laser-excited fluorimeter, together with bootstrapping methodology, the authors have generated many millions of experimental linear calibration curves for the detection of rhodamine 6G tetrafluoroborate in ethanol solutions. The detection limits computed from them are in excellent agreement with both previously published theory and with comprehensive Monte Carlo computer simulations. Currie decision levels and Currie detection limits, each in the theoretical, chemical content domain, were found to be simply scaled reciprocals of the non-centrality parameter of the non-central t distribution that characterizes univariate linear calibration curves that have homoscedastic, additive Gaussian white noise. Accurate and precise estimates of the theoretical, content domain Currie detection limit for the experimental system, with 5% (each) probabilities of false positives and false negatives, are presented.

  17. A parsimonious statistical method to detect groupwise differentially expressed functional connectivity networks.

    PubMed

    Chen, Shuo; Kang, Jian; Xing, Yishi; Wang, Guoqing

    2015-12-01

    Group-level functional connectivity analyses often aim to detect the altered connectivity patterns between subgroups with different clinical or psychological experimental conditions, for example, comparing cases and healthy controls. We present a new statistical method to detect differentially expressed connectivity networks with significantly improved power and lower false-positive rates. The goal of our method was to capture most differentially expressed connections within networks of constrained numbers of brain regions (by the rule of parsimony). By virtue of parsimony, the false-positive individual connectivity edges within a network are effectively reduced, whereas the informative (differentially expressed) edges are allowed to borrow strength from each other to increase the overall power of the network. We develop a test statistic for each network in light of combinatorics graph theory, and provide p-values for the networks (in the weak sense) by using permutation test with multiple-testing adjustment. We validate and compare this new approach with existing methods, including false discovery rate and network-based statistic, via simulation studies and a resting-state functional magnetic resonance imaging case-control study. The results indicate that our method can identify differentially expressed connectivity networks, whereas existing methods are limited. PMID:26416398

  18. A Parsimonious Statistical Method to Detect Groupwise Differentially Expressed Functional Connectivity Networks

    PubMed Central

    Chen, Shuo; Kang, Jian; Xing, Yishi; Wang, Guoqing

    2016-01-01

    Group-level functional connectivity analyses often aim to detect the altered connectivity patterns between subgroups with different clinical or psychological experimental conditions, for example, comparing cases and healthy controls. We present a new statistical method to detect differentially expressed connectivity networks with significantly improved power and lower false-positive rates. The goal of our method was to capture most differentially expressed connections within networks of constrained numbers of brain regions (by the rule of parsimony). By virtue of parsimony, the false-positive individual connectivity edges within a network are effectively reduced, whereas the informative (differentially expressed) edges are allowed to borrow strength from each other to increase the overall power of the network. We develop a test statistic for each network in light of combinatorics graph theory, and provide p-values for the networks (in the weak sense) by using permutation test with multiple-testing adjustment. We validate and compare this new approach with existing methods, including false discovery rate and network-based statistic, via simulation studies and a resting-state functional magnetic resonance imaging case–control study. The results indicate that our method can identify differentially expressed connectivity networks, whereas existing methods are limited. PMID:26416398

  19. A Visitor's Guide to Effect Sizes--Statistical Significance versus Practical (Clinical) Importance of Research Findings

    ERIC Educational Resources Information Center

    Hojat, Mohammadreza; Xu, Gang

    2004-01-01

    Effect Sizes (ES) are an increasingly important index used to quantify the degree of practical significance of study results. This paper gives an introduction to the computation and interpretation of effect sizes from the perspective of the consumer of the research literature. The key points made are: (1) "ES" is a useful indicator of the…

  20. Clinical Significance: A Statistical Approach to Defining Meaningful Change in Psychotherapy Research.

    ERIC Educational Resources Information Center

    Jacobson, Neil S.; Truax, Paula

    1991-01-01

    Describes ways of operationalizing clinically significant change, defined as extent to which therapy moves someone outside range of dysfunctional population or within range of functional population. Uses examples to show how clients can be categorized on basis of this definition. Proposes reliable change index (RC) to determine whether magnitude…

  1. A Proposed New "What if Reliability" Analysis for Assessing the Statistical Significance of Bivariate Relationships

    ERIC Educational Resources Information Center

    Onwuegbuzie, Anthony J.; Roberts, J. Kyle; Daniel, Larry G.

    2005-01-01

    In this article, the authors (a) illustrate how displaying disattenuated correlation coefficients alongside their unadjusted counterparts will allow researchers to assess the impact of unreliability on bivariate relationships and (b) demonstrate how a proposed new "what if reliability" analysis can complement null hypothesis significance tests of…

  2. On the statistical significance of excess events: Remarks of caution and the need for a standard method of calculation

    NASA Technical Reports Server (NTRS)

    Staubert, R.

    1985-01-01

    Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.

  3. Key statistics related to CO/sub 2/ emissions: Significant contributing countries

    SciTech Connect

    Kellogg, M.A.; Edmonds, J.A.; Scott, M.J.; Pomykala, J.S.

    1987-07-01

    This country selection task report describes and applies a methodology for identifying a set of countries responsible for significant present and anticipated future emissions of CO/sub 2/ and other radiatively important gases (RIGs). The identification of countries responsible for CO/sub 2/ and other RIGs emissions will help determine to what extent a select number of countries might be capable of influencing future emissions. Once identified, those countries could potentially exercise cooperative collective control of global emissions and thus mitigate the associated adverse affects of those emissions. The methodology developed consists of two approaches: the resource approach and the emissions approach. While conceptually very different, both approaches yield the same fundamental conclusion. The core of any international initiative to control global emissions must include three key countries: the US, USSR, and the People's Republic of China. It was also determined that broader control can be achieved through the inclusion of sixteen additional countries with significant contributions to worldwide emissions.

  4. Detection and analysis of statistical differences in anatomical shape.

    PubMed

    Golland, Polina; Grimson, W Eric L; Shenton, Martha E; Kikinis, Ron

    2005-02-01

    We present a computational framework for image-based analysis and interpretation of statistical differences in anatomical shape between populations. Applications of such analysis include understanding developmental and anatomical aspects of disorders when comparing patients versus normal controls, studying morphological changes caused by aging, or even differences in normal anatomy, for example, differences between genders. Once a quantitative description of organ shape is extracted from input images, the problem of identifying differences between the two groups can be reduced to one of the classical questions in machine learning of constructing a classifier function for assigning new examples to one of the two groups while making as few misclassifications as possible. The resulting classifier must be interpreted in terms of shape differences between the two groups back in the image domain. We demonstrate a novel approach to such interpretation that allows us to argue about the identified shape differences in anatomically meaningful terms of organ deformation. Given a classifier function in the feature space, we derive a deformation that corresponds to the differences between the two classes while ignoring shape variability within each class. Based on this approach, we present a system for statistical shape analysis using distance transforms for shape representation and the support vector machines learning algorithm for the optimal classifier estimation and demonstrate it on artificially generated data sets, as well as real medical studies. PMID:15581813

  5. Statistical Fault Detection for Parallel Applications with AutomaDeD

    SciTech Connect

    Bronevetsky, G; Laguna, I; Bagchi, S; de Supinski, B R; Ahn, D; Schulz, M

    2010-03-23

    Today's largest systems have over 100,000 cores, with million-core systems expected over the next few years. The large component count means that these systems fail frequently and often in very complex ways, making them difficult to use and maintain. While prior work on fault detection and diagnosis has focused on faults that significantly reduce system functionality, the wide variety of failure modes in modern systems makes them likely to fail in complex ways that impair system performance but are difficult to detect and diagnose. This paper presents AutomaDeD, a statistical tool that models the timing behavior of each application task and tracks its behavior to identify any abnormalities. If any are observed, AutomaDeD can immediately detect them and report to the system administrator the task where the problem began. This identification of the fault's initial manifestation can provide administrators with valuable insight into the fault's root causes, making it significantly easier and cheaper for them to understand and repair it. Our experimental evaluation shows that AutomaDeD detects a wide range of faults immediately after they occur 80% of the time, with a low false-positive rate. Further, it identifies weaknesses of the current approach that motivate future research.

  6. The statistical significance of error probability as determined from decoding simulations for long codes

    NASA Technical Reports Server (NTRS)

    Massey, J. L.

    1976-01-01

    The very low error probability obtained with long error-correcting codes results in a very small number of observed errors in simulation studies of practical size and renders the usual confidence interval techniques inapplicable to the observed error probability. A natural extension of the notion of a 'confidence interval' is made and applied to such determinations of error probability by simulation. An example is included to show the surprisingly great significance of as few as two decoding errors in a very large number of decoding trials.

  7. Longitudinal change detection in diffusion MRI using multivariate statistical testing on tensors.

    PubMed

    Grigis, Antoine; Noblet, Vincent; Heitz, Fabrice; Blanc, Frédéric; de Sèze, Jérome; Kremer, Stéphane; Rumbach, Lucien; Armspach, Jean-Paul

    2012-05-01

    This paper presents a longitudinal change detection framework for detecting relevant modifications in diffusion MRI, with application to neuromyelitis optica (NMO) and multiple sclerosis (MS). The core problem is to identify image regions that are significantly different between two scans. The proposed method is based on multivariate statistical testing which was initially introduced for tensor population comparison. We use this method in the context of longitudinal change detection by considering several strategies to build sets of tensors characterizing the variability of each voxel. These strategies make use of the variability existing in the diffusion weighted images (thanks to a bootstrap procedure), or in the spatial neighborhood of the considered voxel, or a combination of both. Results on synthetic evolutions and on real data are presented. Interestingly, experiments on NMO patients highlight the ability of the proposed approach to detect changes in the normal-appearing white matter (according to conventional MRI) that are related with physical status outcome. Experiments on MS patients highlight the ability of the proposed approach to detect changes in evolving and non-evolving lesions (according to conventional MRI). These findings might open promising prospects for the follow-up of NMO and MS pathologies. PMID:22387171

  8. Statistics

    Cancer.gov

    Links to sources of cancer-related statistics, including the Surveillance, Epidemiology and End Results (SEER) Program, SEER-Medicare datasets, cancer survivor prevalence data, and the Cancer Trends Progress Report.

  9. The Importance of Integrating Clinical Relevance and Statistical Significance in the Assessment of Quality of Care –Illustrated Using the Swedish Stroke Register

    PubMed Central

    Lindmark, Anita; van Rompaye, Bart; Goetghebeur, Els; Glader, Eva-Lotta; Eriksson, Marie

    2016-01-01

    Background When profiling hospital performance, quality inicators are commonly evaluated through hospital-specific adjusted means with confidence intervals. When identifying deviations from a norm, large hospitals can have statistically significant results even for clinically irrelevant deviations while important deviations in small hospitals can remain undiscovered. We have used data from the Swedish Stroke Register (Riksstroke) to illustrate the properties of a benchmarking method that integrates considerations of both clinical relevance and level of statistical significance. Methods The performance measure used was case-mix adjusted risk of death or dependency in activities of daily living within 3 months after stroke. A hospital was labeled as having outlying performance if its case-mix adjusted risk exceeded a benchmark value with a specified statistical confidence level. The benchmark was expressed relative to the population risk and should reflect the clinically relevant deviation that is to be detected. A simulation study based on Riksstroke patient data from 2008–2009 was performed to investigate the effect of the choice of the statistical confidence level and benchmark value on the diagnostic properties of the method. Results Simulations were based on 18,309 patients in 76 hospitals. The widely used setting, comparing 95% confidence intervals to the national average, resulted in low sensitivity (0.252) and high specificity (0.991). There were large variations in sensitivity and specificity for different requirements of statistical confidence. Lowering statistical confidence improved sensitivity with a relatively smaller loss of specificity. Variations due to different benchmark values were smaller, especially for sensitivity. This allows the choice of a clinically relevant benchmark to be driven by clinical factors without major concerns about sufficiently reliable evidence. Conclusions The study emphasizes the importance of combining clinical relevance

  10. Ultrabroadband direct detection of nonclassical photon statistics at telecom wavelength.

    PubMed

    Wakui, Kentaro; Eto, Yujiro; Benichi, Hugo; Izumi, Shuro; Yanagida, Tetsufumi; Ema, Kazuhiro; Numata, Takayuki; Fukuda, Daiji; Takeoka, Masahiro; Sasaki, Masahide

    2014-01-01

    Broadband light sources play essential roles in diverse fields, such as high-capacity optical communications, optical coherence tomography, optical spectroscopy, and spectrograph calibration. Although a nonclassical state from spontaneous parametric down-conversion may serve as a quantum counterpart, its detection and characterization have been a challenging task. Here we demonstrate the direct detection of photon numbers of an ultrabroadband (110 nm FWHM) squeezed state in the telecom band centred at 1535 nm wavelength, using a superconducting transition-edge sensor. The observed photon-number distributions violate Klyshko's criterion for the nonclassicality. From the observed photon-number distribution, we evaluate the second- and third-order correlation functions, and characterize a multimode structure, which implies that several tens of orthonormal modes of squeezing exist in the single optical pulse. Our results and techniques open up a new possibility to generate and characterize frequency-multiplexed nonclassical light sources for quantum info-communications technology. PMID:24694515

  11. Ultrabroadband direct detection of nonclassical photon statistics at telecom wavelength

    PubMed Central

    Wakui, Kentaro; Eto, Yujiro; Benichi, Hugo; Izumi, Shuro; Yanagida, Tetsufumi; Ema, Kazuhiro; Numata, Takayuki; Fukuda, Daiji; Takeoka, Masahiro; Sasaki, Masahide

    2014-01-01

    Broadband light sources play essential roles in diverse fields, such as high-capacity optical communications, optical coherence tomography, optical spectroscopy, and spectrograph calibration. Although a nonclassical state from spontaneous parametric down-conversion may serve as a quantum counterpart, its detection and characterization have been a challenging task. Here we demonstrate the direct detection of photon numbers of an ultrabroadband (110 nm FWHM) squeezed state in the telecom band centred at 1535 nm wavelength, using a superconducting transition-edge sensor. The observed photon-number distributions violate Klyshko's criterion for the nonclassicality. From the observed photon-number distribution, we evaluate the second- and third-order correlation functions, and characterize a multimode structure, which implies that several tens of orthonormal modes of squeezing exist in the single optical pulse. Our results and techniques open up a new possibility to generate and characterize frequency-multiplexed nonclassical light sources for quantum info-communications technology. PMID:24694515

  12. Tri-mean-based statistical differential gene expression detection.

    PubMed

    Ji, Zhaohua; Wu, Chunguo; Wang, Yao; Guan, Renchu; Tu, Huawei; Wu, Xiaozhou; Liang, Yanchun

    2012-01-01

    Based on the assumption that only a subset of disease group has differential gene expression, traditional detection of differentially expressed genes is under the constraint that cancer genes are up- or down-regulated in all disease samples compared with normal samples. However, in 2005, Tomlins assumed and discussed the situation that only a subset of disease samples would be activated, which are often referred to as outliers. PMID:23155761

  13. Statistical detection of the mid-Pleistocene transition

    SciTech Connect

    Maasch, K.A. )

    1988-01-01

    Statistical methods have been used to show quantitatively that the transition in mean and variance observed in delta O-18 records during the middle of the Pleistocene was abrupt. By applying these methods to all of the available records spanning the entire Pleistocene, it appears that this jump was global and primarily represents an increase in ice mass. At roughly the same time an abrupt decrease in sea surface temperature also occurred, indicative of sudden global cooling. This kind of evidence suggests a possible bifurcation of the climate system that must be accounted for in a complete explanation of the ice ages. Theoretical models including internal dynamics are capable of exhibiting this kind of rapid transition. 50 refs.

  14. A high-order statistical tensor based algorithm for anomaly detection in hyperspectral imagery.

    PubMed

    Geng, Xiurui; Sun, Kang; Ji, Luyan; Zhao, Yongchao

    2014-01-01

    Recently, high-order statistics have received more and more interest in the field of hyperspectral anomaly detection. However, most of the existing high-order statistics based anomaly detection methods require stepwise iterations since they are the direct applications of blind source separation. Moreover, these methods usually produce multiple detection maps rather than a single anomaly distribution image. In this study, we exploit the concept of coskewness tensor and propose a new anomaly detection method, which is called COSD (coskewness detector). COSD does not need iteration and can produce single detection map. The experiments based on both simulated and real hyperspectral data sets verify the effectiveness of our algorithm. PMID:25366706

  15. Myths and Misconceptions Revisited - What are the (Statistically Significant) methods to prevent employee injuries

    SciTech Connect

    Potts, T.T.; Hylko, J.M.; Almond, D.

    2007-07-01

    A company's overall safety program becomes an important consideration to continue performing work and for procuring future contract awards. When injuries or accidents occur, the employer ultimately loses on two counts - increased medical costs and employee absences. This paper summarizes the human and organizational components that contributed to successful safety programs implemented by WESKEM, LLC's Environmental, Safety, and Health Departments located in Paducah, Kentucky, and Oak Ridge, Tennessee. The philosophy of 'safety, compliance, and then production' and programmatic components implemented at the start of the contracts were qualitatively identified as contributing factors resulting in a significant accumulation of safe work hours and an Experience Modification Rate (EMR) of <1.0. Furthermore, a study by the Associated General Contractors of America quantitatively validated components, already found in the WESKEM, LLC programs, as contributing factors to prevent employee accidents and injuries. Therefore, an investment in the human and organizational components now can pay dividends later by reducing the EMR, which is the key to reducing Workers' Compensation premiums. Also, knowing your employees' demographics and taking an active approach to evaluate and prevent fatigue may help employees balance work and non-work responsibilities. In turn, this approach can assist employers in maintaining a healthy and productive workforce. For these reasons, it is essential that safety needs be considered as the starting point when performing work. (authors)

  16. Tables of square-law signal detection statistics for Hann spectra with 50 percent overlap

    NASA Technical Reports Server (NTRS)

    Deans, Stanley R.; Cullers, D. Kent

    1991-01-01

    The Search for Extraterrestrial Intelligence, currently being planned by NASA, will require that an enormous amount of data be analyzed in real time by special purpose hardware. It is expected that overlapped Hann data windows will play an important role in this analysis. In order to understand the statistical implication of this approach, it has been necessary to compute detection statistics for overlapped Hann spectra. Tables of signal detection statistics are given for false alarm rates from 10(exp -14) to 10(exp -1) and signal detection probabilities from 0.50 to 0.99; the number of computed spectra ranges from 4 to 2000.

  17. Detection and implication of significant temporal b-value variation during earthquake sequences

    NASA Astrophysics Data System (ADS)

    Gulia, Laura; Tormann, Thessa; Schorlemmer, Danijel; Wiemer, Stefan

    2016-04-01

    Earthquakes tend to cluster in space and time and periods of increased seismic activity are also periods of increased seismic hazard. Forecasting models currently used in statistical seismology and in Operational Earthquake Forecasting (e.g. ETAS) consider the spatial and temporal changes in the activity rates whilst the spatio-temporal changes in the earthquake size distribution, the b-value, are not included. Laboratory experiments on rock samples show an increasing relative proportion of larger events as the system approaches failure, and a sudden reversal of this trend after the main event. The increasing fraction of larger events during the stress increase period can be mathematically represented by a systematic b-value decrease, while the b-value increases immediately following the stress release. We investigate whether these lab-scale observations also apply to natural earthquake sequences and can help to improve our understanding of the physical processes generating damaging earthquakes. A number of large events nucleated in low b-value regions and spatial b-value variations have been extensively documented in the past. Detecting temporal b-value evolution with confidence is more difficult, one reason being the very different scales that have been suggested for a precursory drop in b-value, from a few days to decadal scale gradients. We demonstrate with the results of detailed case studies of the 2009 M6.3 L'Aquila and 2011 M9 Tohoku earthquakes that significant and meaningful temporal b-value variability can be detected throughout the sequences, which e.g. suggests that foreshock probabilities are not generic but subject to significant spatio-temporal variability. Such potential conclusions require and motivate the systematic study of many sequences to investigate whether general patterns exist that might eventually be useful for time-dependent or even real-time seismic hazard assessment.

  18. Detecting Maternal-Effect Loci by Statistical Cross-Fostering

    PubMed Central

    Wolf, Jason; Cheverud, James M.

    2012-01-01

    Great progress has been made in understanding the genetic architecture of phenotypic variation, but it is almost entirely focused on how the genotype of an individual affects the phenotype of that same individual. However, in many species the genotype of the mother is a major determinant of the phenotype of her offspring. Therefore, a complete picture of genetic architecture must include these maternal genetic effects, but they can be difficult to identify because maternal and offspring genotypes are correlated and therefore, partially confounded. We present a conceptual framework that overcomes this challenge to separate direct and maternal effects in intact families through an analysis that we call “statistical cross-fostering.” Our approach combines genotype data from mothers and their offspring to remove the confounding effects of the offspring’s own genotype on measures of maternal genetic effects. We formalize our approach in an orthogonal model and apply this model to an experimental population of mice. We identify a set of six maternal genetic effect loci that explain a substantial portion of variation in body size at all ages. This variation would be missed in an approach focused solely on direct genetic effects, but is clearly a major component of genetic architecture. Our approach can easily be adapted to examine maternal effects in different systems, and because it does not require experimental manipulation, it provides a framework that can be used to understand the contribution of maternal genetic effects in both natural and experimental populations. PMID:22377636

  19. Statistically qualified neuro-analytic failure detection method and system

    DOEpatents

    Vilim, Richard B.; Garcia, Humberto E.; Chen, Frederick W.

    2002-03-02

    An apparatus and method for monitoring a process involve development and application of a statistically qualified neuro-analytic (SQNA) model to accurately and reliably identify process change. The development of the SQNA model is accomplished in two stages: deterministic model adaption and stochastic model modification of the deterministic model adaptation. Deterministic model adaption involves formulating an analytic model of the process representing known process characteristics, augmenting the analytic model with a neural network that captures unknown process characteristics, and training the resulting neuro-analytic model by adjusting the neural network weights according to a unique scaled equation error minimization technique. Stochastic model modification involves qualifying any remaining uncertainty in the trained neuro-analytic model by formulating a likelihood function, given an error propagation equation, for computing the probability that the neuro-analytic model generates measured process output. Preferably, the developed SQNA model is validated using known sequential probability ratio tests and applied to the process as an on-line monitoring system. Illustrative of the method and apparatus, the method is applied to a peristaltic pump system.

  20. "APEC Blue" association with emission control and meteorological conditions detected by multi-scale statistics

    NASA Astrophysics Data System (ADS)

    Wang, Ping; Dai, Xin-Gang

    2016-09-01

    The term "APEC Blue" has been created to describe the clear sky days since the Asia-Pacific Economic Cooperation (APEC) summit held in Beijing during November 5-11, 2014. The duration of the APEC Blue is detected from November 1 to November 14 (hereafter Blue Window) by moving t test in statistics. Observations show that APEC Blue corresponds to low air pollution with respect to PM2.5, PM10, SO2, and NO2 under strict emission-control measures (ECMs) implemented in Beijing and surrounding areas. Quantitative assessment shows that ECM is more effective on reducing aerosols than the chemical constituents. Statistical investigation has revealed that the window also resulted from intensified wind variability, as well as weakened static stability of atmosphere (SSA). The wind and ECMs played key roles in reducing air pollution during November 1-7 and 11-13, and strict ECMs and weak SSA become dominant during November 7-10 under weak wind environment. Moving correlation manifests that the emission reduction for aerosols can increase the apparent wind cleanup effect, leading to significant negative correlations of them, and the period-wise changes in emission rate can be well identified by multi-scale correlations basing on wavelet decomposition. In short, this case study manifests statistically how human interference modified air quality in the mega city through controlling local and surrounding emissions in association with meteorological condition.

  1. Spatial-Temporal Change Detection in NDVI Data Through Statistical Parametric Mapping

    NASA Astrophysics Data System (ADS)

    McKenna, S. A.; Yadav, V.; Gutierrez, K.

    2011-12-01

    Detection of significant changes in vegetation patterns provides a quantitative means of defining phenological response to changing climate. These changes may be indicative of long-term trends or shorter-duration conditions. In either case, quantifying the significance of the change patterns is critical in order to better understand the underlying processes. Spatial and temporal correlation within imaged data sets complicates change detection and must be taken into account. We apply a novel approach, Statistical Parametric Mapping (SPM), to change detection in Normalized Difference Vegetation Index (NDVI) data. SPM has been developed for identification of regions of anomalous activation in human brain imaging given functional magnetic resonance imaging (fMRI) data. Here, we adapt SPM to work on identifying anomalous regions of vegetation density within 30 years of weekly NDVI imagery. Significant change in any given image pixel is defined as a deviation from the expected value. Expected values are calculated using sinusoidal regression models fit to previous data at that location. The amount of deviation of an observation from the expected value is calculated using a modified t-test that accounts for temporal correlation in the regression data. The t-tests are applied independently to each pixel to create a t-statistic map for every time step. For a given time step, the probability that the maximum t-value exceeds a given threshold can be calculated to determine times with significant deviations, but standard techniques are not applicable due to the large number of pixels searched to find the maximum. SPM takes into account the spatial correlation of the t-statistic map to determine the significance of the maximum observed t-value. Theory developed for truncated Gaussian fields as part of SPM provides the expected number and size of regions within the t-statistic map that exceed a given threshold. The significance of the excursion regions can be assessed and then

  2. Fast and accurate border detection in dermoscopy images using statistical region merging

    NASA Astrophysics Data System (ADS)

    Celebi, M. Emre; Kingravi, Hassan A.; Iyatomi, Hitoshi; Lee, JeongKyu; Aslandogan, Y. Alp; Van Stoecker, William; Moss, Randy; Malters, Joseph M.; Marghoob, Ashfaq A.

    2007-03-01

    As a result of advances in skin imaging technology and the development of suitable image processing techniques during the last decade, there has been a significant increase of interest in the computer-aided diagnosis of melanoma. Automated border detection is one of the most important steps in this procedure, since the accuracy of the subsequent steps crucially depends on it. In this paper, a fast and unsupervised approach to border detection in dermoscopy images of pigmented skin lesions based on the Statistical Region Merging algorithm is presented. The method is tested on a set of 90 dermoscopy images. The border detection error is quantified by a metric in which a set of dermatologist-determined borders is used as the ground-truth. The proposed method is compared to six state-of-the-art automated methods (optimized histogram thresholding, orientation-sensitive fuzzy c-means, gradient vector flow snakes, dermatologist-like tumor extraction algorithm, meanshift clustering, and the modified JSEG method) and borders determined by a second dermatologist. The results demonstrate that the presented method achieves both fast and accurate border detection in dermoscopy images.

  3. The Suzaku View of Highly Ionized Outflows in AGN. 1; Statistical Detection and Global Absorber Properties

    NASA Technical Reports Server (NTRS)

    Gofford, Jason; Reeves, James N.; Tombesi, Francesco; Braito, Valentina; Turner, T. Jane; Miller, Lance; Cappi, Massimo

    2013-01-01

    We present the results of a new spectroscopic study of Fe K-band absorption in active galactic nuclei (AGN). Using data obtained from the Suzaku public archive we have performed a statistically driven blind search for Fe XXV Healpha and/or Fe XXVI Lyalpha absorption lines in a large sample of 51 Type 1.0-1.9 AGN. Through extensive Monte Carlo simulations we find that statistically significant absorption is detected at E greater than or approximately equal to 6.7 keV in 20/51 sources at the P(sub MC) greater than or equal tov 95 per cent level, which corresponds to approximately 40 per cent of the total sample. In all cases, individual absorption lines are detected independently and simultaneously amongst the two (or three) available X-ray imaging spectrometer detectors, which confirms the robustness of the line detections. The most frequently observed outflow phenomenology consists of two discrete absorption troughs corresponding to Fe XXV Healpha and Fe XXVI Lyalpha at a common velocity shift. From xstar fitting the mean column density and ionization parameter for the Fe K absorption components are log (N(sub H) per square centimeter)) is approximately equal to 23 and log (Xi/erg centimeter per second) is approximately equal to 4.5, respectively. Measured outflow velocities span a continuous range from less than1500 kilometers per second up to approximately100 000 kilometers per second, with mean and median values of approximately 0.1 c and approximately 0.056 c, respectively. The results of this work are consistent with those recently obtained using XMM-Newton and independently provides strong evidence for the existence of very highly ionized circumnuclear material in a significant fraction of both radio-quiet and radio-loud AGN in the local universe.

  4. Parameter-space correlations of the optimal statistic for continuous gravitational-wave detection

    SciTech Connect

    Pletsch, Holger J.

    2008-11-15

    The phase parameters of matched-filtering searches for continuous gravitational-wave signals are sky position, frequency, and frequency time-derivatives. The space of these parameters features strong global correlations in the optimal detection statistic. For observation times smaller than 1 yr, the orbital motion of the Earth leads to a family of global-correlation equations which describes the 'global maximum structure' of the detection statistic. The solution to each of these equations is a different hypersurface in parameter space. The expected detection statistic is maximal at the intersection of these hypersurfaces. The global maximum structure of the detection statistic from stationary instrumental-noise artifacts is also described by the global-correlation equations. This permits the construction of a veto method which excludes false candidate events.

  5. Anomaly detection based on the statistics of hyperspectral imagery

    NASA Astrophysics Data System (ADS)

    Catterall, Stephen P.

    2004-10-01

    The purpose of this paper is to introduce a new anomaly detection algorithm for application to hyperspectral imaging (HSI) data. The algorithm uses characterisations of the joint (among wavebands) probability density function (pdf) of HSI data. Traditionally, the pdf has been assumed to be multivariate Gaussian or a mixture of multivariate Gaussians. Other distributions have been considered by previous authors, in particular Elliptically Contoured Distributions (ECDs). In this paper we focus on another distribution, which has only recently been defined and studied. This distribution has a more flexible and extensive set of parameters than the multivariate Gaussian does, yet the pdf takes on a relatively simple mathematical form. The result of all this is a model for the pdf of a hyperspectral image, consisting of a mixture of these distributions. Once a model for the pdf of a hyperspectral image has been obtained, it can be incorporated into an anomaly detector. The new anomaly detector is implemented and applied to some medium wave infra-red (MWIR) hyperspectral imagery. Comparison is made with a well-known anomaly detector, and it will be seen that the results are promising.

  6. An Unsigned Mantel-Haenszel Statistic for Detecting Uniform and Nonuniform DIF.

    ERIC Educational Resources Information Center

    Kwak, Nohoon; And Others

    This paper introduces a new method for detecting differential item functioning (DIF), the unsigned Mantel-Haenszel (UMH) statistic, and compares this method with two other chi-square methods, the Mantel-Haenszel (MH) and the absolute mean deviation (AMD) statistics, in terms of power and agreement between expected and actual false positive rates.…

  7. A Statistical Analysis of Automated and Manually Detected Fires Using Environmental Satellites

    NASA Astrophysics Data System (ADS)

    Ruminski, M. G.; McNamara, D.

    2003-12-01

    The National Environmental Satellite and Data Information Service (NESDIS) of the National Oceanic and Atmospheric Administration (NOAA) has been producing an analysis of fires and smoke over the US since 1998. This product underwent significant enhancement in June 2002 with the introduction of the Hazard Mapping System (HMS), an interactive workstation based system that displays environmental satellite imagery (NOAA Geostationary Operational Environmental Satellite (GOES), NOAA Polar Operational Environmental Satellite (POES) and National Aeronautics and Space Administration (NASA) MODIS data) and fire detects from the automated algorithms for each of the satellite sensors. The focus of this presentation is to present statistics compiled on the fire detects since November 2002. The Automated Biomass Burning Algorithm (ABBA) detects fires using GOES East and GOES West imagery. The Fire Identification, Mapping and Monitoring Algorithm (FIMMA) utilizes NOAA POES 15/16/17 imagery and the MODIS algorithm uses imagery from the MODIS instrument on the Terra and Aqua spacecraft. The HMS allows satellite analysts to inspect and interrogate the automated fire detects and the input satellite imagery. The analyst can then delete those detects that are felt to be false alarms and/or add fire points that the automated algorithms have not selected. Statistics are compiled for the number of automated detects from each of the algorithms, the number of automated detects that are deleted and the number of fire points added by the analyst for the contiguous US and immediately adjacent areas of Mexico and Canada. There is no attempt to distinguish between wildfires and control or agricultural fires. A detailed explanation of the automated algorithms is beyond the scope of this presentation. However, interested readers can find a more thorough description by going to www.ssd.noaa.gov/PS/FIRE/hms.html and scrolling down to Individual Fire Layers. For the period November 2002 thru August

  8. Fusion of Local Statistical Parameters for Buried Underwater Mine Detection in Sonar Imaging

    NASA Astrophysics Data System (ADS)

    Maussang, F.; Rombaut, M.; Chanussot, J.; Hétet, A.; Amate, M.

    2008-12-01

    Detection of buried underwater objects, and especially mines, is a current crucial strategic task. Images provided by sonar systems allowing to penetrate in the sea floor, such as the synthetic aperture sonars (SASs), are of great interest for the detection and classification of such objects. However, the signal-to-noise ratio is fairly low and advanced information processing is required for a correct and reliable detection of the echoes generated by the objects. The detection method proposed in this paper is based on a data-fusion architecture using the belief theory. The input data of this architecture are local statistical characteristics extracted from SAS data corresponding to the first-, second-, third-, and fourth-order statistical properties of the sonar images, respectively. The interest of these parameters is derived from a statistical model of the sonar data. Numerical criteria are also proposed to estimate the detection performances and to validate the method.

  9. Prognostic significance of prospectively detected bone marrow micrometastases in esophagogastric cancer: 10-year follow-up confirms prognostic significance.

    PubMed

    Ryan, Paul; Furlong, Heidi; Murphy, Conleth G; O'Sullivan, Finbarr; Walsh, Thomas N; Shanahan, Fergus; O'Sullivan, Gerald C

    2015-08-01

    We have previously reported that most patients with esophagogastric cancer (EGC) undergoing potentially curative resections have bone marrow micrometastases (BMM). We present 10-year outcome data of patients with EGC whose rib marrow was examined for micrometastases and correlate the findings with treatment and conventional pathologic tumor staging. A total of 88 patients with localized esophagogastric tumors had radical en-bloc esophagectomy, with 47 patients receiving neoadjuvant (5-fluorouracil/cisplatin based) chemoradiotherapy (CRT) and the remainder being treated with surgery alone. Rib marrow was examined for cytokeratin-18-positive cells. Standard demographic and pathologic features were recorded and patients were followed for a mean 10.04 years. Disease recurrences and all deaths in the follow-up period were recorded. No patients were lost to follow-up. 46 EGC-related and 10 non-EGC-related deaths occurred. Multivariate Cox analysis of interaction of neoadjuvant chemotherapy, nodal status, and BMM positivity showed that the contribution of BMM to disease-specific and overall survival is significant (P = 0.014). There is significant interaction with neoadjvant CRT (P < 0.005), and lymph node positivity (P < 0.001) but BMM positivity contributes to increase in risk of cancer-related death in patients treated with either CRT or surgery alone. Bone marrow micrometastases detected at the time of surgery for EGC is a long-term prognostic marker. Detection is a readily available, technically noncomplex test which offers a window on the metastatic process and a refinement of pathologic staging and is worthy of routine consideration. PMID:25914238

  10. Recovery of gastrointestinal tract motility detection using Naive Bayesian and minimum statistics.

    PubMed

    Ulusar, Umit D

    2014-08-01

    Loss of gastrointestinal motility is a significant medical setback for patients who experience abdominal surgery and contributes to the most common reason for prolonged hospital stays. Recent clinical studies suggest that initiating feeding early after abdominal surgery is beneficial. Early feeding is possible when the patients demonstrate bowel motility in the form of bowel sounds (BS). This work provides a data collection, processing and analysis methodology for detection of recovery of gastrointestinal track motility by observing BSs in auscultation recordings. The approach is suitable for real-time long-term continuous monitoring in clinical environments. The system was developed using a Naive Bayesian algorithm for pattern classification, and Minimum Statistics and spectral subtraction for noise attenuation. The solution was tested on 59h of recordings and 94.15% recognition accuracy was observed. PMID:24971526

  11. Statistical framework for detection of genetically modified organisms based on Next Generation Sequencing.

    PubMed

    Willems, Sander; Fraiture, Marie-Alice; Deforce, Dieter; De Keersmaecker, Sigrid C J; De Loose, Marc; Ruttink, Tom; Herman, Philippe; Van Nieuwerburgh, Filip; Roosens, Nancy

    2016-02-01

    Because the number and diversity of genetically modified (GM) crops has significantly increased, their analysis based on real-time PCR (qPCR) methods is becoming increasingly complex and laborious. While several pioneers already investigated Next Generation Sequencing (NGS) as an alternative to qPCR, its practical use has not been assessed for routine analysis. In this study a statistical framework was developed to predict the number of NGS reads needed to detect transgene sequences, to prove their integration into the host genome and to identify the specific transgene event in a sample with known composition. This framework was validated by applying it to experimental data from food matrices composed of pure GM rice, processed GM rice (noodles) or a 10% GM/non-GM rice mixture, revealing some influential factors. Finally, feasibility of NGS for routine analysis of GM crops was investigated by applying the framework to samples commonly encountered in routine analysis of GM crops. PMID:26304412

  12. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic

    PubMed Central

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364

  13. A Fast Framework for Abrupt Change Detection Based on Binary Search Trees and Kolmogorov Statistic.

    PubMed

    Qi, Jin-Peng; Qi, Jie; Zhang, Qing

    2016-01-01

    Change-Point (CP) detection has attracted considerable attention in the fields of data mining and statistics; it is very meaningful to discuss how to quickly and efficiently detect abrupt change from large-scale bioelectric signals. Currently, most of the existing methods, like Kolmogorov-Smirnov (KS) statistic and so forth, are time-consuming, especially for large-scale datasets. In this paper, we propose a fast framework for abrupt change detection based on binary search trees (BSTs) and a modified KS statistic, named BSTKS (binary search trees and Kolmogorov statistic). In this method, first, two binary search trees, termed as BSTcA and BSTcD, are constructed by multilevel Haar Wavelet Transform (HWT); second, three search criteria are introduced in terms of the statistic and variance fluctuations in the diagnosed time series; last, an optimal search path is detected from the root to leaf nodes of two BSTs. The studies on both the synthetic time series samples and the real electroencephalograph (EEG) recordings indicate that the proposed BSTKS can detect abrupt change more quickly and efficiently than KS, t-statistic (t), and Singular-Spectrum Analyses (SSA) methods, with the shortest computation time, the highest hit rate, the smallest error, and the highest accuracy out of four methods. This study suggests that the proposed BSTKS is very helpful for useful information inspection on all kinds of bioelectric time series signals. PMID:27413364

  14. Statistical models for LWIR hyperspectral backgrounds and their applications in chemical agent detection

    NASA Astrophysics Data System (ADS)

    Manolakis, D.; Jairam, L. G.; Zhang, D.; Rossacci, M.

    2007-04-01

    Remote detection of chemical vapors in the atmosphere has a wide range of civilian and military applications. In the past few years there has been significant interest in the detection of effluent plumes using hyperspectral imaging spectroscopy in the 8-13μm atmospheric window. A major obstacle in the full exploitation of this technology is the fact that everything in the infrared is a source of radiation. As a result, the emission from the gases of interest is always mixed with emission by the more abundant atmospheric constituents and by other objects in the sensor field of view. The radiance fluctuations in this background emission constitute an additional source of interference which is much stronger than the detector noise. In this paper we develop and evaluate parametric models for the statistical characterization of LWIR hyperspectral backgrounds. We consider models based on the theory of elliptically contoured distributions. Both models can handle heavy tails, which is a key stastical feature of hyperspectral imaging backgrounds. The paper provides a concise description of the underlying models, the algorithms used to estimate their parameters from the background spectral measurements, and the use of the developed models in the design and evaluation of chemical warfare agent detection algorithms.

  15. Inferential statistics for transient signal detection in radio astronomy phased arrays

    NASA Astrophysics Data System (ADS)

    Schmid, Natalia A.; Prestage, Richard M.; Alkhweldi, Marwan

    2015-05-01

    In this paper we develop two statistical rules for the purpose of detecting pulsars and transients using signals from phased array feeds installed on a radio telescope in place of a traditional horn receiver. We assume a known response of the antenna arrays and known coupling among array elements. We briefly summarize a set of pre-processing steps applied to raw array data prior to signal detection and then derive two detection statistics assuming two models for the unknown radio source astronomical signal: (1) the signal is deterministic and (2) the signal is a random process. The performance of both detectors is analyzed using both real and simulated data.

  16. Confounding factors in HGT detection: statistical error, coalescent effects, and multiple solutions.

    PubMed

    Than, Cuong; Ruths, Derek; Innan, Hideki; Nakhleh, Luay

    2007-05-01

    Prokaryotic organisms share genetic material across species boundaries by means of a process known as horizontal gene transfer (HGT). This process has great significance for understanding prokaryotic genome diversification and unraveling their complexities. Phylogeny-based detection of HGT is one of the most commonly used methods for this task, and is based on the fundamental fact that HGT may cause gene trees to disagree with one another, as well as with the species phylogeny. Using these methods, we can compare gene and species trees, and infer a set of HGT events to reconcile the differences among these trees. In this paper, we address three factors that confound the detection of the true HGT events, including the donors and recipients of horizontally transferred genes. First, we study experimentally the effects of error in the estimated gene trees (statistical error) on the accuracy of inferred HGT events. Our results indicate that statistical error leads to overestimation of the number of HGT events, and that HGT detection methods should be designed with unresolved gene trees in mind. Second, we demonstrate, both theoretically and empirically, that based on topological comparison alone, the number of HGT scenarios that reconcile a pair of species/gene trees may be exponential. This number may be reduced when branch lengths in both trees are estimated correctly. This set of results implies that in the absence of additional biological information, and/or a biological model of how HGT occurs, multiple HGT scenarios must be sought, and efficient strategies for how to enumerate such solutions must be developed. Third, we address the issue of lineage sorting, how it confounds HGT detection, and how to incorporate it with HGT into a single stochastic framework that distinguishes between the two events by extending population genetics theories. This result is very important, particularly when analyzing closely related organisms, where coalescent effects may not be

  17. Statistics for the Relative Detectability of Chemicals in Weak Gaseous Plumes in LWIR Hyperspectral Imagery

    SciTech Connect

    Metoyer, Candace N.; Walsh, Stephen J.; Tardiff, Mark F.; Chilton, Lawrence

    2008-10-30

    The detection and identification of weak gaseous plumes using thermal imaging data is complicated by many factors. These include variability due to atmosphere, ground and plume temperature, and background clutter. This paper presents an analysis of one formulation of the physics-based model that describes the at-sensor observed radiance. The motivating question for the analyses performed in this paper is as follows. Given a set of backgrounds, is there a way to predict the background over which the probability of detecting a given chemical will be the highest? Two statistics were developed to address this question. These statistics incorporate data from the long-wave infrared band to predict the background over which chemical detectability will be the highest. These statistics can be computed prior to data collection. As a preliminary exploration into the predictive ability of these statistics, analyses were performed on synthetic hyperspectral images. Each image contained one chemical (either carbon tetrachloride or ammonia) spread across six distinct background types. The statistics were used to generate predictions for the background ranks. Then, the predicted ranks were compared to the empirical ranks obtained from the analyses of the synthetic images. For the simplified images under consideration, the predicted and empirical ranks showed a promising amount of agreement. One statistic accurately predicted the best and worst background for detection in all of the images. Future work may include explorations of more complicated plume ingredients, background types, and noise structures.

  18. Enhancing the mathematical properties of new haplotype homozygosity statistics for the detection of selective sweeps.

    PubMed

    Garud, Nandita R; Rosenberg, Noah A

    2015-06-01

    Soft selective sweeps represent an important form of adaptation in which multiple haplotypes bearing adaptive alleles rise to high frequency. Most statistical methods for detecting selective sweeps from genetic polymorphism data, however, have focused on identifying hard selective sweeps in which a favored allele appears on a single haplotypic background; these methods might be underpowered to detect soft sweeps. Among exceptions is the set of haplotype homozygosity statistics introduced for the detection of soft sweeps by Garud et al. (2015). These statistics, examining frequencies of multiple haplotypes in relation to each other, include H12, a statistic designed to identify both hard and soft selective sweeps, and H2/H1, a statistic that conditional on high H12 values seeks to distinguish between hard and soft sweeps. A challenge in the use of H2/H1 is that its range depends on the associated value of H12, so that equal H2/H1 values might provide different levels of support for a soft sweep model at different values of H12. Here, we enhance the H12 and H2/H1 haplotype homozygosity statistics for selective sweep detection by deriving the upper bound on H2/H1 as a function of H12, thereby generating a statistic that normalizes H2/H1 to lie between 0 and 1. Through a reanalysis of resequencing data from inbred lines of Drosophila, we show that the enhanced statistic both strengthens interpretations obtained with the unnormalized statistic and leads to empirical insights that are less readily apparent without the normalization. PMID:25891325

  19. Species identification of airborne molds and its significance for the detection of indoor pollution

    SciTech Connect

    Fradkin, A.; Tobin, R.S.; Tario, S.M.; Tucic-Porretta, M.; Malloch, D.

    1987-01-01

    The present study was undertaken to investigate species composition and prevalence of culturable particles of airborne fungi in 27 homes in Toronto, Canada. Its major objective is to examine the significance of species identification for the detection of indoor pollution.

  20. Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

    PubMed Central

    Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

    2009-01-01

    In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409

  1. Statistical Track-Before-Detect Methods Applied to Faint Optical Observations of Resident Space Objects

    NASA Astrophysics Data System (ADS)

    Fujimoto, K.; Yanagisawa, T.; Uetsuhara, M.

    Automated detection and tracking of faint objects in optical, or bearing-only, sensor imagery is a topic of immense interest in space surveillance. Robust methods in this realm will lead to better space situational awareness (SSA) while reducing the cost of sensors and optics. They are especially relevant in the search for high area-to-mass ratio (HAMR) objects, as their apparent brightness can change significantly over time. A track-before-detect (TBD) approach has been shown to be suitable for faint, low signal-to-noise ratio (SNR) images of resident space objects (RSOs). TBD does not rely upon the extraction of feature points within the image based on some thresholding criteria, but rather directly takes as input the intensity information from the image file. Not only is all of the available information from the image used, TBD avoids the computational intractability of the conventional feature-based line detection (i.e., "string of pearls") approach to track detection for low SNR data. Implementation of TBD rooted in finite set statistics (FISST) theory has been proposed recently by Vo, et al. Compared to other TBD methods applied so far to SSA, such as the stacking method or multi-pass multi-period denoising, the FISST approach is statistically rigorous and has been shown to be more computationally efficient, thus paving the path toward on-line processing. In this paper, we intend to apply a multi-Bernoulli filter to actual CCD imagery of RSOs. The multi-Bernoulli filter can explicitly account for the birth and death of multiple targets in a measurement arc. TBD is achieved via a sequential Monte Carlo implementation. Preliminary results with simulated single-target data indicate that a Bernoulli filter can successfully track and detect objects with measurement SNR as low as 2.4. Although the advent of fast-cadence scientific CMOS sensors have made the automation of faint object detection a realistic goal, it is nonetheless a difficult goal, as measurements

  2. On the power for linkage detection using a test based on scan statistics.

    PubMed

    Hernández, Sonia; Siegmund, David O; de Gunst, Mathisca

    2005-04-01

    We analyze some aspects of scan statistics, which have been proposed to help for the detection of weak signals in genetic linkage analysis. We derive approximate expressions for the power of a test based on moving averages of the identity by descent allele sharing proportions for pairs of relatives at several contiguous markers. We confirm these approximate formulae by simulation. The results show that when there is a single trait-locus on a chromosome, the test based on the scan statistic is slightly less powerful than that based on the customary allele sharing statistic. On the other hand, if two genes having a moderate effect on a trait lie close to each other on the same chromosome, scan statistics improve power to detect linkage. PMID:15772104

  3. Quantile regression for the statistical analysis of immunological data with many non-detects

    PubMed Central

    2012-01-01

    Background Immunological parameters are hard to measure. A well-known problem is the occurrence of values below the detection limit, the non-detects. Non-detects are a nuisance, because classical statistical analyses, like ANOVA and regression, cannot be applied. The more advanced statistical techniques currently available for the analysis of datasets with non-detects can only be used if a small percentage of the data are non-detects. Methods and results Quantile regression, a generalization of percentiles to regression models, models the median or higher percentiles and tolerates very high numbers of non-detects. We present a non-technical introduction and illustrate it with an implementation to real data from a clinical trial. We show that by using quantile regression, groups can be compared and that meaningful linear trends can be computed, even if more than half of the data consists of non-detects. Conclusion Quantile regression is a valuable addition to the statistical methods that can be used for the analysis of immunological datasets with non-detects. PMID:22769433

  4. Anomaly detection of turbopump vibration in Space Shuttle Main Engine using statistics and neural networks

    NASA Astrophysics Data System (ADS)

    Lo, C. F.; Wu, K.; Whitehead, B. A.

    1993-06-01

    The statistical and neural networks methods have been applied to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. The anomalies are detected based on the amplitude of peaks of fundamental and harmonic frequencies in the power spectral density. These data are reduced to the proper format from sensor data measured by strain gauges and accelerometers. Both methods are feasible to detect the vibration anomalies. The statistical method requires sufficient data points to establish a reasonable statistical distribution data bank. This method is applicable for on-line operation. The neural networks method also needs to have enough data basis to train the neural networks. The testing procedure can be utilized at any time so long as the characteristics of components remain unchanged.

  5. Anomaly detection of turbopump vibration in Space Shuttle Main Engine using statistics and neural networks

    NASA Technical Reports Server (NTRS)

    Lo, C. F.; Wu, K.; Whitehead, B. A.

    1993-01-01

    The statistical and neural networks methods have been applied to investigate the feasibility in detecting anomalies in turbopump vibration of SSME. The anomalies are detected based on the amplitude of peaks of fundamental and harmonic frequencies in the power spectral density. These data are reduced to the proper format from sensor data measured by strain gauges and accelerometers. Both methods are feasible to detect the vibration anomalies. The statistical method requires sufficient data points to establish a reasonable statistical distribution data bank. This method is applicable for on-line operation. The neural networks method also needs to have enough data basis to train the neural networks. The testing procedure can be utilized at any time so long as the characteristics of components remain unchanged.

  6. Statistical properties of radio-frequency and envelope-detected signals with applications to medical ultrasound

    SciTech Connect

    Wagner, R.F.; Insana, M.F.; Brown, D.G.

    1987-05-01

    Both radio-frequency (rf) and envelope-detected signal anlayses have lead to successful tissue discrimination in medical ultrasound. The extrapolation from tissue discrimination to a description of the tissue structure requires an analysis of the statistics of complex signals. To that end, first- and second-order statistics of complex random signals are reviewed, and an example is taken from rf signal analysis of the backscattered echoes from diffuse scatterers. In this case the scattering form factor of small scatterers can be easily separated from long-range structure and corrected for the transducer characteristics, thereby yielding an instrument-independent tissue signature. The statistics of the more economical envelope- and square-law-detected signals are derived next and found to be almost identical when normalized autocorrelation functions are used. Of the two nonlinear methods of detection, the square-law or intensity scheme gives rise to statistics that are more transparent to physical insight. Moreover, an analysis of the intensity-correlation structure indicates that the contributions to the total echo signal from the diffuse scatter and from the steady and variable components of coherent scatter can still be separated and used for tissue characterization. However, this anlaysis is not system independent. Finally, the statistical methods of this paper may be applied directly to envelope signals in nuclear-magnetic-resonance imaging because of the approximate equivalence of second-order statistics for magnitude and intensity.

  7. Combined Statistical Analyses of Peptide Intensities and Peptide Occurrences Improves Identification of Significant Peptides from MS-based Proteomics Data

    SciTech Connect

    Webb-Robertson, Bobbie-Jo M.; McCue, Lee Ann; Waters, Katrina M.; Matzke, Melissa M.; Jacobs, Jon M.; Metz, Thomas O.; Varnum, Susan M.; Pounds, Joel G.

    2010-11-01

    Liquid chromatography-mass spectrometry-based (LC-MS) proteomics uses peak intensities of proteolytic peptides to infer the differential abundance of peptides/proteins. However, substantial run-to-run variability in peptide intensities and observations (presence/absence) of peptides makes data analysis quite challenging. The missing abundance values in LC-MS proteomics data are difficult to address with traditional imputation-based approaches because the mechanisms by which data are missing are unknown a priori. Data can be missing due to random mechanisms such as experimental error, or non-random mechanisms such as a true biological effect. We present a statistical approach that uses a test of independence known as a G-test to test the null hypothesis of independence between the number of missing values and the experimental groups. We pair the G-test results evaluating independence of missing data (IMD) with a standard analysis of variance (ANOVA) that uses only means and variances computed from the observed data. Each peptide is therefore represented by two statistical confidence metrics, one for qualitative differential observation and one for quantitative differential intensity. We use two simulated and two real LC-MS datasets to demonstrate the robustness and sensitivity of the ANOVA-IMD approach for assigning confidence to peptides with significant differential abundance among experimental groups.

  8. Statistical detection and modeling of the over-dispersion of winter storm occurrence

    NASA Astrophysics Data System (ADS)

    Raschke, M.

    2015-08-01

    In this communication, I improve the detection and modeling of the over-dispersion of winter storm occurrence. For this purpose, the generalized Poisson distribution and the Bayesian information criterion are introduced; the latter is used for statistical model selection. Moreover, I replace the frequently used dispersion statistics by an over-dispersion parameter which does not depend on the considered return period of storm events. These models and methods are applied in order to properly detect the over-dispersion in winter storm data for Germany, carrying out a joint estimation of the distribution models for different samples.

  9. Fast SAR Image Change Detection Using Bayesian Approach Based Difference Image and Modified Statistical Region Merging

    PubMed Central

    Ni, Weiping; Yan, Weidong; Bian, Hui; Wu, Junzheng

    2014-01-01

    A novel fast SAR image change detection method is presented in this paper. Based on a Bayesian approach, the prior information that speckles follow the Nakagami distribution is incorporated into the difference image (DI) generation process. The new DI performs much better than the familiar log ratio (LR) DI as well as the cumulant based Kullback-Leibler divergence (CKLD) DI. The statistical region merging (SRM) approach is first introduced to change detection context. A new clustering procedure with the region variance as the statistical inference variable is exhibited to tailor SAR image change detection purposes, with only two classes in the final map, the unchanged and changed classes. The most prominent advantages of the proposed modified SRM (MSRM) method are the ability to cope with noise corruption and the quick implementation. Experimental results show that the proposed method is superior in both the change detection accuracy and the operation efficiency. PMID:25258740

  10. Analytic estimation of statistical significance maps for support vector machine based multi-variate image analysis and classification

    PubMed Central

    Gaonkar, Bilwaj; Davatzikos, Christos

    2013-01-01

    Multivariate pattern analysis (MVPA) methods such as support vector machines (SVMs) have been increasingly applied to fMRI and sMRI analyses, enabling the detection of distinctive imaging patterns. However, identifying brain regions that significantly contribute to the classification/group separation requires computationally expensive permutation testing. In this paper we show that the results of SVM-permutation testing can be analytically approximated. This approximation leads to more than a thousand fold speed up of the permutation testing procedure, thereby rendering it feasible to perform such tests on standard computers. The speed up achieved makes SVM based group difference analysis competitive with standard univariate group difference analysis methods. PMID:23583748

  11. Asymmetric signal amplification for simultaneous SERS detection of multiple cancer markers with significantly different levels.

    PubMed

    Ye, Sujuan; Wu, Yanying; Zhai, Xiaomo; Tang, Bo

    2015-08-18

    Simultaneous detection of cancer biomarkers holds great promise for the early diagnosis of different cancers. However, in the presence of high-concentration biomarkers, the signals of lower-expression biomarkers are overlapped. Existing techniques are not suitable for simultaneously detecting multiple biomarkers at concentrations with significantly different orders of magnitude. Here, we propose an asymmetric signal amplification method for simultaneously detecting multiple biomarkers with significantly different levels. Using the bifunctional probe, a linear amplification mode responds to high-concentration markers, and quadratic amplification mode responds to low-concentration markers. With the combined biobarcode probe and hybridization chain reaction (HCR) amplification method, the detection limits of microRNA (miRNA) and ATP via surface-enhanced Raman scattering (SERS) detection are 0.15 fM and 20 nM, respectively, with a breakthrough of detection concentration difference over 11 orders of magnitude. Furthermore, successful determination of miRNA and ATP in cancer cells supports the practicability of the assay. This methodology promises to open an exciting new avenue for the detection of various types of biomolecules. PMID:26218034

  12. Accelerated detection of intracranial space-occupying lesions with CUDA based on statistical texture atlas in brain HRCT.

    PubMed

    Liu, Wei; Feng, Huanqing; Li, Chuanfu; Huang, Yufeng; Wu, Dehuang; Tong, Tong

    2009-01-01

    In this paper, we present a method that detects intracranial space-occupying lesions in two-dimensional (2D) brain high-resolution CT images. Use of statistical texture atlas technique localizes anatomy variation in the gray level distribution of brain images, and in turn, identifies the regions with lesions. The statistical texture atlas involves 147 HRCT slices of normal individuals and its construction is extremely time-consuming. To improve the performance of atlas construction, we have implemented the pixel-wise texture extraction procedure on Nvidia 8800GTX GPU with Compute Unified Device Architecture (CUDA) platform. Experimental results indicate that the extracted texture feature is distinctive and robust enough, and is suitable for detecting uniform and mixed density space-occupying lesions. In addition, a significant speedup against straight forward CPU version was achieved with CUDA. PMID:19963990

  13. The statistical significance test of regional climate change caused by land use and land cover variation in West China

    NASA Astrophysics Data System (ADS)

    Wang, H. J.; Shi, W. L.; Chen, X. H.

    2006-05-01

    The West Development Policy being implemented in China is causing significant land use and land cover (LULC) changes in West China. With the up-to-date satellite database of the Global Land Cover Characteristics Database (GLCCD) that characterizes the lower boundary conditions, the regional climate model RIEMS-TEA is used to simulate possible impacts of the significant LULC variation. The model was run for five continuous three-month periods from 1 June to 1 September of 1993, 1994, 1995, 1996, and 1997, and the results of the five groups are examined by means of a student t-test to identify the statistical significance of regional climate variation. The main results are: (1) The regional climate is affected by the LULC variation because the equilibrium of water and heat transfer in the air-vegetation interface is changed. (2) The integrated impact of the LULC variation on regional climate is not only limited to West China where the LULC varies, but also to some areas in the model domain where the LULC does not vary at all. (3) The East Asian monsoon system and its vertical structure are adjusted by the large scale LULC variation in western China, where the consequences axe the enhancement of the westward water vapor transfer from the east east and the relevant increase of wet-hydrostatic energy in the middle-upper atmospheric layers. (4) The ecological engineering in West China affects significantly the regional climate in Northwest China, North China and the middle-lower reaches of the Yangtze River; there are obvious effects in South, Northeast, and Southwest China, but minor effects in Tibet.

  14. Neural Evidence of Statistical Learning: Efficient Detection of Visual Regularities without Awareness

    ERIC Educational Resources Information Center

    Turk-Browne, Nicholas B.; Scholl, Brian J.; Chun, Marvin M.; Johnson, Marcia K.

    2009-01-01

    Our environment contains regularities distributed in space and time that can be detected by way of statistical learning. This unsupervised learning occurs without intent or awareness, but little is known about how it relates to other types of learning, how it affects perceptual processing, and how quickly it can occur. Here we use fMRI during…

  15. Dual-band, infrared buried mine detection using a statistical pattern recognition approach

    SciTech Connect

    Buhl, M.R.; Hernandez, J.E.; Clark, G.A.; Sengupta, S.K.

    1993-08-01

    The main objective of this work was to detect surrogate land mines, which were buried in clay and sand, using dual-band, infrared images. A statistical pattern recognition approach was used to achieve this objective. This approach is discussed and results of applying it to real images are given.

  16. Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

    USGS Publications Warehouse

    Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

    1999-01-01

    Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier

  17. Detection of Local Anomalies in High Resolution Hyperspectral Imagery Using Geostatistical Filtering and Local Spatial Statistics

    NASA Astrophysics Data System (ADS)

    Goovaerts, P.; Jacquez, G. M.; Marcus, A. W.

    2004-12-01

    Spatial data are periodically collected and processed to monitor, analyze and interpret developments in our changing environment. Remote sensing is a modern way of data collecting and has seen an enormous growth since launching of modern satellites and development of airborne sensors. In particular, the recent availability of high spatial resolution hyperspectral imagery (spatial resolution of less than 5 meters and including data collected over 64 or more bands of electromagnetic radiation for each pixel offers a great potential to significantly enhance environmental mapping and our ability to model spatial systems. High spatial resolution imagery contains a remarkable quantity of information that could be used to analyze spatial breaks (boundaries), areas of similarity (clusters), and spatial autocorrelation (associations) across the landscape. This paper addresses the specific issue of soil disturbance detection, which could indicate the presence of land mines or recent movements of troop and heavy equipment. A challenge presented by soil detection is to retain the measurement of fine-scale features (i.e. mineral soil changes, organic content changes, vegetation disturbance related changes, aspect changes) while still covering proportionally large spatial areas. An additional difficulty is that no ground data might be available for the calibration of spectral signatures, and little might be known about the size of patches of disturbed soils to be detected. This paper describes a new technique for automatic target detection which capitalizes on both spatial and across spectral bands correlation, does not require any a priori information on the target spectral signature but does not allow discrimination between targets. This approach involves successively a multivariate statistical analysis (principal component analysis) of all spectral bands, a geostatistical filtering of noise and regional background in the first principal components using factorial kriging, and

  18. Statistical methods for detecting differentially abundant features in clinical metagenomic samples.

    PubMed

    White, James Robert; Nagarajan, Niranjan; Pop, Mihai

    2009-04-01

    Numerous studies are currently underway to characterize the microbial communities inhabiting our world. These studies aim to dramatically expand our understanding of the microbial biosphere and, more importantly, hope to reveal the secrets of the complex symbiotic relationship between us and our commensal bacterial microflora. An important prerequisite for such discoveries are computational tools that are able to rapidly and accurately compare large datasets generated from complex bacterial communities to identify features that distinguish them.We present a statistical method for comparing clinical metagenomic samples from two treatment populations on the basis of count data (e.g. as obtained through sequencing) to detect differentially abundant features. Our method, Metastats, employs the false discovery rate to improve specificity in high-complexity environments, and separately handles sparsely-sampled features using Fisher's exact test. Under a variety of simulations, we show that Metastats performs well compared to previously used methods, and significantly outperforms other methods for features with sparse counts. We demonstrate the utility of our method on several datasets including a 16S rRNA survey of obese and lean human gut microbiomes, COG functional profiles of infant and mature gut microbiomes, and bacterial and viral metabolic subsystem data inferred from random sequencing of 85 metagenomes. The application of our method to the obesity dataset reveals differences between obese and lean subjects not reported in the original study. For the COG and subsystem datasets, we provide the first statistically rigorous assessment of the differences between these populations. The methods described in this paper are the first to address clinical metagenomic datasets comprising samples from multiple subjects. Our methods are robust across datasets of varied complexity and sampling level. While designed for metagenomic applications, our software can also be applied

  19. Archival Legacy Investigations of Circumstellar Environments (ALICE): Statistical assessment of point source detections

    NASA Astrophysics Data System (ADS)

    Choquet, Élodie; Pueyo, Laurent; Soummer, Rémi; Perrin, Marshall D.; Hagan, J. Brendan; Gofas-Salas, Elena; Rajan, Abhijith; Aguilar, Jonathan

    2015-09-01

    The ALICE program, for Archival Legacy Investigation of Circumstellar Environment, is currently conducting a virtual survey of about 400 stars, by re-analyzing the HST-NICMOS coronagraphic archive with advanced post-processing techniques. We present here the strategy that we adopted to identify detections and potential candidates for follow-up observations, and we give a preliminary overview of our detections. We present a statistical analysis conducted to evaluate the confidence level on these detection and the completeness of our candidate search.

  20. Frequency and Clinical Significance of Previously Undetected Incidental Findings Detected on Computed Tomography Simulation Scans for Breast Cancer Patients

    SciTech Connect

    Nakamura, Naoki; Tsunoda, Hiroko; Takahashi, Osamu; Kikuchi, Mari; Honda, Satoshi; Shikama, Naoto; Akahane, Keiko; Sekiguchi, Kenji

    2012-11-01

    Purpose: To determine the frequency and clinical significance of previously undetected incidental findings found on computed tomography (CT) simulation images for breast cancer patients. Methods and Materials: All CT simulation images were first interpreted prospectively by radiation oncologists and then double-checked by diagnostic radiologists. The official reports of CT simulation images for 881 consecutive postoperative breast cancer patients from 2009 to 2010 were retrospectively reviewed. Potentially important incidental findings (PIIFs) were defined as any previously undetected benign or malignancy-related findings requiring further medical follow-up or investigation. For all patients in whom a PIIF was detected, we reviewed the clinical records to determine the clinical significance of the PIIF. If the findings from the additional studies prompted by a PIIF required a change in management, the PIIF was also recorded as a clinically important incidental finding (CIIF). Results: There were a total of 57 (6%) PIIFs. The 57 patients in whom a PIIF was detected were followed for a median of 17 months (range, 3-26). Six cases of CIIFs (0.7% of total) were detected. Of the six CIIFs, three (50%) cases had not been noted by the radiation oncologist until the diagnostic radiologist detected the finding. On multivariate analysis, previous CT examination was an independent predictor for PIIF (p = 0.04). Patients who had not previously received chest CT examinations within 1 year had a statistically significantly higher risk of PIIF than those who had received CT examinations within 6 months (odds ratio, 3.54; 95% confidence interval, 1.32-9.50; p = 0.01). Conclusions: The rate of incidental findings prompting a change in management was low. However, radiation oncologists appear to have some difficulty in detecting incidental findings that require a change in management. Considering cost, it may be reasonable that routine interpretations are given to those who have not

  1. A novel pairwise comparison method for in silico discovery of statistically significant cis-regulatory elements in eukaryotic promoter regions: application to Arabidopsis.

    PubMed

    Shamloo-Dashtpagerdi, Roohollah; Razi, Hooman; Aliakbari, Massumeh; Lindlöf, Angelica; Ebrahimi, Mahdi; Ebrahimie, Esmaeil

    2015-01-01

    Cis regulatory elements (CREs), located within promoter regions, play a significant role in the blueprint for transcriptional regulation of genes. There is a growing interest to study the combinatorial nature of CREs including presence or absence of CREs, the number of occurrences of each CRE, as well as of their order and location relative to their target genes. Comparative promoter analysis has been shown to be a reliable strategy to test the significance of each component of promoter architecture. However, it remains unclear what level of difference in the number of occurrences of each CRE is of statistical significance in order to explain different expression patterns of two genes. In this study, we present a novel statistical approach for pairwise comparison of promoters of Arabidopsis genes in the context of number of occurrences of each CRE within the promoters. First, using the sample of 1000 Arabidopsis promoters, the results of the goodness of fit test and non-parametric analysis revealed that the number of occurrences of CREs in a promoter sequence is Poisson distributed. As a promoter sequence contained functional and non-functional CREs, we addressed the issue of the statistical distribution of functional CREs by analyzing the ChIP-seq datasets. The results showed that the number of occurrences of functional CREs over the genomic regions was determined as being Poisson distributed. In accordance with the obtained distribution of CREs occurrences, we suggested the Audic and Claverie (AC) test to compare two promoters based on the number of occurrences for the CREs. Superiority of the AC test over Chi-square (2×2) and Fisher's exact tests was also shown, as the AC test was able to detect a higher number of significant CREs. The two case studies on the Arabidopsis genes were performed in order to biologically verify the pairwise test for promoter comparison. Consequently, a number of CREs with significantly different occurrences was identified between

  2. Statistical detection of slow-mode waves in solar polar regions with SDO/AIA

    SciTech Connect

    Su, J. T.

    2014-10-01

    Observations from the Atmospheric Imaging Assembly (AIA) on board the Solar Dynamics Observatory are utilized to statistically investigate the propagating quasi-periodic oscillations in the solar polar plume and inter-plume regions. On average, the periods are found to be nearly equal in the three coronal channels of AIA 171 Å, 193 Å, and 211 Å, and the wavelengths increase with temperature from 171 Å, 193 Å, and 211 Å. The phase speeds may be inferred from the above parameters. Furthermore, the speed ratios of v {sub 193}/v {sub 171} and v {sub 211}/v {sub 171} are derived, e.g., 1.4 ± 0.8 and 2.0 ± 1.9 in the plume regions, respectively, which are equivalent to the theoretical ones for acoustic waves. We find that there are no significant differences for the detected parameters between the plume and inter-plume regions. To our knowledge, this is the first time that we have simultaneously obtained the phase speeds of slow-mode waves in the three channels in the open coronal magnetic structures due to the method adopted in the present work, which is able to minimize the influence of the jets or eruptions on wave signals.

  3. Automated microcalcification detection in mammograms using statistical variable-box-threshold filter method

    NASA Astrophysics Data System (ADS)

    Wilson, Mark; Mitra, Sunanda; Roberson, Glenn H.; Shieh, Yao-Yang

    1997-10-01

    Currently early detection of breast cancer is primarily accomplished by mammography and suspicious findings may lead to a decision for performing a biopsy. Digital enhancement and pattern recognition techniques may aid in early detection of some patterns such as microcalcification clusters indicating onset of DCIS (ductal carcinoma in situ) that accounts for 20% of all mammographically detected breast cancers and could be treated when detected early. These individual calcifications are hard to detect due to size and shape variability and inhomogeneous background texture. Our study addresses only early detection of microcalcifications that allows the radiologist to interpret the x-ray findings in computer-aided enhanced form easier than evaluating the x-ray film directly. We present an algorithm which locates microcalcifications based on local grayscale variability and of tissue structures and image statistics. Threshold filters with lower and upper bounds computed from the image statistics of the entire image and selected subimages were designed to enhance the entire image. This enhanced image was used as the initial image for identifying the micro-calcifications based on the variable box threshold filters at different resolutions. The test images came from the Texas Tech University Health Sciences Center and the MIAS mammographic database, which are classified into various categories including microcalcifications. Classification of other types of abnormalities in mammograms based on their characteristic features is addressed in later studies.

  4. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments.

    PubMed

    Sivasamy, Aneetha Avalappampatty; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T(2) method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T(2) statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  5. A Dynamic Intrusion Detection System Based on Multivariate Hotelling's T2 Statistics Approach for Network Environments

    PubMed Central

    Avalappampatty Sivasamy, Aneetha; Sundan, Bose

    2015-01-01

    The ever expanding communication requirements in today's world demand extensive and efficient network systems with equally efficient and reliable security features integrated for safe, confident, and secured communication and data transfer. Providing effective security protocols for any network environment, therefore, assumes paramount importance. Attempts are made continuously for designing more efficient and dynamic network intrusion detection models. In this work, an approach based on Hotelling's T2 method, a multivariate statistical analysis technique, has been employed for intrusion detection, especially in network environments. Components such as preprocessing, multivariate statistical analysis, and attack detection have been incorporated in developing the multivariate Hotelling's T2 statistical model and necessary profiles have been generated based on the T-square distance metrics. With a threshold range obtained using the central limit theorem, observed traffic profiles have been classified either as normal or attack types. Performance of the model, as evaluated through validation and testing using KDD Cup'99 dataset, has shown very high detection rates for all classes with low false alarm rates. Accuracy of the model presented in this work, in comparison with the existing models, has been found to be much better. PMID:26357668

  6. Detection of Clinically Significant Retinopathy of Prematurity Using Wide-angle Digital Retinal Photography

    PubMed Central

    Chiang, Michael F.; Melia, Michele; Buffenn, Angela N.; Lambert, Scott R.; Recchia, Franco M.; Simpson, Jennifer L.; Yang, Michael B.

    2013-01-01

    Objective To evaluate the accuracy of detecting clinically significant retinopathy of prematurity (ROP) using wide-angle digital retinal photography. Methods Literature searches of PubMed and the Cochrane Library databases were conducted last on December 7, 2010, and yielded 414 unique citations. The authors assessed these 414 citations and marked 82 that potentially met the inclusion criteria. These 82 studies were reviewed in full text; 28 studies met inclusion criteria. The authors extracted from these studies information about study design, interventions, outcomes, and study quality. After data abstraction, 18 were excluded for study deficiencies or because they were superseded by a more recent publication. The methodologist reviewed the remaining 10 studies and assigned ratings of evidence quality; 7 studies were rated level I evidence and 3 studies were rated level III evidence. Results There is level I evidence from ≥5 studies demonstrating that digital retinal photography has high accuracy for detection of clinically significant ROP. Level III studies have reported high accuracy, without any detectable complications, from real-world operational programs intended to detect clinically significant ROP through remote site interpretation of wide-angle retinal photographs. Conclusions Wide-angle digital retinal photography has the potential to complement standard ROP care. It may provide advantages through objective documentation of clinical examination findings, improved recognition of disease progression by comparing previous photographs, and the creation of image libraries for education and research. Financial Disclosure(s) Proprietary or commercial disclosure may be found after the references. PMID:22541632

  7. Statistical techniques for detecting the intergalactic magnetic field from large samples of extragalactic Faraday rotation data

    SciTech Connect

    Akahori, Takuya; Gaensler, B. M.; Ryu, Dongsu E-mail: bryan.gaensler@sydney.edu.au

    2014-08-01

    Rotation measure (RM) grids of extragalactic radio sources have been widely used for studying cosmic magnetism. However, their potential for exploring the intergalactic magnetic field (IGMF) in filaments of galaxies is unclear, since other Faraday-rotation media such as the radio source itself, intervening galaxies, and the interstellar medium of our Galaxy are all significant contributors. We study statistical techniques for discriminating the Faraday rotation of filaments from other sources of Faraday rotation in future large-scale surveys of radio polarization. We consider a 30° × 30° field of view toward the south Galactic pole, while varying the number of sources detected in both present and future observations. We select sources located at high redshifts and toward which depolarization and optical absorption systems are not observed so as to reduce the RM contributions from the sources and intervening galaxies. It is found that a high-pass filter can satisfactorily reduce the RM contribution from the Galaxy since the angular scale of this component toward high Galactic latitudes would be much larger than that expected for the IGMF. Present observations do not yet provide a sufficient source density to be able to estimate the RM of filaments. However, from the proposed approach with forthcoming surveys, we predict significant residuals of RM that should be ascribable to filaments. The predicted structure of the IGMF down to scales of 0.°1 should be observable with data from the Square Kilometre Array, if we achieve selections of sources toward which sightlines do not contain intervening galaxies and RM errors are less than a few rad m{sup –2}.

  8. Recommended methods for statistical analysis of data containing less-than-detectable measurements

    SciTech Connect

    Atwood, C.L.; Blackwood, L.G.; Harris, G.A.; Loehr, C.A.

    1991-09-01

    This report is a manual for statistical workers dealing with environmental measurements, when some of the measurements are not given exactly but are only reported as less than detectable. For some statistical settings with such data, many methods have been proposed in the literature, while for others few or none have been proposed. This report gives a recommended method in each of the settings considered. The body of the report gives a brief description of each recommended method. Appendix A gives example programs using the statistical package SAS, for those methods that involve nonstandard methods. Appendix B presents the methods that were compared and the reasons for selecting each recommended method, and explains any fine points that might be of interest. 7 refs., 4 figs.

  9. Recommended methods for statistical analysis of data containing less-than-detectable measurements

    SciTech Connect

    Atwood, C.L.; Blackwood, L.G.; Harris, G.A.; Loehr, C.A.

    1990-09-01

    This report is a manual for statistical workers dealing with environmental measurements, when some of the measurements are not given exactly but are only reported as less than detectable. For some statistical settings with such data, many methods have been proposed in the literature, while for others few or none have been proposed. This report gives a recommended method in each of the settings considered. The body of the report gives a brief description of each recommended method. Appendix A gives example programs using the statistical package SAS, for those methods that involve nonstandard methods. Appendix B presents the methods that were compared and the reasons for selecting each recommended method, and explains any fine points that might be of interest. This is an interim version. Future revisions will complete the recommendations. 34 refs., 2 figs., 11 tabs.

  10. Clinical significance of microembolus detection by transcranial Doppler sonography in cardiovascular clinical conditions.

    PubMed

    Hudorović, Narcis

    2006-01-01

    Transcranial Doppler can detect microembolic signals, which are characterized by unidirectional high intensity increase, short duration, and random occurrence, producing a "whistling" sound. Microembolic signals have been proven to represent solid or gaseous particles within the blood flow. Microemboli have been detected in a number of clinical cardiovascular settings: carotid artery stenosis, aortic arch plaques, atrial fibrillation, myocardial infarction, prosthetic heart valves, patent foramen ovale, valvular stenosis, during invasive procedures (angiography, percutaneous transluminal angioplasty) and surgery (carotid, cardiopulmonary bypass). Despite numerous studies performed so far, clinical significance of microembolic signals is still unclear. This article provides an overview of the development and current state of technical and clinical aspects of microembolus detection. PMID:17462357

  11. Lineaments on Skylab photographs: Detection, mapping, and hydrologic significance in central Tennessee

    NASA Technical Reports Server (NTRS)

    Moore, G. K.

    1976-01-01

    An investigation was carried out to determine the feasibility of mapping lineaments on SKYLAB photographs of central Tennessee and to determine the hydrologic significance of these lineaments, particularly as concerns the occurrence and productivity of ground water. Sixty-nine percent more lineaments were found on SKYLAB photographs by stereo viewing than by projection viewing, but longer lineaments were detected by projection viewing. Most SKYLAB lineaments consisted of topographic depressions and they followed or paralleled the streams. The remainder were found by vegetation alinements and the straight sides of ridges. Test drilling showed that the median yield of wells located on SKYLAB lineaments were about six times the median yield of wells located by random drilling. The best single detection method, in terms of potential savings, was stereo viewing. Larger savings might be achieved by locating wells on lineaments detected by both stereo viewing and projection.

  12. Statistics provide guidance for indigenous organic carbon detection on Mars missions.

    PubMed

    Sephton, Mark A; Carter, Jonathan N

    2014-08-01

    Data from the Viking and Mars Science Laboratory missions indicate the presence of organic compounds that are not definitively martian in origin. Both contamination and confounding mineralogies have been suggested as alternatives to indigenous organic carbon. Intuitive thought suggests that we are repeatedly obtaining data that confirms the same level of uncertainty. Bayesian statistics may suggest otherwise. If an organic detection method has a true positive to false positive ratio greater than one, then repeated organic matter detection progressively increases the probability of indigeneity. Bayesian statistics also reveal that methods with higher ratios of true positives to false positives give higher overall probabilities and that detection of organic matter in a sample with a higher prior probability of indigenous organic carbon produces greater confidence. Bayesian statistics, therefore, provide guidance for the planning and operation of organic carbon detection activities on Mars. Suggestions for future organic carbon detection missions and instruments are as follows: (i) On Earth, instruments should be tested with analog samples of known organic content to determine their true positive to false positive ratios. (ii) On the mission, for an instrument with a true positive to false positive ratio above one, it should be recognized that each positive detection of organic carbon will result in a progressive increase in the probability of indigenous organic carbon being present; repeated measurements, therefore, can overcome some of the deficiencies of a less-than-definitive test. (iii) For a fixed number of analyses, the highest true positive to false positive ratio method or instrument will provide the greatest probability that indigenous organic carbon is present. (iv) On Mars, analyses should concentrate on samples with highest prior probability of indigenous organic carbon; intuitive desires to contrast samples of high prior probability and low prior

  13. Test of significant toxicity: a statistical application for assessing whether an effluent or site water is truly toxic.

    PubMed

    Denton, Debra L; Diamond, Jerry; Zheng, Lei

    2011-05-01

    The U.S. Environmental Protection Agency (U.S. EPA) and state agencies implement the Clean Water Act, in part, by evaluating the toxicity of effluent and surface water samples. A common goal for both regulatory authorities and permittees is confidence in an individual test result (e.g., no-observed-effect concentration [NOEC], pass/fail, 25% effective concentration [EC25]), which is used to make regulatory decisions, such as reasonable potential determinations, permit compliance, and watershed assessments. This paper discusses an additional statistical approach (test of significant toxicity [TST]), based on bioequivalence hypothesis testing, or, more appropriately, test of noninferiority, which examines whether there is a nontoxic effect at a single concentration of concern compared with a control. Unlike the traditional hypothesis testing approach in whole effluent toxicity (WET) testing, TST is designed to incorporate explicitly both α and β error rates at levels of toxicity that are unacceptable and acceptable, given routine laboratory test performance for a given test method. Regulatory management decisions are used to identify unacceptable toxicity levels for acute and chronic tests, and the null hypothesis is constructed such that test power is associated with the ability to declare correctly a truly nontoxic sample as acceptable. This approach provides a positive incentive to generate high-quality WET data to make informed decisions regarding regulatory decisions. This paper illustrates how α and β error rates were established for specific test method designs and tests the TST approach using both simulation analyses and actual WET data. In general, those WET test endpoints having higher routine (e.g., 50th percentile) within-test control variation, on average, have higher method-specific α values (type I error rate), to maintain a desired type II error rate. This paper delineates the technical underpinnings of this approach and demonstrates the benefits

  14. Automatic detection of health changes using statistical process control techniques on measured transfer times of elderly.

    PubMed

    Baldewijns, Greet; Luca, Stijn; Nagels, William; Vanrumste, Bart; Croonenborghs, Tom

    2015-01-01

    It has been shown that gait speed and transfer times are good measures of functional ability in elderly. However, data currently acquired by systems that measure either gait speed or transfer times in the homes of elderly people require manual reviewing by healthcare workers. This reviewing process is time-consuming. To alleviate this burden, this paper proposes the use of statistical process control methods to automatically detect both positive and negative changes in transfer times. Three SPC techniques: tabular CUSUM, standardized CUSUM and EWMA, known for their ability to detect small shifts in the data, are evaluated on simulated transfer times. This analysis shows that EWMA is the best-suited method with a detection accuracy of 82% and an average detection time of 9.64 days. PMID:26737425

  15. Autologous Doping with Cryopreserved Red Blood Cells - Effects on Physical Performance and Detection by Multivariate Statistics.

    PubMed

    Malm, Christer B; Khoo, Nelson S; Granlund, Irene; Lindstedt, Emilia; Hult, Andreas

    2016-01-01

    The discovery of erythropoietin (EPO) simplified blood doping in sports, but improved detection methods, for EPO has forced cheating athletes to return to blood transfusion. Autologous blood transfusion with cryopreserved red blood cells (RBCs) is the method of choice, because no valid method exists to accurately detect such event. In endurance sports, it can be estimated that elite athletes improve performance by up to 3% with blood doping, regardless of method. Valid detection methods for autologous blood doping is important to maintain credibility of athletic performances. Recreational male (N = 27) and female (N = 11) athletes served as Transfusion (N = 28) and Control (N = 10) subjects in two different transfusion settings. Hematological variables and physical performance were measured before donation of 450 or 900 mL whole blood, and until four weeks after re-infusion of the cryopreserved RBC fraction. Blood was analyzed for transferrin, iron, Hb, EVF, MCV, MCHC, reticulocytes, leucocytes and EPO. Repeated measures multivariate analysis of variance (MANOVA) and pattern recognition using Principal Component Analysis (PCA) and Orthogonal Projections of Latent Structures (OPLS) discriminant analysis (DA) investigated differences between Control and Transfusion groups over time. Significant increase in performance (15 ± 8%) and VO2max (17 ± 10%) (mean ± SD) could be measured 48 h after RBC re-infusion, and remained increased for up to four weeks in some subjects. In total, 533 blood samples were included in the study (Clean = 220, Transfused = 313). In response to blood transfusion, the largest change in hematological variables occurred 48 h after blood donation, when Control and Transfused groups could be separated with OPLS-DA (R2 = 0.76/Q2 = 0.59). RBC re-infusion resulted in the best model (R2 = 0.40/Q2 = 0.10) at the first sampling point (48 h), predicting one false positive and one false negative. Over all, a 25% and 86% false positives ratio was

  16. Early statistical detection of anthrax outbreaks by tracking over-the-counter medication sales

    PubMed Central

    Goldenberg, Anna; Shmueli, Galit; Caruana, Richard A.; Fienberg, Stephen E.

    2002-01-01

    The recent series of anthrax attacks has reinforced the importance of biosurveillance systems for the timely detection of epidemics. This paper describes a statistical framework for monitoring grocery data to detect a large-scale but localized bioterrorism attack. Our system illustrates the potential of data sources that may be more timely than traditional medical and public health data. The system includes several layers, each customized to grocery data and tuned to finding footprints of an epidemic. We also propose an evaluation methodology that is suitable in the absence of data on large-scale bioterrorist attacks and disease outbreaks. PMID:11959973

  17. Signal Waveform Detection with Statistical Automaton for Internet and Web Service Streaming

    PubMed Central

    Liu, Yiming; Huang, Nai-Lun; Zeng, Fufu; Lin, Fang-Ying

    2014-01-01

    In recent years, many approaches have been suggested for Internet and web streaming detection. In this paper, we propose an approach to signal waveform detection for Internet and web streaming, with novel statistical automatons. The system records network connections over a period of time to form a signal waveform and compute suspicious characteristics of the waveform. Network streaming according to these selected waveform features by our newly designed Aho-Corasick (AC) automatons can be classified. We developed two versions, that is, basic AC and advanced AC-histogram waveform automata, and conducted comprehensive experimentation. The results confirm that our approach is feasible and suitable for deployment. PMID:25032231

  18. Identifying minefields and verifying clearance: adapting statistical methods for UXO target detection

    NASA Astrophysics Data System (ADS)

    Gilbert, Richard O.; O'Brien, Robert F.; Wilson, John E.; Pulsipher, Brent A.; McKinstry, Craig A.

    2003-09-01

    It may not be feasible to completely survey large tracts of land suspected of containing minefields. It is desirable to develop a characterization protocol that will confidently identify minefields within these large land tracts if they exist. Naturally, surveying areas of greatest concern and most likely locations would be necessary but will not provide the needed confidence that an unknown minefield had not eluded detection. Once minefields are detected, methods are needed to bound the area that will require detailed mine detection surveys. The US Department of Defense Strategic Environmental Research and Development Program (SERDP) is sponsoring the development of statistical survey methods and tools for detecting potential UXO targets. These methods may be directly applicable to demining efforts. Statistical methods are employed to determine the optimal geophysical survey transect spacing to have confidence of detecting target areas of a critical size, shape, and anomaly density. Other methods under development determine the proportion of a land area that must be surveyed to confidently conclude that there are no UXO present. Adaptive sampling schemes are also being developed as an approach for bounding the target areas. These methods and tools will be presented and the status of relevant research in this area will be discussed.

  19. A comparison of statistical tests for detecting differential expression using Affymetrix oligonucleotide microarrays.

    PubMed

    Vardhanabhuti, Saran; Blakemore, Steven J; Clark, Steven M; Ghosh, Sujoy; Stephens, Richard J; Rajagopalan, Dilip

    2006-01-01

    Signal quantification and detection of differential expression are critical steps in the analysis of Affymetrix microarray data. Many methods have been proposed in the literature for each of these steps. The goal of this paper is to evaluate several signal quantification methods (GCRMA, RSVD, VSN, MAS5, and Resolver) and statistical methods for differential expression (t test, Cyber-T, SAM, LPE, RankProducts, Resolver RatioBuild). Our particular focus is on the ability to detect differential expression via statistical tests. We have used two different datasets for our evaluation. First, we have used the HG-U133 Latin Square spike in dataset developed by Affymetrix. Second, we have used data from an in-house rat liver transcriptomics study following 30 different drug treatments generated using the Affymetrix RAE230A chip. Our overall recommendation based on this study is to use GCRMA for signal quantification. For detection of differential expression, GCRMA coupled with Cyber-T or SAM is the best approach, as measured by area under the receiver operating characteristic (ROC) curve. The integrated pipeline in Resolver RatioBuild combining signal quantification and detection of differential expression is an equally good alternative for detecting differentially expressed genes. For most of the differential expression algorithms we considered, the performance using MAS5 signal quantification was inferior to that of the other methods we evaluated. PMID:17233564

  20. An Exploratory Statistical Analysis of a Planet Approach-Phase Guidance Scheme Using Angular Measurements with Significant Error

    NASA Technical Reports Server (NTRS)

    Friedlander, Alan L.; Harry, David P., III

    1960-01-01

    An exploratory analysis of vehicle guidance during the approach to a target planet is presented. The objective of the guidance maneuver is to guide the vehicle to a specific perigee distance with a high degree of accuracy and minimum corrective velocity expenditure. The guidance maneuver is simulated by considering the random sampling of real measurements with significant error and reducing this information to prescribe appropriate corrective action. The instrumentation system assumed includes optical and/or infrared devices to indicate range and a reference angle in the trajectory plane. Statistical results are obtained by Monte-Carlo techniques and are shown as the expectation of guidance accuracy and velocity-increment requirements. Results are nondimensional and applicable to any planet within limits of two-body assumptions. The problem of determining how many corrections to make and when to make them is a consequence of the conflicting requirement of accurate trajectory determination and propulsion. Optimum values were found for a vehicle approaching a planet along a parabolic trajectory with an initial perigee distance of 5 radii and a target perigee of 1.02 radii. In this example measurement errors were less than i minute of arc. Results indicate that four corrections applied in the vicinity of 50, 16, 15, and 1.5 radii, respectively, yield minimum velocity-increment requirements. Thrust devices capable of producing a large variation of velocity-increment size are required. For a vehicle approaching the earth, miss distances within 32 miles are obtained with 90-percent probability. Total velocity increments used in guidance are less than 3300 feet per second with 90-percent probability. It is noted that the above representative results are valid only for the particular guidance scheme hypothesized in this analysis. A parametric study is presented which indicates the effects of measurement error size, initial perigee, and initial energy on the guidance

  1. A statistical model of the photomultiplier gain process with applications to optical pulse detection

    NASA Technical Reports Server (NTRS)

    Tan, H. H.

    1982-01-01

    A Markov diffusion model was used to determine an approximate probability density for the random gain. This approximate density preserves the correct second-order statistics and appears to be in reasonably good agreement with experimental data. The receiver operating curve for a pulse counter detector of PMT cathode emission events was analyzed using this density. The error performance of a simple binary direct detection optical communication system was also derived. Previously announced in STAR as N82-25100

  2. A statistical model of the photomultiplier gain process with applications to optical pulse detection

    NASA Technical Reports Server (NTRS)

    Tan, H. H.

    1982-01-01

    A Markov diffusion model was used to determine an approximate probability density for the random gain. This approximate density preserves the correct second-order statistics and appears to be in reasonably good agreement with experimental data. The receiver operating curve for a pulse counter detector of PMT cathode emission events was analyzed using this density. The error performance of a simple binary direct detection optical communication system was also derived.

  3. Automatic detection of coronary artery disease in myocardial perfusion SPECT using image registration and voxel to voxel statistical comparisons.

    PubMed

    Peace, R A; Staff, R T; Gemmell, H G; McKiddie, F I; Metcalfe, M J

    2002-08-01

    The purpose of this study was to compare the performance of automatic detection of coronary artery disease (CAD) with that of expert observers. A male and female normal image template was constructed from normal stress technetium-99m single photon emission computed tomography (SPECT) studies. Mean and standard deviation images for each sex were created by registering normal studies to a standard shape and position. The test group consisted of 104 patients who had been routinely referred for SPECT and angiography. The gold standard for CAD was defined by angiography. The test group studies were registered to the respective templates and the Z-score was calculated for each voxel. Voxels with a Z-score greater than 5 indicated the presence of CAD. The performance of this method and that of three observers were compared by continuous receiver operating characteristic (CROC) analysis. The overall sensitivity and specificity for automatic detection were 73% and 92%, respectively. The area (Az) under the CROC curve (+/-1 SE) for automatic detection of CAD was 0.88+/-0.06. There was no statistically significant difference between the performances of the three observers in terms of Az and that of automatic detection (P> or =0.25, univariate Z-score test). The use of this automated statistical mapping approach shows a performance comparable with experienced observers, but avoids inter-observer and intra-observer variability. PMID:12124485

  4. Assessing the utility of statistical adjustments for imperfect detection in tropical conservation science

    PubMed Central

    Banks-Leite, Cristina; Pardini, Renata; Boscolo, Danilo; Cassano, Camila Righetto; Püttker, Thomas; Barros, Camila Santos; Barlow, Jos

    2014-01-01

    1. In recent years, there has been a fast development of models that adjust for imperfect detection. These models have revolutionized the analysis of field data, and their use has repeatedly demonstrated the importance of sampling design and data quality. There are, however, several practical limitations associated with the use of detectability models which restrict their relevance to tropical conservation science. 2. We outline the main advantages of detectability models, before examining their limitations associated with their applicability to the analysis of tropical communities, rare species and large-scale data sets. Finally, we discuss whether detection probability needs to be controlled before and/or after data collection. 3. Models that adjust for imperfect detection allow ecologists to assess data quality by estimating uncertainty and to obtain adjusted ecological estimates of populations and communities. Importantly, these models have allowed informed decisions to be made about the conservation and management of target species. 4. Data requirements for obtaining unadjusted estimates are substantially lower than for detectability-adjusted estimates, which require relatively high detection/recapture probabilities and a number of repeated surveys at each location. These requirements can be difficult to meet in large-scale environmental studies where high levels of spatial replication are needed, or in the tropics where communities are composed of many naturally rare species. However, while imperfect detection can only be adjusted statistically, covariates of detection probability can also be controlled through study design. Using three study cases where we controlled for covariates of detection probability through sampling design, we show that the variation in unadjusted ecological estimates from nearly 100 species was qualitatively the same as that obtained from adjusted estimates. Finally, we discuss that the decision as to whether one should control for

  5. Colonoscopy detects significantly more flat adenomas than 3-tesla magnetic resonance colonography: a pilot trial

    PubMed Central

    Hüneburg, Robert; Kukuk, Guido; Nattermann, Jacob; Endler, Christoph; Penner, Arndt-Hendrik; Wolter, Karsten; Schild, Hans; Strassburg, Christian; Sauerbruch, Tilman; Schmitz, Volker; Willinek, Winfried

    2016-01-01

    Background and study aims: Colorectal cancer (CRC) is one of the most common cancers worldwide, and several efforts have been made to reduce its occurrence or severity. Although colonoscopy is considered the gold standard in CRC prevention, it has its disadvantages: missed lesions, bleeding, and perforation. Furthermore, a high number of patients undergo this procedure even though no polyps are detected. Therefore, an initial screening examination may be warranted. Our aim was to compare the adenoma detection rate of magnetic resonance colonography (MRC) with that of optical colonoscopy. Patients and methods: A total of 25 patients with an intermediate risk for CRC (17 men, 8 women; mean age 57.6, standard deviation 11) underwent MRC with a 3.0-tesla magnet, followed by colonoscopy. The endoscopist was initially blinded to the results of MRC and unblinded immediately after examining the distal rectum. Following endoscopic excision, the size, anatomical localization, and appearance of all polyps were described according to the Paris classification. Results: A total of 93 lesions were detected during colonoscopy. These included a malignant infiltration of the transverse colon due to gastric cancer in 1 patient, 28 adenomas in 10 patients, 19 hyperplastic polyps in 9 patients, and 45 non-neoplastic lesions. In 5 patients, no lesion was detected. MRC detected significantly fewer lesions: 1 adenoma (P = 0.001) and 1 hyperplastic polyp (P = 0.004). The malignant infiltration was seen with both modalities. Of the 28 adenomas, 23 (82 %) were 5 mm or smaller; only 4 adenomas 10 mm or larger (14 %) were detected. Conclusion: MRC does not detect adenomas sufficiently independently of the location of the lesion. Even advanced lesions were missed. Therefore, colonoscopy should still be considered the current gold standard, even for diagnostic purposes. PMID:26878043

  6. Improved bowel preparation increases polyp detection and unmasks significant polyp miss rate

    PubMed Central

    Papanikolaou, Ioannis S; Sioulas, Athanasios D; Magdalinos, Nektarios; Beintaris, Iosif; Lazaridis, Lazaros-Dimitrios; Polymeros, Dimitrios; Malli, Chrysoula; Dimitriadis, George D; Triantafyllou, Konstantinos

    2015-01-01

    AIM: To retrospectively compare previous-day vs split-dose preparation in terms of bowel cleanliness and polyp detection in patients referred for polypectomy. METHODS: Fifty patients underwent two colonoscopies: one diagnostic in a private clinic and a second for polypectomy in a University Hospital. The latter procedures were performed within 12 wk of the index ones. Examinations were accomplished by two experienced endoscopists, different in each facility. Twenty-seven patients underwent screening/surveillance colonoscopy, while the rest were symptomatic. Previous day bowel preparation was utilized initially and split-dose for polypectomy. Colon cleansing was evaluated using the Aronchick scale. We measured the number of detected polyps, and the polyp miss rates per-polyp. RESULTS: Excellent/good preparation was reported in 38 cases with previous-day preparation (76%) vs 46 with split-dose (92%), respectively (P = 0.03). One hundred and twenty-six polyps were detected initially and 169 subsequently (P < 0.0001); 88 vs 126 polyps were diminutive (P < 0.0001), 25 vs 29 small (P = 0.048) and 13 vs 14 equal or larger than 10 mm. The miss rates for total, diminutive, small and large polyps were 25.4%, 30.1%, 13.7% and 6.6%, respectively. Multivariate analysis revealed that split-dose preparation was significantly associated (OR, P) with increased number of polyps detected overall (0.869, P < 0.001), in the right (0.418, P = 0.008) and in the left colon (0.452, P = 0.02). CONCLUSION: Split-dose preparation improved colon cleansing, enhanced polyp detection and unmasked significant polyp miss rates. PMID:26488024

  7. Evolving approach and clinical significance of detecting DNA mismatch repair deficiency in colorectal carcinoma

    PubMed Central

    Shia, Jinru

    2016-01-01

    The last two decades have seen significant advancement in our understanding of colorectal tumors with DNA mismatch repair (MMR) deficiency. The ever-emerging revelations of new molecular and genetic alterations in various clinical conditions have necessitated constant refinement of disease terminology and classification. Thus, a case with the clinical condition of hereditary non-polyposis colorectal cancer as defined by the Amsterdam criteria may be one of Lynch syndrome characterized by a germline defect in one of the several MMR genes, one of the yet-to-be-defined “Lynch-like syndrome” if there is evidence of MMR deficiency in the tumor but no detectable germline MMR defect or tumor MLH1 promoter methylation, or “familial colorectal cancer type X” if there is no evidence of MMR deficiency. The detection of these conditions carries significant clinical implications. The detection tools and strategies are constantly evolving. The Bethesda guidelines symbolize a selective approach that uses clinical information and tumor histology as the basis to select high-risk individuals. Such a selective approach has subsequently been found to have limited sensitivity, and is thus gradually giving way to the alternative universal approach that tests all newly diagnosed colorectal cancers. Notably, the universal approach also has its own limitations; its cost-effectiveness in real practice, in particular, remains to be determined. Meanwhile, technological advances such as the next-generation sequencing are offering the promise of direct genetic testing for MMR deficiency at an affordable cost probably in the near future. This article reviews the up-to-date molecular definitions of the various conditions related to MMR deficiency, and discusses the tools and strategies that have been used in detecting these conditions. Special emphasis will be placed on the evolving nature and the clinical importance of the disease definitions and the detection strategies. PMID:25716099

  8. Fast microcalcification detection in ultrasound images using image enhancement and threshold adjacency statistics

    NASA Astrophysics Data System (ADS)

    Cho, Baek Hwan; Chang, Chuho; Lee, Jong-Ha; Ko, Eun Young; Seong, Yeong Kyeong; Woo, Kyoung-Gu

    2013-02-01

    The existence of microcalcifications (MCs) is an important marker of malignancy in breast cancer. In spite of the benefits in mass detection for dense breasts, ultrasonography is believed that it might not reliably detect MCs. For computer aided diagnosis systems, however, accurate detection of MCs has the possibility of improving the performance in both Breast Imaging-Reporting and Data System (BI-RADS) lexicon description for calcifications and malignancy classification. We propose a new efficient and effective method for MC detection using image enhancement and threshold adjacency statistics (TAS). The main idea of TAS is to threshold an image and to count the number of white pixels with a given number of adjacent white pixels. Our contribution is to adopt TAS features and apply image enhancement to facilitate MC detection in ultrasound images. We employed fuzzy logic, tophat filter, and texture filter to enhance images for MCs. Using a total of 591 images, the classification accuracy of the proposed method in MC detection showed 82.75%, which is comparable to that of Haralick texture features (81.38%). When combined, the performance was as high as 85.11%. In addition, our method also showed the ability in mass classification when combined with existing features. In conclusion, the proposed method exploiting image enhancement and TAS features has the potential to deal with MC detection in ultrasound images efficiently and extend to the real-time localization and visualization of MCs.

  9. Automated detection of radiology reports that document non-routine communication of critical or significant results.

    PubMed

    Lakhani, Paras; Langlotz, Curtis P

    2010-12-01

    The purpose of this investigation is to develop an automated method to accurately detect radiology reports that indicate non-routine communication of critical or significant results. Such a classification system would be valuable for performance monitoring and accreditation. Using a database of 2.3 million free-text radiology reports, a rule-based query algorithm was developed after analyzing hundreds of radiology reports that indicated communication of critical or significant results to a healthcare provider. This algorithm consisted of words and phrases used by radiologists to indicate such communications combined with specific handcrafted rules. This algorithm was iteratively refined and retested on hundreds of reports until the precision and recall did not significantly change between iterations. The algorithm was then validated on the entire database of 2.3 million reports, excluding those reports used during the testing and refinement process. Human review was used as the reference standard. The accuracy of this algorithm was determined using precision, recall, and F measure. Confidence intervals were calculated using the adjusted Wald method. The developed algorithm for detecting critical result communication has a precision of 97.0% (95% CI, 93.5-98.8%), recall 98.2% (95% CI, 93.4-100%), and F measure of 97.6% (ß=1). Our query algorithm is accurate for identifying radiology reports that contain non-routine communication of critical or significant results. This algorithm can be applied to a radiology reports database for quality control purposes and help satisfy accreditation requirements. PMID:19826871

  10. Pair normalized channel feature and statistics-based learning for high-performance pedestrian detection

    NASA Astrophysics Data System (ADS)

    Zeng, Bobo; Wang, Guijin; Ruan, Zhiwei; Lin, Xinggang; Meng, Long

    2012-07-01

    High-performance pedestrian detection with good accuracy and fast speed is an important yet challenging task in computer vision. We design a novel feature named pair normalized channel feature (PNCF), which simultaneously combines and normalizes two channel features in image channels, achieving a highly discriminative power and computational efficiency. PNCF applies to both gradient channels and color channels so that shape and appearance information are described and integrated in the same feature. To efficiently explore the formidably large PNCF feature space, we propose a statistics-based feature learning method to select a small number of potentially discriminative candidate features, which are fed into the boosting algorithm. In addition, channel compression and a hybrid pyramid are employed to speed up the multiscale detection. Experiments illustrate the effectiveness of PNCF and its learning method. Our proposed detector outperforms the state-of-the-art on several benchmark datasets in both detection accuracy and efficiency.