identifies statistically significant: Topics by Science.gov

Sample records for identifies statistically significant

Cloud-based solution to identify statistically significant MS peaks differentiating sample categories.

PubMed

Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B

2013-03-23

Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Determining coding CpG islands by identifying regions significant for pattern statistics on Markov chains.

PubMed

Singer, Meromit; Engström, Alexander; Schönhuth, Alexander; Pachter, Lior

2011-09-23

Recent experimental and computational work confirms that CpGs can be unmethylated inside coding exons, thereby showing that codons may be subjected to both genomic and epigenomic constraint. It is therefore of interest to identify coding CpG islands (CCGIs) that are regions inside exons enriched for CpGs. The difficulty in identifying such islands is that coding exons exhibit sequence biases determined by codon usage and constraints that must be taken into account. We present a method for finding CCGIs that showcases a novel approach we have developed for identifying regions of interest that are significant (with respect to a Markov chain) for the counts of any pattern. Our method begins with the exact computation of tail probabilities for the number of CpGs in all regions contained in coding exons, and then applies a greedy algorithm for selecting islands from among the regions. We show that the greedy algorithm provably optimizes a biologically motivated criterion for selecting islands while controlling the false discovery rate. We applied this approach to the human genome (hg18) and annotated CpG islands in coding exons. The statistical criterion we apply to evaluating islands reduces the number of false positives in existing annotations, while our approach to defining islands reveals significant numbers of undiscovered CCGIs in coding exons. Many of these appear to be examples of functional epigenetic specialization in coding exons.
Common pitfalls in statistical analysis: Clinical versus statistical significance

PubMed Central

Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

2015-01-01

In clinical research, study results, which are statistically significant are often interpreted as being clinically important. While statistical significance indicates the reliability of the study results, clinical significance reflects its impact on clinical practice. The third article in this series exploring pitfalls in statistical analysis clarifies the importance of differentiating between statistical significance and clinical significance. PMID:26229754
Visualizing statistical significance of disease clusters using cartograms.

PubMed

Kronenfeld, Barry J; Wong, David W S

2017-05-15

Health officials and epidemiological researchers often use maps of disease rates to identify potential disease clusters. Because these maps exaggerate the prominence of low-density districts and hide potential clusters in urban (high-density) areas, many researchers have used density-equalizing maps (cartograms) as a basis for epidemiological mapping. However, we do not have existing guidelines for visual assessment of statistical uncertainty. To address this shortcoming, we develop techniques for visual determination of statistical significance of clusters spanning one or more districts on a cartogram. We developed the techniques within a geovisual analytics framework that does not rely on automated significance testing, and can therefore facilitate visual analysis to detect clusters that automated techniques might miss. On a cartogram of the at-risk population, the statistical significance of a disease cluster is determinate from the rate, area and shape of the cluster under standard hypothesis testing scenarios. We develop formulae to determine, for a given rate, the area required for statistical significance of a priori and a posteriori designated regions under certain test assumptions. Uniquely, our approach enables dynamic inference of aggregate regions formed by combining individual districts. The method is implemented in interactive tools that provide choropleth mapping, automated legend construction and dynamic search tools to facilitate cluster detection and assessment of the validity of tested assumptions. A case study of leukemia incidence analysis in California demonstrates the ability to visually distinguish between statistically significant and insignificant regions. The proposed geovisual analytics approach enables intuitive visual assessment of statistical significance of arbitrarily defined regions on a cartogram. Our research prompts a broader discussion of the role of geovisual exploratory analyses in disease mapping and the appropriate
Estimation of the geochemical threshold and its statistical significance

USGS Publications Warehouse

Miesch, A.T.

1981-01-01

A statistic is proposed for estimating the geochemical threshold and its statistical significance, or it may be used to identify a group of extreme values that can be tested for significance by other means. The statistic is the maximum gap between adjacent values in an ordered array after each gap has been adjusted for the expected frequency. The values in the ordered array are geochemical values transformed by either ln(?? - ??) or ln(?? - ??) and then standardized so that the mean is zero and the variance is unity. The expected frequency is taken from a fitted normal curve with unit area. The midpoint of an adjusted gap that exceeds the corresponding critical value may be taken as an estimate of the geochemical threshold, and the associated probability indicates the likelihood that the threshold separates two geochemical populations. The adjusted gap test may fail to identify threshold values if the variation tends to be continuous from background values to the higher values that reflect mineralized ground. However, the test will serve to identify other anomalies that may be too subtle to have been noted by other means. ?? 1981.
Statistical Significance Testing from Three Perspectives and Interpreting Statistical Significance and Nonsignificance and the Role of Statistics in Research.

ERIC Educational Resources Information Center

Levin, Joel R.; And Others

1993-01-01

Journal editors respond to criticisms of reliance on statistical significance in research reporting. Joel R. Levin ("Journal of Educational Psychology") defends its use, whereas William D. Schafer ("Measurement and Evaluation in Counseling and Development") emphasizes the distinction between statistically significant and important. William Asher…
Statistical significance test for transition matrices of atmospheric Markov chains

NASA Technical Reports Server (NTRS)

Vautard, Robert; Mo, Kingtse C.; Ghil, Michael

1990-01-01

Low-frequency variability of large-scale atmospheric dynamics can be represented schematically by a Markov chain of multiple flow regimes. This Markov chain contains useful information for the long-range forecaster, provided that the statistical significance of the associated transition matrix can be reliably tested. Monte Carlo simulation yields a very reliable significance test for the elements of this matrix. The results of this test agree with previously used empirical formulae when each cluster of maps identified as a distinct flow regime is sufficiently large and when they all contain a comparable number of maps. Monte Carlo simulation provides a more reliable way to test the statistical significance of transitions to and from small clusters. It can determine the most likely transitions, as well as the most unlikely ones, with a prescribed level of statistical significance.
Précis of statistical significance: rationale, validity, and utility.

PubMed

Chow, S L

1998-04-01

The null-hypothesis significance-test procedure (NHSTP) is defended in the context of the theory-corroboration experiment, as well as the following contrasts: (a) substantive hypotheses versus statistical hypotheses, (b) theory corroboration versus statistical hypothesis testing, (c) theoretical inference versus statistical decision, (d) experiments versus nonexperimental studies, and (e) theory corroboration versus treatment assessment. The null hypothesis can be true because it is the hypothesis that errors are randomly distributed in data. Moreover, the null hypothesis is never used as a categorical proposition. Statistical significance means only that chance influences can be excluded as an explanation of data; it does not identify the nonchance factor responsible. The experimental conclusion is drawn with the inductive principle underlying the experimental design. A chain of deductive arguments gives rise to the theoretical conclusion via the experimental conclusion. The anomalous relationship between statistical significance and the effect size often used to criticize NHSTP is more apparent than real. The absolute size of the effect is not an index of evidential support for the substantive hypothesis. Nor is the effect size, by itself, informative as to the practical importance of the research result. Being a conditional probability, statistical power cannot be the a priori probability of statistical significance. The validity of statistical power is debatable because statistical significance is determined with a single sampling distribution of the test statistic based on H0, whereas it takes two distributions to represent statistical power or effect size. Sample size should not be determined in the mechanical manner envisaged in power analysis. It is inappropriate to criticize NHSTP for nonstatistical reasons. At the same time, neither effect size, nor confidence interval estimate, nor posterior probability can be used to exclude chance as an explanation of
Publication of statistically significant research findings in prosthodontics & implant dentistry in the context of other dental specialties.

PubMed

Papageorgiou, Spyridon N; Kloukos, Dimitrios; Petridis, Haralampos; Pandis, Nikolaos

2015-10-01

To assess the hypothesis that there is excessive reporting of statistically significant studies published in prosthodontic and implantology journals, which could indicate selective publication. The last 30 issues of 9 journals in prosthodontics and implant dentistry were hand-searched for articles with statistical analyses. The percentages of significant and non-significant results were tabulated by parameter of interest. Univariable/multivariable logistic regression analyses were applied to identify possible predictors of reporting statistically significance findings. The results of this study were compared with similar studies in dentistry with random-effects meta-analyses. From the 2323 included studies 71% of them reported statistically significant results, with the significant results ranging from 47% to 86%. Multivariable modeling identified that geographical area and involvement of statistician were predictors of statistically significant results. Compared to interventional studies, the odds that in vitro and observational studies would report statistically significant results was increased by 1.20 times (OR: 2.20, 95% CI: 1.66-2.92) and 0.35 times (OR: 1.35, 95% CI: 1.05-1.73), respectively. The probability of statistically significant results from randomized controlled trials was significantly lower compared to various study designs (difference: 30%, 95% CI: 11-49%). Likewise the probability of statistically significant results in prosthodontics and implant dentistry was lower compared to other dental specialties, but this result did not reach statistical significant (P>0.05). The majority of studies identified in the fields of prosthodontics and implant dentistry presented statistically significant results. The same trend existed in publications of other specialties in dentistry. Copyright © 2015 Elsevier Ltd. All rights reserved.
Statistical Significance for Hierarchical Clustering

PubMed Central

Kimes, Patrick K.; Liu, Yufeng; Hayes, D. Neil; Marron, J. S.

2017-01-01

Summary Cluster analysis has proved to be an invaluable tool for the exploratory and unsupervised analysis of high dimensional datasets. Among methods for clustering, hierarchical approaches have enjoyed substantial popularity in genomics and other fields for their ability to simultaneously uncover multiple layers of clustering structure. A critical and challenging question in cluster analysis is whether the identified clusters represent important underlying structure or are artifacts of natural sampling variation. Few approaches have been proposed for addressing this problem in the context of hierarchical clustering, for which the problem is further complicated by the natural tree structure of the partition, and the multiplicity of tests required to parse the layers of nested clusters. In this paper, we propose a Monte Carlo based approach for testing statistical significance in hierarchical clustering which addresses these issues. The approach is implemented as a sequential testing procedure guaranteeing control of the family-wise error rate. Theoretical justification is provided for our approach, and its power to detect true clustering structure is illustrated through several simulation studies and applications to two cancer gene expression datasets. PMID:28099990
Identifying clusters of active transportation using spatial scan statistics.

PubMed

Huang, Lan; Stinchcomb, David G; Pickle, Linda W; Dill, Jennifer; Berrigan, David

2009-08-01

There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007-2008. Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units.
Identifying Clusters of Active Transportation Using Spatial Scan Statistics

PubMed Central

Huang, Lan; Stinchcomb, David G.; Pickle, Linda W.; Dill, Jennifer; Berrigan, David

2009-01-01

Background There is an intense interest in the possibility that neighborhood characteristics influence active transportation such as walking or biking. The purpose of this paper is to illustrate how a spatial cluster identification method can evaluate the geographic variation of active transportation and identify neighborhoods with unusually high/low levels of active transportation. Methods Self-reported walking/biking prevalence, demographic characteristics, street connectivity variables, and neighborhood socioeconomic data were collected from respondents to the 2001 California Health Interview Survey (CHIS; N=10,688) in Los Angeles County (LAC) and San Diego County (SDC). Spatial scan statistics were used to identify clusters of high or low prevalence (with and without age-adjustment) and the quantity of time spent walking and biking. The data, a subset from the 2001 CHIS, were analyzed in 2007–2008. Results Geographic clusters of significantly high or low prevalence of walking and biking were detected in LAC and SDC. Structural variables such as street connectivity and shorter block lengths are consistently associated with higher levels of active transportation, but associations between active transportation and socioeconomic variables at the individual and neighborhood levels are mixed. Only one cluster with less time spent walking and biking among walkers/bikers was detected in LAC, and this was of borderline significance. Age-adjustment affects the clustering pattern of walking/biking prevalence in LAC, but not in SDC. Conclusions The use of spatial scan statistics to identify significant clustering of health behaviors such as active transportation adds to the more traditional regression analysis that examines associations between behavior and environmental factors by identifying specific geographic areas with unusual levels of the behavior independent of predefined administrative units. PMID:19589451
Efficiently Identifying Significant Associations in Genome-wide Association Studies

PubMed Central

Eskin, Eleazar

2013-01-01

Abstract Over the past several years, genome-wide association studies (GWAS) have implicated hundreds of genes in common disease. More recently, the GWAS approach has been utilized to identify regions of the genome that harbor variation affecting gene expression or expression quantitative trait loci (eQTLs). Unlike GWAS applied to clinical traits, where only a handful of phenotypes are analyzed per study, in eQTL studies, tens of thousands of gene expression levels are measured, and the GWAS approach is applied to each gene expression level. This leads to computing billions of statistical tests and requires substantial computational resources, particularly when applying novel statistical methods such as mixed models. We introduce a novel two-stage testing procedure that identifies all of the significant associations more efficiently than testing all the single nucleotide polymorphisms (SNPs). In the first stage, a small number of informative SNPs, or proxies, across the genome are tested. Based on their observed associations, our approach locates the regions that may contain significant SNPs and only tests additional SNPs from those regions. We show through simulations and analysis of real GWAS datasets that the proposed two-stage procedure increases the computational speed by a factor of 10. Additionally, efficient implementation of our software increases the computational speed relative to the state-of-the-art testing approaches by a factor of 75. PMID:24033261
Active Learning with Rationales for Identifying Operationally Significant Anomalies in Aviation

NASA Technical Reports Server (NTRS)

Sharma, Manali; Das, Kamalika; Bilgic, Mustafa; Matthews, Bryan; Nielsen, David Lynn; Oza, Nikunj C.

2016-01-01

A major focus of the commercial aviation community is discovery of unknown safety events in flight operations data. Data-driven unsupervised anomaly detection methods are better at capturing unknown safety events compared to rule-based methods which only look for known violations. However, not all statistical anomalies that are discovered by these unsupervised anomaly detection methods are operationally significant (e.g., represent a safety concern). Subject Matter Experts (SMEs) have to spend significant time reviewing these statistical anomalies individually to identify a few operationally significant ones. In this paper we propose an active learning algorithm that incorporates SME feedback in the form of rationales to build a classifier that can distinguish between uninteresting and operationally significant anomalies. Experimental evaluation on real aviation data shows that our approach improves detection of operationally significant events by as much as 75% compared to the state-of-the-art. The learnt classifier also generalizes well to additional validation data sets.
Significance levels for studies with correlated test statistics.

PubMed

Shi, Jianxin; Levinson, Douglas F; Whittemore, Alice S

2008-07-01

When testing large numbers of null hypotheses, one needs to assess the evidence against the global null hypothesis that none of the hypotheses is false. Such evidence typically is based on the test statistic of the largest magnitude, whose statistical significance is evaluated by permuting the sample units to simulate its null distribution. Efron (2007) has noted that correlation among the test statistics can induce substantial interstudy variation in the shapes of their histograms, which may cause misleading tail counts. Here, we show that permutation-based estimates of the overall significance level also can be misleading when the test statistics are correlated. We propose that such estimates be conditioned on a simple measure of the spread of the observed histogram, and we provide a method for obtaining conditional significance levels. We justify this conditioning using the conditionality principle described by Cox and Hinkley (1974). Application of the method to gene expression data illustrates the circumstances when conditional significance levels are needed.
Identification of Microorganisms by High Resolution Tandem Mass Spectrometry with Accurate Statistical Significance

NASA Astrophysics Data System (ADS)

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo

2016-02-01

Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Quantification and statistical significance analysis of group separation in NMR-based metabonomics studies

PubMed Central

Goodpaster, Aaron M.; Kennedy, Michael A.

2015-01-01

Currently, no standard metrics are used to quantify cluster separation in PCA or PLS-DA scores plots for metabonomics studies or to determine if cluster separation is statistically significant. Lack of such measures makes it virtually impossible to compare independent or inter-laboratory studies and can lead to confusion in the metabonomics literature when authors putatively identify metabolites distinguishing classes of samples based on visual and qualitative inspection of scores plots that exhibit marginal separation. While previous papers have addressed quantification of cluster separation in PCA scores plots, none have advocated routine use of a quantitative measure of separation that is supported by a standard and rigorous assessment of whether or not the cluster separation is statistically significant. Here quantification and statistical significance of separation of group centroids in PCA and PLS-DA scores plots are considered. The Mahalanobis distance is used to quantify the distance between group centroids, and the two-sample Hotelling's T2 test is computed for the data, related to an F-statistic, and then an F-test is applied to determine if the cluster separation is statistically significant. We demonstrate the value of this approach using four datasets containing various degrees of separation, ranging from groups that had no apparent visual cluster separation to groups that had no visual cluster overlap. Widespread adoption of such concrete metrics to quantify and evaluate the statistical significance of PCA and PLS-DA cluster separation would help standardize reporting of metabonomics data. PMID:26246647
Using spatial statistics to identify emerging hot spots of forest loss

NASA Astrophysics Data System (ADS)

Harris, Nancy L.; Goldman, Elizabeth; Gabris, Christopher; Nordling, Jon; Minnemeyer, Susan; Ansari, Stephen; Lippmann, Michael; Bennett, Lauren; Raad, Mansour; Hansen, Matthew; Potapov, Peter

2017-02-01

As sources of data for global forest monitoring grow larger, more complex and numerous, data analysis and interpretation become critical bottlenecks for effectively using them to inform land use policy discussions. Here in this paper, we present a method that combines big data analytical tools with Emerging Hot Spot Analysis (ArcGIS) to identify statistically significant spatiotemporal trends of forest loss in Brazil, Indonesia and the Democratic Republic of Congo (DRC) between 2000 and 2014. Results indicate that while the overall rate of forest loss in Brazil declined over the 14-year time period, spatiotemporal patterns of loss shifted, with forest loss significantly diminishing within the Amazonian states of Mato Grosso and Rondônia and intensifying within the cerrado biome. In Indonesia, forest loss intensified in Riau province in Sumatra and in Sukamara and West Kotawaringin regencies in Central Kalimantan. Substantial portions of West Kalimantan became new and statistically significant hot spots of forest loss in the years 2013 and 2014. Similarly, vast areas of DRC emerged as significant new hot spots of forest loss, with intensified loss radiating out from city centers such as Beni and Kisangani. While our results focus on identifying significant trends at the national scale, we also demonstrate the scalability of our approach to smaller or larger regions depending on the area of interest and specific research question involved. When combined with other contextual information, these statistical data models can help isolate the most significant clusters of loss occurring over dynamic forest landscapes and provide more coherent guidance for the allocation of resources for forest monitoring and enforcement efforts.
Statistical Significance vs. Practical Significance: An Exploration through Health Education

ERIC Educational Resources Information Center

Rosen, Brittany L.; DeMaria, Andrea L.

2012-01-01

The purpose of this paper is to examine the differences between statistical and practical significance, including strengths and criticisms of both methods, as well as provide information surrounding the application of various effect sizes and confidence intervals within health education research. Provided are recommendations, explanations and…
Two statistics for evaluating parameter identifiability and error reduction

USGS Publications Warehouse

Doherty, John; Hunt, Randall J.

2009-01-01

Two statistics are presented that can be used to rank input parameters utilized by a model in terms of their relative identifiability based on a given or possible future calibration dataset. Identifiability is defined here as the capability of model calibration to constrain parameters used by a model. Both statistics require that the sensitivity of each model parameter be calculated for each model output for which there are actual or presumed field measurements. Singular value decomposition (SVD) of the weighted sensitivity matrix is then undertaken to quantify the relation between the parameters and observations that, in turn, allows selection of calibration solution and null spaces spanned by unit orthogonal vectors. The first statistic presented, "parameter identifiability", is quantitatively defined as the direction cosine between a parameter and its projection onto the calibration solution space. This varies between zero and one, with zero indicating complete non-identifiability and one indicating complete identifiability. The second statistic, "relative error reduction", indicates the extent to which the calibration process reduces error in estimation of a parameter from its pre-calibration level where its value must be assigned purely on the basis of prior expert knowledge. This is more sophisticated than identifiability, in that it takes greater account of the noise associated with the calibration dataset. Like identifiability, it has a maximum value of one (which can only be achieved if there is no measurement noise). Conceptually it can fall to zero; and even below zero if a calibration problem is poorly posed. An example, based on a coupled groundwater/surface-water model, is included that demonstrates the utility of the statistics. ?? 2009 Elsevier B.V.

Social significance of community structure: Statistical view

NASA Astrophysics Data System (ADS)

Li, Hui-Jia; Daniels, Jasmine J.

2015-01-01

Community structure analysis is a powerful tool for social networks that can simplify their topological and functional analysis considerably. However, since community detection methods have random factors and real social networks obtained from complex systems always contain error edges, evaluating the significance of a partitioned community structure is an urgent and important question. In this paper, integrating the specific characteristics of real society, we present a framework to analyze the significance of a social community. The dynamics of social interactions are modeled by identifying social leaders and corresponding hierarchical structures. Instead of a direct comparison with the average outcome of a random model, we compute the similarity of a given node with the leader by the number of common neighbors. To determine the membership vector, an efficient community detection algorithm is proposed based on the position of the nodes and their corresponding leaders. Then, using a log-likelihood score, the tightness of the community can be derived. Based on the distribution of community tightness, we establish a connection between p -value theory and network analysis, and then we obtain a significance measure of statistical form . Finally, the framework is applied to both benchmark networks and real social networks. Experimental results show that our work can be used in many fields, such as determining the optimal number of communities, analyzing the social significance of a given community, comparing the performance among various algorithms, etc.
Health significance and statistical uncertainty. The value of P-value.

PubMed

Consonni, Dario; Bertazzi, Pier Alberto

2017-10-27

The P-value is widely used as a summary statistics of scientific results. Unfortunately, there is a widespread tendency to dichotomize its value in "P<0.05" (defined as "statistically significant") and "P>0.05" ("statistically not significant"), with the former implying a "positive" result and the latter a "negative" one. To show the unsuitability of such an approach when evaluating the effects of environmental and occupational risk factors. We provide examples of distorted use of P-value and of the negative consequences for science and public health of such a black-and-white vision. The rigid interpretation of P-value as a dichotomy favors the confusion between health relevance and statistical significance, discourages thoughtful thinking, and distorts attention from what really matters, the health significance. A much better way to express and communicate scientific results involves reporting effect estimates (e.g., risks, risks ratios or risk differences) and their confidence intervals (CI), which summarize and convey both health significance and statistical uncertainty. Unfortunately, many researchers do not usually consider the whole interval of CI but only examine if it includes the null-value, therefore degrading this procedure to the same P-value dichotomy (statistical significance or not). In reporting statistical results of scientific research present effects estimates with their confidence intervals and do not qualify the P-value as "significant" or "not significant".
The questioned p value: clinical, practical and statistical significance.

PubMed

Jiménez-Paneque, Rosa

2016-09-09

The use of p-value and statistical significance have been questioned since the early 80s in the last century until today. Much has been discussed about it in the field of statistics and its applications, especially in Epidemiology and Public Health. As a matter of fact, the p-value and its equivalent, statistical significance, are difficult concepts to grasp for the many health professionals some way involved in research applied to their work areas. However, its meaning should be clear in intuitive terms although it is based on theoretical concepts of the field of Statistics. This paper attempts to present the p-value as a concept that applies to everyday life and therefore intuitively simple but whose proper use cannot be separated from theoretical and methodological elements of inherent complexity. The reasons behind the criticism received by the p-value and its isolated use are intuitively explained, mainly the need to demarcate statistical significance from clinical significance and some of the recommended remedies for these problems are approached as well. It finally refers to the current trend to vindicate the p-value appealing to the convenience of its use in certain situations and the recent statement of the American Statistical Association in this regard.
Tests of Statistical Significance Made Sound

ERIC Educational Resources Information Center

Haig, Brian D.

2017-01-01

This article considers the nature and place of tests of statistical significance (ToSS) in science, with particular reference to psychology. Despite the enormous amount of attention given to this topic, psychology's understanding of ToSS remains deficient. The major problem stems from a widespread and uncritical acceptance of null hypothesis…
Statistically significant relational data mining :

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berry, Jonathan W.; Leung, Vitus Joseph; Phillips, Cynthia Ann

This report summarizes the work performed under the project (3z(BStatitically significant relational data mining.(3y (BThe goal of the project was to add more statistical rigor to the fairly ad hoc area of data mining on graphs. Our goal was to develop better algorithms and better ways to evaluate algorithm quality. We concetrated on algorithms for community detection, approximate pattern matching, and graph similarity measures. Approximate pattern matching involves finding an instance of a relatively small pattern, expressed with tolerance, in a large graph of data observed with uncertainty. This report gathers the abstracts and references for the eight refereed publicationsmore » that have appeared as part of this work. We then archive three pieces of research that have not yet been published. The first is theoretical and experimental evidence that a popular statistical measure for comparison of community assignments favors over-resolved communities over approximations to a ground truth. The second are statistically motivated methods for measuring the quality of an approximate match of a small pattern in a large graph. The third is a new probabilistic random graph model. Statisticians favor these models for graph analysis. The new local structure graph model overcomes some of the issues with popular models such as exponential random graph models and latent variable models.« less
The Use of Meta-Analytic Statistical Significance Testing

ERIC Educational Resources Information Center

Polanin, Joshua R.; Pigott, Terri D.

2015-01-01

Meta-analysis multiplicity, the concept of conducting multiple tests of statistical significance within one review, is an underdeveloped literature. We address this issue by considering how Type I errors can impact meta-analytic results, suggest how statistical power may be affected through the use of multiplicity corrections, and propose how…
The fragility of statistically significant findings from randomized trials in head and neck surgery.

PubMed

Noel, Christopher W; McMullen, Caitlin; Yao, Christopher; Monteiro, Eric; Goldstein, David P; Eskander, Antoine; de Almeida, John R

2018-04-23

The Fragility Index (FI) is a novel tool for evaluating the robustness of statistically significant findings in a randomized control trial (RCT). It measures the number of events upon which statistical significance depends. We sought to calculate the FI scores for RCTs in the head and neck cancer literature where surgery was a primary intervention. Potential articles were identified in PubMed (MEDLINE), Embase, and Cochrane without publication date restrictions. Two reviewers independently screened eligible RCTs reporting at least one dichotomous and statistically significant outcome. The data from each trial were extracted and the FI scores were calculated. Associations between trial characteristics and FI were determined. In total, 27 articles were identified. The median sample size was 67.5 (interquartile range [IQR] = 42-143) and the median number of events per trial was 8 (IQR = 2.25-18.25). The median FI score was 1 (IQR = 0-2.5), meaning that changing one patient from a nonevent to an event in the treatment arm would change the result to a statistically nonsignificant result, or P > .05. The FI score was less than the number of patients lost to follow-up in 71% of cases. The FI score was found to be moderately correlated with P value (ρ = -0.52, P = .007) and with journal impact factor (ρ = 0.49, P = .009) on univariable analysis. On multivariable analysis, only the P value was found to be a predictor of FI score (P = .001). Randomized trials in the head and neck cancer literature where surgery is a primary modality are relatively nonrobust statistically with low FI scores. Laryngoscope, 2018. © 2018 The American Laryngological, Rhinological and Otological Society, Inc.
Common pitfalls in statistical analysis: “P” values, statistical significance and confidence intervals

PubMed Central

Ranganathan, Priya; Pramesh, C. S.; Buyse, Marc

2015-01-01

In the second part of a series on pitfalls in statistical analysis, we look at various ways in which a statistically significant study result can be expressed. We debunk some of the myths regarding the ‘P’ value, explain the importance of ‘confidence intervals’ and clarify the importance of including both values in a paper PMID:25878958
Using the bootstrap to establish statistical significance for relative validity comparisons among patient-reported outcome measures

PubMed Central

2013-01-01

Background Relative validity (RV), a ratio of ANOVA F-statistics, is often used to compare the validity of patient-reported outcome (PRO) measures. We used the bootstrap to establish the statistical significance of the RV and to identify key factors affecting its significance. Methods Based on responses from 453 chronic kidney disease (CKD) patients to 16 CKD-specific and generic PRO measures, RVs were computed to determine how well each measure discriminated across clinically-defined groups of patients compared to the most discriminating (reference) measure. Statistical significance of RV was quantified by the 95% bootstrap confidence interval. Simulations examined the effects of sample size, denominator F-statistic, correlation between comparator and reference measures, and number of bootstrap replicates. Results The statistical significance of the RV increased as the magnitude of denominator F-statistic increased or as the correlation between comparator and reference measures increased. A denominator F-statistic of 57 conveyed sufficient power (80%) to detect an RV of 0.6 for two measures correlated at r = 0.7. Larger denominator F-statistics or higher correlations provided greater power. Larger sample size with a fixed denominator F-statistic or more bootstrap replicates (beyond 500) had minimal impact. Conclusions The bootstrap is valuable for establishing the statistical significance of RV estimates. A reasonably large denominator F-statistic (F > 57) is required for adequate power when using the RV to compare the validity of measures with small or moderate correlations (r < 0.7). Substantially greater power can be achieved when comparing measures of a very high correlation (r > 0.9). PMID:23721463
Finding Statistically Significant Communities in Networks

PubMed Central

Lancichinetti, Andrea; Radicchi, Filippo; Ramasco, José J.; Fortunato, Santo

2011-01-01

Community structure is one of the main structural features of networks, revealing both their internal organization and the similarity of their elementary units. Despite the large variety of methods proposed to detect communities in graphs, there is a big need for multi-purpose techniques, able to handle different types of datasets and the subtleties of community structure. In this paper we present OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics. It is based on the local optimization of a fitness function expressing the statistical significance of clusters with respect to random fluctuations, which is estimated with tools of Extreme and Order Statistics. OSLOM can be used alone or as a refinement procedure of partitions/covers delivered by other techniques. We have also implemented sequential algorithms combining OSLOM with other fast techniques, so that the community structure of very large networks can be uncovered. Our method has a comparable performance as the best existing algorithms on artificial benchmark graphs. Several applications on real networks are shown as well. OSLOM is implemented in a freely available software (http://www.oslom.org), and we believe it will be a valuable tool in the analysis of networks. PMID:21559480
Statistical significance of combinatorial regulations

PubMed Central

Terada, Aika; Okada-Hatakeyama, Mariko; Tsuda, Koji; Sese, Jun

2013-01-01

More than three transcription factors often work together to enable cells to respond to various signals. The detection of combinatorial regulation by multiple transcription factors, however, is not only computationally nontrivial but also extremely unlikely because of multiple testing correction. The exponential growth in the number of tests forces us to set a strict limit on the maximum arity. Here, we propose an efficient branch-and-bound algorithm called the “limitless arity multiple-testing procedure” (LAMP) to count the exact number of testable combinations and calibrate the Bonferroni factor to the smallest possible value. LAMP lists significant combinations without any limit, whereas the family-wise error rate is rigorously controlled under the threshold. In the human breast cancer transcriptome, LAMP discovered statistically significant combinations of as many as eight binding motifs. This method may contribute to uncover pathways regulated in a coordinated fashion and find hidden associations in heterogeneous data. PMID:23882073
Assigning statistical significance to proteotypic peptides via database searches

PubMed Central

Alves, Gelio; Ogurtsov, Aleksey Y.; Yu, Yi-Kuo

2011-01-01

Querying MS/MS spectra against a database containing only proteotypic peptides reduces data analysis time due to reduction of database size. Despite the speed advantage, this search strategy is challenged by issues of statistical significance and coverage. The former requires separating systematically significant identifications from less confident identifications, while the latter arises when the underlying peptide is not present, due to single amino acid polymorphisms (SAPs) or post-translational modifications (PTMs), in the proteotypic peptide libraries searched. To address both issues simultaneously, we have extended RAId’s knowledge database to include proteotypic information, utilized RAId’s statistical strategy to assign statistical significance to proteotypic peptides, and modified RAId’s programs to allow for consideration of proteotypic information during database searches. The extended database alleviates the coverage problem since all annotated modifications, even those occurred within proteotypic peptides, may be considered. Taking into account the likelihoods of observation, the statistical strategy of RAId provides accurate E-value assignments regardless whether a candidate peptide is proteotypic or not. The advantage of including proteotypic information is evidenced by its superior retrieval performance when compared to regular database searches. PMID:21055489
Determining the Statistical Significance of Relative Weights

ERIC Educational Resources Information Center

Tonidandel, Scott; LeBreton, James M.; Johnson, Jeff W.

2009-01-01

Relative weight analysis is a procedure for estimating the relative importance of correlated predictors in a regression equation. Because the sampling distribution of relative weights is unknown, researchers using relative weight analysis are unable to make judgments regarding the statistical significance of the relative weights. J. W. Johnson…
How many spectral lines are statistically significant?

NASA Astrophysics Data System (ADS)

Freund, J.

When experimental line spectra are fitted with least squares techniques one frequently does not know whether n or n + 1 lines may be fitted safely. This paper shows how an F-test can be applied in order to determine the statistical significance of including an extra line into the fitting routine.
Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations

PubMed Central

Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.

2017-01-01

Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A→G mutations. We show that major effects of neighbors on germline mutation lie within ±2 of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T→C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. PMID:27974498
Statistical Significance of Optical Map Alignments

PubMed Central

Sarkar, Deepayan; Goldstein, Steve; Schwartz, David C.

2012-01-01

Abstract The Optical Mapping System constructs ordered restriction maps spanning entire genomes through the assembly and analysis of large datasets comprising individually analyzed genomic DNA molecules. Such restriction maps uniquely reveal mammalian genome structure and variation, but also raise computational and statistical questions beyond those that have been solved in the analysis of smaller, microbial genomes. We address the problem of how to filter maps that align poorly to a reference genome. We obtain map-specific thresholds that control errors and improve iterative assembly. We also show how an optimal self-alignment score provides an accurate approximation to the probability of alignment, which is useful in applications seeking to identify structural genomic abnormalities. PMID:22506568
TAGCNA: A Method to Identify Significant Consensus Events of Copy Number Alterations in Cancer

PubMed Central

Yuan, Xiguo; Zhang, Junying; Yang, Liying; Zhang, Shengli; Chen, Baodi; Geng, Yaojun; Wang, Yue

2012-01-01

Somatic copy number alteration (CNA) is a common phenomenon in cancer genome. Distinguishing significant consensus events (SCEs) from random background CNAs in a set of subjects has been proven to be a valuable tool to study cancer. In order to identify SCEs with an acceptable type I error rate, better computational approaches should be developed based on reasonable statistics and null distributions. In this article, we propose a new approach named TAGCNA for identifying SCEs in somatic CNAs that may encompass cancer driver genes. TAGCNA employs a peel-off permutation scheme to generate a reasonable null distribution based on a prior step of selecting tag CNA markers from the genome being considered. We demonstrate the statistical power of TAGCNA on simulated ground truth data, and validate its applicability using two publicly available cancer datasets: lung and prostate adenocarcinoma. TAGCNA identifies SCEs that are known to be involved with proto-oncogenes (e.g. EGFR, CDK4) and tumor suppressor genes (e.g. CDKN2A, CDKN2B), and provides many additional SCEs with potential biological relevance in these data. TAGCNA can be used to analyze the significance of CNAs in various cancers. It is implemented in R and is freely available at http://tagcna.sourceforge.net/. PMID:22815924
Advances in Testing the Statistical Significance of Mediation Effects

ERIC Educational Resources Information Center

Mallinckrodt, Brent; Abraham, W. Todd; Wei, Meifen; Russell, Daniel W.

2006-01-01

P. A. Frazier, A. P. Tix, and K. E. Barron (2004) highlighted a normal theory method popularized by R. M. Baron and D. A. Kenny (1986) for testing the statistical significance of indirect effects (i.e., mediator variables) in multiple regression contexts. However, simulation studies suggest that this method lacks statistical power relative to some…
Assessment of statistical significance and clinical relevance.

PubMed

Kieser, Meinhard; Friede, Tim; Gondan, Matthias

2013-05-10

In drug development, it is well accepted that a successful study will demonstrate not only a statistically significant result but also a clinically relevant effect size. Whereas standard hypothesis tests are used to demonstrate the former, it is less clear how the latter should be established. In the first part of this paper, we consider the responder analysis approach and study the performance of locally optimal rank tests when the outcome distribution is a mixture of responder and non-responder distributions. We find that these tests are quite sensitive to their planning assumptions and have therefore not really any advantage over standard tests such as the t-test and the Wilcoxon-Mann-Whitney test, which perform overall well and can be recommended for applications. In the second part, we present a new approach to the assessment of clinical relevance based on the so-called relative effect (or probabilistic index) and derive appropriate sample size formulae for the design of studies aiming at demonstrating both a statistically significant and clinically relevant effect. Referring to recent studies in multiple sclerosis, we discuss potential issues in the application of this approach. Copyright © 2012 John Wiley & Sons, Ltd.
Increasing the statistical significance of entanglement detection in experiments.

PubMed

Jungnitsch, Bastian; Niekamp, Sönke; Kleinmann, Matthias; Gühne, Otfried; Lu, He; Gao, Wei-Bo; Chen, Yu-Ao; Chen, Zeng-Bing; Pan, Jian-Wei

2010-05-28

Entanglement is often verified by a violation of an inequality like a Bell inequality or an entanglement witness. Considerable effort has been devoted to the optimization of such inequalities in order to obtain a high violation. We demonstrate theoretically and experimentally that such an optimization does not necessarily lead to a better entanglement test, if the statistical error is taken into account. Theoretically, we show for different error models that reducing the violation of an inequality can improve the significance. Experimentally, we observe this phenomenon in a four-photon experiment, testing the Mermin and Ardehali inequality for different levels of noise. Furthermore, we provide a way to develop entanglement tests with high statistical significance.

Testing the Difference of Correlated Agreement Coefficients for Statistical Significance

ERIC Educational Resources Information Center

Gwet, Kilem L.

2016-01-01

This article addresses the problem of testing the difference between two correlated agreement coefficients for statistical significance. A number of authors have proposed methods for testing the difference between two correlated kappa coefficients, which require either the use of resampling methods or the use of advanced statistical modeling…
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry

NASA Astrophysics Data System (ADS)

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo

2018-06-01

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Rapid Classification and Identification of Multiple Microorganisms with Accurate Statistical Significance via High-Resolution Tandem Mass Spectrometry.

PubMed

Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo

2018-06-05

Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
Comparing identified and statistically significant lipids and polar metabolites in 15-year old serum and dried blood spot samples for longitudinal studies: Comparing lipids and metabolites in serum and DBS samples

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kyle, Jennifer E.; Casey, Cameron P.; Stratton, Kelly G.

The use of dried blood spots (DBS) has many advantages over traditional plasma and serum samples such as smaller blood volume required, storage at room temperature, and ability for sampling in remote locations. However, understanding the robustness of different analytes in DBS samples is essential, especially in older samples collected for longitudinal studies. Here we analyzed DBS samples collected in 2000-2001 and stored at room temperature and compared them to matched serum samples stored at -80°C to determine if they could be effectively used as specific time points in a longitudinal study following metabolic disease. Four hundred small molecules weremore » identified in both the serum and DBS samples using gas chromatograph-mass spectrometry (GC-MS), liquid chromatography-MS (LC-MS) and LC-ion mobility spectrometry-MS (LC-IMS-MS). The identified polar metabolites overlapped well between the sample types, though only one statistically significant polar metabolite in a case-control study was conserved, indicating degradation occurs in the DBS samples affecting quantitation. Differences in the lipid identifications indicated that some oxidation occurs in the DBS samples. However, thirty-six statistically significant lipids correlated in both sample types indicating that lipid quantitation was more stable across the sample types.« less
Statistical significance of trace evidence matches using independent physicochemical measurements

NASA Astrophysics Data System (ADS)

Almirall, Jose R.; Cole, Michael; Furton, Kenneth G.; Gettinby, George

1997-02-01

A statistical approach to the significance of glass evidence is proposed using independent physicochemical measurements and chemometrics. Traditional interpretation of the significance of trace evidence matches or exclusions relies on qualitative descriptors such as 'indistinguishable from,' 'consistent with,' 'similar to' etc. By performing physical and chemical measurements with are independent of one another, the significance of object exclusions or matches can be evaluated statistically. One of the problems with this approach is that the human brain is excellent at recognizing and classifying patterns and shapes but performs less well when that object is represented by a numerical list of attributes. Chemometrics can be employed to group similar objects using clustering algorithms and provide statistical significance in a quantitative manner. This approach is enhanced when population databases exist or can be created and the data in question can be evaluated given these databases. Since the selection of the variables used and their pre-processing can greatly influence the outcome, several different methods could be employed in order to obtain a more complete picture of the information contained in the data. Presently, we report on the analysis of glass samples using refractive index measurements and the quantitative analysis of the concentrations of the metals: Mg, Al, Ca, Fe, Mn, Ba, Sr, Ti and Zr. The extension of this general approach to fiber and paint comparisons also is discussed. This statistical approach should not replace the current interpretative approaches to trace evidence matches or exclusions but rather yields an additional quantitative measure. The lack of sufficient general population databases containing the needed physicochemical measurements and the potential for confusion arising from statistical analysis currently hamper this approach and ways of overcoming these obstacles are presented.
The role of baryons in creating statistically significant planes of satellites around Milky Way-mass galaxies

NASA Astrophysics Data System (ADS)

Ahmed, Sheehan H.; Brooks, Alyson M.; Christensen, Charlotte R.

2017-04-01

We investigate whether the inclusion of baryonic physics influences the formation of thin, coherently rotating planes of satellites such as those seen around the Milky Way and Andromeda. For four Milky Way-mass simulations, each run both as dark matter-only and with baryons included, we are able to identify a planar configuration that significantly maximizes the number of plane satellite members. The maximum plane member satellites are consistently different between the dark matter-only and baryonic versions of the same run due to the fact that satellites are both more likely to be destroyed and to infall later in the baryonic runs. Hence, studying satellite planes in dark matter-only simulations is misleading, because they will be composed of different satellite members than those that would exist if baryons were included. Additionally, the destruction of satellites in the baryonic runs leads to less radially concentrated satellite distributions, a result that is critical to making planes that are statistically significant compared to a random distribution. Since all planes pass through the centre of the galaxy, it is much harder to create a plane of a given height from a random distribution if the satellites have a low radial concentration. We identify Andromeda's low radial satellite concentration as a key reason why the plane in Andromeda is highly significant. Despite this, when corotation is considered, none of the satellite planes identified for the simulated galaxies are as statistically significant as the observed planes around the Milky Way and Andromeda, even in the baryonic runs.
Decadal power in land air temperatures: Is it statistically significant?

NASA Astrophysics Data System (ADS)

Thejll, Peter A.

2001-12-01

The geographical distribution and properties of the well-known 10-11 year signal in terrestrial temperature records is investigated. By analyzing the Global Historical Climate Network data for surface air temperatures we verify that the signal is strongest in North America and is similar in nature to that reported earlier by R. G. Currie. The decadal signal is statistically significant for individual stations, but it is not possible to show that the signal is statistically significant globally, using strict tests. In North America, during the twentieth century, the decadal variability in the solar activity cycle is associated with the decadal part of the North Atlantic Oscillation index series in such a way that both of these signals correspond to the same spatial pattern of cooling and warming. A method for testing statistical results with Monte Carlo trials on data fields with specified temporal structure and specific spatial correlation retained is presented.
Detecting Statistically Significant Communities of Triangle Motifs in Undirected Networks

DTIC Science & Technology

2016-04-26

REPORT TYPE Final 3. DATES COVERED (From - To) 15 Oct 2014 to 14 Jan 2015 4. TITLE AND SUBTITLE Detecting statistically significant clusters of...extend the work of Perry et al. [6] by developing a statistical framework that supports the detection of triangle motif-based clusters in complex...priori, the need for triangle motif-based clustering . 2. Developed an algorithm for clustering undirected networks, where the triangle con guration was
A randomized trial in a massive online open course shows people don't know what a statistically significant relationship looks like, but they can learn.

PubMed

Fisher, Aaron; Anderson, G Brooke; Peng, Roger; Leek, Jeff

2014-01-01

Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%-49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%-76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/.
A randomized trial in a massive online open course shows people don’t know what a statistically significant relationship looks like, but they can learn

PubMed Central

Fisher, Aaron; Anderson, G. Brooke; Peng, Roger

2014-01-01

Scatterplots are the most common way for statisticians, scientists, and the public to visually detect relationships between measured variables. At the same time, and despite widely publicized controversy, P-values remain the most commonly used measure to statistically justify relationships identified between variables. Here we measure the ability to detect statistically significant relationships from scatterplots in a randomized trial of 2,039 students in a statistics massive open online course (MOOC). Each subject was shown a random set of scatterplots and asked to visually determine if the underlying relationships were statistically significant at the P < 0.05 level. Subjects correctly classified only 47.4% (95% CI [45.1%–49.7%]) of statistically significant relationships, and 74.6% (95% CI [72.5%–76.6%]) of non-significant relationships. Adding visual aids such as a best fit line or scatterplot smooth increased the probability a relationship was called significant, regardless of whether the relationship was actually significant. Classification of statistically significant relationships improved on repeat attempts of the survey, although classification of non-significant relationships did not. Our results suggest: (1) that evidence-based data analysis can be used to identify weaknesses in theoretical procedures in the hands of average users, (2) data analysts can be trained to improve detection of statistically significant results with practice, but (3) data analysts have incorrect intuition about what statistically significant relationships look like, particularly for small effects. We have built a web tool for people to compare scatterplots with their corresponding p-values which is available here: http://glimmer.rstudio.com/afisher/EDA/. PMID:25337457
Reporting of statistically significant results at ClinicalTrials.gov for completed superiority randomized controlled trials.

PubMed

Dechartres, Agnes; Bond, Elizabeth G; Scheer, Jordan; Riveros, Carolina; Atal, Ignacio; Ravaud, Philippe

2016-11-30

Publication bias and other reporting bias have been well documented for journal articles, but no study has evaluated the nature of results posted at ClinicalTrials.gov. We aimed to assess how many randomized controlled trials (RCTs) with results posted at ClinicalTrials.gov report statistically significant results and whether the proportion of trials with significant results differs when no treatment effect estimate or p-value is posted. We searched ClinicalTrials.gov in June 2015 for all studies with results posted. We included completed RCTs with a superiority hypothesis and considered results for the first primary outcome with results posted. For each trial, we assessed whether a treatment effect estimate and/or p-value was reported at ClinicalTrials.gov and if yes, whether results were statistically significant. If no treatment effect estimate or p-value was reported, we calculated the treatment effect and corresponding p-value using results per arm posted at ClinicalTrials.gov when sufficient data were reported. From the 17,536 studies with results posted at ClinicalTrials.gov, we identified 2823 completed phase 3 or 4 randomized trials with a superiority hypothesis. Of these, 1400 (50%) reported a treatment effect estimate and/or p-value. Results were statistically significant for 844 trials (60%), with a median p-value of 0.01 (Q1-Q3: 0.001-0.26). For the 1423 trials with no treatment effect estimate or p-value posted, we could calculate the treatment effect and corresponding p-value using results reported per arm for 929 (65%). For 494 trials (35%), p-values could not be calculated mainly because of insufficient reporting, censored data, or repeated measurements over time. For the 929 trials we could calculate p-values, we found statistically significant results for 342 (37%), with a median p-value of 0.19 (Q1-Q3: 0.005-0.59). Half of the trials with results posted at ClinicalTrials.gov reported a treatment effect estimate and/or p-value, with significant
Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance.

PubMed

Kramer, Karen L; Veile, Amanda; Otárola-Castillo, Erik

2016-01-01

Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children's growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children's monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children's growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children's growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children's growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance.
On Interestingness Measures for Mining Statistically Significant and Novel Clinical Associations from EMRs

PubMed Central

Abar, Orhan; Charnigo, Richard J.; Rayapati, Abner

2017-01-01

Association rule mining has received significant attention from both the data mining and machine learning communities. While data mining researchers focus more on designing efficient algorithms to mine rules from large datasets, the learning community has explored applications of rule mining to classification. A major problem with rule mining algorithms is the explosion of rules even for moderate sized datasets making it very difficult for end users to identify both statistically significant and potentially novel rules that could lead to interesting new insights and hypotheses. Researchers have proposed many domain independent interestingness measures using which, one can rank the rules and potentially glean useful rules from the top ranked ones. However, these measures have not been fully explored for rule mining in clinical datasets owing to the relatively large sizes of the datasets often encountered in healthcare and also due to limited access to domain experts for review/analysis. In this paper, using an electronic medical record (EMR) dataset of diagnoses and medications from over three million patient visits to the University of Kentucky medical center and affiliated clinics, we conduct a thorough evaluation of dozens of interestingness measures proposed in data mining literature, including some new composite measures. Using cumulative relevance metrics from information retrieval, we compare these interestingness measures against human judgments obtained from a practicing psychiatrist for association rules involving the depressive disorders class as the consequent. Our results not only surface new interesting associations for depressive disorders but also indicate classes of interestingness measures that weight rule novelty and statistical strength in contrasting ways, offering new insights for end users in identifying interesting rules. PMID:28736771
Tipping points in the arctic: eyeballing or statistical significance?

PubMed

Carstensen, Jacob; Weydmann, Agata

2012-02-01

Arctic ecosystems have experienced and are projected to experience continued large increases in temperature and declines in sea ice cover. It has been hypothesized that small changes in ecosystem drivers can fundamentally alter ecosystem functioning, and that this might be particularly pronounced for Arctic ecosystems. We present a suite of simple statistical analyses to identify changes in the statistical properties of data, emphasizing that changes in the standard error should be considered in addition to changes in mean properties. The methods are exemplified using sea ice extent, and suggest that the loss rate of sea ice accelerated by factor of ~5 in 1996, as reported in other studies, but increases in random fluctuations, as an early warning signal, were observed already in 1990. We recommend to employ the proposed methods more systematically for analyzing tipping points to document effects of climate change in the Arctic.
Your Chi-Square Test Is Statistically Significant: Now What?

ERIC Educational Resources Information Center

Sharpe, Donald

2015-01-01

Applied researchers have employed chi-square tests for more than one hundred years. This paper addresses the question of how one should follow a statistically significant chi-square test result in order to determine the source of that result. Four approaches were evaluated: calculating residuals, comparing cells, ransacking, and partitioning. Data…
Statistical Significance Testing in Second Language Research: Basic Problems and Suggestions for Reform

ERIC Educational Resources Information Center

Norris, John M.

2015-01-01

Traditions of statistical significance testing in second language (L2) quantitative research are strongly entrenched in how researchers design studies, select analyses, and interpret results. However, statistical significance tests using "p" values are commonly misinterpreted by researchers, reviewers, readers, and others, leading to…
Sibling Competition & Growth Tradeoffs. Biological vs. Statistical Significance

PubMed Central

Kramer, Karen L.; Veile, Amanda; Otárola-Castillo, Erik

2016-01-01

Early childhood growth has many downstream effects on future health and reproduction and is an important measure of offspring quality. While a tradeoff between family size and child growth outcomes is theoretically predicted in high-fertility societies, empirical evidence is mixed. This is often attributed to phenotypic variation in parental condition. However, inconsistent study results may also arise because family size confounds the potentially differential effects that older and younger siblings can have on young children’s growth. Additionally, inconsistent results might reflect that the biological significance associated with different growth trajectories is poorly understood. This paper addresses these concerns by tracking children’s monthly gains in height and weight from weaning to age five in a high fertility Maya community. We predict that: 1) as an aggregate measure family size will not have a major impact on child growth during the post weaning period; 2) competition from young siblings will negatively impact child growth during the post weaning period; 3) however because of their economic value, older siblings will have a negligible effect on young children’s growth. Accounting for parental condition, we use linear mixed models to evaluate the effects that family size, younger and older siblings have on children’s growth. Congruent with our expectations, it is younger siblings who have the most detrimental effect on children’s growth. While we find statistical evidence of a quantity/quality tradeoff effect, the biological significance of these results is negligible in early childhood. Our findings help to resolve why quantity/quality studies have had inconsistent results by showing that sibling competition varies with sibling age composition, not just family size, and that biological significance is distinct from statistical significance. PMID:26938742
Identifying duplicate content using statistically improbable phrases

PubMed Central

Errami, Mounir; Sun, Zhaohui; George, Angela C.; Long, Tara C.; Skinner, Michael A.; Wren, Jonathan D.; Garner, Harold R.

2010-01-01

Motivation: Document similarity metrics such as PubMed's ‘Find related articles’ feature, which have been primarily used to identify studies with similar topics, can now also be used to detect duplicated or potentially plagiarized papers within literature reference databases. However, the CPU-intensive nature of document comparison has limited MEDLINE text similarity studies to the comparison of abstracts, which constitute only a small fraction of a publication's total text. Extending searches to include text archived by online search engines would drastically increase comparison ability. For large-scale studies, submitting short phrases encased in direct quotes to search engines for exact matches would be optimal for both individual queries and programmatic interfaces. We have derived a method of analyzing statistically improbable phrases (SIPs) for assistance in identifying duplicate content. Results: When applied to MEDLINE citations, this method substantially improves upon previous algorithms in the detection of duplication citations, yielding a precision and recall of 78.9% (versus 50.3% for eTBLAST) and 99.6% (versus 99.8% for eTBLAST), respectively. Availability: Similar citations identified by this work are freely accessible in the Déjà vu database, under the SIP discovery method category at http://dejavu.vbi.vt.edu/dejavu/ Contact: merrami@collin.edu PMID:20472545
The statistics of identifying differentially expressed genes in Expresso and TM4: a comparison

PubMed Central

Sioson, Allan A; Mane, Shrinivasrao P; Li, Pinghua; Sha, Wei; Heath, Lenwood S; Bohnert, Hans J; Grene, Ruth

2006-01-01

Background Analysis of DNA microarray data takes as input spot intensity measurements from scanner software and returns differential expression of genes between two conditions, together with a statistical significance assessment. This process typically consists of two steps: data normalization and identification of differentially expressed genes through statistical analysis. The Expresso microarray experiment management system implements these steps with a two-stage, log-linear ANOVA mixed model technique, tailored to individual experimental designs. The complement of tools in TM4, on the other hand, is based on a number of preset design choices that limit its flexibility. In the TM4 microarray analysis suite, normalization, filter, and analysis methods form an analysis pipeline. TM4 computes integrated intensity values (IIV) from the average intensities and spot pixel counts returned by the scanner software as input to its normalization steps. By contrast, Expresso can use either IIV data or median intensity values (MIV). Here, we compare Expresso and TM4 analysis of two experiments and assess the results against qRT-PCR data. Results The Expresso analysis using MIV data consistently identifies more genes as differentially expressed, when compared to Expresso analysis with IIV data. The typical TM4 normalization and filtering pipeline corrects systematic intensity-specific bias on a per microarray basis. Subsequent statistical analysis with Expresso or a TM4 t-test can effectively identify differentially expressed genes. The best agreement with qRT-PCR data is obtained through the use of Expresso analysis and MIV data. Conclusion The results of this research are of practical value to biologists who analyze microarray data sets. The TM4 normalization and filtering pipeline corrects microarray-specific systematic bias and complements the normalization stage in Expresso analysis. The results of Expresso using MIV data have the best agreement with qRT-PCR results. In
Power, effects, confidence, and significance: an investigation of statistical practices in nursing research.

PubMed

Gaskin, Cadeyrn J; Happell, Brenda

2014-05-01

To (a) assess the statistical power of nursing research to detect small, medium, and large effect sizes; (b) estimate the experiment-wise Type I error rate in these studies; and (c) assess the extent to which (i) a priori power analyses, (ii) effect sizes (and interpretations thereof), and (iii) confidence intervals were reported. Statistical review. Papers published in the 2011 volumes of the 10 highest ranked nursing journals, based on their 5-year impact factors. Papers were assessed for statistical power, control of experiment-wise Type I error, reporting of a priori power analyses, reporting and interpretation of effect sizes, and reporting of confidence intervals. The analyses were based on 333 papers, from which 10,337 inferential statistics were identified. The median power to detect small, medium, and large effect sizes was .40 (interquartile range [IQR]=.24-.71), .98 (IQR=.85-1.00), and 1.00 (IQR=1.00-1.00), respectively. The median experiment-wise Type I error rate was .54 (IQR=.26-.80). A priori power analyses were reported in 28% of papers. Effect sizes were routinely reported for Spearman's rank correlations (100% of papers in which this test was used), Poisson regressions (100%), odds ratios (100%), Kendall's tau correlations (100%), Pearson's correlations (99%), logistic regressions (98%), structural equation modelling/confirmatory factor analyses/path analyses (97%), and linear regressions (83%), but were reported less often for two-proportion z tests (50%), analyses of variance/analyses of covariance/multivariate analyses of variance (18%), t tests (8%), Wilcoxon's tests (8%), Chi-squared tests (8%), and Fisher's exact tests (7%), and not reported for sign tests, Friedman's tests, McNemar's tests, multi-level models, and Kruskal-Wallis tests. Effect sizes were infrequently interpreted. Confidence intervals were reported in 28% of papers. The use, reporting, and interpretation of inferential statistics in nursing research need substantial

Statistical significance versus clinical relevance.

PubMed

van Rijn, Marieke H C; Bech, Anneke; Bouyer, Jean; van den Brand, Jan A J G

2017-04-01

In March this year, the American Statistical Association (ASA) posted a statement on the correct use of P-values, in response to a growing concern that the P-value is commonly misused and misinterpreted. We aim to translate these warnings given by the ASA into a language more easily understood by clinicians and researchers without a deep background in statistics. Moreover, we intend to illustrate the limitations of P-values, even when used and interpreted correctly, and bring more attention to the clinical relevance of study findings using two recently reported studies as examples. We argue that P-values are often misinterpreted. A common mistake is saying that P < 0.05 means that the null hypothesis is false, and P ≥0.05 means that the null hypothesis is true. The correct interpretation of a P-value of 0.05 is that if the null hypothesis were indeed true, a similar or more extreme result would occur 5% of the times upon repeating the study in a similar sample. In other words, the P-value informs about the likelihood of the data given the null hypothesis and not the other way around. A possible alternative related to the P-value is the confidence interval (CI). It provides more information on the magnitude of an effect and the imprecision with which that effect was estimated. However, there is no magic bullet to replace P-values and stop erroneous interpretation of scientific results. Scientists and readers alike should make themselves familiar with the correct, nuanced interpretation of statistical tests, P-values and CIs. © The Author 2017. Published by Oxford University Press on behalf of ERA-EDTA. All rights reserved.
SigTree: A Microbial Community Analysis Tool to Identify and Visualize Significantly Responsive Branches in a Phylogenetic Tree.

PubMed

Stevens, John R; Jones, Todd R; Lefevre, Michael; Ganesan, Balasubramanian; Weimer, Bart C

2017-01-01

Microbial community analysis experiments to assess the effect of a treatment intervention (or environmental change) on the relative abundance levels of multiple related microbial species (or operational taxonomic units) simultaneously using high throughput genomics are becoming increasingly common. Within the framework of the evolutionary phylogeny of all species considered in the experiment, this translates to a statistical need to identify the phylogenetic branches that exhibit a significant consensus response (in terms of operational taxonomic unit abundance) to the intervention. We present the R software package SigTree , a collection of flexible tools that make use of meta-analysis methods and regular expressions to identify and visualize significantly responsive branches in a phylogenetic tree, while appropriately adjusting for multiple comparisons.
Using statistical text classification to identify health information technology incidents

PubMed Central

Chai, Kevin E K; Anthony, Stephen; Coiera, Enrico; Magrabi, Farah

2013-01-01

Objective To examine the feasibility of using statistical text classification to automatically identify health information technology (HIT) incidents in the USA Food and Drug Administration (FDA) Manufacturer and User Facility Device Experience (MAUDE) database. Design We used a subset of 570 272 incidents including 1534 HIT incidents reported to MAUDE between 1 January 2008 and 1 July 2010. Text classifiers using regularized logistic regression were evaluated with both ‘balanced’ (50% HIT) and ‘stratified’ (0.297% HIT) datasets for training, validation, and testing. Dataset preparation, feature extraction, feature selection, cross-validation, classification, performance evaluation, and error analysis were performed iteratively to further improve the classifiers. Feature-selection techniques such as removing short words and stop words, stemming, lemmatization, and principal component analysis were examined. Measurements κ statistic, F1 score, precision and recall. Results Classification performance was similar on both the stratified (0.954 F1 score) and balanced (0.995 F1 score) datasets. Stemming was the most effective technique, reducing the feature set size to 79% while maintaining comparable performance. Training with balanced datasets improved recall (0.989) but reduced precision (0.165). Conclusions Statistical text classification appears to be a feasible method for identifying HIT reports within large databases of incidents. Automated identification should enable more HIT problems to be detected, analyzed, and addressed in a timely manner. Semi-supervised learning may be necessary when applying machine learning to big data analysis of patient safety incidents and requires further investigation. PMID:23666777
Use of multivariate statistics to identify unreliable data obtained using CASA.

PubMed

Martínez, Luis Becerril; Crispín, Rubén Huerta; Mendoza, Maximino Méndez; Gallegos, Oswaldo Hernández; Martínez, Andrés Aragón

2013-06-01

In order to identify unreliable data in a dataset of motility parameters obtained from a pilot study acquired by a veterinarian with experience in boar semen handling, but without experience in the operation of a computer assisted sperm analysis (CASA) system, a multivariate graphical and statistical analysis was performed. Sixteen boar semen samples were aliquoted then incubated with varying concentrations of progesterone from 0 to 3.33 µg/ml and analyzed in a CASA system. After standardization of the data, Chernoff faces were pictured for each measurement, and a principal component analysis (PCA) was used to reduce the dimensionality and pre-process the data before hierarchical clustering. The first twelve individual measurements showed abnormal features when Chernoff faces were drawn. PCA revealed that principal components 1 and 2 explained 63.08% of the variance in the dataset. Values of principal components for each individual measurement of semen samples were mapped to identify differences among treatment or among boars. Twelve individual measurements presented low values of principal component 1. Confidence ellipses on the map of principal components showed no statistically significant effects for treatment or boar. Hierarchical clustering realized on two first principal components produced three clusters. Cluster 1 contained evaluations of the two first samples in each treatment, each one of a different boar. With the exception of one individual measurement, all other measurements in cluster 1 were the same as observed in abnormal Chernoff faces. Unreliable data in cluster 1 are probably related to the operator inexperience with a CASA system. These findings could be used to objectively evaluate the skill level of an operator of a CASA system. This may be particularly useful in the quality control of semen analysis using CASA systems.
Statistical Significance and Effect Size: Two Sides of a Coin.

ERIC Educational Resources Information Center

Fan, Xitao

This paper suggests that statistical significance testing and effect size are two sides of the same coin; they complement each other, but do not substitute for one another. Good research practice requires that both should be taken into consideration to make sound quantitative decisions. A Monte Carlo simulation experiment was conducted, and a…
Significant Statistics: Viewed with a Contextual Lens

ERIC Educational Resources Information Center

Tait-McCutcheon, Sandi

2010-01-01

This paper examines the pedagogical and organisational changes three lead teachers made to their statistics teaching and learning programs. The lead teachers posed the research question: What would the effect of contextually integrating statistical investigations and literacies into other curriculum areas be on student achievement? By finding the…
"What If" Analyses: Ways to Interpret Statistical Significance Test Results Using EXCEL or "R"

ERIC Educational Resources Information Center

Ozturk, Elif

2012-01-01

The present paper aims to review two motivations to conduct "what if" analyses using Excel and "R" to understand the statistical significance tests through the sample size context. "What if" analyses can be used to teach students what statistical significance tests really do and in applied research either prospectively to estimate what sample size…
Hard, harder, hardest: principal stratification, statistical identifiability, and the inherent difficulty of finding surrogate endpoints.

PubMed

Wolfson, Julian; Henn, Lisa

2014-01-01

In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging.
Hard, harder, hardest: principal stratification, statistical identifiability, and the inherent difficulty of finding surrogate endpoints

PubMed Central

2014-01-01

In many areas of clinical investigation there is great interest in identifying and validating surrogate endpoints, biomarkers that can be measured a relatively short time after a treatment has been administered and that can reliably predict the effect of treatment on the clinical outcome of interest. However, despite dramatic advances in the ability to measure biomarkers, the recent history of clinical research is littered with failed surrogates. In this paper, we present a statistical perspective on why identifying surrogate endpoints is so difficult. We view the problem from the framework of causal inference, with a particular focus on the technique of principal stratification (PS), an approach which is appealing because the resulting estimands are not biased by unmeasured confounding. In many settings, PS estimands are not statistically identifiable and their degree of non-identifiability can be thought of as representing the statistical difficulty of assessing the surrogate value of a biomarker. In this work, we examine the identifiability issue and present key simplifying assumptions and enhanced study designs that enable the partial or full identification of PS estimands. We also present example situations where these assumptions and designs may or may not be feasible, providing insight into the problem characteristics which make the statistical evaluation of surrogate endpoints so challenging. PMID:25342953
Statistical significance of the rich-club phenomenon in complex networks

NASA Astrophysics Data System (ADS)

Jiang, Zhi-Qiang; Zhou, Wei-Xing

2008-04-01

We propose that the rich-club phenomenon in complex networks should be defined in the spirit of bootstrapping, in which a null model is adopted to assess the statistical significance of the rich-club detected. Our method can serve as a definition of the rich-club phenomenon and is applied to analyze three real networks and three model networks. The results show significant improvement compared with previously reported results. We report a dilemma with an exceptional example, showing that there does not exist an omnipotent definition for the rich-club phenomenon.
[Continuity of hospital identifiers in hospital discharge data - Analysis of the nationwide German DRG Statistics from 2005 to 2013].

PubMed

Nimptsch, Ulrike; Wengler, Annelene; Mansky, Thomas

2016-11-01

In Germany, nationwide hospital discharge data (DRG statistics provided by the research data centers of the Federal Statistical Office and the Statistical Offices of the 'Länder') are increasingly used as data source for health services research. Within this data hospitals can be separated via their hospital identifier ([Institutionskennzeichen] IK). However, this hospital identifier primarily designates the invoicing unit and is not necessarily equivalent to one hospital location. Aiming to investigate direction and extent of possible bias in hospital-level analyses this study examines the continuity of the hospital identifier within a cross-sectional and longitudinal approach and compares the results to official hospital census statistics. Within the DRG statistics from 2005 to 2013 the annual number of hospitals as classified by hospital identifiers was counted for each year of observation. The annual number of hospitals derived from DRG statistics was compared to the number of hospitals in the official census statistics 'Grunddaten der Krankenhäuser'. Subsequently, the temporal continuity of hospital identifiers in the DRG statistics was analyzed within cohorts of hospitals. Until 2013, the annual number of hospital identifiers in the DRG statistics fell by 175 (from 1,725 to 1,550). This decline affected only providers with small or medium case volume. The number of hospitals identified in the DRG statistics was lower than the number given in the census statistics (e.g., in 2013 1,550 IK vs. 1,668 hospitals in the census statistics). The longitudinal analyses revealed that the majority of hospital identifiers persisted in the years of observation, while one fifth of hospital identifiers changed. In cross-sectional studies of German hospital discharge data the separation of hospitals via the hospital identifier might lead to underestimating the number of hospitals and consequential overestimation of caseload per hospital. Discontinuities of hospital
Fostering Students' Statistical Literacy through Significant Learning Experience

ERIC Educational Resources Information Center

Krishnan, Saras

2015-01-01

A major objective of statistics education is to develop students' statistical literacy that enables them to be educated users of data in context. Teaching statistics in today's educational settings is not an easy feat because teachers have a huge task in keeping up with the demands of the new generation of learners. The present day students have…
Distinguishing between statistical significance and practical/clinical meaningfulness using statistical inference.

PubMed

Wilkinson, Michael

2014-03-01

Decisions about support for predictions of theories in light of data are made using statistical inference. The dominant approach in sport and exercise science is the Neyman-Pearson (N-P) significance-testing approach. When applied correctly it provides a reliable procedure for making dichotomous decisions for accepting or rejecting zero-effect null hypotheses with known and controlled long-run error rates. Type I and type II error rates must be specified in advance and the latter controlled by conducting an a priori sample size calculation. The N-P approach does not provide the probability of hypotheses or indicate the strength of support for hypotheses in light of data, yet many scientists believe it does. Outcomes of analyses allow conclusions only about the existence of non-zero effects, and provide no information about the likely size of true effects or their practical/clinical value. Bayesian inference can show how much support data provide for different hypotheses, and how personal convictions should be altered in light of data, but the approach is complicated by formulating probability distributions about prior subjective estimates of population effects. A pragmatic solution is magnitude-based inference, which allows scientists to estimate the true magnitude of population effects and how likely they are to exceed an effect magnitude of practical/clinical importance, thereby integrating elements of subjective Bayesian-style thinking. While this approach is gaining acceptance, progress might be hastened if scientists appreciate the shortcomings of traditional N-P null hypothesis significance testing.
A Practical Method for Identifying Significant Change Scores

ERIC Educational Resources Information Center

Cascio, Wayne F.; Kurtines, William M.

1977-01-01

A test of significance for identifying individuals who are most influenced by an experimental treatment as measured by pre-post test change score is presented. The technique requires true difference scores, the reliability of obtained differences, and their standard error of measurement. (Author/JKS)
Statistical Analyses of Scatterplots to Identify Important Factors in Large-Scale Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kleijnen, J.P.C.; Helton, J.C.

1999-04-01

The robustness of procedures for identifying patterns in scatterplots generated in Monte Carlo sensitivity analyses is investigated. These procedures are based on attempts to detect increasingly complex patterns in the scatterplots under consideration and involve the identification of (1) linear relationships with correlation coefficients, (2) monotonic relationships with rank correlation coefficients, (3) trends in central tendency as defined by means, medians and the Kruskal-Wallis statistic, (4) trends in variability as defined by variances and interquartile ranges, and (5) deviations from randomness as defined by the chi-square statistic. The following two topics related to the robustness of these procedures are consideredmore » for a sequence of example analyses with a large model for two-phase fluid flow: the presence of Type I and Type II errors, and the stability of results obtained with independent Latin hypercube samples. Observations from analysis include: (1) Type I errors are unavoidable, (2) Type II errors can occur when inappropriate analysis procedures are used, (3) physical explanations should always be sought for why statistical procedures identify variables as being important, and (4) the identification of important variables tends to be stable for independent Latin hypercube samples.« less
Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

PubMed

Farrell, Mary Beth

2018-06-01

This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being
Using the Bootstrap Method for a Statistical Significance Test of Differences between Summary Histograms

NASA Technical Reports Server (NTRS)

Xu, Kuan-Man

2006-01-01

A new method is proposed to compare statistical differences between summary histograms, which are the histograms summed over a large ensemble of individual histograms. It consists of choosing a distance statistic for measuring the difference between summary histograms and using a bootstrap procedure to calculate the statistical significance level. Bootstrapping is an approach to statistical inference that makes few assumptions about the underlying probability distribution that describes the data. Three distance statistics are compared in this study. They are the Euclidean distance, the Jeffries-Matusita distance and the Kuiper distance. The data used in testing the bootstrap method are satellite measurements of cloud systems called cloud objects. Each cloud object is defined as a contiguous region/patch composed of individual footprints or fields of view. A histogram of measured values over footprints is generated for each parameter of each cloud object and then summary histograms are accumulated over all individual histograms in a given cloud-object size category. The results of statistical hypothesis tests using all three distances as test statistics are generally similar, indicating the validity of the proposed method. The Euclidean distance is determined to be most suitable after comparing the statistical tests of several parameters with distinct probability distributions among three cloud-object size categories. Impacts on the statistical significance levels resulting from differences in the total lengths of satellite footprint data between two size categories are also discussed.
Mass spectrometry-based protein identification with accurate statistical significance assignment.

PubMed

Alves, Gelio; Yu, Yi-Kuo

2015-03-01

Assigning statistical significance accurately has become increasingly important as metadata of many types, often assembled in hierarchies, are constructed and combined for further biological analyses. Statistical inaccuracy of metadata at any level may propagate to downstream analyses, undermining the validity of scientific conclusions thus drawn. From the perspective of mass spectrometry-based proteomics, even though accurate statistics for peptide identification can now be achieved, accurate protein level statistics remain challenging. We have constructed a protein ID method that combines peptide evidences of a candidate protein based on a rigorous formula derived earlier; in this formula the database P-value of every peptide is weighted, prior to the final combination, according to the number of proteins it maps to. We have also shown that this protein ID method provides accurate protein level E-value, eliminating the need of using empirical post-processing methods for type-I error control. Using a known protein mixture, we find that this protein ID method, when combined with the Sorić formula, yields accurate values for the proportion of false discoveries. In terms of retrieval efficacy, the results from our method are comparable with other methods tested. The source code, implemented in C++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit. Published by Oxford University Press 2014. This work is written by US Government employees and is in the public domain in the US.
Using the 3-Sigma Limit to Identify Significantly Long Pauses during Composing.

ERIC Educational Resources Information Center

Kelly, Leonard P.; Nolan, Thomas W.

Ten deaf college freshmen and a comparison group of five hearing students participated in a study of a method to identify long pauses in written composition that have important statistical properties. Subjects first wrote two initial drafts of short stories they had viewed on video tape. Later, they revised and recopied their two originals and one…
Novel statistical framework to identify differentially expressed genes allowing transcriptomic background differences.

PubMed

Ling, Zhi-Qiang; Wang, Yi; Mukaisho, Kenichi; Hattori, Takanori; Tatsuta, Takeshi; Ge, Ming-Hua; Jin, Li; Mao, Wei-Min; Sugihara, Hiroyuki

2010-06-01

Tests of differentially expressed genes (DEGs) from microarray experiments are based on the null hypothesis that genes that are irrelevant to the phenotype/stimulus are expressed equally in the target and control samples. However, this strict hypothesis is not always true, as there can be several transcriptomic background differences between target and control samples, including different cell/tissue types, different cell cycle stages and different biological donors. These differences lead to increased false positives, which have little biological/medical significance. In this article, we propose a statistical framework to identify DEGs between target and control samples from expression microarray data allowing transcriptomic background differences between these samples by introducing a modified null hypothesis that the gene expression background difference is normally distributed. We use an iterative procedure to perform robust estimation of the null hypothesis and identify DEGs as outliers. We evaluated our method using our own triplicate microarray experiment, followed by validations with reverse transcription-polymerase chain reaction (RT-PCR) and on the MicroArray Quality Control dataset. The evaluations suggest that our technique (i) results in less false positive and false negative results, as measured by the degree of agreement with RT-PCR of the same samples, (ii) can be applied to different microarray platforms and results in better reproducibility as measured by the degree of DEG identification concordance both intra- and inter-platforms and (iii) can be applied efficiently with only a few microarray replicates. Based on these evaluations, we propose that this method not only identifies more reliable and biologically/medically significant DEG, but also reduces the power-cost tradeoff problem in the microarray field. Source code and binaries freely available for download at http://comonca.org.cn/fdca/resources/softwares/deg.zip.

Strategies for Testing Statistical and Practical Significance in Detecting DIF with Logistic Regression Models

ERIC Educational Resources Information Center

Fidalgo, Angel M.; Alavi, Seyed Mohammad; Amirian, Seyed Mohammad Reza

2014-01-01

This study examines three controversial aspects in differential item functioning (DIF) detection by logistic regression (LR) models: first, the relative effectiveness of different analytical strategies for detecting DIF; second, the suitability of the Wald statistic for determining the statistical significance of the parameters of interest; and…
GSHSite: Exploiting an Iteratively Statistical Method to Identify S-Glutathionylation Sites with Substrate Specificity

PubMed Central

Chen, Yi-Ju; Lu, Cheng-Tsung; Huang, Kai-Yao; Wu, Hsin-Yi; Chen, Yu-Ju; Lee, Tzong-Yi

2015-01-01

S-glutathionylation, the covalent attachment of a glutathione (GSH) to the sulfur atom of cysteine, is a selective and reversible protein post-translational modification (PTM) that regulates protein activity, localization, and stability. Despite its implication in the regulation of protein functions and cell signaling, the substrate specificity of cysteine S-glutathionylation remains unknown. Based on a total of 1783 experimentally identified S-glutathionylation sites from mouse macrophages, this work presents an informatics investigation on S-glutathionylation sites including structural factors such as the flanking amino acids composition and the accessible surface area (ASA). TwoSampleLogo presents that positively charged amino acids flanking the S-glutathionylated cysteine may influence the formation of S-glutathionylation in closed three-dimensional environment. A statistical method is further applied to iteratively detect the conserved substrate motifs with statistical significance. Support vector machine (SVM) is then applied to generate predictive model considering the substrate motifs. According to five-fold cross-validation, the SVMs trained with substrate motifs could achieve an enhanced sensitivity, specificity, and accuracy, and provides a promising performance in an independent test set. The effectiveness of the proposed method is demonstrated by the correct identification of previously reported S-glutathionylation sites of mouse thioredoxin (TXN) and human protein tyrosine phosphatase 1b (PTP1B). Finally, the constructed models are adopted to implement an effective web-based tool, named GSHSite (http://csb.cse.yzu.edu.tw/GSHSite/), for identifying uncharacterized GSH substrate sites on the protein sequences. PMID:25849935
Clinical relevance vs. statistical significance: Using neck outcomes in patients with temporomandibular disorders as an example.

PubMed

Armijo-Olivo, Susan; Warren, Sharon; Fuentes, Jorge; Magee, David J

2011-12-01

Statistical significance has been used extensively to evaluate the results of research studies. Nevertheless, it offers only limited information to clinicians. The assessment of clinical relevance can facilitate the interpretation of the research results into clinical practice. The objective of this study was to explore different methods to evaluate the clinical relevance of the results using a cross-sectional study as an example comparing different neck outcomes between subjects with temporomandibular disorders and healthy controls. Subjects were compared for head and cervical posture, maximal cervical muscle strength, endurance of the cervical flexor and extensor muscles, and electromyographic activity of the cervical flexor muscles during the CranioCervical Flexion Test (CCFT). The evaluation of clinical relevance of the results was performed based on the effect size (ES), minimal important difference (MID), and clinical judgement. The results of this study show that it is possible to have statistical significance without having clinical relevance, to have both statistical significance and clinical relevance, to have clinical relevance without having statistical significance, or to have neither statistical significance nor clinical relevance. The evaluation of clinical relevance in clinical research is crucial to simplify the transfer of knowledge from research into practice. Clinical researchers should present the clinical relevance of their results. Copyright © 2011 Elsevier Ltd. All rights reserved.
Identifying Frequent Users of an Urban Emergency Medical Service Using Descriptive Statistics and Regression Analyses.

PubMed

Norman, Chenelle; Mello, Michael; Choi, Bryan

2016-01-01

This retrospective cohort study provides a descriptive analysis of a population that frequently uses an urban emergency medical service (EMS) and identifies factors that contribute to use among all frequent users. For purposes of this study we divided frequent users into the following groups: low- frequent users (4 EMS transports in 2012), medium-frequent users (5 to 6 EMS transports in 2012), high-frequent users (7 to 10 EMS transports in 2012) and super-frequent users (11 or more EMS transports in 2012). Overall, we identified 539 individuals as frequent users. For all groups of EMS frequent users (i.e. low, medium, high and super) one or more hospital admissions, receiving a referral for follow-up care upon discharge, and having no insurance were found to be statistically significant with frequent EMS use (P<0.05). Within the diagnostic categories, 41.61% of super-frequent users had a diagnosis of "primarily substance abuse/misuse" and among low-frequent users a majority, 53.33%, were identified as having a "reoccurring (medical) diagnosis." Lastly, relative risk ratios for the highest group of users, super-frequent users, were 3.34 (95% CI [1.90-5.87]) for obtaining at least one referral for follow-up care, 13.67 (95% CI [5.60-33.34]) for having four or more hospital admissions and 5.95 (95% CI [1.80-19.63]) for having a diagnoses of primarily substance abuse/misuse. Findings from this study demonstrate that among low- and medium-frequent users a majority of patients are using EMS for reoccurring medical conditions. This could potentially be avoided with better care management. In addition, this study adds to the current literature that illustrates a strong correlation between substance abuse/misuse and high/super-frequent EMS use. For the subgroup analysis among individuals 65 years of age and older, we did not find any of the independent variables included in our model to be statistically significant with frequent EMS use.
Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data

PubMed Central

Kim, Sung-Min; Choi, Yosoon

2017-01-01

To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z-score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z-scores: high content with a high z-score (HH), high content with a low z-score (HL), low content with a high z-score (LH), and low content with a low z-score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1–4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required. PMID:28629168
Assessing Statistically Significant Heavy-Metal Concentrations in Abandoned Mine Areas via Hot Spot Analysis of Portable XRF Data.

PubMed

Kim, Sung-Min; Choi, Yosoon

2017-06-18

To develop appropriate measures to prevent soil contamination in abandoned mining areas, an understanding of the spatial variation of the potentially toxic trace elements (PTEs) in the soil is necessary. For the purpose of effective soil sampling, this study uses hot spot analysis, which calculates a z -score based on the Getis-Ord Gi* statistic to identify a statistically significant hot spot sample. To constitute a statistically significant hot spot, a feature with a high value should also be surrounded by other features with high values. Using relatively cost- and time-effective portable X-ray fluorescence (PXRF) analysis, sufficient input data are acquired from the Busan abandoned mine and used for hot spot analysis. To calibrate the PXRF data, which have a relatively low accuracy, the PXRF analysis data are transformed using the inductively coupled plasma atomic emission spectrometry (ICP-AES) data. The transformed PXRF data of the Busan abandoned mine are classified into four groups according to their normalized content and z -scores: high content with a high z -score (HH), high content with a low z -score (HL), low content with a high z -score (LH), and low content with a low z -score (LL). The HL and LH cases may be due to measurement errors. Additional or complementary surveys are required for the areas surrounding these suspect samples or for significant hot spot areas. The soil sampling is conducted according to a four-phase procedure in which the hot spot analysis and proposed group classification method are employed to support the development of a sampling plan for the following phase. Overall, 30, 50, 80, and 100 samples are investigated and analyzed in phases 1-4, respectively. The method implemented in this case study may be utilized in the field for the assessment of statistically significant soil contamination and the identification of areas for which an additional survey is required.
A new statistic for identifying batch effects in high-throughput genomic data that uses guided principal component analysis.

PubMed

Reese, Sarah E; Archer, Kellie J; Therneau, Terry M; Atkinson, Elizabeth J; Vachon, Celine M; de Andrade, Mariza; Kocher, Jean-Pierre A; Eckel-Passow, Jeanette E

2013-11-15

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal component analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of PCA to quantify the existence of batch effects, called guided PCA (gPCA). We describe a test statistic that uses gPCA to test whether a batch effect exists. We apply our proposed test statistic derived using gPCA to simulated data and to two copy number variation case studies: the first study consisted of 614 samples from a breast cancer family study using Illumina Human 660 bead-chip arrays, whereas the second case study consisted of 703 samples from a family blood pressure study that used Affymetrix SNP Array 6.0. We demonstrate that our statistic has good statistical properties and is able to identify significant batch effects in two copy number variation case studies. We developed a new statistic that uses gPCA to identify whether batch effects exist in high-throughput genomic data. Although our examples pertain to copy number data, gPCA is general and can be used on other data types as well. The gPCA R package (Available via CRAN) provides functionality and data to perform the methods in this article. reesese@vcu.edu
Statistical Parametric Mapping to Identify Differences between Consensus-Based Joint Patterns during Gait in Children with Cerebral Palsy.

PubMed

Nieuwenhuys, Angela; Papageorgiou, Eirini; Desloovere, Kaat; Molenaers, Guy; De Laet, Tinne

2017-01-01

Experts recently identified 49 joint motion patterns in children with cerebral palsy during a Delphi consensus study. Pattern definitions were therefore the result of subjective expert opinion. The present study aims to provide objective, quantitative data supporting the identification of these consensus-based patterns. To do so, statistical parametric mapping was used to compare the mean kinematic waveforms of 154 trials of typically developing children (n = 56) to the mean kinematic waveforms of 1719 trials of children with cerebral palsy (n = 356), which were classified following the classification rules of the Delphi study. Three hypotheses stated that: (a) joint motion patterns with 'no or minor gait deviations' (n = 11 patterns) do not differ significantly from the gait pattern of typically developing children; (b) all other pathological joint motion patterns (n = 38 patterns) differ from typically developing gait and the locations of difference within the gait cycle, highlighted by statistical parametric mapping, concur with the consensus-based classification rules. (c) all joint motion patterns at the level of each joint (n = 49 patterns) differ from each other during at least one phase of the gait cycle. Results showed that: (a) ten patterns with 'no or minor gait deviations' differed somewhat unexpectedly from typically developing gait, but these differences were generally small (≤3°); (b) all other joint motion patterns (n = 38) differed from typically developing gait and the significant locations within the gait cycle that were indicated by the statistical analyses, coincided well with the classification rules; (c) joint motion patterns at the level of each joint significantly differed from each other, apart from two sagittal plane pelvic patterns. In addition to these results, for several joints, statistical analyses indicated other significant areas during the gait cycle that were not included in the pattern definitions of the consensus study
A Statistical Method of Identifying Interactions in Neuron–Glia Systems Based on Functional Multicell Ca2+ Imaging

PubMed Central

Nakae, Ken; Ikegaya, Yuji; Ishikawa, Tomoe; Oba, Shigeyuki; Urakubo, Hidetoshi; Koyama, Masanori; Ishii, Shin

2014-01-01

Crosstalk between neurons and glia may constitute a significant part of information processing in the brain. We present a novel method of statistically identifying interactions in a neuron–glia network. We attempted to identify neuron–glia interactions from neuronal and glial activities via maximum-a-posteriori (MAP)-based parameter estimation by developing a generalized linear model (GLM) of a neuron–glia network. The interactions in our interest included functional connectivity and response functions. We evaluated the cross-validated likelihood of GLMs that resulted from the addition or removal of connections to confirm the existence of specific neuron-to-glia or glia-to-neuron connections. We only accepted addition or removal when the modification improved the cross-validated likelihood. We applied the method to a high-throughput, multicellular in vitro Ca2+ imaging dataset obtained from the CA3 region of a rat hippocampus, and then evaluated the reliability of connectivity estimates using a statistical test based on a surrogate method. Our findings based on the estimated connectivity were in good agreement with currently available physiological knowledge, suggesting our method can elucidate undiscovered functions of neuron–glia systems. PMID:25393874
Statistical downscaling rainfall using artificial neural network: significantly wetter Bangkok?

NASA Astrophysics Data System (ADS)

Vu, Minh Tue; Aribarg, Thannob; Supratid, Siriporn; Raghavan, Srivatsan V.; Liong, Shie-Yui

2016-11-01

Artificial neural network (ANN) is an established technique with a flexible mathematical structure that is capable of identifying complex nonlinear relationships between input and output data. The present study utilizes ANN as a method of statistically downscaling global climate models (GCMs) during the rainy season at meteorological site locations in Bangkok, Thailand. The study illustrates the applications of the feed forward back propagation using large-scale predictor variables derived from both the ERA-Interim reanalyses data and present day/future GCM data. The predictors are first selected over different grid boxes surrounding Bangkok region and then screened by using principal component analysis (PCA) to filter the best correlated predictors for ANN training. The reanalyses downscaled results of the present day climate show good agreement against station precipitation with a correlation coefficient of 0.8 and a Nash-Sutcliffe efficiency of 0.65. The final downscaled results for four GCMs show an increasing trend of precipitation for rainy season over Bangkok by the end of the twenty-first century. The extreme values of precipitation determined using statistical indices show strong increases of wetness. These findings will be useful for policy makers in pondering adaptation measures due to flooding such as whether the current drainage network system is sufficient to meet the changing climate and to plan for a range of related adaptation/mitigation measures.
Statistical significance of task related deep brain EEG dynamic changes in the time-frequency domain.

PubMed

Chládek, J; Brázdil, M; Halámek, J; Plešinger, F; Jurák, P

2013-01-01

We present an off-line analysis procedure for exploring brain activity recorded from intra-cerebral electroencephalographic data (SEEG). The objective is to determine the statistical differences between different types of stimulations in the time-frequency domain. The procedure is based on computing relative signal power change and subsequent statistical analysis. An example of characteristic statistically significant event-related de/synchronization (ERD/ERS) detected across different frequency bands following different oddball stimuli is presented. The method is used for off-line functional classification of different brain areas.
Cluster Size Statistic and Cluster Mass Statistic: Two Novel Methods for Identifying Changes in Functional Connectivity Between Groups or Conditions

PubMed Central

Ing, Alex; Schwarzbauer, Christian

2014-01-01

Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods – the cluster size statistic (CSS) and cluster mass statistic (CMS) – are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity. PMID:24906136
Cluster size statistic and cluster mass statistic: two novel methods for identifying changes in functional connectivity between groups or conditions.

PubMed

Ing, Alex; Schwarzbauer, Christian

2014-01-01

Functional connectivity has become an increasingly important area of research in recent years. At a typical spatial resolution, approximately 300 million connections link each voxel in the brain with every other. This pattern of connectivity is known as the functional connectome. Connectivity is often compared between experimental groups and conditions. Standard methods used to control the type 1 error rate are likely to be insensitive when comparisons are carried out across the whole connectome, due to the huge number of statistical tests involved. To address this problem, two new cluster based methods--the cluster size statistic (CSS) and cluster mass statistic (CMS)--are introduced to control the family wise error rate across all connectivity values. These methods operate within a statistical framework similar to the cluster based methods used in conventional task based fMRI. Both methods are data driven, permutation based and require minimal statistical assumptions. Here, the performance of each procedure is evaluated in a receiver operator characteristic (ROC) analysis, utilising a simulated dataset. The relative sensitivity of each method is also tested on real data: BOLD (blood oxygen level dependent) fMRI scans were carried out on twelve subjects under normal conditions and during the hypercapnic state (induced through the inhalation of 6% CO2 in 21% O2 and 73%N2). Both CSS and CMS detected significant changes in connectivity between normal and hypercapnic states. A family wise error correction carried out at the individual connection level exhibited no significant changes in connectivity.
Intensive inpatient treatment for bulimia nervosa: Statistical and clinical significance of symptom changes.

PubMed

Diedrich, Alice; Schlegl, Sandra; Greetfeld, Martin; Fumi, Markus; Voderholzer, Ulrich

2018-03-01

This study examines the statistical and clinical significance of symptom changes during an intensive inpatient treatment program with a strong psychotherapeutic focus for individuals with severe bulimia nervosa. 295 consecutively admitted bulimic patients were administered the Structured Interview for Anorexic and Bulimic Syndromes-Self-Rating (SIAB-S), the Eating Disorder Inventory-2 (EDI-2), the Brief Symptom Inventory (BSI), and the Beck Depression Inventory-II (BDI-II) at treatment intake and discharge. Results indicated statistically significant symptom reductions with large effect sizes regarding severity of binge eating and compensatory behavior (SIAB-S), overall eating disorder symptom severity (EDI-2), overall psychopathology (BSI), and depressive symptom severity (BDI-II) even when controlling for antidepressant medication. The majority of patients showed either reliable (EDI-2: 33.7%, BSI: 34.8%, BDI-II: 18.1%) or even clinically significant symptom changes (EDI-2: 43.2%, BSI: 33.9%, BDI-II: 56.9%). Patients with clinically significant improvement were less distressed at intake and less likely to suffer from a comorbid borderline personality disorder when compared with those who did not improve to a clinically significant extent. Findings indicate that intensive psychotherapeutic inpatient treatment may be effective in about 75% of severely affected bulimic patients. For the remaining non-responding patients, inpatient treatment might be improved through an even stronger focus on the reduction of comorbid borderline personality traits.
A Statistical Approach to Identify Superluminous Supernovae and Probe Their Diversity

NASA Astrophysics Data System (ADS)

Inserra, C.; Prajs, S.; Gutierrez, C. P.; Angus, C.; Smith, M.; Sullivan, M.

2018-02-01

We investigate the identification of hydrogen-poor superluminous supernovae (SLSNe I) using a photometric analysis, without including an arbitrary magnitude threshold. We assemble a homogeneous sample of previously classified SLSNe I from the literature, and fit their light curves using Gaussian processes. From the fits, we identify four photometric parameters that have a high statistical significance when correlated, and combine them in a parameter space that conveys information on their luminosity and color evolution. This parameter space presents a new definition for SLSNe I, which can be used to analyze existing and future transient data sets. We find that 90% of previously classified SLSNe I meet our new definition. We also examine the evidence for two subclasses of SLSNe I, combining their photometric evolution with spectroscopic information, namely the photospheric velocity and its gradient. A cluster analysis reveals the presence of two distinct groups. “Fast” SLSNe show fast light curves and color evolution, large velocities, and a large velocity gradient. “Slow” SLSNe show slow light curve and color evolution, small expansion velocities, and an almost non-existent velocity gradient. Finally, we discuss the impact of our analyses in the understanding of the powering engine of SLSNe, and their implementation as cosmological probes in current and future surveys.
Statistical Parametric Mapping to Identify Differences between Consensus-Based Joint Patterns during Gait in Children with Cerebral Palsy

PubMed Central

Papageorgiou, Eirini; Desloovere, Kaat; Molenaers, Guy; De Laet, Tinne

2017-01-01

Experts recently identified 49 joint motion patterns in children with cerebral palsy during a Delphi consensus study. Pattern definitions were therefore the result of subjective expert opinion. The present study aims to provide objective, quantitative data supporting the identification of these consensus-based patterns. To do so, statistical parametric mapping was used to compare the mean kinematic waveforms of 154 trials of typically developing children (n = 56) to the mean kinematic waveforms of 1719 trials of children with cerebral palsy (n = 356), which were classified following the classification rules of the Delphi study. Three hypotheses stated that: (a) joint motion patterns with ‘no or minor gait deviations’ (n = 11 patterns) do not differ significantly from the gait pattern of typically developing children; (b) all other pathological joint motion patterns (n = 38 patterns) differ from typically developing gait and the locations of difference within the gait cycle, highlighted by statistical parametric mapping, concur with the consensus-based classification rules. (c) all joint motion patterns at the level of each joint (n = 49 patterns) differ from each other during at least one phase of the gait cycle. Results showed that: (a) ten patterns with ‘no or minor gait deviations’ differed somewhat unexpectedly from typically developing gait, but these differences were generally small (≤3°); (b) all other joint motion patterns (n = 38) differed from typically developing gait and the significant locations within the gait cycle that were indicated by the statistical analyses, coincided well with the classification rules; (c) joint motion patterns at the level of each joint significantly differed from each other, apart from two sagittal plane pelvic patterns. In addition to these results, for several joints, statistical analyses indicated other significant areas during the gait cycle that were not included in the pattern definitions of the consensus
Identifying Measurement Disturbance Effects Using Rasch Item Fit Statistics and the Logit Residual Index.

ERIC Educational Resources Information Center

Mount, Robert E.; Schumacker, Randall E.

1998-01-01

A Monte Carlo study was conducted using simulated dichotomous data to determine the effects of guessing on Rasch item fit statistics and the Logit Residual Index. Results indicate that no significant differences were found between the mean Rasch item fit statistics for each distribution type as the probability of guessing the correct answer…
Using the Descriptive Bootstrap to Evaluate Result Replicability (Because Statistical Significance Doesn't)

ERIC Educational Resources Information Center

Spinella, Sarah

2011-01-01

As result replicability is essential to science and difficult to achieve through external replicability, the present paper notes the insufficiency of null hypothesis statistical significance testing (NHSST) and explains the bootstrap as a plausible alternative, with a heuristic example to illustrate the bootstrap method. The bootstrap relies on…
Statistical tests and identifiability conditions for pooling and analyzing multisite datasets

PubMed Central

Zhou, Hao Henry; Singh, Vikas; Johnson, Sterling C.; Wahba, Grace

2018-01-01

When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. However, variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions where we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable. We analyze various properties of the framework for testing model fit, constructing confidence intervals, and evaluating consistency characteristics. Our technical development is motivated by Alzheimer’s disease (AD) studies, and we present empirical results showing that our framework enables harmonizing of protein biomarkers, even when the assays across sites differ. Our contribution may, in part, mitigate a bottleneck that researchers face in clinical research when pooling smaller sized datasets and may offer benefits when the subjects of interest are difficult to recruit or when resources prohibit large single-site studies. PMID:29386387
Statistical tests and identifiability conditions for pooling and analyzing multisite datasets.

PubMed

Zhou, Hao Henry; Singh, Vikas; Johnson, Sterling C; Wahba, Grace

2018-02-13

When sample sizes are small, the ability to identify weak (but scientifically interesting) associations between a set of predictors and a response may be enhanced by pooling existing datasets. However, variations in acquisition methods and the distribution of participants or observations between datasets, especially due to the distributional shifts in some predictors, may obfuscate real effects when datasets are combined. We present a rigorous statistical treatment of this problem and identify conditions where we can correct the distributional shift. We also provide an algorithm for the situation where the correction is identifiable. We analyze various properties of the framework for testing model fit, constructing confidence intervals, and evaluating consistency characteristics. Our technical development is motivated by Alzheimer's disease (AD) studies, and we present empirical results showing that our framework enables harmonizing of protein biomarkers, even when the assays across sites differ. Our contribution may, in part, mitigate a bottleneck that researchers face in clinical research when pooling smaller sized datasets and may offer benefits when the subjects of interest are difficult to recruit or when resources prohibit large single-site studies. Copyright © 2018 the Author(s). Published by PNAS.

Comment on “Two statistics for evaluating parameter identifiability and error reduction” by John Doherty and Randall J. Hunt

USGS Publications Warehouse

Hill, Mary C.

2010-01-01

Doherty and Hunt (2009) present important ideas for first-order-second moment sensitivity analysis, but five issues are discussed in this comment. First, considering the composite-scaled sensitivity (CSS) jointly with parameter correlation coefficients (PCC) in a CSS/PCC analysis addresses the difficulties with CSS mentioned in the introduction. Second, their new parameter identifiability statistic actually is likely to do a poor job of parameter identifiability in common situations. The statistic instead performs the very useful role of showing how model parameters are included in the estimated singular value decomposition (SVD) parameters. Its close relation to CSS is shown. Third, the idea from p. 125 that a suitable truncation point for SVD parameters can be identified using the prediction variance is challenged using results from Moore and Doherty (2005). Fourth, the relative error reduction statistic of Doherty and Hunt is shown to belong to an emerging set of statistics here named perturbed calculated variance statistics. Finally, the perturbed calculated variance statistics OPR and PPR mentioned on p. 121 are shown to explicitly include the parameter null-space component of uncertainty. Indeed, OPR and PPR results that account for null-space uncertainty have appeared in the literature since 2000.
Searching for statistically significant regulatory modules.

PubMed

Bailey, Timothy L; Noble, William Stafford

2003-10-01

The regulatory machinery controlling gene expression is complex, frequently requiring multiple, simultaneous DNA-protein interactions. The rate at which a gene is transcribed may depend upon the presence or absence of a collection of transcription factors bound to the DNA near the gene. Locating transcription factor binding sites in genomic DNA is difficult because the individual sites are small and tend to occur frequently by chance. True binding sites may be identified by their tendency to occur in clusters, sometimes known as regulatory modules. We describe an algorithm for detecting occurrences of regulatory modules in genomic DNA. The algorithm, called mcast, takes as input a DNA database and a collection of binding site motifs that are known to operate in concert. mcast uses a motif-based hidden Markov model with several novel features. The model incorporates motif-specific p-values, thereby allowing scores from motifs of different widths and specificities to be compared directly. The p-value scoring also allows mcast to only accept motif occurrences with significance below a user-specified threshold, while still assigning better scores to motif occurrences with lower p-values. mcast can search long DNA sequences, modeling length distributions between motifs within a regulatory module, but ignoring length distributions between modules. The algorithm produces a list of predicted regulatory modules, ranked by E-value. We validate the algorithm using simulated data as well as real data sets from fruitfly and human. http://meme.sdsc.edu/MCAST/paper
Identifying Wave-Particle Interactions in the Solar Wind using Statistical Correlations

NASA Astrophysics Data System (ADS)

Broiles, T. W.; Jian, L. K.; Gary, S. P.; Lepri, S. T.; Stevens, M. L.

2017-12-01

Heavy ions are a trace component of the solar wind, which can resonate with plasma waves, causing heating and acceleration relative to the bulk plasma. While wave-particle interactions are generally accepted as the cause of heavy ion heating and acceleration, observations to constrain the physics are lacking. In this work, we statistically link specific wave modes to heavy ion heating and acceleration. We have computed the Fast Fourier Transform (FFT) of transverse and compressional magnetic waves between 0 and 5.5 Hz using 9 days of ACE and Wind Magnetometer data. The FFTs are averaged over plasma measurement cycles to compute statistical correlations between magnetic wave power at each discrete frequency, and ion kinetic properties measured by ACE/SWICS and Wind/SWE. The results show that lower frequency transverse oscillations (< 0.2 Hz) and higher frequency compressional oscillations (> 0.4 Hz) are positively correlated with enhancements in the heavy ion thermal and drift speeds. Moreover, the correlation results for the He2+ and O6+ were similar on most days. The correlations were often weak, but most days had some frequencies that correlated with statistical significance. This work suggests that the solar wind heavy ions are possibly being heated and accelerated by both transverse and compressional waves at different frequencies.
Statistically significant performance results of a mine detector and fusion algorithm from an x-band high-resolution SAR

NASA Astrophysics Data System (ADS)

Williams, Arnold C.; Pachowicz, Peter W.

2004-09-01

Current mine detection research indicates that no single sensor or single look from a sensor will detect mines/minefields in a real-time manner at a performance level suitable for a forward maneuver unit. Hence, the integrated development of detectors and fusion algorithms are of primary importance. A problem in this development process has been the evaluation of these algorithms with relatively small data sets, leading to anecdotal and frequently over trained results. These anecdotal results are often unreliable and conflicting among various sensors and algorithms. Consequently, the physical phenomena that ought to be exploited and the performance benefits of this exploitation are often ambiguous. The Army RDECOM CERDEC Night Vision Laboratory and Electron Sensors Directorate has collected large amounts of multisensor data such that statistically significant evaluations of detection and fusion algorithms can be obtained. Even with these large data sets care must be taken in algorithm design and data processing to achieve statistically significant performance results for combined detectors and fusion algorithms. This paper discusses statistically significant detection and combined multilook fusion results for the Ellipse Detector (ED) and the Piecewise Level Fusion Algorithm (PLFA). These statistically significant performance results are characterized by ROC curves that have been obtained through processing this multilook data for the high resolution SAR data of the Veridian X-Band radar. We discuss the implications of these results on mine detection and the importance of statistical significance, sample size, ground truth, and algorithm design in performance evaluation.
Automation method to identify the geological structure of seabed using spatial statistic analysis of echo sounding data

NASA Astrophysics Data System (ADS)

Kwon, O.; Kim, W.; Kim, J.

2017-12-01

Recently construction of subsea tunnel has been increased globally. For safe construction of subsea tunnel, identifying the geological structure including fault at design and construction stage is more than important. Then unlike the tunnel in land, it's very difficult to obtain the data on geological structure because of the limit in geological survey. This study is intended to challenge such difficulties in a way of developing the technology to identify the geological structure of seabed automatically by using echo sounding data. When investigation a potential site for a deep subsea tunnel, there is the technical and economical limit with borehole of geophysical investigation. On the contrary, echo sounding data is easily obtainable while information reliability is higher comparing to above approaches. This study is aimed at developing the algorithm that identifies the large scale of geological structure of seabed using geostatic approach. This study is based on theory of structural geology that topographic features indicate geological structure. Basic concept of algorithm is outlined as follows; (1) convert the seabed topography to the grid data using echo sounding data, (2) apply the moving window in optimal size to the grid data, (3) estimate the spatial statistics of the grid data in the window area, (4) set the percentile standard of spatial statistics, (5) display the values satisfying the standard on the map, (6) visualize the geological structure on the map. The important elements in this study include optimal size of moving window, kinds of optimal spatial statistics and determination of optimal percentile standard. To determine such optimal elements, a numerous simulations were implemented. Eventually, user program based on R was developed using optimal analysis algorithm. The user program was designed to identify the variations of various spatial statistics. It leads to easy analysis of geological structure depending on variation of spatial statistics
Identifying sources of fugitive emissions in industrial facilities using trajectory statistical methods

NASA Astrophysics Data System (ADS)

Brereton, Carol A.; Johnson, Matthew R.

2012-05-01

Fugitive pollutant sources from the oil and gas industry are typically quite difficult to find within industrial plants and refineries, yet they are a significant contributor of global greenhouse gas emissions. A novel approach for locating fugitive emission sources using computationally efficient trajectory statistical methods (TSM) has been investigated in detailed proof-of-concept simulations. Four TSMs were examined in a variety of source emissions scenarios developed using transient CFD simulations on the simplified geometry of an actual gas plant: potential source contribution function (PSCF), concentration weighted trajectory (CWT), residence time weighted concentration (RTWC), and quantitative transport bias analysis (QTBA). Quantitative comparisons were made using a correlation measure based on search area from the source(s). PSCF, CWT and RTWC could all distinguish areas near major sources from the surroundings. QTBA successfully located sources in only some cases, even when provided with a large data set. RTWC, given sufficient domain trajectory coverage, distinguished source areas best, but otherwise could produce false source predictions. Using RTWC in conjunction with CWT could overcome this issue as well as reduce sensitivity to noise in the data. The results demonstrate that TSMs are a promising approach for identifying fugitive emissions sources within complex facility geometries.
Identifying significant gene‐environment interactions using a combination of screening testing and hierarchical false discovery rate control

PubMed Central

Shen, Li; Saykin, Andrew J.; Williams, Scott M.; Moore, Jason H.

2016-01-01

ABSTRACT Although gene‐environment (G× E) interactions play an important role in many biological systems, detecting these interactions within genome‐wide data can be challenging due to the loss in statistical power incurred by multiple hypothesis correction. To address the challenge of poor power and the limitations of existing multistage methods, we recently developed a screening‐testing approach for G× E interaction detection that combines elastic net penalized regression with joint estimation to support a single omnibus test for the presence of G× E interactions. In our original work on this technique, however, we did not assess type I error control or power and evaluated the method using just a single, small bladder cancer data set. In this paper, we extend the original method in two important directions and provide a more rigorous performance evaluation. First, we introduce a hierarchical false discovery rate approach to formally assess the significance of individual G× E interactions. Second, to support the analysis of truly genome‐wide data sets, we incorporate a score statistic‐based prescreening step to reduce the number of single nucleotide polymorphisms prior to fitting the first stage penalized regression model. To assess the statistical properties of our method, we compare the type I error rate and statistical power of our approach with competing techniques using both simple simulation designs as well as designs based on real disease architectures. Finally, we demonstrate the ability of our approach to identify biologically plausible SNP‐education interactions relative to Alzheimer's disease status using genome‐wide association study data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). PMID:27578615
Identifying Significant Changes in Cerebrovascular Reactivity to Carbon Dioxide.

PubMed

Sobczyk, O; Crawley, A P; Poublanc, J; Sam, K; Mandell, D M; Mikulis, D J; Duffin, J; Fisher, J A

2016-05-01

Changes in cerebrovascular reactivity can be used to assess disease progression and response to therapy but require discrimination of pathology from normal test-to-test variability. Such variability is due to variations in methodology, technology, and physiology with time. With uniform test conditions, our aim was to determine the test-to-test variability of cerebrovascular reactivity in healthy subjects and in patients with known cerebrovascular disease. Cerebrovascular reactivity was the ratio of the blood oxygen level-dependent MR imaging response divided by the change in carbon dioxide stimulus. Two standardized cerebrovascular reactivity tests were conducted at 3T in 15 healthy men (36.7 ± 16.1 years of age) within a 4-month period and were coregistered into standard space to yield voxelwise mean cerebrovascular reactivity interval difference measures, composing a reference interval difference atlas. Cerebrovascular reactivity interval difference maps were prepared for 11 male patients. For each patient, the test-retest difference of each voxel was scored statistically as z-values of the corresponding voxel mean difference in the reference atlas and then color-coded and superimposed on the anatomic images to create cerebrovascular reactivity interval difference z-maps. There were no significant test-to-test differences in cerebrovascular reactivity in either gray or white matter (mean gray matter, P = .431; mean white matter, P = .857; paired t test) in the healthy cohort. The patient cerebrovascular reactivity interval difference z-maps indicated regions where cerebrovascular reactivity increased or decreased and the probability that the changes were significant. Accounting for normal test-to-test differences in cerebrovascular reactivity enables the assessment of significant changes in disease status (stability, progression, or regression) in patients with time. © 2016 by American Journal of Neuroradiology.
CorSig: a general framework for estimating statistical significance of correlation and its application to gene co-expression analysis.

PubMed

Wang, Hong-Qiang; Tsai, Chung-Jui

2013-01-01

With the rapid increase of omics data, correlation analysis has become an indispensable tool for inferring meaningful associations from a large number of observations. Pearson correlation coefficient (PCC) and its variants are widely used for such purposes. However, it remains challenging to test whether an observed association is reliable both statistically and biologically. We present here a new method, CorSig, for statistical inference of correlation significance. CorSig is based on a biology-informed null hypothesis, i.e., testing whether the true PCC (ρ) between two variables is statistically larger than a user-specified PCC cutoff (τ), as opposed to the simple null hypothesis of ρ = 0 in existing methods, i.e., testing whether an association can be declared without a threshold. CorSig incorporates Fisher's Z transformation of the observed PCC (r), which facilitates use of standard techniques for p-value computation and multiple testing corrections. We compared CorSig against two methods: one uses a minimum PCC cutoff while the other (Zhu's procedure) controls correlation strength and statistical significance in two discrete steps. CorSig consistently outperformed these methods in various simulation data scenarios by balancing between false positives and false negatives. When tested on real-world Populus microarray data, CorSig effectively identified co-expressed genes in the flavonoid pathway, and discriminated between closely related gene family members for their differential association with flavonoid and lignin pathways. The p-values obtained by CorSig can be used as a stand-alone parameter for stratification of co-expressed genes according to their correlation strength in lieu of an arbitrary cutoff. CorSig requires one single tunable parameter, and can be readily extended to other correlation measures. Thus, CorSig should be useful for a wide range of applications, particularly for network analysis of high-dimensional genomic data. A web server for
How to get statistically significant effects in any ERP experiment (and why you shouldn't).

PubMed

Luck, Steven J; Gaspelin, Nicholas

2017-01-01

ERP experiments generate massive datasets, often containing thousands of values for each participant, even after averaging. The richness of these datasets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant but bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand-averaged data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multifactor statistical analyses. Reanalyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant but bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. © 2016 Society for Psychophysiological Research.
How to Get Statistically Significant Effects in Any ERP Experiment (and Why You Shouldn’t)

PubMed Central

Luck, Steven J.; Gaspelin, Nicholas

2016-01-01

Event-related potential (ERP) experiments generate massive data sets, often containing thousands of values for each participant, even after averaging. The richness of these data sets can be very useful in testing sophisticated hypotheses, but this richness also creates many opportunities to obtain effects that are statistically significant but do not reflect true differences among groups or conditions (bogus effects). The purpose of this paper is to demonstrate how common and seemingly innocuous methods for quantifying and analyzing ERP effects can lead to very high rates of significant-but-bogus effects, with the likelihood of obtaining at least one such bogus effect exceeding 50% in many experiments. We focus on two specific problems: using the grand average data to select the time windows and electrode sites for quantifying component amplitudes and latencies, and using one or more multi-factor statistical analyses. Re-analyses of prior data and simulations of typical experimental designs are used to show how these problems can greatly increase the likelihood of significant-but-bogus results. Several strategies are described for avoiding these problems and for increasing the likelihood that significant effects actually reflect true differences among groups or conditions. PMID:28000253
Significant Association of Urinary Toxic Metals and Autism-Related Symptoms—A Nonlinear Statistical Analysis with Cross Validation

PubMed Central

Adams, James; Kruger, Uwe; Geis, Elizabeth; Gehn, Eva; Fimbres, Valeria; Pollard, Elena; Mitchell, Jessica; Ingram, Julie; Hellmers, Robert; Quig, David; Hahn, Juergen

2017-01-01

Introduction A number of previous studies examined a possible association of toxic metals and autism, and over half of those studies suggest that toxic metal levels are different in individuals with Autism Spectrum Disorders (ASD). Additionally, several studies found that those levels correlate with the severity of ASD. Methods In order to further investigate these points, this paper performs the most detailed statistical analysis to date of a data set in this field. First morning urine samples were collected from 67 children and adults with ASD and 50 neurotypical controls of similar age and gender. The samples were analyzed to determine the levels of 10 urinary toxic metals (UTM). Autism-related symptoms were assessed with eleven behavioral measures. Statistical analysis was used to distinguish participants on the ASD spectrum and neurotypical participants based upon the UTM data alone. The analysis also included examining the association of autism severity with toxic metal excretion data using linear and nonlinear analysis. “Leave-one-out” cross-validation was used to ensure statistical independence of results. Results and Discussion Average excretion levels of several toxic metals (lead, tin, thallium, antimony) were significantly higher in the ASD group. However, ASD classification using univariate statistics proved difficult due to large variability, but nonlinear multivariate statistical analysis significantly improved ASD classification with Type I/II errors of 15% and 18%, respectively. These results clearly indicate that the urinary toxic metal excretion profiles of participants in the ASD group were significantly different from those of the neurotypical participants. Similarly, nonlinear methods determined a significantly stronger association between the behavioral measures and toxic metal excretion. The association was strongest for the Aberrant Behavior Checklist (including subscales on Irritability, Stereotypy, Hyperactivity, and Inappropriate
Statistical significance estimation of a signal within the GooFit framework on GPUs

NASA Astrophysics Data System (ADS)

Cristella, Leonardo; Di Florio, Adriano; Pompili, Alexis

2017-03-01

In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B+ → J/ψϕK+. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
A Modified Delphi to Identify the Significant Works Pertaining to the Understanding of Reading Comprehension and Content Analysis of the Identified Works

ERIC Educational Resources Information Center

Zunker, Norma D.; Pearce, Daniel L.

2012-01-01

The first part of this study explored the significant works pertaining to the understanding of reading comprehension using a Modified Delphi Method. A panel of reading comprehension experts identified 19 works they considered to be significant to the understanding of reading comprehension. The panel of experts identified the reasons they…
Use of a spatial scan statistic to identify clusters of births occurring outside Ghanaian health facilities for targeted intervention.

PubMed

Bosomprah, Samuel; Dotse-Gborgbortsi, Winfred; Aboagye, Patrick; Matthews, Zoe

2016-11-01

To identify and evaluate clusters of births that occurred outside health facilities in Ghana for targeted intervention. A retrospective study was conducted using a convenience sample of live births registered in Ghanaian health facilities from January 1 to December 31, 2014. Data were extracted from the district health information system. A spatial scan statistic was used to investigate clusters of home births through a discrete Poisson probability model. Scanning with a circular spatial window was conducted only for clusters with high rates of such deliveries. The district was used as the geographic unit of analysis. The likelihood P value was estimated using Monte Carlo simulations. Ten statistically significant clusters with a high rate of home birth were identified. The relative risks ranged from 1.43 ("least likely" cluster; P=0.001) to 1.95 ("most likely" cluster; P=0.001). The relative risks of the top five "most likely" clusters ranged from 1.68 to 1.95; these clusters were located in Ashanti, Brong Ahafo, and the Western, Eastern, and Greater regions of Accra. Health facility records, geospatial techniques, and geographic information systems provided locally relevant information to assist policy makers in delivering targeted interventions to small geographic areas. Copyright © 2016 International Federation of Gynecology and Obstetrics. Published by Elsevier Ireland Ltd. All rights reserved.
The intriguing evolution of effect sizes in biomedical research over time: smaller but more often statistically significant.

PubMed

Monsarrat, Paul; Vergnes, Jean-Noel

2018-01-01

In medicine, effect sizes (ESs) allow the effects of independent variables (including risk/protective factors or treatment interventions) on dependent variables (e.g., health outcomes) to be quantified. Given that many public health decisions and health care policies are based on ES estimates, it is important to assess how ESs are used in the biomedical literature and to investigate potential trends in their reporting over time. Through a big data approach, the text mining process automatically extracted 814 120 ESs from 13 322 754 PubMed abstracts. Eligible ESs were risk ratio, odds ratio, and hazard ratio, along with their confidence intervals. Here we show a remarkable decrease of ES values in PubMed abstracts between 1990 and 2015 while, concomitantly, results become more often statistically significant. Medians of ES values have decreased over time for both "risk" and "protective" values. This trend was found in nearly all fields of biomedical research, with the most marked downward tendency in genetics. Over the same period, the proportion of statistically significant ESs increased regularly: among the abstracts with at least 1 ES, 74% were statistically significant in 1990-1995, vs 85% in 2010-2015. whereas decreasing ESs could be an intrinsic evolution in biomedical research, the concomitant increase of statistically significant results is more intriguing. Although it is likely that growing sample sizes in biomedical research could explain these results, another explanation may lie in the "publish or perish" context of scientific research, with the probability of a growing orientation toward sensationalism in research reports. Important provisions must be made to improve the credibility of biomedical research and limit waste of resources. © The Authors 2017. Published by Oxford University Press.
Use of electronic data and existing screening tools to identify clinically significant obstructive sleep apnea.

PubMed

Severson, Carl A; Pendharkar, Sachin R; Ronksley, Paul E; Tsai, Willis H

2015-01-01

To assess the ability of electronic health data and existing screening tools to identify clinically significant obstructive sleep apnea (OSA), as defined by symptomatic or severe OSA. The present retrospective cohort study of 1041 patients referred for sleep diagnostic testing was undertaken at a tertiary sleep centre in Calgary, Alberta. A diagnosis of clinically significant OSA or an alternative sleep diagnosis was assigned to each patient through blinded independent chart review by two sleep physicians. Predictive variables were identified from online questionnaire data, and diagnostic algorithms were developed. The performance of electronically derived algorithms for identifying patients with clinically significant OSA was determined. Diagnostic performance of these algorithms was compared with versions of the STOP-Bang questionnaire and adjusted neck circumference score (ANC) derived from electronic data. Electronic questionnaire data were highly sensitive (>95%) at identifying clinically significant OSA, but not specific. Sleep diagnostic testing-determined respiratory disturbance index was very specific (specificity ≥95%) for clinically relevant disease, but not sensitive (<35%). Derived algorithms had similar accuracy to the STOP-Bang or ANC, but required fewer questions and calculations. These data suggest that a two-step process using a small number of clinical variables (maximizing sensitivity) and objective diagnostic testing (maximizing specificity) is required to identify clinically significant OSA. When used in an online setting, simple algorithms can identify clinically relevant OSA with similar performance to existing decision rules such as the STOP-Bang or ANC.
Identifying Minefields and Verifying Clearance: Adapting Statistical Methods for UXO Target Detection

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gilbert, Richard O.; O'Brien, Robert F.; Wilson, John E.

2003-09-01

It may not be feasible to completely survey large tracts of land suspected of containing minefields. It is desirable to develop a characterization protocol that will confidently identify minefields within these large land tracts if they exist. Naturally, surveying areas of greatest concern and most likely locations would be necessary but will not provide the needed confidence that an unknown minefield had not eluded detection. Once minefields are detected, methods are needed to bound the area that will require detailed mine detection surveys. The US Department of Defense Strategic Environmental Research and Development Program (SERDP) is sponsoring the development ofmore » statistical survey methods and tools for detecting potential UXO targets. These methods may be directly applicable to demining efforts. Statistical methods are employed to determine the optimal geophysical survey transect spacing to have confidence of detecting target areas of a critical size, shape, and anomaly density. Other methods under development determine the proportion of a land area that must be surveyed to confidently conclude that there are no UXO present. Adaptive sampling schemes are also being developed as an approach for bounding the target areas. These methods and tools will be presented and the status of relevant research in this area will be discussed.« less
A Note on Comparing the Power of Test Statistics at Low Significance Levels.

PubMed

Morris, Nathan; Elston, Robert

2011-01-01

It is an obvious fact that the power of a test statistic is dependent upon the significance (alpha) level at which the test is performed. It is perhaps a less obvious fact that the relative performance of two statistics in terms of power is also a function of the alpha level. Through numerous personal discussions, we have noted that even some competent statisticians have the mistaken intuition that relative power comparisons at traditional levels such as α = 0.05 will be roughly similar to relative power comparisons at very low levels, such as the level α = 5 × 10 -8 , which is commonly used in genome-wide association studies. In this brief note, we demonstrate that this notion is in fact quite wrong, especially with respect to comparing tests with differing degrees of freedom. In fact, at very low alpha levels the cost of additional degrees of freedom is often comparatively low. Thus we recommend that statisticians exercise caution when interpreting the results of power comparison studies which use alpha levels that will not be used in practice.
Null-space and statistical significance of first-arrival traveltime inversion

NASA Astrophysics Data System (ADS)

Morozov, Igor B.

2004-03-01

The strong uncertainty inherent in the traveltime inversion of first arrivals from surface sources is usually removed by using a priori constraints or regularization. This leads to the null-space (data-independent model variability) being inadequately sampled, and consequently, model uncertainties may be underestimated in traditional (such as checkerboard) resolution tests. To measure the full null-space model uncertainties, we use unconstrained Monte Carlo inversion and examine the statistics of the resulting model ensembles. In an application to 1-D first-arrival traveltime inversion, the τ-p method is used to build a set of models that are equivalent to the IASP91 model within small, ~0.02 per cent, time deviations. The resulting velocity variances are much larger, ~2-3 per cent within the regions above the mantle discontinuities, and are interpreted as being due to the null-space. Depth-variant depth averaging is required for constraining the velocities within meaningful bounds, and the averaging scalelength could also be used as a measure of depth resolution. Velocity variances show structure-dependent, negative correlation with the depth-averaging scalelength. Neither the smoothest (Herglotz-Wiechert) nor the mean velocity-depth functions reproduce the discontinuities in the IASP91 model; however, the discontinuities can be identified by the increased null-space velocity (co-)variances. Although derived for a 1-D case, the above conclusions also relate to higher dimensions.

An application of statistics to comparative metagenomics

PubMed Central

Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

2006-01-01

Background Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Results Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. Conclusion The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems. PMID:16549025
An application of statistics to comparative metagenomics.

PubMed

Rodriguez-Brito, Beltran; Rohwer, Forest; Edwards, Robert A

2006-03-20

Metagenomics, sequence analyses of genomic DNA isolated directly from the environments, can be used to identify organisms and model community dynamics of a particular ecosystem. Metagenomics also has the potential to identify significantly different metabolic potential in different environments. Here we use a statistical method to compare curated subsystems, to predict the physiology, metabolism, and ecology from metagenomes. This approach can be used to identify those subsystems that are significantly different between metagenome sequences. Subsystems that were overrepresented in the Sargasso Sea and Acid Mine Drainage metagenome when compared to non-redundant databases were identified. The methodology described herein applies statistics to the comparisons of metabolic potential in metagenomes. This analysis reveals those subsystems that are more, or less, represented in the different environments that are compared. These differences in metabolic potential lead to several testable hypotheses about physiology and metabolism of microbes from these ecosystems.
Bulk tank somatic cell counts analyzed by statistical process control tools to identify and monitor subclinical mastitis incidence.

PubMed

Lukas, J M; Hawkins, D M; Kinsel, M L; Reneau, J K

2005-11-01

The objective of this study was to examine the relationship between monthly Dairy Herd Improvement (DHI) subclinical mastitis and new infection rate estimates and daily bulk tank somatic cell count (SCC) summarized by statistical process control tools. Dairy Herd Improvement Association test-day subclinical mastitis and new infection rate estimates along with daily or every other day bulk tank SCC data were collected for 12 mo of 2003 from 275 Upper Midwest dairy herds. Herds were divided into 5 herd production categories. A linear score [LNS = ln(BTSCC/100,000)/0.693147 + 3] was calculated for each individual bulk tank SCC. For both the raw SCC and the transformed data, the mean and sigma were calculated using the statistical quality control individual measurement and moving range chart procedure of Statistical Analysis System. One hundred eighty-three herds of the 275 herds from the study data set were then randomly selected and the raw (method 1) and transformed (method 2) bulk tank SCC mean and sigma were used to develop models for predicting subclinical mastitis and new infection rate estimates. Herd production category was also included in all models as 5 dummy variables. Models were validated by calculating estimates of subclinical mastitis and new infection rates for the remaining 92 herds and plotting them against observed values of each of the dependents. Only herd production category and bulk tank SCC mean were significant and remained in the final models. High R2 values (0.83 and 0.81 for methods 1 and 2, respectively) indicated a strong correlation between the bulk tank SCC and herd's subclinical mastitis prevalence. The standard errors of the estimate were 4.02 and 4.28% for methods 1 and 2, respectively, and decreased with increasing herd production. As a case study, Shewhart Individual Measurement Charts were plotted from the bulk tank SCC to identify shifts in mastitis incidence. Four of 5 charts examined signaled a change in bulk tank SCC before
Testing for significance of phase synchronisation dynamics in the EEG.

PubMed

Daly, Ian; Sweeney-Reed, Catherine M; Nasuto, Slawomir J

2013-06-01

A number of tests exist to check for statistical significance of phase synchronisation within the Electroencephalogram (EEG); however, the majority suffer from a lack of generality and applicability. They may also fail to account for temporal dynamics in the phase synchronisation, regarding synchronisation as a constant state instead of a dynamical process. Therefore, a novel test is developed for identifying the statistical significance of phase synchronisation based upon a combination of work characterising temporal dynamics of multivariate time-series and Markov modelling. We show how this method is better able to assess the significance of phase synchronisation than a range of commonly used significance tests. We also show how the method may be applied to identify and classify significantly different phase synchronisation dynamics in both univariate and multivariate datasets.
Power analysis as a tool to identify statistically informative indicators for monitoring coral reef disturbances.

PubMed

Van Wynsberge, Simon; Gilbert, Antoine; Guillemot, Nicolas; Heintz, Tom; Tremblay-Boyer, Laura

2017-07-01

Extensive biological field surveys are costly and time consuming. To optimize sampling and ensure regular monitoring on the long term, identifying informative indicators of anthropogenic disturbances is a priority. In this study, we used 1800 candidate indicators by combining metrics measured from coral, fish, and macro-invertebrate assemblages surveyed from 2006 to 2012 in the vicinity of an ongoing mining project in the Voh-Koné-Pouembout lagoon, New Caledonia. We performed a power analysis to identify a subset of indicators which would best discriminate temporal changes due to a simulated chronic anthropogenic impact. Only 4% of tested indicators were likely to detect a 10% annual decrease of values with sufficient power (>0.80). Corals generally exerted higher statistical power than macro-invertebrates and fishes because of lower natural variability and higher occurrence. For the same reasons, higher taxonomic ranks provided higher power than lower taxonomic ranks. Nevertheless, a number of families of common sedentary or sessile macro-invertebrates and fishes also performed well in detecting changes: Echinometridae, Isognomidae, Muricidae, Tridacninae, Arcidae, and Turbinidae for macro-invertebrates and Pomacentridae, Labridae, and Chaetodontidae for fishes. Interestingly, these families did not provide high power in all geomorphological strata, suggesting that the ability of indicators in detecting anthropogenic impacts was closely linked to reef geomorphology. This study provides a first operational step toward identifying statistically relevant indicators of anthropogenic disturbances in New Caledonia's coral reefs, which can be useful in similar tropical reef ecosystems where little information is available regarding the responses of ecological indicators to anthropogenic disturbances.
The accuracy of physical examination in identifying significant pathologies in penetrating thoracic trauma.

PubMed

Kong, V Y; Sartorius, B; Clarke, D L

2015-12-01

Accurate physical examination (PE) remains a key component in the assessment of penetrating thoracic trauma (PTT), despite the increasing availability of advanced radiological imaging. Evidence regarding the accuracy of PE in identifying significant pathology following PTT is limited. A retrospective review of 405 patients was undertaken over a twelve-month period to determine the accuracy of PE in identifying significant pathology (SP) subsequently confirmed on chest radiographs (CXRs) in patients who sustained stab injuries to the thorax. Ninety-seven per cent (372/405) of patients were males, and the mean age was 24 years. The weapons involved were knives in 98 % (398/405), screwdrivers in 1 % (3/405) and unknown in the remaining 1 %. Fifty-nine per cent (238/405) of all injuries were on the left side. There were 306 (76 %) SPs identified on CXR. Ninety-nine (24 %) CXRs were entirely normal. Based on PE alone, 223 (55 %) patients were thought to have SPs present, 182 (45 %) patients were thought to have no SPs. The overall sensitivity of PE in identifying SPs was 68 % (63-73, 95 % CI), with a specificity of 86 % (77-92, 95 % CI). The PPV of PE was 94 % (90-97, 95 % CI) and the NPV was 47 % (39-54, 95 % CI). The sensitivity of PE for identifying a pneumothorax was 59 % (51-66, 95 % CI), with a specificity of 96 % (89-99, 95 % CI) and the sensitivity of PE for identifying a haemothorax was 79 % (72-86, 95 % CI), with a specificity of 96 % (89-99, 95 % CI). PE is inaccurate in identifying SPs in PTT. The increased reliance on advanced radiological imaging and the subsequent reduced emphasis on PE may have contributed to rapid deskilling amongst surgical residents. The importance of PE must be repeatedly re-emphasised.
Using Statistical Process Control Charts to Identify the Steroids Era in Major League Baseball: An Educational Exercise

ERIC Educational Resources Information Center

Hill, Stephen E.; Schvaneveldt, Shane J.

2011-01-01

This article presents an educational exercise in which statistical process control charts are constructed and used to identify the Steroids Era in American professional baseball. During this period (roughly 1993 until the present), numerous baseball players were alleged or proven to have used banned, performance-enhancing drugs. Also observed…
Use of Tests of Statistical Significance and Other Analytic Choices in a School Psychology Journal: Review of Practices and Suggested Alternatives.

ERIC Educational Resources Information Center

Snyder, Patricia A.; Thompson, Bruce

The use of tests of statistical significance was explored, first by reviewing some criticisms of contemporary practice in the use of statistical tests as reflected in a series of articles in the "American Psychologist" and in the appointment of a "Task Force on Statistical Inference" by the American Psychological Association…
Identifying significant environmental features using feature recognition.

DOT National Transportation Integrated Search

2015-10-01

The Department of Environmental Analysis at the Kentucky Transportation Cabinet has expressed an interest in feature-recognition capability because it may help analysts identify environmentally sensitive features in the landscape, : including those r...
Consomic mouse strain selection based on effect size measurement, statistical significance testing and integrated behavioral z-scoring: focus on anxiety-related behavior and locomotion.

PubMed

Labots, M; Laarakker, M C; Ohl, F; van Lith, H A

2016-06-29

Selecting chromosome substitution strains (CSSs, also called consomic strains/lines) used in the search for quantitative trait loci (QTLs) consistently requires the identification of the respective phenotypic trait of interest and is simply based on a significant difference between a consomic and host strain. However, statistical significance as represented by P values does not necessarily predicate practical importance. We therefore propose a method that pays attention to both the statistical significance and the actual size of the observed effect. The present paper extends on this approach and describes in more detail the use of effect size measures (Cohen's d, partial eta squared - η p (2) ) together with the P value as statistical selection parameters for the chromosomal assignment of QTLs influencing anxiety-related behavior and locomotion in laboratory mice. The effect size measures were based on integrated behavioral z-scoring and were calculated in three experiments: (A) a complete consomic male mouse panel with A/J as the donor strain and C57BL/6J as the host strain. This panel, including host and donor strains, was analyzed in the modified Hole Board (mHB). The consomic line with chromosome 19 from A/J (CSS-19A) was selected since it showed increased anxiety-related behavior, but similar locomotion compared to its host. (B) Following experiment A, female CSS-19A mice were compared with their C57BL/6J counterparts; however no significant differences and effect sizes close to zero were found. (C) A different consomic mouse strain (CSS-19PWD), with chromosome 19 from PWD/PhJ transferred on the genetic background of C57BL/6J, was compared with its host strain. Here, in contrast with CSS-19A, there was a decreased overall anxiety in CSS-19PWD compared to C57BL/6J males, but not locomotion. This new method shows an improved way to identify CSSs for QTL analysis for anxiety-related behavior using a combination of statistical significance testing and effect
42 CFR 137.22 - May the Secretary consider uncorrected significant and material audit exceptions identified...

Code of Federal Regulations, 2010 CFR

2010-10-01

... and material audit exceptions identified regarding centralized financial and administrative functions... Tribes for Participation in Self-Governance Planning Phase § 137.22 May the Secretary consider uncorrected significant and material audit exceptions identified regarding centralized financial and...
Assay optimization: a statistical design of experiments approach.

PubMed

Altekar, Maneesha; Homon, Carol A; Kashem, Mohammed A; Mason, Steven W; Nelson, Richard M; Patnaude, Lori A; Yingling, Jeffrey; Taylor, Paul B

2007-03-01

With the transition from manual to robotic HTS in the last several years, assay optimization has become a significant bottleneck. Recent advances in robotic liquid handling have made it feasible to reduce assay optimization timelines with the application of statistically designed experiments. When implemented, they can efficiently optimize assays by rapidly identifying significant factors, complex interactions, and nonlinear responses. This article focuses on the use of statistically designed experiments in assay optimization.
A systematic review and meta-analysis of acute stroke unit care: What’s beyond the statistical significance?

PubMed Central

2013-01-01

Background The benefits of stroke unit care in terms of reducing death, dependency and institutional care were demonstrated in a 2009 Cochrane review carried out by the Stroke Unit Trialists’ Collaboration. Methods As requested by the Belgian health authorities, a systematic review and meta-analysis of the effect of acute stroke units was performed. Clinical trials mentioned in the original Cochrane review were included. In addition, an electronic database search on Medline, Embase, the Cochrane Central Register of Controlled Trials, and Physiotherapy Evidence Database (PEDro) was conducted to identify trials published since 2006. Trials investigating acute stroke units compared to alternative care were eligible for inclusion. Study quality was appraised according to the criteria recommended by Scottish Intercollegiate Guidelines Network (SIGN) and the GRADE system. In the meta-analysis, dichotomous outcomes were estimated by calculating odds ratios (OR) and continuous outcomes were estimated by calculating standardized mean differences. The weight of a study was calculated based on inverse variance. Results Evidence from eight trials comparing acute stroke unit and conventional care (general medical ward) were retained for the main synthesis and analysis. The findings from this study were broadly in line with the original Cochrane review: acute stroke units can improve survival and independency, as well as reduce the chance of hospitalization and the length of inpatient stay. The improvement with stroke unit care on mortality was less conclusive and only reached borderline level of significance (OR 0.84, 95% CI 0.70 to 1.00, P = 0.05). This improvement became statistically non-significant (OR 0.87, 95% CI 0.74 to 1.03, P = 0.12) when data from two unpublished trials (Goteborg-Ostra and Svendborg) were added to the analysis. After further also adding two additional trials (Beijing, Stockholm) with very short observation periods (until discharge), the
Testing earthquake prediction algorithms: Statistically significant advance prediction of the largest earthquakes in the Circum-Pacific, 1992-1997

USGS Publications Warehouse

Kossobokov, V.G.; Romashkova, L.L.; Keilis-Borok, V. I.; Healy, J.H.

1999-01-01

Algorithms M8 and MSc (i.e., the Mendocino Scenario) were used in a real-time intermediate-term research prediction of the strongest earthquakes in the Circum-Pacific seismic belt. Predictions are made by M8 first. Then, the areas of alarm are reduced by MSc at the cost that some earthquakes are missed in the second approximation of prediction. In 1992-1997, five earthquakes of magnitude 8 and above occurred in the test area: all of them were predicted by M8 and MSc identified correctly the locations of four of them. The space-time volume of the alarms is 36% and 18%, correspondingly, when estimated with a normalized product measure of empirical distribution of epicenters and uniform time. The statistical significance of the achieved results is beyond 99% both for M8 and MSc. For magnitude 7.5 + , 10 out of 19 earthquakes were predicted by M8 in 40% and five were predicted by M8-MSc in 13% of the total volume considered. This implies a significance level of 81% for M8 and 92% for M8-MSc. The lower significance levels might result from a global change in seismic regime in 1993-1996, when the rate of the largest events has doubled and all of them become exclusively normal or reversed faults. The predictions are fully reproducible; the algorithms M8 and MSc in complete formal definitions were published before we started our experiment [Keilis-Borok, V.I., Kossobokov, V.G., 1990. Premonitory activation of seismic flow: Algorithm M8, Phys. Earth and Planet. Inter. 61, 73-83; Kossobokov, V.G., Keilis-Borok, V.I., Smith, S.W., 1990. Localization of intermediate-term earthquake prediction, J. Geophys. Res., 95, 19763-19772; Healy, J.H., Kossobokov, V.G., Dewey, J.W., 1992. A test to evaluate the earthquake prediction algorithm, M8. U.S. Geol. Surv. OFR 92-401]. M8 is available from the IASPEI Software Library [Healy, J.H., Keilis-Borok, V.I., Lee, W.H.K. (Eds.), 1997. Algorithms for Earthquake Statistics and Prediction, Vol. 6. IASPEI Software Library]. ?? 1999 Elsevier
Back to the basics: Identifying and addressing underlying challenges in achieving high quality and relevant health statistics for indigenous populations in Canada

PubMed Central

Smylie, Janet; Firestone, Michelle

2015-01-01

Canada is known internationally for excellence in both the quality and public policy relevance of its health and social statistics. There is a double standard however with respect to the relevance and quality of statistics for Indigenous populations in Canada. Indigenous specific health and social statistics gathering is informed by unique ethical, rights-based, policy and practice imperatives regarding the need for Indigenous participation and leadership in Indigenous data processes throughout the spectrum of indicator development, data collection, management, analysis and use. We demonstrate how current Indigenous data quality challenges including misclassification errors and non-response bias systematically contribute to a significant underestimate of inequities in health determinants, health status, and health care access between Indigenous and non-Indigenous people in Canada. The major quality challenge underlying these errors and biases is the lack of Indigenous specific identifiers that are consistent and relevant in major health and social data sources. The recent removal of an Indigenous identity question from the Canadian census has resulted in further deterioration of an already suboptimal system. A revision of core health data sources to include relevant, consistent, and inclusive Indigenous self-identification is urgently required. These changes need to be carried out in partnership with Indigenous peoples and their representative and governing organizations. PMID:26793283
Understanding Statistics and Statistics Education: A Chinese Perspective

ERIC Educational Resources Information Center

Shi, Ning-Zhong; He, Xuming; Tao, Jian

2009-01-01

In recent years, statistics education in China has made great strides. However, there still exists a fairly large gap with the advanced levels of statistics education in more developed countries. In this paper, we identify some existing problems in statistics education in Chinese schools and make some proposals as to how they may be overcome. We…
Are Mammographically Occult Additional Tumors Identified More Than 2 Cm Away From the Primary Breast Cancer on MRI Clinically Significant?

PubMed

Goodman, Sarah; Mango, Victoria; Friedlander, Lauren; Desperito, Elise; Wynn, Ralph; Ha, Richard

2018-06-08

To evaluate the clinical significance of mammographically occult additional tumors identified more than 2cm away from the primary breast cancer on preoperative magnetic resonance imaging (MRI). An Institutional Review Board approved review of consecutive preoperative breast MRIs performed from 1/1/08 to 12/31/14, yielded 667 patients with breast cancer. These patients underwent further assessment to identify biopsy proven mammographically occult breast tumors located more than 2cm away from the edge of the primary tumor. Additional MRI characteristics of the primary and secondary tumors and pathology were reviewed. Statistical analysis was performed using SPSS (v. 24). Of 667 patients with breast cancer, 129 patients had 150 additional ipsilateral mammographically occult tumors that were more than 2cm away from the edge of the primary tumor. One hundred twelve of 129 (86.8%) patients had one additional tumor and 17/129 (13.2%) had two or more additional tumors. In 71/129 (55.0%), additional tumors were located in a different quadrant and in 58/129 (45.0%) additional tumors were in the same quadrant but ≥2cm away. Overall, primary tumor size was significantly larger (mean 1.87± 1.25 cm) than the additional tumors (mean 0.79 ± 0.61cm, p < 0.001). However, in 20/129 (15.5%) the additional tumor was larger and in 26/129 (20.2%) the additional tumor was ≥1cm. The primary tumor was significantly more likely to be invasive (81.4%, 105/129) compared to additional tumors (70%, 105/150, p = 0.03). In 9/129 (7.0%) patients, additional tumors yielded unsuspected invasive cancer orhigher tumor grade. The additional tumor was more likely to be nonmass lesion type (37.3% vs 24% p = 0.02) and focus lesion type (10% vs 0.08%, p < 0.001) compared to primary tumor. Mammographically occult additional tumors identified more than 2cm away from the primary breast tumor on MRI are unlikely to be surgically treated if undiagnosed and may be clinically significant. Copyright �
"Clinical" Significance: "Clinical" Significance and "Practical" Significance are NOT the Same Things

ERIC Educational Resources Information Center

Peterson, Lisa S.

2008-01-01

Clinical significance is an important concept in research, particularly in education and the social sciences. The present article first compares clinical significance to other measures of "significance" in statistics. The major methods used to determine clinical significance are explained and the strengths and weaknesses of clinical significance…
Genome-wide association study identifies TF as a significant modifier gene of iron metabolism in HFE hemochromatosis.

PubMed

de Tayrac, Marie; Roth, Marie-Paule; Jouanolle, Anne-Marie; Coppin, Hélène; le Gac, Gérald; Piperno, Alberto; Férec, Claude; Pelucchi, Sara; Scotet, Virginie; Bardou-Jacquet, Edouard; Ropert, Martine; Bouvet, Régis; Génin, Emmanuelle; Mosser, Jean; Deugnier, Yves

2015-03-01

Hereditary hemochromatosis (HH) is the most common form of genetic iron loading disease. It is mainly related to the homozygous C282Y/C282Y mutation in the HFE gene that is, however, a necessary but not a sufficient condition to develop clinical and even biochemical HH. This suggests that modifier genes are likely involved in the expressivity of the disease. Our aim was to identify such modifier genes. We performed a genome-wide association study (GWAS) using DNA collected from 474 unrelated C282Y homozygotes. Associations were examined for both quantitative iron burden indices and clinical outcomes with 534,213 single nucleotide polymorphisms (SNP) genotypes, with replication analyses in an independent sample of 748 C282Y homozygotes from four different European centres. One SNP met genome-wide statistical significance for association with transferrin concentration (rs3811647, GWAS p value of 7×10(-9) and replication p value of 5×10(-13)). This SNP, located within intron 11 of the TF gene, had a pleiotropic effect on serum iron (GWAS p value of 4.9×10(-6) and replication p value of 3.2×10(-6)). Both serum transferrin and iron levels were associated with serum ferritin levels, amount of iron removed and global clinical stage (p<0.01). Serum iron levels were also associated with fibrosis stage (p<0.0001). This GWAS, the largest one performed so far in unselected HFE-associated HH (HFE-HH) patients, identified the rs3811647 polymorphism in the TF gene as the only SNP significantly associated with iron metabolism through serum transferrin and iron levels. Because these two outcomes were clearly associated with the biochemical and clinical expression of the disease, an indirect link between the rs3811647 polymorphism and the phenotypic presentation of HFE-HH is likely. Copyright © 2014 European Association for the Study of the Liver. Published by Elsevier B.V. All rights reserved.
The distribution of P-values in medical research articles suggested selective reporting associated with statistical significance.

PubMed

Perneger, Thomas V; Combescure, Christophe

2017-07-01

Published P-values provide a window into the global enterprise of medical research. The aim of this study was to use the distribution of published P-values to estimate the relative frequencies of null and alternative hypotheses and to seek irregularities suggestive of publication bias. This cross-sectional study included P-values published in 120 medical research articles in 2016 (30 each from the BMJ, JAMA, Lancet, and New England Journal of Medicine). The observed distribution of P-values was compared with expected distributions under the null hypothesis (i.e., uniform between 0 and 1) and the alternative hypothesis (strictly decreasing from 0 to 1). P-values were categorized according to conventional levels of statistical significance and in one-percent intervals. Among 4,158 recorded P-values, 26.1% were highly significant (P < 0.001), 9.1% were moderately significant (P ≥ 0.001 to < 0.01), 11.7% were weakly significant (P ≥ 0.01 to < 0.05), and 53.2% were nonsignificant (P ≥ 0.05). We noted three irregularities: (1) high proportion of P-values <0.001, especially in observational studies, (2) excess of P-values equal to 1, and (3) about twice as many P-values less than 0.05 compared with those more than 0.05. The latter finding was seen in both randomized trials and observational studies, and in most types of analyses, excepting heterogeneity tests and interaction tests. Under plausible assumptions, we estimate that about half of the tested hypotheses were null and the other half were alternative. This analysis suggests that statistical tests published in medical journals are not a random sample of null and alternative hypotheses but that selective reporting is prevalent. In particular, significant results are about twice as likely to be reported as nonsignificant results. Copyright © 2017 Elsevier Inc. All rights reserved.

The thresholds for statistical and clinical significance – a five-step procedure for evaluation of intervention effects in randomised clinical trials

PubMed Central

2014-01-01

Background Thresholds for statistical significance are insufficiently demonstrated by 95% confidence intervals or P-values when assessing results from randomised clinical trials. First, a P-value only shows the probability of getting a result assuming that the null hypothesis is true and does not reflect the probability of getting a result assuming an alternative hypothesis to the null hypothesis is true. Second, a confidence interval or a P-value showing significance may be caused by multiplicity. Third, statistical significance does not necessarily result in clinical significance. Therefore, assessment of intervention effects in randomised clinical trials deserves more rigour in order to become more valid. Methods Several methodologies for assessing the statistical and clinical significance of intervention effects in randomised clinical trials were considered. Balancing simplicity and comprehensiveness, a simple five-step procedure was developed. Results For a more valid assessment of results from a randomised clinical trial we propose the following five-steps: (1) report the confidence intervals and the exact P-values; (2) report Bayes factor for the primary outcome, being the ratio of the probability that a given trial result is compatible with a ‘null’ effect (corresponding to the P-value) divided by the probability that the trial result is compatible with the intervention effect hypothesised in the sample size calculation; (3) adjust the confidence intervals and the statistical significance threshold if the trial is stopped early or if interim analyses have been conducted; (4) adjust the confidence intervals and the P-values for multiplicity due to number of outcome comparisons; and (5) assess clinical significance of the trial results. Conclusions If the proposed five-step procedure is followed, this may increase the validity of assessments of intervention effects in randomised clinical trials. PMID:24588900
A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

PubMed Central

Eddy, Sean R.

2008-01-01

Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments. PMID:18516236
Conducting tests for statistically significant differences using forest inventory data

Treesearch

James A. Westfall; Scott A. Pugh; John W. Coulston

2013-01-01

Many forest inventory and monitoring programs are based on a sample of ground plots from which estimates of forest resources are derived. In addition to evaluating metrics such as number of trees or amount of cubic wood volume, it is often desirable to make comparisons between resource attributes. To properly conduct statistical tests for differences, it is imperative...
To be or not to be associated: power study of four statistical modeling approaches to identify parasite associations in cross-sectional studies

PubMed Central

Vaumourin, Elise; Vourc'h, Gwenaël; Telfer, Sandra; Lambin, Xavier; Salih, Diaeldin; Seitzer, Ulrike; Morand, Serge; Charbonnel, Nathalie; Vayssier-Taussat, Muriel; Gasqui, Patrick

2014-01-01

A growing number of studies are reporting simultaneous infections by parasites in many different hosts. The detection of whether these parasites are significantly associated is important in medicine and epidemiology. Numerous approaches to detect associations are available, but only a few provide statistical tests. Furthermore, they generally test for an overall detection of association and do not identify which parasite is associated with which other one. Here, we developed a new approach, the association screening approach, to detect the overall and the detail of multi-parasite associations. We studied the power of this new approach and of three other known ones (i.e., the generalized chi-square, the network and the multinomial GLM approaches) to identify parasite associations either due to parasite interactions or to confounding factors. We applied these four approaches to detect associations within two populations of multi-infected hosts: (1) rodents infected with Bartonella sp., Babesia microti and Anaplasma phagocytophilum and (2) bovine population infected with Theileria sp. and Babesia sp. We found that the best power is obtained with the screening model and the generalized chi-square test. The differentiation between associations, which are due to confounding factors and parasite interactions was not possible. The screening approach significantly identified associations between Bartonella doshiae and B. microti, and between T. parva, T. mutans, and T. velifera. Thus, the screening approach was relevant to test the overall presence of parasite associations and identify the parasite combinations that are significantly over- or under-represented. Unraveling whether the associations are due to real biological interactions or confounding factors should be further investigated. Nevertheless, in the age of genomics and the advent of new technologies, it is a considerable asset to speed up researches focusing on the mechanisms driving interactions between
Statistical significance of seasonal warming/cooling trends

NASA Astrophysics Data System (ADS)

Ludescher, Josef; Bunde, Armin; Schellnhuber, Hans Joachim

2017-04-01

The question whether a seasonal climate trend (e.g., the increase of summer temperatures in Antarctica in the last decades) is of anthropogenic or natural origin is of great importance for mitigation and adaption measures alike. The conventional significance analysis assumes that (i) the seasonal climate trends can be quantified by linear regression, (ii) the different seasonal records can be treated as independent records, and (iii) the persistence in each of these seasonal records can be characterized by short-term memory described by an autoregressive process of first order. Here we show that assumption ii is not valid, due to strong intraannual correlations by which different seasons are correlated. We also show that, even in the absence of correlations, for Gaussian white noise, the conventional analysis leads to a strong overestimation of the significance of the seasonal trends, because multiple testing has not been taken into account. In addition, when the data exhibit long-term memory (which is the case in most climate records), assumption iii leads to a further overestimation of the trend significance. Combining Monte Carlo simulations with the Holm-Bonferroni method, we demonstrate how to obtain reliable estimates of the significance of the seasonal climate trends in long-term correlated records. For an illustration, we apply our method to representative temperature records from West Antarctica, which is one of the fastest-warming places on Earth and belongs to the crucial tipping elements in the Earth system.
A Network-Based Method to Assess the Statistical Significance of Mild Co-Regulation Effects

PubMed Central

Horvát, Emőke-Ágnes; Zhang, Jitao David; Uhlmann, Stefan; Sahin, Özgür; Zweig, Katharina Anna

2013-01-01

Recent development of high-throughput, multiplexing technology has initiated projects that systematically investigate interactions between two types of components in biological networks, for instance transcription factors and promoter sequences, or microRNAs (miRNAs) and mRNAs. In terms of network biology, such screening approaches primarily attempt to elucidate relations between biological components of two distinct types, which can be represented as edges between nodes in a bipartite graph. However, it is often desirable not only to determine regulatory relationships between nodes of different types, but also to understand the connection patterns of nodes of the same type. Especially interesting is the co-occurrence of two nodes of the same type, i.e., the number of their common neighbours, which current high-throughput screening analysis fails to address. The co-occurrence gives the number of circumstances under which both of the biological components are influenced in the same way. Here we present SICORE, a novel network-based method to detect pairs of nodes with a statistically significant co-occurrence. We first show the stability of the proposed method on artificial data sets: when randomly adding and deleting observations we obtain reliable results even with noise exceeding the expected level in large-scale experiments. Subsequently, we illustrate the viability of the method based on the analysis of a proteomic screening data set to reveal regulatory patterns of human microRNAs targeting proteins in the EGFR-driven cell cycle signalling system. Since statistically significant co-occurrence may indicate functional synergy and the mechanisms underlying canalization, and thus hold promise in drug target identification and therapeutic development, we provide a platform-independent implementation of SICORE with a graphical user interface as a novel tool in the arsenal of high-throughput screening analysis. PMID:24039936
[Big data in official statistics].

PubMed

Zwick, Markus

2015-08-01

The concept of "big data" stands to change the face of official statistics over the coming years, having an impact on almost all aspects of data production. The tasks of future statisticians will not necessarily be to produce new data, but rather to identify and make use of existing data to adequately describe social and economic phenomena. Until big data can be used correctly in official statistics, a lot of questions need to be answered and problems solved: the quality of data, data protection, privacy, and the sustainable availability are some of the more pressing issues to be addressed. The essential skills of official statisticians will undoubtedly change, and this implies a number of challenges to be faced by statistical education systems, in universities, and inside the statistical offices. The national statistical offices of the European Union have concluded a concrete strategy for exploring the possibilities of big data for official statistics, by means of the Big Data Roadmap and Action Plan 1.0. This is an important first step and will have a significant influence on implementing the concept of big data inside the statistical offices of Germany.
Statistical Genomic Approach Identifies Association between FSHR Polymorphisms and Polycystic Ovary Morphology in Women with Polycystic Ovary Syndrome

PubMed Central

Du, Tao; Duan, Yu; Li, Kaiwen; Zhao, Xiaomiao; Ni, Renmin; Li, Yu; Yang, Dongzi

2015-01-01

Background. Single-nucleotide polymorphisms (SNPs) in the follicle stimulating hormone receptor (FSHR) gene are associated with PCOS. However, their relationship to the polycystic ovary (PCO) morphology remains unknown. This study aimed to investigate whether PCOS related SNPs in the FSHR gene are associated with PCO in women with PCOS. Methods. Patients were grouped into PCO (n = 384) and non-PCO (n = 63) groups. Genomic genotypes were profiled using Affymetrix human genome SNP chip 6. Two polymorphisms (rs2268361 and rs2349415) of FSHR were analyzed using a statistical approach. Results. Significant differences were found in the allele distributions of the GG genotype of rs2268361 between the PCO and non-PCO groups (27.6% GG, 53.4% GA, and 19.0% AA versus 33.3% GG, 36.5% GA, and 30.2% AA), while no significant differences were found in the allele distributions of the GG genotype of rs2349415. When rs2268361 was considered, there were statistically significant differences of serum follicle stimulating hormone, estradiol, and sex hormone binding globulin between genotypes in the PCO group. In case of the rs2349415 SNP, only serum sex hormone binding globulin was statistically different between genotypes in the PCO group. Conclusions. Functional variants in FSHR gene may contribute to PCO susceptibility in women with PCOS. PMID:26273622
A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations

PubMed Central

Zhang, Han; Wheeler, William; Hyland, Paula L.; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

2016-01-01

Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs. PMID:27362418
A Powerful Procedure for Pathway-Based Meta-analysis Using Summary Statistics Identifies 43 Pathways Associated with Type II Diabetes in European Populations.

PubMed

Zhang, Han; Wheeler, William; Hyland, Paula L; Yang, Yifan; Shi, Jianxin; Chatterjee, Nilanjan; Yu, Kai

2016-06-01

Meta-analysis of multiple genome-wide association studies (GWAS) has become an effective approach for detecting single nucleotide polymorphism (SNP) associations with complex traits. However, it is difficult to integrate the readily accessible SNP-level summary statistics from a meta-analysis into more powerful multi-marker testing procedures, which generally require individual-level genetic data. We developed a general procedure called Summary based Adaptive Rank Truncated Product (sARTP) for conducting gene and pathway meta-analysis that uses only SNP-level summary statistics in combination with genotype correlation estimated from a panel of individual-level genetic data. We demonstrated the validity and power advantage of sARTP through empirical and simulated data. We conducted a comprehensive pathway-based meta-analysis with sARTP on type 2 diabetes (T2D) by integrating SNP-level summary statistics from two large studies consisting of 19,809 T2D cases and 111,181 controls with European ancestry. Among 4,713 candidate pathways from which genes in neighborhoods of 170 GWAS established T2D loci were excluded, we detected 43 T2D globally significant pathways (with Bonferroni corrected p-values < 0.05), which included the insulin signaling pathway and T2D pathway defined by KEGG, as well as the pathways defined according to specific gene expression patterns on pancreatic adenocarcinoma, hepatocellular carcinoma, and bladder carcinoma. Using summary data from 8 eastern Asian T2D GWAS with 6,952 cases and 11,865 controls, we showed 7 out of the 43 pathways identified in European populations remained to be significant in eastern Asians at the false discovery rate of 0.1. We created an R package and a web-based tool for sARTP with the capability to analyze pathways with thousands of genes and tens of thousands of SNPs.
IGESS: a statistical approach to integrating individual-level genotype data and summary statistics in genome-wide association studies.

PubMed

Dai, Mingwei; Ming, Jingsi; Cai, Mingxuan; Liu, Jin; Yang, Can; Wan, Xiang; Xu, Zongben

2017-09-15

Results from genome-wide association studies (GWAS) suggest that a complex phenotype is often affected by many variants with small effects, known as 'polygenicity'. Tens of thousands of samples are often required to ensure statistical power of identifying these variants with small effects. However, it is often the case that a research group can only get approval for the access to individual-level genotype data with a limited sample size (e.g. a few hundreds or thousands). Meanwhile, summary statistics generated using single-variant-based analysis are becoming publicly available. The sample sizes associated with the summary statistics datasets are usually quite large. How to make the most efficient use of existing abundant data resources largely remains an open question. In this study, we propose a statistical approach, IGESS, to increasing statistical power of identifying risk variants and improving accuracy of risk prediction by i ntegrating individual level ge notype data and s ummary s tatistics. An efficient algorithm based on variational inference is developed to handle the genome-wide analysis. Through comprehensive simulation studies, we demonstrated the advantages of IGESS over the methods which take either individual-level data or summary statistics data as input. We applied IGESS to perform integrative analysis of Crohns Disease from WTCCC and summary statistics from other studies. IGESS was able to significantly increase the statistical power of identifying risk variants and improve the risk prediction accuracy from 63.2% ( ±0.4% ) to 69.4% ( ±0.1% ) using about 240 000 variants. The IGESS software is available at https://github.com/daviddaigithub/IGESS . zbxu@xjtu.edu.cn or xwan@comp.hkbu.edu.hk or eeyang@hkbu.edu.hk. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
Synthetic velocity gradient tensors and the identification of statistically significant aspects of the structure of turbulence

NASA Astrophysics Data System (ADS)

Keylock, Christopher J.

2017-08-01

A method is presented for deriving random velocity gradient tensors given a source tensor. These synthetic tensors are constrained to lie within mathematical bounds of the non-normality of the source tensor, but we do not impose direct constraints upon scalar quantities typically derived from the velocity gradient tensor and studied in fluid mechanics. Hence, it becomes possible to ask hypotheses of data at a point regarding the statistical significance of these scalar quantities. Having presented our method and the associated mathematical concepts, we apply it to homogeneous, isotropic turbulence to test the utility of the approach for a case where the behavior of the tensor is understood well. We show that, as well as the concentration of data along the Vieillefosse tail, actual turbulence is also preferentially located in the quadrant where there is both excess enstrophy (Q>0 ) and excess enstrophy production (R<0 ). We also examine the topology implied by the strain eigenvalues and find that for the statistically significant results there is a particularly strong relative preference for the formation of disklike structures in the (Q<0 ,R<0 ) quadrant. With the method shown to be useful for a turbulence that is already understood well, it should be of even greater utility for studying complex flows seen in industry and the environment.
Avalanche Statistics Identify Intrinsic Stellar Processes near Criticality in KIC 8462852

NASA Astrophysics Data System (ADS)

Sheikh, Mohammed A.; Weaver, Richard L.; Dahmen, Karin A.

2016-12-01

The star KIC8462852 (Tabby's star) has shown anomalous drops in light flux. We perform a statistical analysis of the more numerous smaller dimming events by using methods found useful for avalanches in ferromagnetism and plastic flow. Scaling exponents for avalanche statistics and temporal profiles of the flux during the dimming events are close to mean field predictions. Scaling collapses suggest that this star may be near a nonequilibrium critical point. The large events are interpreted as avalanches marked by modified dynamics, limited by the system size, and not within the scaling regime.
Statistical Significance of Periodicity and Log-Periodicity with Heavy-Tailed Correlated Noise

NASA Astrophysics Data System (ADS)

Zhou, Wei-Xing; Sornette, Didier

We estimate the probability that random noise, of several plausible standard distributions, creates a false alarm that a periodicity (or log-periodicity) is found in a time series. The solution of this problem is already known for independent Gaussian distributed noise. We investigate more general situations with non-Gaussian correlated noises and present synthetic tests on the detectability and statistical significance of periodic components. A periodic component of a time series is usually detected by some sort of Fourier analysis. Here, we use the Lomb periodogram analysis, which is suitable and outperforms Fourier transforms for unevenly sampled time series. We examine the false-alarm probability of the largest spectral peak of the Lomb periodogram in the presence of power-law distributed noises, of short-range and of long-range fractional-Gaussian noises. Increasing heavy-tailness (respectively correlations describing persistence) tends to decrease (respectively increase) the false-alarm probability of finding a large spurious Lomb peak. Increasing anti-persistence tends to decrease the false-alarm probability. We also study the interplay between heavy-tailness and long-range correlations. In order to fully determine if a Lomb peak signals a genuine rather than a spurious periodicity, one should in principle characterize the Lomb peak height, its width and its relations to other peaks in the complete spectrum. As a step towards this full characterization, we construct the joint-distribution of the frequency position (relative to other peaks) and of the height of the highest peak of the power spectrum. We also provide the distributions of the ratio of the highest Lomb peak to the second highest one. Using the insight obtained by the present statistical study, we re-examine previously reported claims of ``log-periodicity'' and find that the credibility for log-periodicity in 2D-freely decaying turbulence is weakened while it is strengthened for fracture, for the
Statistical trend analysis and extreme distribution of significant wave height from 1958 to 1999 - an application to the Italian Seas

NASA Astrophysics Data System (ADS)

Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.

2010-06-01

The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 80's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The r-largest annual maxima method provides more reliable predictions of the extreme values especially for small return periods (<100 years). Finally, the study statistically proves the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.
On the statistical significance of excess events: Remarks of caution and the need for a standard method of calculation

NASA Technical Reports Server (NTRS)

Staubert, R.

1985-01-01

Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.
Genome-wide association study for rotator cuff tears identifies two significant single-nucleotide polymorphisms.

PubMed

Tashjian, Robert Z; Granger, Erin K; Farnham, James M; Cannon-Albright, Lisa A; Teerlink, Craig C

2016-02-01

The precise etiology of rotator cuff disease is unknown, but prior evidence suggests a role for genetic factors. Limited data exist identifying specific genes associated with rotator cuff tearing. The purpose of this study was to identify specific genes or genetic variants associated with rotator cuff tearing by a genome-wide association study with an independent set of rotator cuff tear cases. A set of 311 full-thickness rotator cuff tear cases genotyped on the Illumina 5M single-nucleotide polymorphism (SNP) platform were used in a genome-wide association study with 2641 genetically matched white population controls available from the Illumina iControls database. Tests of association were performed with GEMMA software at 257,558 SNPs that compose the intersection of Illumina SNP platforms and that passed general quality control metrics. SNPs were considered significant if P < 1.94 × 10(-7) (Bonferroni correction: 0.05/257,558). Tests of association revealed 2 significantly associated SNPs, one occurring in SAP30BP (rs820218; P = 3.8E-9) on chromosome 17q25 and another occurring in SASH1 (rs12527089; P = 1.9E-7) on chromosome 6q24. This study represents the first attempt to identify genetic factors influencing rotator cuff tearing by a genome-wide association study using a dense/complete set of SNPs. Two SNPs were significantly associated with rotator cuff tearing, residing in SAP30BP on chromosome 17 and SASH1 on chromosome 6. Both genes are associated with the cellular process of apoptosis. Identification of potential genes or genetic variants associated with rotator cuff tearing may help in identifying individuals at risk for the development of rotator cuff tearing. Copyright © 2016 Journal of Shoulder and Elbow Surgery Board of Trustees. Published by Elsevier Inc. All rights reserved.
40 CFR 141.723 - Requirements to respond to significant deficiencies identified in sanitary surveys performed by EPA.

Code of Federal Regulations, 2011 CFR

2011-07-01

... deficiencies identified in sanitary surveys performed by EPA. 141.723 Section 141.723 Protection of Environment... REGULATIONS Enhanced Treatment for Cryptosporidium Requirements for Sanitary Surveys Performed by Epa § 141.723 Requirements to respond to significant deficiencies identified in sanitary surveys performed by...
40 CFR 141.723 - Requirements to respond to significant deficiencies identified in sanitary surveys performed by EPA.

Code of Federal Regulations, 2014 CFR

2014-07-01

... deficiencies identified in sanitary surveys performed by EPA. 141.723 Section 141.723 Protection of Environment... REGULATIONS Enhanced Treatment for Cryptosporidium Requirements for Sanitary Surveys Performed by Epa § 141.723 Requirements to respond to significant deficiencies identified in sanitary surveys performed by...
40 CFR 141.723 - Requirements to respond to significant deficiencies identified in sanitary surveys performed by EPA.

Code of Federal Regulations, 2013 CFR

2013-07-01

... deficiencies identified in sanitary surveys performed by EPA. 141.723 Section 141.723 Protection of Environment... REGULATIONS Enhanced Treatment for Cryptosporidium Requirements for Sanitary Surveys Performed by Epa § 141.723 Requirements to respond to significant deficiencies identified in sanitary surveys performed by...

40 CFR 141.723 - Requirements to respond to significant deficiencies identified in sanitary surveys performed by EPA.

Code of Federal Regulations, 2012 CFR

2012-07-01

... deficiencies identified in sanitary surveys performed by EPA. 141.723 Section 141.723 Protection of Environment... REGULATIONS Enhanced Treatment for Cryptosporidium Requirements for Sanitary Surveys Performed by Epa § 141.723 Requirements to respond to significant deficiencies identified in sanitary surveys performed by...
40 CFR 141.723 - Requirements to respond to significant deficiencies identified in sanitary surveys performed by EPA.

Code of Federal Regulations, 2010 CFR

2010-07-01

... deficiencies identified in sanitary surveys performed by EPA. 141.723 Section 141.723 Protection of Environment... REGULATIONS Enhanced Treatment for Cryptosporidium Requirements for Sanitary Surveys Performed by Epa § 141.723 Requirements to respond to significant deficiencies identified in sanitary surveys performed by...
A tree-based statistical classification algorithm (CHAID) for identifying variables responsible for the occurrence of faecal indicator bacteria during waterworks operations

NASA Astrophysics Data System (ADS)

Bichler, Andrea; Neumaier, Arnold; Hofmann, Thilo

2014-11-01

Microbial contamination of groundwater used for drinking water can affect public health and is of major concern to local water authorities and water suppliers. Potential hazards need to be identified in order to protect raw water resources. We propose a non-parametric data mining technique for exploring the presence of total coliforms (TC) in a groundwater abstraction well and its relationship to readily available, continuous time series of hydrometric monitoring parameters (seven year records of precipitation, river water levels, and groundwater heads). The original monitoring parameters were used to create an extensive generic dataset of explanatory variables by considering different accumulation or averaging periods, as well as temporal offsets of the explanatory variables. A classification tree based on the Chi-Squared Automatic Interaction Detection (CHAID) recursive partitioning algorithm revealed statistically significant relationships between precipitation and the presence of TC in both a production well and a nearby monitoring well. Different secondary explanatory variables were identified for the two wells. Elevated water levels and short-term water table fluctuations in the nearby river were found to be associated with TC in the observation well. The presence of TC in the production well was found to relate to elevated groundwater heads and fluctuations in groundwater levels. The generic variables created proved useful for increasing significance levels. The tree-based model was used to predict the occurrence of TC on the basis of hydrometric variables.
The sumLINK statistic for genetic linkage analysis in the presence of heterogeneity.

PubMed

Christensen, G B; Knight, S; Camp, N J

2009-11-01

We present the "sumLINK" statistic--the sum of multipoint LOD scores for the subset of pedigrees with nominally significant linkage evidence at a given locus--as an alternative to common methods to identify susceptibility loci in the presence of heterogeneity. We also suggest the "sumLOD" statistic (the sum of positive multipoint LOD scores) as a companion to the sumLINK. sumLINK analysis identifies genetic regions of extreme consistency across pedigrees without regard to negative evidence from unlinked or uninformative pedigrees. Significance is determined by an innovative permutation procedure based on genome shuffling that randomizes linkage information across pedigrees. This procedure for generating the empirical null distribution may be useful for other linkage-based statistics as well. Using 500 genome-wide analyses of simulated null data, we show that the genome shuffling procedure results in the correct type 1 error rates for both the sumLINK and sumLOD. The power of the statistics was tested using 100 sets of simulated genome-wide data from the alternative hypothesis from GAW13. Finally, we illustrate the statistics in an analysis of 190 aggressive prostate cancer pedigrees from the International Consortium for Prostate Cancer Genetics, where we identified a new susceptibility locus. We propose that the sumLINK and sumLOD are ideal for collaborative projects and meta-analyses, as they do not require any sharing of identifiable data between contributing institutions. Further, loci identified with the sumLINK have good potential for gene localization via statistical recombinant mapping, as, by definition, several linked pedigrees contribute to each peak.
Protein Sectors: Statistical Coupling Analysis versus Conservation

PubMed Central

Teşileanu, Tiberiu; Colwell, Lucy J.; Leibler, Stanislas

2015-01-01

Statistical coupling analysis (SCA) is a method for analyzing multiple sequence alignments that was used to identify groups of coevolving residues termed “sectors”. The method applies spectral analysis to a matrix obtained by combining correlation information with sequence conservation. It has been asserted that the protein sectors identified by SCA are functionally significant, with different sectors controlling different biochemical properties of the protein. Here we reconsider the available experimental data and note that it involves almost exclusively proteins with a single sector. We show that in this case sequence conservation is the dominating factor in SCA, and can alone be used to make statistically equivalent functional predictions. Therefore, we suggest shifting the experimental focus to proteins for which SCA identifies several sectors. Correlations in protein alignments, which have been shown to be informative in a number of independent studies, would then be less dominated by sequence conservation. PMID:25723535
Sigsearch: a new term for post hoc unplanned search for statistically significant relationships with the intent to create publishable findings.

PubMed

Hashim, Muhammad Jawad

2010-09-01

Post-hoc secondary data analysis with no prespecified hypotheses has been discouraged by textbook authors and journal editors alike. Unfortunately no single term describes this phenomenon succinctly. I would like to coin the term "sigsearch" to define this practice and bring it within the teaching lexicon of statistics courses. Sigsearch would include any unplanned, post-hoc search for statistical significance using multiple comparisons of subgroups. It would also include data analysis with outcomes other than the prespecified primary outcome measure of a study as well as secondary data analyses of earlier research.
To Identify the Important Soil Properties Affecting Dinoseb Adsorption with Statistical Analysis

PubMed Central

Guan, Yiqing; Wei, Jianhui; Zhang, Danrong; Zu, Mingjuan; Zhang, Liru

2013-01-01

Investigating the influences of soil characteristic factors on dinoseb adsorption parameter with different statistical methods would be valuable to explicitly figure out the extent of these influences. The correlation coefficients and the direct, indirect effects of soil characteristic factors on dinoseb adsorption parameter were analyzed through bivariate correlation analysis, and path analysis. With stepwise regression analysis the factors which had little influence on the adsorption parameter were excluded. Results indicate that pH and CEC had moderate relationship and lower direct effect on dinoseb adsorption parameter due to the multicollinearity with other soil factors, and organic carbon and clay contents were found to be the most significant soil factors which affect the dinoseb adsorption process. A regression is thereby set up to explore the relationship between the dinoseb adsorption parameter and the two soil factors: the soil organic carbon and clay contents. A 92% of the variation of dinoseb sorption coefficient could be attributed to the variation of the soil organic carbon and clay contents. PMID:23737715
Statistical Analysis of Big Data on Pharmacogenomics

PubMed Central

Fan, Jianqing; Liu, Han

2013-01-01

This paper discusses statistical methods for estimating complex correlation structure from large pharmacogenomic datasets. We selectively review several prominent statistical methods for estimating large covariance matrix for understanding correlation structure, inverse covariance matrix for network modeling, large-scale simultaneous tests for selecting significantly differently expressed genes and proteins and genetic markers for complex diseases, and high dimensional variable selection for identifying important molecules for understanding molecule mechanisms in pharmacogenomics. Their applications to gene network estimation and biomarker selection are used to illustrate the methodological power. Several new challenges of Big data analysis, including complex data distribution, missing data, measurement error, spurious correlation, endogeneity, and the need for robust statistical methods, are also discussed. PMID:23602905
Genomic Characterization of Vulvar (Pre)cancers Identifies Distinct Molecular Subtypes with Prognostic Significance.

PubMed

Nooij, Linda S; Ter Haar, Natalja T; Ruano, Dina; Rakislova, Natalia; van Wezel, Tom; Smit, Vincent T H B M; Trimbos, Baptist J B M Z; Ordi, Jaume; van Poelgeest, Mariette I E; Bosse, Tjalling

2017-11-15

Purpose: Vulvar cancer (VC) can be subclassified by human papillomavirus (HPV) status. HPV-negative VCs frequently harbor TP53 mutations; however, in-depth analysis of other potential molecular genetic alterations is lacking. We comprehensively assessed somatic mutations in a large series of vulvar (pre)cancers. Experimental Design: We performed targeted next-generation sequencing (17 genes), p53 immunohistochemistry and HPV testing on 36 VC and 82 precursors (sequencing cohort). Subsequently, the prognostic significance of the three subtypes identified in the sequencing cohort was assessed in a series of 236 VC patients (follow-up cohort). Results: Frequent recurrent mutations were identified in HPV-negative vulvar (pre)cancers in TP53 (42% and 68%), NOTCH1 (28% and 41%), and HRAS (20% and 31%). Mutation frequency in HPV-positive vulvar (pre)cancers was significantly lower ( P = 0.001). Furthermore, a substantial subset of the HPV-negative precursors (35/60, 58.3%) and VC (10/29, 34.5%) were TP53 wild-type (wt), suggesting a third, not-previously described, molecular subtype. Clinical outcomes in the three different subtypes (HPV + , HPV - /p53wt, HPV - /p53abn) were evaluated in a follow-up cohort consisting of 236 VC patients. Local recurrence rate was 5.3% for HPV + , 16.3% for HPV - /p53wt and 22.6% for HPV - /p53abn tumors ( P = 0.044). HPV positivity remained an independent prognostic factor for favorable outcome in the multivariable analysis ( P = 0.020). Conclusions: HPV - and HPV + vulvar (pre)cancers display striking differences in somatic mutation patterns. HPV - /p53wt VC appear to be a distinct clinicopathologic subgroup with frequent NOTCH1 mutations. HPV + VC have a significantly lower local recurrence rate, independent of clinicopathological variables, opening opportunities for reducing overtreatment in VC. Clin Cancer Res; 23(22); 6781-9. ©2017 AACR . ©2017 American Association for Cancer Research.
Business Statistics Education: Content and Software in Undergraduate Business Statistics Courses.

ERIC Educational Resources Information Center

Tabatabai, Manouchehr; Gamble, Ralph

1997-01-01

Survey responses from 204 of 500 business schools identified most often topics in business statistics I and II courses. The most popular software at both levels was Minitab. Most schools required both statistics I and II. (SK)
Scalable detection of statistically significant communities and hierarchies, using message passing for modularity

PubMed Central

Zhang, Pan; Moore, Cristopher

2014-01-01

Modularity is a popular measure of community structure. However, maximizing the modularity can lead to many competing partitions, with almost the same modularity, that are poorly correlated with each other. It can also produce illusory ‘‘communities’’ in random graphs where none exist. We address this problem by using the modularity as a Hamiltonian at finite temperature and using an efficient belief propagation algorithm to obtain the consensus of many partitions with high modularity, rather than looking for a single partition that maximizes it. We show analytically and numerically that the proposed algorithm works all of the way down to the detectability transition in networks generated by the stochastic block model. It also performs well on real-world networks, revealing large communities in some networks where previous work has claimed no communities exist. Finally we show that by applying our algorithm recursively, subdividing communities until no statistically significant subcommunities can be found, we can detect hierarchical structure in real-world networks more efficiently than previous methods. PMID:25489096
Common Scientific and Statistical Errors in Obesity Research

PubMed Central

George, Brandon J.; Beasley, T. Mark; Brown, Andrew W.; Dawson, John; Dimova, Rositsa; Divers, Jasmin; Goldsby, TaShauna U.; Heo, Moonseong; Kaiser, Kathryn A.; Keith, Scott; Kim, Mimi Y.; Li, Peng; Mehta, Tapan; Oakes, J. Michael; Skinner, Asheley; Stuart, Elizabeth; Allison, David B.

2015-01-01

We identify 10 common errors and problems in the statistical analysis, design, interpretation, and reporting of obesity research and discuss how they can be avoided. The 10 topics are: 1) misinterpretation of statistical significance, 2) inappropriate testing against baseline values, 3) excessive and undisclosed multiple testing and “p-value hacking,” 4) mishandling of clustering in cluster randomized trials, 5) misconceptions about nonparametric tests, 6) mishandling of missing data, 7) miscalculation of effect sizes, 8) ignoring regression to the mean, 9) ignoring confirmation bias, and 10) insufficient statistical reporting. We hope that discussion of these errors can improve the quality of obesity research by helping researchers to implement proper statistical practice and to know when to seek the help of a statistician. PMID:27028280
I Cannot Read My Statistics Textbook: The Relationship between Reading Ability and Statistics Anxiety

ERIC Educational Resources Information Center

Collins, Kathleen M. T.; Onwuegbuzie, Anthony J.

2007-01-01

Although several antecedents of statistics anxiety have been identified, many of these factors are relatively immutable (e.g., gender) and, at best, identify students who are at risk for debilitative levels of statistics anxiety, thereby having only minimal implications for intervention. Furthermore, the few interventions that have been designed…
The challenge of identifying greenhouse gas-induced climatic change

NASA Technical Reports Server (NTRS)

Maccracken, Michael C.

1992-01-01

Meeting the challenge of identifying greenhouse gas-induced climatic change involves three steps. First, observations of critical variables must be assembled, evaluated, and analyzed to determine that there has been a statistically significant change. Second, reliable theoretical (model) calculations must be conducted to provide a definitive set of changes for which to search. Third, a quantitative and statistically significant association must be made between the projected and observed changes to exclude the possibility that the changes are due to natural variability or other factors. This paper provides a qualitative overview of scientific progress in successfully fulfilling these three steps.
Identifying the controls of wildfire activity in Namibia using multivariate statistics

NASA Astrophysics Data System (ADS)

Mayr, Manuel; Le Roux, Johan; Samimi, Cyrus

2015-04-01

Despite large areas of Namibia being unaffected by fires due to aridity, substantial burning in the northern and north-eastern parts of the country is observed every year. Within the fire-affected regions, a strong spatial and inter-annual variability characterizes the dry-season fire situation. In order to understand these patterns, it appears critical to identify the causative factors behind fire occurrence and to examine their interactions in detail. Furthermore, most studies dealing with causative factor examination focus either on the local or the regional scale. However, these scales seem to be inappropriate from a management perspective, as fire-related strategic action plans are most often set up nationwide. Here, we will present an examination of the fire regimes of Namibia based on a dataset conducted by Le Roux (2011). A decade-spanning fire record (1994-2003) derived from NOAA's Advanced Very High Resolution Radiometer (AVHRR) imagery was used to generate four fire regime metrics (Burned Area, Fire Season Length, Month of Peak Fire Season, and Fire Return Period) and quantitative information on vegetation and phenology derived from Normalized Difference Vegetation Index (NDVI) time series. Further variables contained by this dataset are related to climate, biodiversity, and human activities. Le Roux (2011) analyzed the correlations between the fire metrics mentioned above and the predictor variables. We hypothesize that linear correlations (as estimated by correlation coefficients) simplify the interactions between response and predictor variables. For instance, moderate population densities could induce the highest number of fires, whereas the complete absence of humans lacks one major source of ignition. Around highly populated areas, in contrary, fuels are usually reduced and space is more fragmented - thus, the initiation and spread of a potential fire could as well be inhibited. From a total of over 40 explanatory variables, we will initially use
On the Use of Biomineral Oxygen Isotope Data to Identify Human Migrants in the Archaeological Record: Intra-Sample Variation, Statistical Methods and Geographical Considerations

PubMed Central

Lightfoot, Emma; O’Connell, Tamsin C.

2016-01-01

Oxygen isotope analysis of archaeological skeletal remains is an increasingly popular tool to study past human migrations. It is based on the assumption that human body chemistry preserves the δ18O of precipitation in such a way as to be a useful technique for identifying migrants and, potentially, their homelands. In this study, the first such global survey, we draw on published human tooth enamel and bone bioapatite data to explore the validity of using oxygen isotope analyses to identify migrants in the archaeological record. We use human δ18O results to show that there are large variations in human oxygen isotope values within a population sample. This may relate to physiological factors influencing the preservation of the primary isotope signal, or due to human activities (such as brewing, boiling, stewing, differential access to water sources and so on) causing variation in ingested water and food isotope values. We compare the number of outliers identified using various statistical methods. We determine that the most appropriate method for identifying migrants is dependent on the data but is likely to be the IQR or median absolute deviation from the median under most archaeological circumstances. Finally, through a spatial assessment of the dataset, we show that the degree of overlap in human isotope values from different locations across Europe is such that identifying individuals’ homelands on the basis of oxygen isotope analysis alone is not possible for the regions analysed to date. Oxygen isotope analysis is a valid method for identifying first-generation migrants from an archaeological site when used appropriately, however it is difficult to identify migrants using statistical methods for a sample size of less than c. 25 individuals. In the absence of local previous analyses, each sample should be treated as an individual dataset and statistical techniques can be used to identify migrants, but in most cases pinpointing a specific homeland should
HYPE: a WFD tool for the identification of significant and sustained upward trends in groundwater time series

NASA Astrophysics Data System (ADS)

Lopez, Benjamin; Croiset, Nolwenn; Laurence, Gourcy

2014-05-01

The Water Framework Directive 2006/11/CE (WFD) on the protection of groundwater against pollution and deterioration asks Member States to identify significant and sustained upward trends in all bodies or groups of bodies of groundwater that are characterised as being at risk in accordance with Annex II to Directive 2000/60/EC. The Directive indicates that the procedure for the identification of significant and sustained upward trends must be based on a statistical method. Moreover, for significant increases of concentrations of pollutants, trend reversals are identified as being necessary. This means to be able to identify significant trend reversals. A specific tool, named HYPE, has been developed in order to help stakeholders working on groundwater trend assessment. The R encoded tool HYPE provides statistical analysis of groundwater time series. It follows several studies on the relevancy of the use of statistical tests on groundwater data series (Lopez et al., 2011) and other case studies on the thematic (Bourgine et al., 2012). It integrates the most powerful and robust statistical tests for hydrogeological applications. HYPE is linked to the French national database on groundwater data (ADES). So monitoring data gathered by the Water Agencies can be directly processed. HYPE has two main modules: - a characterisation module, which allows to visualize time series. HYPE calculates the main statistical characteristics and provides graphical representations; - a trend module, which identifies significant breaks, trends and trend reversals in time series, providing result table and graphical representation (cf figure). Additional modules are also implemented to identify regional and seasonal trends and to sample time series in a relevant way. HYPE has been used successfully in 2012 by the French Water Agencies to satisfy requirements of the WFD, concerning characterization of groundwater bodies' qualitative status and evaluation of the risk of non-achievement of
Combining the Power of Statistical Analyses and Community Interviews to Identify Adoption Barriers for Stormwater Best-Management Practices

NASA Astrophysics Data System (ADS)

Hoover, F. A.; Bowling, L. C.; Prokopy, L. S.

2015-12-01

Urban stormwater is an on-going management concern in municipalities of all sizes. In both combined or separated sewer systems, pollutants from stormwater runoff enter the natural waterway system during heavy rain events. Urban flooding during frequent and more intense storms are also a growing concern. Therefore, stormwater best-management practices (BMPs) are being implemented in efforts to reduce and manage stormwater pollution and overflow. The majority of BMP water quality studies focus on the small-scale, individual effects of the BMP, and the change in water quality directly from the runoff of these infrastructures. At the watershed scale, it is difficult to establish statistically whether or not these BMPs are making a difference in water quality, given that watershed scale monitoring is often costly and time consuming, relying on significant sources of funds, which a city may not have. Hence, there is a need to quantify the level of sampling needed to detect the water quality impact of BMPs at the watershed scale. In this study, a power analysis was performed on data from an urban watershed in Lafayette, Indiana, to determine the frequency of sampling required to detect a significant change in water quality measurements. Using the R platform, results indicate that detecting a significant change in watershed level water quality would require hundreds of weekly measurements, even when improvement is present. The second part of this study investigates whether the difficulty in demonstrating water quality change represents a barrier to adoption of stormwater BMPs. Semi-structured interviews of community residents and organizations in Chicago, IL are being used to investigate residents understanding of water quality and best management practices and identify their attitudes and perceptions towards stormwater BMPs. Second round interviews will examine how information on uncertainty in water quality improvements influences their BMP attitudes and perceptions.
The environment associated with significant tornadoes in Bangladesh

NASA Astrophysics Data System (ADS)

Bikos, Dan; Finch, Jonathan; Case, Jonathan L.

2016-01-01

This paper investigates the environmental parameters favoring significant tornadoes in Bangladesh through a simulation of ten high-impact events. A climatological perspective is first presented on classifying significant tornadoes in Bangladesh, noting the challenges since reports of tornadoes are not documented in a formal manner. The statistical relationship between United States and Bangladesh tornado-related deaths suggests that significant tornadoes do occur in Bangladesh so this paper identifies the most significant tornadic events and analyzes the environmental conditions associated with these events. Given the scarcity of observational data to assess the near-storm environment in this region, high-resolution (3-km horizontal grid spacing) numerical weather prediction simulations are performed for events identified to be associated with a significant tornado. In comparison to similar events over the United States, significant tornado environments in Bangladesh are characterized by relatively high convective available potential energy, sufficient deep-layer vertical shear, and a propensity for deviant (i.e., well to the right of the mean flow) storm motion along a low-level convergence boundary.
28 CFR 22.21 - Use of identifiable data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... STATISTICAL INFORMATION § 22.21 Use of identifiable data. Research or statistical information identifiable to a private person may be used only for research or statistical purposes. ... 28 Judicial Administration 1 2010-07-01 2010-07-01 false Use of identifiable data. 22.21 Section...

Wildfire cluster detection using space-time scan statistics

NASA Astrophysics Data System (ADS)

Tonini, M.; Tuia, D.; Ratle, F.; Kanevski, M.

2009-04-01

The aim of the present study is to identify spatio-temporal clusters of fires sequences using space-time scan statistics. These statistical methods are specifically designed to detect clusters and assess their significance. Basically, scan statistics work by comparing a set of events occurring inside a scanning window (or a space-time cylinder for spatio-temporal data) with those that lie outside. Windows of increasing size scan the zone across space and time: the likelihood ratio is calculated for each window (comparing the ratio "observed cases over expected" inside and outside): the window with the maximum value is assumed to be the most probable cluster, and so on. Under the null hypothesis of spatial and temporal randomness, these events are distributed according to a known discrete-state random process (Poisson or Bernoulli), which parameters can be estimated. Given this assumption, it is possible to test whether or not the null hypothesis holds in a specific area. In order to deal with fires data, the space-time permutation scan statistic has been applied since it does not require the explicit specification of the population-at risk in each cylinder. The case study is represented by Florida daily fire detection using the Moderate Resolution Imaging Spectroradiometer (MODIS) active fire product during the period 2003-2006. As result, statistically significant clusters have been identified. Performing the analyses over the entire frame period, three out of the five most likely clusters have been identified in the forest areas, on the North of the country; the other two clusters cover a large zone in the South, corresponding to agricultural land and the prairies in the Everglades. Furthermore, the analyses have been performed separately for the four years to analyze if the wildfires recur each year during the same period. It emerges that clusters of forest fires are more frequent in hot seasons (spring and summer), while in the South areas they are widely
Statistical trend analysis and extreme distribution of significant wave height from 1958 to 1999 - an application to the Italian Seas

NASA Astrophysics Data System (ADS)

Martucci, G.; Carniel, S.; Chiggiato, J.; Sclavo, M.; Lionello, P.; Galati, M. B.

2009-09-01

The study is a statistical analysis of sea states timeseries derived using the wave model WAM forced by the ERA-40 dataset in selected areas near the Italian coasts. For the period 1 January 1958 to 31 December 1999 the analysis yields: (i) the existence of a negative trend in the annual- and winter-averaged sea state heights; (ii) the existence of a turning-point in late 70's in the annual-averaged trend of sea state heights at a site in the Northern Adriatic Sea; (iii) the overall absence of a significant trend in the annual-averaged mean durations of sea states over thresholds; (iv) the assessment of the extreme values on a time-scale of thousand years. The analysis uses two methods to obtain samples of extremes from the independent sea states: the r-largest annual maxima and the peak-over-threshold. The two methods show statistical differences in retrieving the return values and more generally in describing the significant wave field. The study shows the existence of decadal negative trends in the significant wave heights and by this it conveys useful information on the wave climatology of the Italian seas during the second half of the 20th century.
Identifying irregularly shaped crime hot-spots using a multiobjective evolutionary algorithm

NASA Astrophysics Data System (ADS)

Wu, Xiaolan; Grubesic, Tony H.

2010-12-01

Spatial cluster detection techniques are widely used in criminology, geography, epidemiology, and other fields. In particular, spatial scan statistics are popular and efficient techniques for detecting areas of elevated crime or disease events. The majority of spatial scan approaches attempt to delineate geographic zones by evaluating the significance of clusters using likelihood ratio statistics tested with the Poisson distribution. While this can be effective, many scan statistics give preference to circular clusters, diminishing their ability to identify elongated and/or irregular shaped clusters. Although adjusting the shape of the scan window can mitigate some of these problems, both the significance of irregular clusters and their spatial structure must be accounted for in a meaningful way. This paper utilizes a multiobjective evolutionary algorithm to find clusters with maximum significance while quantitatively tracking their geographic structure. Crime data for the city of Cincinnati are utilized to demonstrate the advantages of the new approach and highlight its benefits versus more traditional scan statistics.
SU-F-BRD-05: Dosimetric Comparison of Protocol-Based SBRT Lung Treatment Modalities: Statistically Significant VMAT Advantages Over Fixed- Beam IMRT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Best, R; Harrell, A; Geesey, C

2014-06-15

Purpose: The purpose of this study is to inter-compare and find statistically significant differences between flattened field fixed-beam (FB) IMRT with flattening-filter free (FFF) volumetric modulated arc therapy (VMAT) for stereotactic body radiation therapy SBRT. Methods: SBRT plans using FB IMRT and FFF VMAT were generated for fifteen SBRT lung patients using 6 MV beams. For each patient, both IMRT and VMAT plans were created for comparison. Plans were generated utilizing RTOG 0915 (peripheral, 10 patients) and RTOG 0813 (medial, 5 patients) lung protocols. Target dose, critical structure dose, and treatment time were compared and tested for statistical significance. Parametersmore » of interest included prescription isodose surface coverage, target dose heterogeneity, high dose spillage (location and volume), low dose spillage (location and volume), lung dose spillage, and critical structure maximum- and volumetric-dose limits. Results: For all criteria, we found equivalent or higher conformality with VMAT plans as well as reduced critical structure doses. Several differences passed a Student's t-test of significance: VMAT reduced the high dose spillage, evaluated with conformality index (CI), by an average of 9.4%±15.1% (p=0.030) compared to IMRT. VMAT plans reduced the lung volume receiving 20 Gy by 16.2%±15.0% (p=0.016) compared with IMRT. For the RTOG 0915 peripheral lesions, the volumes of lung receiving 12.4 Gy and 11.6 Gy were reduced by 27.0%±13.8% and 27.5%±12.6% (for both, p<0.001) in VMAT plans. Of the 26 protocol pass/fail criteria, VMAT plans were able to achieve an average of 0.2±0.7 (p=0.026) more constraints than the IMRT plans. Conclusions: FFF VMAT has dosimetric advantages over fixed beam IMRT for lung SBRT. Significant advantages included increased dose conformity, and reduced organs-at-risk doses. The overall improvements in terms of protocol pass/fail criteria were more modest and will require more patient data to establish
Funding source and primary outcome changes in clinical trials registered on ClinicalTrials.gov are associated with the reporting of a statistically significant primary outcome: a cross-sectional study.

PubMed

Ramagopalan, Sreeram V; Skingsley, Andrew P; Handunnetthi, Lahiru; Magnus, Daniel; Klingel, Michelle; Pakpoor, Julia; Goldacre, Ben

2015-01-01

We and others have shown a significant proportion of interventional trials registered on ClinicalTrials.gov have their primary outcomes altered after the listed study start and completion dates. The objectives of this study were to investigate whether changes made to primary outcomes are associated with the likelihood of reporting a statistically significant primary outcome on ClinicalTrials.gov. A cross-sectional analysis of all interventional clinical trials registered on ClinicalTrials.gov as of 20 November 2014 was performed. The main outcome was any change made to the initially listed primary outcome and the time of the change in relation to the trial start and end date. 13,238 completed interventional trials were registered with ClinicalTrials.gov that also had study results posted on the website. 2555 (19.3%) had one or more statistically significant primary outcomes. Statistical analysis showed that registration year, funding source and primary outcome change after trial completion were associated with reporting a statistically significant primary outcome . Funding source and primary outcome change after trial completion are associated with a statistically significant primary outcome report on clinicaltrials.gov.
Relating genes to function: identifying enriched transcription factors using the ENCODE ChIP-Seq significance tool.

PubMed

Auerbach, Raymond K; Chen, Bin; Butte, Atul J

2013-08-01

Biological analysis has shifted from identifying genes and transcripts to mapping these genes and transcripts to biological functions. The ENCODE Project has generated hundreds of ChIP-Seq experiments spanning multiple transcription factors and cell lines for public use, but tools for a biomedical scientist to analyze these data are either non-existent or tailored to narrow biological questions. We present the ENCODE ChIP-Seq Significance Tool, a flexible web application leveraging public ENCODE data to identify enriched transcription factors in a gene or transcript list for comparative analyses. The ENCODE ChIP-Seq Significance Tool is written in JavaScript on the client side and has been tested on Google Chrome, Apple Safari and Mozilla Firefox browsers. Server-side scripts are written in PHP and leverage R and a MySQL database. The tool is available at http://encodeqt.stanford.edu. abutte@stanford.edu Supplementary material is available at Bioinformatics online.
Why significant variables aren't automatically good predictors.

PubMed

Lo, Adeline; Chernoff, Herman; Zheng, Tian; Lo, Shaw-Hwa

2015-11-10

Thus far, genome-wide association studies (GWAS) have been disappointing in the inability of investigators to use the results of identified, statistically significant variants in complex diseases to make predictions useful for personalized medicine. Why are significant variables not leading to good prediction of outcomes? We point out that this problem is prevalent in simple as well as complex data, in the sciences as well as the social sciences. We offer a brief explanation and some statistical insights on why higher significance cannot automatically imply stronger predictivity and illustrate through simulations and a real breast cancer example. We also demonstrate that highly predictive variables do not necessarily appear as highly significant, thus evading the researcher using significance-based methods. We point out that what makes variables good for prediction versus significance depends on different properties of the underlying distributions. If prediction is the goal, we must lay aside significance as the only selection standard. We suggest that progress in prediction requires efforts toward a new research agenda of searching for a novel criterion to retrieve highly predictive variables rather than highly significant variables. We offer an alternative approach that was not designed for significance, the partition retention method, which was very effective predicting on a long-studied breast cancer data set, by reducing the classification error rate from 30% to 8%.
Functional genomics annotation of a statistical epistasis network associated with bladder cancer susceptibility.

PubMed

Hu, Ting; Pan, Qinxin; Andrew, Angeline S; Langer, Jillian M; Cole, Michael D; Tomlinson, Craig R; Karagas, Margaret R; Moore, Jason H

2014-04-11

Several different genetic and environmental factors have been identified as independent risk factors for bladder cancer in population-based studies. Recent studies have turned to understanding the role of gene-gene and gene-environment interactions in determining risk. We previously developed the bioinformatics framework of statistical epistasis networks (SEN) to characterize the global structure of interacting genetic factors associated with a particular disease or clinical outcome. By applying SEN to a population-based study of bladder cancer among Caucasians in New Hampshire, we were able to identify a set of connected genetic factors with strong and significant interaction effects on bladder cancer susceptibility. To support our statistical findings using networks, in the present study, we performed pathway enrichment analyses on the set of genes identified using SEN, and found that they are associated with the carcinogen benzo[a]pyrene, a component of tobacco smoke. We further carried out an mRNA expression microarray experiment to validate statistical genetic interactions, and to determine if the set of genes identified in the SEN were differentially expressed in a normal bladder cell line and a bladder cancer cell line in the presence or absence of benzo[a]pyrene. Significant nonrandom sets of genes from the SEN were found to be differentially expressed in response to benzo[a]pyrene in both the normal bladder cells and the bladder cancer cells. In addition, the patterns of gene expression were significantly different between these two cell types. The enrichment analyses and the gene expression microarray results support the idea that SEN analysis of bladder in population-based studies is able to identify biologically meaningful statistical patterns. These results bring us a step closer to a systems genetic approach to understanding cancer susceptibility that integrates population and laboratory-based studies.
Significant lexical relationships

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pedersen, T.; Kayaalp, M.; Bruce, R.

Statistical NLP inevitably deals with a large number of rare events. As a consequence, NLP data often violates the assumptions implicit in traditional statistical procedures such as significance testing. We describe a significance test, an exact conditional test, that is appropriate for NLP data and can be performed using freely available software. We apply this test to the study of lexical relationships and demonstrate that the results obtained using this test are both theoretically more reliable and different from the results obtained using previously applied tests.
Teaching Biology through Statistics: Application of Statistical Methods in Genetics and Zoology Courses

PubMed Central

Colon-Berlingeri, Migdalisel; Burrowes, Patricia A.

2011-01-01

Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math–biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology. PMID:21885822
Teaching biology through statistics: application of statistical methods in genetics and zoology courses.

PubMed

Colon-Berlingeri, Migdalisel; Burrowes, Patricia A

2011-01-01

Incorporation of mathematics into biology curricula is critical to underscore for undergraduate students the relevance of mathematics to most fields of biology and the usefulness of developing quantitative process skills demanded in modern biology. At our institution, we have made significant changes to better integrate mathematics into the undergraduate biology curriculum. The curricular revision included changes in the suggested course sequence, addition of statistics and precalculus as prerequisites to core science courses, and incorporating interdisciplinary (math-biology) learning activities in genetics and zoology courses. In this article, we describe the activities developed for these two courses and the assessment tools used to measure the learning that took place with respect to biology and statistics. We distinguished the effectiveness of these learning opportunities in helping students improve their understanding of the math and statistical concepts addressed and, more importantly, their ability to apply them to solve a biological problem. We also identified areas that need emphasis in both biology and mathematics courses. In light of our observations, we recommend best practices that biology and mathematics academic departments can implement to train undergraduates for the demands of modern biology.
Statistics, Uncertainty, and Transmitted Variation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wendelberger, Joanne Roth

2014-11-05

The field of Statistics provides methods for modeling and understanding data and making decisions in the presence of uncertainty. When examining response functions, variation present in the input variables will be transmitted via the response function to the output variables. This phenomenon can potentially have significant impacts on the uncertainty associated with results from subsequent analysis. This presentation will examine the concept of transmitted variation, its impact on designed experiments, and a method for identifying and estimating sources of transmitted variation in certain settings.
New Statistics for Testing Differential Expression of Pathways from Microarray Data

NASA Astrophysics Data System (ADS)

Siu, Hoicheong; Dong, Hua; Jin, Li; Xiong, Momiao

Exploring biological meaning from microarray data is very important but remains a great challenge. Here, we developed three new statistics: linear combination test, quadratic test and de-correlation test to identify differentially expressed pathways from gene expression profile. We apply our statistics to two rheumatoid arthritis datasets. Notably, our results reveal three significant pathways and 275 genes in common in two datasets. The pathways we found are meaningful to uncover the disease mechanisms of rheumatoid arthritis, which implies that our statistics are a powerful tool in functional analysis of gene expression data.
Identifying hearing loss by means of iridology.

PubMed

Stearn, Natalie; Swanepoel, De Wet

2006-11-13

Isolated reports of hearing loss presenting as markings on the iris exist, but to date the effectiveness of iridology to identify hearing loss has not been investigated. This study therefore aimed to determine the efficacy of iridological analysis in the identification of moderate to profound sensorineural hearing loss in adolescents. A controlled trial was conducted with an iridologist, blind to the actual hearing status of participants, analyzing the irises of participants with and without hearing loss. Fifty hearing impaired and fifty normal hearing subjects, between the ages of 15 and 19 years, controlled for gender, participated in the study. An experienced iridologist analyzed the randomised set of participants' irises. A 70% correct identification of hearing status was obtained by iridological analyses with a false negative rate of 41% compared to a 19% false positive rate. The respective sensitivity and specificity rates therefore came to 59% and 81%. Iridological analysis of hearing status indicated a statistically significant relationship to actual hearing status (P < 0.05). Although statistically significant sensitivity and specificity rates for identifying hearing loss by iridology were not comparable to those of traditional audiological screening procedures.
Weighted Feature Significance: A Simple, Interpretable Model of Compound Toxicity Based on the Statistical Enrichment of Structural Features

PubMed Central

Huang, Ruili; Southall, Noel; Xia, Menghang; Cho, Ming-Hsuang; Jadhav, Ajit; Nguyen, Dac-Trung; Inglese, James; Tice, Raymond R.; Austin, Christopher P.

2009-01-01

In support of the U.S. Tox21 program, we have developed a simple and chemically intuitive model we call weighted feature significance (WFS) to predict the toxicological activity of compounds, based on the statistical enrichment of structural features in toxic compounds. We trained and tested the model on the following: (1) data from quantitative high–throughput screening cytotoxicity and caspase activation assays conducted at the National Institutes of Health Chemical Genomics Center, (2) data from Salmonella typhimurium reverse mutagenicity assays conducted by the U.S. National Toxicology Program, and (3) hepatotoxicity data published in the Registry of Toxic Effects of Chemical Substances. Enrichments of structural features in toxic compounds are evaluated for their statistical significance and compiled into a simple additive model of toxicity and then used to score new compounds for potential toxicity. The predictive power of the model for cytotoxicity was validated using an independent set of compounds from the U.S. Environmental Protection Agency tested also at the National Institutes of Health Chemical Genomics Center. We compared the performance of our WFS approach with classical classification methods such as Naive Bayesian clustering and support vector machines. In most test cases, WFS showed similar or slightly better predictive power, especially in the prediction of hepatotoxic compounds, where WFS appeared to have the best performance among the three methods. The new algorithm has the important advantages of simplicity, power, interpretability, and ease of implementation. PMID:19805409
Are there pollination syndromes in the Australian epacrids (Ericaceae: Styphelioideae)? A novel statistical method to identify key floral traits per syndrome

PubMed Central

Johnson, Karen A.

2013-01-01

Background and Aims Convergent floral traits hypothesized as attracting particular pollinators are known as pollination syndromes. Floral diversity suggests that the Australian epacrid flora may be adapted to pollinator type. Currently there are empirical data on the pollination systems for 87 species (approx. 15 % of Australian epacrids). This provides an opportunity to test for pollination syndromes and their important morphological traits in an iconic element of the Australian flora. Methods Data on epacrid–pollinator relationships were obtained from published literature and field observation. A multivariate approach was used to test whether epacrid floral attributes related to pollinator profiles. Statistical classification was then used to rank floral attributes according to their predictive value. Data sets excluding mixed pollination systems were used to test the predictive power of statistical classification to identify pollination models. Key Results Floral attributes are correlated with bird, fly and bee pollination. Using floral attributes identified as correlating with pollinator type, bird pollination is classified with 86 % accuracy, red flowers being the most important predictor. Fly and bee pollination are classified with 78 and 69 % accuracy, but have a lack of individually important floral predictors. Excluding mixed pollination systems improved the accuracy of the prediction of both bee and fly pollination systems. Conclusions Although most epacrids have generalized pollination systems, a correlation between bird pollination and red, long-tubed epacrids is found. Statistical classification highlights the relative importance of each floral attribute in relation to pollinator type and proves useful in classifying epacrids to bird, fly and bee pollination systems. PMID:23681546
Comparison of Four Views to Single-view Ultrasound Protocols to Identify Clinically Significant Pneumothorax.

PubMed

Helland, Gregg; Gaspari, Romolo; Licciardo, Samuel; Sanseverino, Alexandra; Torres, Ulises; Emhoff, Timothy; Blehar, David

2016-10-01

Ultrasound (US) has been shown to be effective at identifying a pneumothorax (PTX); however, the additional value of adding multiple views has not been studied. Single- and four-view protocols have both been described in the literature. The objective of this study was to compare the diagnostic accuracy of single-view versus four-view lung US to detect clinically significant PTX in trauma patients. This was a randomized, prospective trial on trauma patients. Adult patients with acute traumatic injury undergoing computed tomography (CT) scan of the chest were eligible for enrollment. Patients were randomized to a single view or four views of each hemithorax prior to any imaging. USs were performed and interpreted by credentialed physicians using a 7.5-Mhz linear array transducer on a portable US machine with digital clips recorded for later review. Attending radiologist interpretation of the chest CT was reviewed for presence or absence of PTX with descriptions of small foci of air or minimal PTX categorized as clinically insignificant. A total of 260 patients were enrolled over a 2-year period. A total of 139 patients received a single view of each chest wall and 121 patients received four views. There were a total of 49 patients that had a PTX (19%), and 29 of these were clinically significant (11%). In diagnosis of any PTX, both single-view and four-view techniques showed poor sensitivity (54.2 and 68%) but high specificity (99 and 98%). For clinically significant PTX, single-view US demonstrated a sensitivity of 93% (95% confidence interval [CI] = 64.1% to 99.6%) and a specificity of 99.2% (95% CI = 95.5% to 99.9%), with sensitivity of 93.3% (95% CI = 66% to 99.7%) and specificity of 98% (95% CI = 92.1% to 99.7%) for four views. Single-view and four-view chest wall USs demonstrate comparable sensitivity and specificity for PTX. The additional time to obtain four views should be weighed against the absence of additional diagnostic yield over a single view when
Identifying patients with hypertension: a case for auditing electronic health record data.

PubMed

Baus, Adam; Hendryx, Michael; Pollard, Cecil

2012-01-01

Problems in the structure, consistency, and completeness of electronic health record data are barriers to outcomes research, quality improvement, and practice redesign. This nonexperimental retrospective study examines the utility of importing de-identified electronic health record data into an external system to identify patients with and at risk for essential hypertension. We find a statistically significant increase in cases based on combined use of diagnostic and free-text coding (mean = 1,256.1, 95% CI 1,232.3-1,279.7) compared to diagnostic coding alone (mean = 1,174.5, 95% CI 1,150.5-1,198.3). While it is not surprising that significantly more patients are identified when broadening search criteria, the implications are critical for quality of care, the movement toward the National Committee for Quality Assurance's Patient-Centered Medical Home program, and meaningful use of electronic health records. Further, we find a statistically significant increase in potential cases based on the last two or more blood pressure readings greater than or equal to 140/90 mm Hg (mean = 1,353.9, 95% CI 1,329.9-1,377.9).
COMPARATIVE DIVERSITY OF FECAL BACTERIA IN AGRICULTURALLY SIGNIFICANT ANIMALS TO IDENTIFY ALTERNATIVE TARGETS FOR MICROBIAL SOURCE TRACKING

EPA Science Inventory

Animals of agricultural significance contribute a large percentage of fecal pollution to waterways via runoff contamination. The premise of microbial source tracking is to utilize fecal bacteria to identify target populations which are directly correlated to specific animal feces...
A Large-Scale Multi-ancestry Genome-wide Study Accounting for Smoking Behavior Identifies Multiple Significant Loci for Blood Pressure.

PubMed

Sung, Yun J; Winkler, Thomas W; de Las Fuentes, Lisa; Bentley, Amy R; Brown, Michael R; Kraja, Aldi T; Schwander, Karen; Ntalla, Ioanna; Guo, Xiuqing; Franceschini, Nora; Lu, Yingchang; Cheng, Ching-Yu; Sim, Xueling; Vojinovic, Dina; Marten, Jonathan; Musani, Solomon K; Li, Changwei; Feitosa, Mary F; Kilpeläinen, Tuomas O; Richard, Melissa A; Noordam, Raymond; Aslibekyan, Stella; Aschard, Hugues; Bartz, Traci M; Dorajoo, Rajkumar; Liu, Yongmei; Manning, Alisa K; Rankinen, Tuomo; Smith, Albert Vernon; Tajuddin, Salman M; Tayo, Bamidele O; Warren, Helen R; Zhao, Wei; Zhou, Yanhua; Matoba, Nana; Sofer, Tamar; Alver, Maris; Amini, Marzyeh; Boissel, Mathilde; Chai, Jin Fang; Chen, Xu; Divers, Jasmin; Gandin, Ilaria; Gao, Chuan; Giulianini, Franco; Goel, Anuj; Harris, Sarah E; Hartwig, Fernando Pires; Horimoto, Andrea R V R; Hsu, Fang-Chi; Jackson, Anne U; Kähönen, Mika; Kasturiratne, Anuradhani; Kühnel, Brigitte; Leander, Karin; Lee, Wen-Jane; Lin, Keng-Hung; 'an Luan, Jian; McKenzie, Colin A; Meian, He; Nelson, Christopher P; Rauramaa, Rainer; Schupf, Nicole; Scott, Robert A; Sheu, Wayne H H; Stančáková, Alena; Takeuchi, Fumihiko; van der Most, Peter J; Varga, Tibor V; Wang, Heming; Wang, Yajuan; Ware, Erin B; Weiss, Stefan; Wen, Wanqing; Yanek, Lisa R; Zhang, Weihua; Zhao, Jing Hua; Afaq, Saima; Alfred, Tamuno; Amin, Najaf; Arking, Dan; Aung, Tin; Barr, R Graham; Bielak, Lawrence F; Boerwinkle, Eric; Bottinger, Erwin P; Braund, Peter S; Brody, Jennifer A; Broeckel, Ulrich; Cabrera, Claudia P; Cade, Brian; Caizheng, Yu; Campbell, Archie; Canouil, Mickaël; Chakravarti, Aravinda; Chauhan, Ganesh; Christensen, Kaare; Cocca, Massimiliano; Collins, Francis S; Connell, John M; de Mutsert, Renée; de Silva, H Janaka; Debette, Stephanie; Dörr, Marcus; Duan, Qing; Eaton, Charles B; Ehret, Georg; Evangelou, Evangelos; Faul, Jessica D; Fisher, Virginia A; Forouhi, Nita G; Franco, Oscar H; Friedlander, Yechiel; Gao, He; Gigante, Bruna; Graff, Misa; Gu, C Charles; Gu, Dongfeng; Gupta, Preeti; Hagenaars, Saskia P; Harris, Tamara B; He, Jiang; Heikkinen, Sami; Heng, Chew-Kiat; Hirata, Makoto; Hofman, Albert; Howard, Barbara V; Hunt, Steven; Irvin, Marguerite R; Jia, Yucheng; Joehanes, Roby; Justice, Anne E; Katsuya, Tomohiro; Kaufman, Joel; Kerrison, Nicola D; Khor, Chiea Chuen; Koh, Woon-Puay; Koistinen, Heikki A; Komulainen, Pirjo; Kooperberg, Charles; Krieger, Jose E; Kubo, Michiaki; Kuusisto, Johanna; Langefeld, Carl D; Langenberg, Claudia; Launer, Lenore J; Lehne, Benjamin; Lewis, Cora E; Li, Yize; Lim, Sing Hui; Lin, Shiow; Liu, Ching-Ti; Liu, Jianjun; Liu, Jingmin; Liu, Kiang; Liu, Yeheng; Loh, Marie; Lohman, Kurt K; Long, Jirong; Louie, Tin; Mägi, Reedik; Mahajan, Anubha; Meitinger, Thomas; Metspalu, Andres; Milani, Lili; Momozawa, Yukihide; Morris, Andrew P; Mosley, Thomas H; Munson, Peter; Murray, Alison D; Nalls, Mike A; Nasri, Ubaydah; Norris, Jill M; North, Kari; Ogunniyi, Adesola; Padmanabhan, Sandosh; Palmas, Walter R; Palmer, Nicholette D; Pankow, James S; Pedersen, Nancy L; Peters, Annette; Peyser, Patricia A; Polasek, Ozren; Raitakari, Olli T; Renström, Frida; Rice, Treva K; Ridker, Paul M; Robino, Antonietta; Robinson, Jennifer G; Rose, Lynda M; Rudan, Igor; Sabanayagam, Charumathi; Salako, Babatunde L; Sandow, Kevin; Schmidt, Carsten O; Schreiner, Pamela J; Scott, William R; Seshadri, Sudha; Sever, Peter; Sitlani, Colleen M; Smith, Jennifer A; Snieder, Harold; Starr, John M; Strauch, Konstantin; Tang, Hua; Taylor, Kent D; Teo, Yik Ying; Tham, Yih Chung; Uitterlinden, André G; Waldenberger, Melanie; Wang, Lihua; Wang, Ya X; Wei, Wen Bin; Williams, Christine; Wilson, Gregory; Wojczynski, Mary K; Yao, Jie; Yuan, Jian-Min; Zonderman, Alan B; Becker, Diane M; Boehnke, Michael; Bowden, Donald W; Chambers, John C; Chen, Yii-Der Ida; de Faire, Ulf; Deary, Ian J; Esko, Tõnu; Farrall, Martin; Forrester, Terrence; Franks, Paul W; Freedman, Barry I; Froguel, Philippe; Gasparini, Paolo; Gieger, Christian; Horta, Bernardo Lessa; Hung, Yi-Jen; Jonas, Jost B; Kato, Norihiro; Kooner, Jaspal S; Laakso, Markku; Lehtimäki, Terho; Liang, Kae-Woei; Magnusson, Patrik K E; Newman, Anne B; Oldehinkel, Albertine J; Pereira, Alexandre C; Redline, Susan; Rettig, Rainer; Samani, Nilesh J; Scott, James; Shu, Xiao-Ou; van der Harst, Pim; Wagenknecht, Lynne E; Wareham, Nicholas J; Watkins, Hugh; Weir, David R; Wickremasinghe, Ananda R; Wu, Tangchun; Zheng, Wei; Kamatani, Yoichiro; Laurie, Cathy C; Bouchard, Claude; Cooper, Richard S; Evans, Michele K; Gudnason, Vilmundur; Kardia, Sharon L R; Kritchevsky, Stephen B; Levy, Daniel; O'Connell, Jeff R; Psaty, Bruce M; van Dam, Rob M; Sims, Mario; Arnett, Donna K; Mook-Kanamori, Dennis O; Kelly, Tanika N; Fox, Ervin R; Hayward, Caroline; Fornage, Myriam; Rotimi, Charles N; Province, Michael A; van Duijn, Cornelia M; Tai, E Shyong; Wong, Tien Yin; Loos, Ruth J F; Reiner, Alex P; Rotter, Jerome I; Zhu, Xiaofeng; Bierut, Laura J; Gauderman, W James; Caulfield, Mark J; Elliott, Paul; Rice, Kenneth; Munroe, Patricia B; Morrison, Alanna C; Cupples, L Adrienne; Rao, Dabeeru C; Chasman, Daniel I

2018-03-01

Genome-wide association analysis advanced understanding of blood pressure (BP), a major risk factor for vascular conditions such as coronary heart disease and stroke. Accounting for smoking behavior may help identify BP loci and extend our knowledge of its genetic architecture. We performed genome-wide association meta-analyses of systolic and diastolic BP incorporating gene-smoking interactions in 610,091 individuals. Stage 1 analysis examined ∼18.8 million SNPs and small insertion/deletion variants in 129,913 individuals from four ancestries (European, African, Asian, and Hispanic) with follow-up analysis of promising variants in 480,178 additional individuals from five ancestries. We identified 15 loci that were genome-wide significant (p < 5 × 10 -8 ) in stage 1 and formally replicated in stage 2. A combined stage 1 and 2 meta-analysis identified 66 additional genome-wide significant loci (13, 35, and 18 loci in European, African, and trans-ancestry, respectively). A total of 56 known BP loci were also identified by our results (p < 5 × 10 -8 ). Of the newly identified loci, ten showed significant interaction with smoking status, but none of them were replicated in stage 2. Several loci were identified in African ancestry, highlighting the importance of genetic studies in diverse populations. The identified loci show strong evidence for regulatory features and support shared pathophysiology with cardiometabolic and addiction traits. They also highlight a role in BP regulation for biological candidates such as modulators of vascular structure and function (CDKN1B, BCAR1-CFDP1, PXDN, EEA1), ciliopathies (SDCCAG8, RPGRIP1L), telomere maintenance (TNKS, PINX1, AKTIP), and central dopaminergic signaling (MSRA, EBF2). Copyright © 2018 American Society of Human Genetics. All rights reserved.

Perspectives on statistics education: observations from statistical consulting in an academic nursing environment.

PubMed

Hayat, Matthew J; Schmiege, Sarah J; Cook, Paul F

2014-04-01

Statistics knowledge is essential for understanding the nursing and health care literature, as well as for applying rigorous science in nursing research. Statistical consultants providing services to faculty and students in an academic nursing program have the opportunity to identify gaps and challenges in statistics education for nursing students. This information may be useful to curriculum committees and statistics educators. This article aims to provide perspective on statistics education stemming from the experiences of three experienced statistics educators who regularly collaborate and consult with nurse investigators. The authors share their knowledge and express their views about data management, data screening and manipulation, statistical software, types of scientific investigation, and advanced statistical topics not covered in the usual coursework. The suggestions provided promote a call for data to study these topics. Relevant data about statistics education can assist educators in developing comprehensive statistics coursework for nursing students. Copyright 2014, SLACK Incorporated.
Calculating stage duration statistics in multistage diseases.

PubMed

Komarova, Natalia L; Thalhauser, Craig J

2011-01-01

Many human diseases are characterized by multiple stages of progression. While the typical sequence of disease progression can be identified, there may be large individual variations among patients. Identifying mean stage durations and their variations is critical for statistical hypothesis testing needed to determine if treatment is having a significant effect on the progression, or if a new therapy is showing a delay of progression through a multistage disease. In this paper we focus on two methods for extracting stage duration statistics from longitudinal datasets: an extension of the linear regression technique, and a counting algorithm. Both are non-iterative, non-parametric and computationally cheap methods, which makes them invaluable tools for studying the epidemiology of diseases, with a goal of identifying different patterns of progression by using bioinformatics methodologies. Here we show that the regression method performs well for calculating the mean stage durations under a wide variety of assumptions, however, its generalization to variance calculations fails under realistic assumptions about the data collection procedure. On the other hand, the counting method yields reliable estimations for both means and variances of stage durations. Applications to Alzheimer disease progression are discussed.
28 CFR 22.25 - Final disposition of identifiable materials.

Code of Federal Regulations, 2011 CFR

2011-07-01

... RESEARCH AND STATISTICAL INFORMATION § 22.25 Final disposition of identifiable materials. Upon completion of a research or statistical project the security of identifiable research or statistical information...
28 CFR 22.25 - Final disposition of identifiable materials.

Code of Federal Regulations, 2010 CFR

2010-07-01

... RESEARCH AND STATISTICAL INFORMATION § 22.25 Final disposition of identifiable materials. Upon completion of a research or statistical project the security of identifiable research or statistical information...
The Development of Statistical Models for Predicting Surgical Site Infections in Japan: Toward a Statistical Model-Based Standardized Infection Ratio.

PubMed

Fukuda, Haruhisa; Kuroki, Manabu

2016-03-01

To develop and internally validate a surgical site infection (SSI) prediction model for Japan. Retrospective observational cohort study. We analyzed surveillance data submitted to the Japan Nosocomial Infections Surveillance system for patients who had undergone target surgical procedures from January 1, 2010, through December 31, 2012. Logistic regression analyses were used to develop statistical models for predicting SSIs. An SSI prediction model was constructed for each of the procedure categories by statistically selecting the appropriate risk factors from among the collected surveillance data and determining their optimal categorization. Standard bootstrapping techniques were applied to assess potential overfitting. The C-index was used to compare the predictive performances of the new statistical models with those of models based on conventional risk index variables. The study sample comprised 349,987 cases from 428 participant hospitals throughout Japan, and the overall SSI incidence was 7.0%. The C-indices of the new statistical models were significantly higher than those of the conventional risk index models in 21 (67.7%) of the 31 procedure categories (P<.05). No significant overfitting was detected. Japan-specific SSI prediction models were shown to generally have higher accuracy than conventional risk index models. These new models may have applications in assessing hospital performance and identifying high-risk patients in specific procedure categories.
Kernel Density Surface Modelling as a Means to Identify Significant Concentrations of Vulnerable Marine Ecosystem Indicators

PubMed Central

Kenchington, Ellen; Murillo, Francisco Javier; Lirette, Camille; Sacau, Mar; Koen-Alonso, Mariano; Kenny, Andrew; Ollerhead, Neil; Wareham, Vonda; Beazley, Lindsay

2014-01-01

The United Nations General Assembly Resolution 61/105, concerning sustainable fisheries in the marine ecosystem, calls for the protection of vulnerable marine ecosystems (VME) from destructive fishing practices. Subsequently, the Food and Agriculture Organization (FAO) produced guidelines for identification of VME indicator species/taxa to assist in the implementation of the resolution, but recommended the development of case-specific operational definitions for their application. We applied kernel density estimation (KDE) to research vessel trawl survey data from inside the fishing footprint of the Northwest Atlantic Fisheries Organization (NAFO) Regulatory Area in the high seas of the northwest Atlantic to create biomass density surfaces for four VME indicator taxa: large-sized sponges, sea pens, small and large gorgonian corals. These VME indicator taxa were identified previously by NAFO using the fragility, life history characteristics and structural complexity criteria presented by FAO, along with an evaluation of their recovery trajectories. KDE, a non-parametric neighbour-based smoothing function, has been used previously in ecology to identify hotspots, that is, areas of relatively high biomass/abundance. We present a novel approach of examining relative changes in area under polygons created from encircling successive biomass categories on the KDE surface to identify “significant concentrations” of biomass, which we equate to VMEs. This allows identification of the VMEs from the broader distribution of the species in the study area. We provide independent assessments of the VMEs so identified using underwater images, benthic sampling with other gear types (dredges, cores), and/or published species distribution models of probability of occurrence, as available. For each VME indicator taxon we provide a brief review of their ecological function which will be important in future assessments of significant adverse impact on these habitats here and
Kernel density surface modelling as a means to identify significant concentrations of vulnerable marine ecosystem indicators.

PubMed

Kenchington, Ellen; Murillo, Francisco Javier; Lirette, Camille; Sacau, Mar; Koen-Alonso, Mariano; Kenny, Andrew; Ollerhead, Neil; Wareham, Vonda; Beazley, Lindsay

2014-01-01

The United Nations General Assembly Resolution 61/105, concerning sustainable fisheries in the marine ecosystem, calls for the protection of vulnerable marine ecosystems (VME) from destructive fishing practices. Subsequently, the Food and Agriculture Organization (FAO) produced guidelines for identification of VME indicator species/taxa to assist in the implementation of the resolution, but recommended the development of case-specific operational definitions for their application. We applied kernel density estimation (KDE) to research vessel trawl survey data from inside the fishing footprint of the Northwest Atlantic Fisheries Organization (NAFO) Regulatory Area in the high seas of the northwest Atlantic to create biomass density surfaces for four VME indicator taxa: large-sized sponges, sea pens, small and large gorgonian corals. These VME indicator taxa were identified previously by NAFO using the fragility, life history characteristics and structural complexity criteria presented by FAO, along with an evaluation of their recovery trajectories. KDE, a non-parametric neighbour-based smoothing function, has been used previously in ecology to identify hotspots, that is, areas of relatively high biomass/abundance. We present a novel approach of examining relative changes in area under polygons created from encircling successive biomass categories on the KDE surface to identify "significant concentrations" of biomass, which we equate to VMEs. This allows identification of the VMEs from the broader distribution of the species in the study area. We provide independent assessments of the VMEs so identified using underwater images, benthic sampling with other gear types (dredges, cores), and/or published species distribution models of probability of occurrence, as available. For each VME indicator taxon we provide a brief review of their ecological function which will be important in future assessments of significant adverse impact on these habitats here and elsewhere.
Identifiability of PBPK Models with Applications to ...

EPA Pesticide Factsheets

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss different types of identifiability that occur in PBPK models and give reasons why they occur. We particularly focus on how the mathematical structure of a PBPK model and lack of appropriate data can lead to statistical models in which it is impossible to estimate at least some parameters precisely. Methods are reviewed which can determine whether a purely linear PBPK model is globally identifiable. We propose a theorem which determines when identifiability at a set of finite and specific values of the mathematical PBPK model (global discrete identifiability) implies identifiability of the statistical model. However, we are unable to establish conditions that imply global discrete identifiability, and conclude that the only safe approach to analysis of PBPK models involves Bayesian analysis with truncated priors. Finally, computational issues regarding posterior simulations of PBPK models are discussed. The methodology is very general and can be applied to numerous PBPK models which can be expressed as linear time-invariant systems. A real data set of a PBPK model for exposure to dimethyl arsinic acid (DMA(V)) is presented to illustrate the proposed methodology. We consider statistical analy
28 CFR 22.22 - Revelation of identifiable data.

Code of Federal Regulations, 2011 CFR

2011-07-01

... STATISTICAL INFORMATION § 22.22 Revelation of identifiable data. (a) Except as noted in paragraph (b) of this section, research and statistical information relating to a private person may be revealed in identifiable... Act. (3) Persons or organizations for research or statistical purposes. Information may only be...
Time series, periodograms, and significance

NASA Astrophysics Data System (ADS)

Hernandez, G.

1999-05-01

The geophysical literature shows a wide and conflicting usage of methods employed to extract meaningful information on coherent oscillations from measurements. This makes it difficult, if not impossible, to relate the findings reported by different authors. Therefore, we have undertaken a critical investigation of the tests and methodology used for determining the presence of statistically significant coherent oscillations in periodograms derived from time series. Statistical significance tests are only valid when performed on the independent frequencies present in a measurement. Both the number of possible independent frequencies in a periodogram and the significance tests are determined by the number of degrees of freedom, which is the number of true independent measurements, present in the time series, rather than the number of sample points in the measurement. The number of degrees of freedom is an intrinsic property of the data, and it must be determined from the serial coherence of the time series. As part of this investigation, a detailed study has been performed which clearly illustrates the deleterious effects that the apparently innocent and commonly used processes of filtering, de-trending, and tapering of data have on periodogram analysis and the consequent difficulties in the interpretation of the statistical significance thus derived. For the sake of clarity, a specific example of actual field measurements containing unevenly-spaced measurements, gaps, etc., as well as synthetic examples, have been used to illustrate the periodogram approach, and pitfalls, leading to the (statistical) significance tests for the presence of coherent oscillations. Among the insights of this investigation are: (1) the concept of a time series being (statistically) band limited by its own serial coherence and thus having a critical sampling rate which defines one of the necessary requirements for the proper statistical design of an experiment; (2) the design of a critical
Statistics in the pharmacy literature.

PubMed

Lee, Charlene M; Soin, Herpreet K; Einarson, Thomas R

2004-09-01

Research in statistical methods is essential for maintenance of high quality of the published literature. To update previous reports of the types and frequencies of statistical terms and procedures in research studies of selected professional pharmacy journals. We obtained all research articles published in 2001 in 6 journals: American Journal of Health-System Pharmacy, The Annals of Pharmacotherapy, Canadian Journal of Hospital Pharmacy, Formulary, Hospital Pharmacy, and Journal of the American Pharmaceutical Association. Two independent reviewers identified and recorded descriptive and inferential statistical terms/procedures found in the methods, results, and discussion sections of each article. Results were determined by tallying the total number of times, as well as the percentage, that each statistical term or procedure appeared in the articles. One hundred forty-four articles were included. Ninety-eight percent employed descriptive statistics; of these, 28% used only descriptive statistics. The most common descriptive statistical terms were percentage (90%), mean (74%), standard deviation (58%), and range (46%). Sixty-nine percent of the articles used inferential statistics, the most frequent being chi(2) (33%), Student's t-test (26%), Pearson's correlation coefficient r (18%), ANOVA (14%), and logistic regression (11%). Statistical terms and procedures were found in nearly all of the research articles published in pharmacy journals. Thus, pharmacy education should aim to provide current and future pharmacists with an understanding of the common statistical terms and procedures identified to facilitate the appropriate appraisal and consequential utilization of the information available in research articles.
Identifying individuality and variability in team tactics by means of statistical shape analysis and multilayer perceptrons.

PubMed

Jäger, Jörg M; Schöllhorn, Wolfgang I

2012-04-01

Offensive and defensive systems of play represent important aspects of team sports. They include the players' positions at certain situations during a match, i.e., when players have to be on specific positions on the court. Patterns of play emerge based on the formations of the players on the court. Recognition of these patterns is important to react adequately and to adjust own strategies to the opponent. Furthermore, the ability to apply variable patterns of play seems to be promising since they make it harder for the opponent to adjust. The purpose of this study is to identify different team tactical patterns in volleyball and to analyze differences in variability. Overall 120 standard situations of six national teams in women's volleyball are analyzed during a world championship tournament. Twenty situations from each national team are chosen, including the base defence position (start configuration) and the two players block with middle back deep (end configuration). The shapes of the defence formations at the start and end configurations during the defence of each national team as well as the variability of these defence formations are statistically analyzed. Furthermore these shapes data are used to train multilayer perceptrons in order to test whether artificial neural networks can recognize the teams by their tactical patterns. Results show significant differences between the national teams in both the base defence position at the start and the two players block with middle back deep at the end of the standard defence situation. Furthermore, the national teams show significant differences in variability of the defence systems and start-positions are more variable than the end-positions. Multilayer perceptrons are able to recognize the teams at an average of 98.5%. It is concluded that defence systems in team sports are highly individual at a competitive level and variable even in standard situations. Artificial neural networks can be used to recognize
Factors related to student performance in statistics courses in Lebanon

NASA Astrophysics Data System (ADS)

Naccache, Hiba Salim

The purpose of the present study was to identify factors that may contribute to business students in Lebanese universities having difficulty in introductory and advanced statistics courses. Two statistics courses are required for business majors at Lebanese universities. Students are not obliged to be enrolled in any math courses prior to taking statistics courses. Drawing on recent educational research, this dissertation attempted to identify the relationship between (1) students’ scores on Lebanese university math admissions tests; (2) students’ scores on a test of very basic mathematical concepts; (3) students’ scores on the survey of attitude toward statistics (SATS); (4) course performance as measured by students’ final scores in the course; and (5) their scores on the final exam. Data were collected from 561 students enrolled in multiple sections of two courses: 307 students in the introductory statistics course and 260 in the advanced statistics course in seven campuses across Lebanon over one semester. The multiple regressions results revealed four significant relationships at the introductory level: between students’ scores on the math quiz with their (1) final exam scores; (2) their final averages; (3) the Cognitive subscale of the SATS with their final exam scores; and (4) their final averages. These four significant relationships were also found at the advanced level. In addition, two more significant relationships were found between students’ final average and the two subscales of Effort (5) and Affect (6). No relationship was found between students’ scores on the admission math tests and both their final exam scores and their final averages in both the introductory and advanced level courses. On the other hand, there was no relationship between students’ scores on Lebanese admissions tests and their final achievement. Although these results were consistent across course formats and instructors, they may encourage Lebanese universities
Statistical Literacy as a Function of Online versus Hybrid Course Delivery Format for an Introductory Graduate Statistics Course

ERIC Educational Resources Information Center

Hahs-Vaughn, Debbie L.; Acquaye, Hannah; Griffith, Matthew D.; Jo, Hang; Matthews, Ken; Acharya, Parul

2017-01-01

Statistical literacy refers to understanding fundamental statistical concepts. Assessment of statistical literacy can take the forms of tasks that require students to identify, translate, compute, read, and interpret data. In addition, statistical instruction can take many forms encompassing course delivery format such as face-to-face, hybrid,…
Identifying Patients with Hypertension: A Case for Auditing Electronic Health Record Data

PubMed Central

Baus, Adam; Hendryx, Michael; Pollard, Cecil

2012-01-01

Problems in the structure, consistency, and completeness of electronic health record data are barriers to outcomes research, quality improvement, and practice redesign. This nonexperimental retrospective study examines the utility of importing de-identified electronic health record data into an external system to identify patients with and at risk for essential hypertension. We find a statistically significant increase in cases based on combined use of diagnostic and free-text coding (mean = 1,256.1, 95% CI 1,232.3–1,279.7) compared to diagnostic coding alone (mean = 1,174.5, 95% CI 1,150.5—1,198.3). While it is not surprising that significantly more patients are identified when broadening search criteria, the implications are critical for quality of care, the movement toward the National Committee for Quality Assurance's Patient-Centered Medical Home program, and meaningful use of electronic health records. Further, we find a statistically significant increase in potential cases based on the last two or more blood pressure readings greater than or equal to 140/90 mm Hg (mean = 1,353.9, 95% CI 1,329.9—1,377.9). PMID:22737097
Statistical assessment of the learning curves of health technologies.

PubMed

Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T

2001-01-01

was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects
Statistical significance approximation in local trend analysis of high-throughput time-series data using the theory of Markov chains.

PubMed

Xia, Li C; Ai, Dongmei; Cram, Jacob A; Liang, Xiaoyi; Fuhrman, Jed A; Sun, Fengzhu

2015-09-21

Local trend (i.e. shape) analysis of time series data reveals co-changing patterns in dynamics of biological systems. However, slow permutation procedures to evaluate the statistical significance of local trend scores have limited its applications to high-throughput time series data analysis, e.g., data from the next generation sequencing technology based studies. By extending the theories for the tail probability of the range of sum of Markovian random variables, we propose formulae for approximating the statistical significance of local trend scores. Using simulations and real data, we show that the approximate p-value is close to that obtained using a large number of permutations (starting at time points >20 with no delay and >30 with delay of at most three time steps) in that the non-zero decimals of the p-values obtained by the approximation and the permutations are mostly the same when the approximate p-value is less than 0.05. In addition, the approximate p-value is slightly larger than that based on permutations making hypothesis testing based on the approximate p-value conservative. The approximation enables efficient calculation of p-values for pairwise local trend analysis, making large scale all-versus-all comparisons possible. We also propose a hybrid approach by integrating the approximation and permutations to obtain accurate p-values for significantly associated pairs. We further demonstrate its use with the analysis of the Polymouth Marine Laboratory (PML) microbial community time series from high-throughput sequencing data and found interesting organism co-occurrence dynamic patterns. The software tool is integrated into the eLSA software package that now provides accelerated local trend and similarity analysis pipelines for time series data. The package is freely available from the eLSA website: http://bitbucket.org/charade/elsa.
Detection of Clostridium difficile infection clusters, using the temporal scan statistic, in a community hospital in southern Ontario, Canada, 2006-2011.

PubMed

Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

2014-05-12

In hospitals, Clostridium difficile infection (CDI) surveillance relies on unvalidated guidelines or threshold criteria to identify outbreaks. This can result in false-positive and -negative cluster alarms. The application of statistical methods to identify and understand CDI clusters may be a useful alternative or complement to standard surveillance techniques. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting CDI clusters and determine if there are significant differences in the rate of CDI cases by month, season, and year in a community hospital. Bacteriology reports of patients identified with a CDI from August 2006 to February 2011 were collected. For patients detected with CDI from March 2010 to February 2011, stool specimens were obtained. Clostridium difficile isolates were characterized by ribotyping and investigated for the presence of toxin genes by PCR. CDI clusters were investigated using a retrospective temporal scan test statistic. Statistically significant clusters were compared to known CDI outbreaks within the hospital. A negative binomial regression model was used to identify associations between year, season, month and the rate of CDI cases. Overall, 86 CDI cases were identified. Eighteen specimens were analyzed and nine ribotypes were classified with ribotype 027 (n = 6) the most prevalent. The temporal scan statistic identified significant CDI clusters at the hospital (n = 5), service (n = 6), and ward (n = 4) levels (P ≤ 0.05). Three clusters were concordant with the one C. difficile outbreak identified by hospital personnel. Two clusters were identified as potential outbreaks. The negative binomial model indicated years 2007-2010 (P ≤ 0.05) had decreased CDI rates compared to 2006 and spring had an increased CDI rate compared to the fall (P = 0.023). Application of the temporal scan statistic identified several clusters, including potential outbreaks not detected by hospital
Neuro magnetic resonance spectroscopy using wavelet decomposition and statistical testing identifies biochemical changes in people with spinal cord injury and pain.

PubMed

Stanwell, Peter; Siddall, Philip; Keshava, Nirmal; Cocuzzo, Daniel; Ramadan, Saadallah; Lin, Alexander; Herbert, David; Craig, Ashley; Tran, Yvonne; Middleton, James; Gautam, Shiva; Cousins, Michael; Mountford, Carolyn

2010-11-01

Spinal cord injury (SCI) can be accompanied by chronic pain, the mechanisms for which are poorly understood. Here we report that magnetic resonance spectroscopy measurements from the brain, collected at 3T, and processed using wavelet-based feature extraction and classification algorithms, can identify biochemical changes that distinguish control subjects from subjects with SCI as well as subdividing the SCI group into those with and without chronic pain. The results from control subjects (n=10) were compared to those with SCI (n=10). The SCI cohort was made up of subjects with chronic neuropathic pain (n=5) and those without chronic pain (n=5). The wavelet-based decomposition of frequency domain MRS signals employs statistical significance testing to identify features best suited to discriminate different classes. Moreover, the features benefit from careful attention to the post-processing of the spectroscopy data prior to the comparison of the three cohorts. The spectroscopy data, from the thalamus, best distinguished control subjects without SCI from those with SCI with a sensitivity and specificity of 0.9 (Percentage of Correct Classification). The spectroscopy data obtained from the prefrontal cortex and anterior cingulate cortex both distinguished between SCI subjects with chronic neuropathic pain and those without pain with a sensitivity and specificity of 1.0. In this study, where two underlying mechanisms co-exist (i.e. SCI and pain), the thalamic changes appear to be linked more strongly to SCI, while the anterior cingulate cortex and prefrontal cortex changes appear to be specifically linked to the presence of pain. Copyright 2010 Elsevier Inc. All rights reserved.
[Completeness of mortality statistics in Navarra, Spain].

PubMed

Moreno-Iribas, Conchi; Guevara, Marcela; Díaz-González, Jorge; Álvarez-Arruti, Nerea; Casado, Itziar; Delfrade, Josu; Larumbe, Emilia; Aguirre, Jesús; Floristán, Yugo

2013-01-01

Women in the region of Navarra, Spain, have one of the highest life expectancies at birth in Europe. The aim of this study is to assess the completeness of the official mortality statistics of Navarra in 2009 and the impact of the under-registration of deaths on life expectancy estimates. Comparison of the number of deaths in Navarra using the official statistics from the Instituto Nacional de Estadística (INE) and the data derived from a multiple-source case-finding: the electronic health record, Instituto Navarro de Medicina Legal and INE including data that they received late. 5,249 deaths were identified, of which 103 were not included in the official mortality statistics. Taking into account only deaths that occurred in Spain, which are the only ones considered for the official statistics, the completeness was 98.4%. Estimated life expectancy at birth in 2009 descended from 86.6 years to 86.4 in women and from 80.0 to 79.6 years in men, after correcting for undercount. The results of this study ruled out the existence of significant under-registration of the official mortality statistics, confirming the exceptional longevity of women in Navarra, who are in the top position in Europe with a life expectancy at birth of 86.4 years.

Pooled Genome-Wide Analysis to Identify Novel Risk Loci for Pediatric Allergic Asthma

PubMed Central

Ricci, Giampaolo; Astolfi, Annalisa; Remondini, Daniel; Cipriani, Francesca; Formica, Serena; Dondi, Arianna; Pession, Andrea

2011-01-01

Background Genome-wide association studies of pooled DNA samples were shown to be a valuable tool to identify candidate SNPs associated to a phenotype. No such study was up to now applied to childhood allergic asthma, even if the very high complexity of asthma genetics is an appropriate field to explore the potential of pooled GWAS approach. Methodology/Principal Findings We performed a pooled GWAS and individual genotyping in 269 children with allergic respiratory diseases comparing allergic children with and without asthma. We used a modular approach to identify the most significant loci associated with asthma by combining silhouette statistics and physical distance method with cluster-adapted thresholding. We found 97% concordance between pooled GWAS and individual genotyping, with 36 out of 37 top-scoring SNPs significant at individual genotyping level. The most significant SNP is located inside the coding sequence of C5, an already identified asthma susceptibility gene, while the other loci regulate functions that are relevant to bronchial physiopathology, as immune- or inflammation-mediated mechanisms and airway smooth muscle contraction. Integration with gene expression data showed that almost half of the putative susceptibility genes are differentially expressed in experimental asthma mouse models. Conclusion/Significance Combined silhouette statistics and cluster-adapted physical distance threshold analysis of pooled GWAS data is an efficient method to identify candidate SNP associated to asthma development in an allergic pediatric population. PMID:21359210
Statistical dependency in visual scanning

NASA Technical Reports Server (NTRS)

Ellis, Stephen R.; Stark, Lawrence

1986-01-01

A method to identify statistical dependencies in the positions of eye fixations is developed and applied to eye movement data from subjects who viewed dynamic displays of air traffic and judged future relative position of aircraft. Analysis of approximately 23,000 fixations on points of interest on the display identified statistical dependencies in scanning that were independent of the physical placement of the points of interest. Identification of these dependencies is inconsistent with random-sampling-based theories used to model visual search and information seeking.
Significant Adverse Events and Outcomes After Medical Abortion

PubMed Central

Cleland, Kelly; Creinin, Mitchell D.; Nucatola, Deborah; Nshom, Montsine; Trussell, James

2013-01-01

Objective To analyze rates of significant adverse events and outcomes in women having a medical abortion at Planned Parenthood health centers in 2009 and 2010, and to identify changes in the rates of adverse events and outcomes between the 2 years. Methods In this database review we analyzed data from Planned Parenthood affiliates that provided medical abortion in 2009 and 2010, almost exclusively using an evidence-based buccal misoprostol regimen. We evaluated the incidence of six clinically significant adverse events (hospital admission, blood transfusion, emergency room treatment, intravenous antibiotics administration, infection, and death) and two significant outcomes (ongoing pregnancy and ectopic pregnancy diagnosed after medical abortion treatment was initiated). We calculated an overall rate as well as rates for each event and identified changes between the 2 years. Results Amongst 233,805 medical abortions provided in 2009 and 2010, significant adverse events or outcomes were reported in 1,530 cases (0.65%). There was no statistically significant difference in overall rates between years. The most common significant outcome was ongoing intrauterine pregnancy (0.50%); significant adverse events occurred in 0.16% of cases. One patient death occurred due to an undiagnosed ectopic pregnancy. Only rates for emergency room treatment and blood transfusion differed by year, and were slightly higher in 2010. Conclusion Review of this large dataset reinforces the safety of the evidence-based medical abortion regimen. PMID:23262942
Randomized Trial of Plaque-Identifying Toothpaste: Decreasing Plaque and Inflammation.

PubMed

Fasula, Kim; Evans, Carla A; Boyd, Linda; Giblin, Lori; Belavsky, Benjamin Z; Hetzel, Scott; McBride, Patrick; DeMets, David L; Hennekens, Charles H

2017-06-01

Randomized data are sparse about whether a plaque-identifying toothpaste reduces dental plaque and nonexistent for inflammation. Inflammation is intimately involved in the pathogenesis of atherosclerosis and is accurately measured by high-sensitivity C-reactive protein (hs-CRP), a sensitive marker for cardiovascular disease. The hypotheses that Plaque HD (TJA Health LLC, Joliet, Ill), a plaque-identifying toothpaste, produces statistically significant reductions in dental plaque and hs-CRP were tested in this randomized trial. Sixty-one apparently healthy subjects aged 19 to 44 years were assigned at random to this plaque-identifying (n = 31) or placebo toothpaste (n = 30) for 60 days. Changes from baseline to follow-up in dental plaque and hs-CRP were assessed. In an intention-to-treat analysis, the plaque-identifying toothpaste reduced mean plaque score by 49%, compared with a 24% reduction in placebo (P = .001). In a prespecified subgroup analysis of 38 subjects with baseline levels >0.5 mg/L, the plaque-identifying toothpaste reduced hs-CRP by 29%, compared with a 25% increase in placebo toothpaste (P = .041). This plaque-identifying toothpaste produced statistically significant reductions in dental plaque and hs-CRP. The observed reduction in dental plaque confirms and extends a previous observation. The observed reduction in inflammation supports the hypothesis of a reduction in risks of cardiovascular disease. The direct test of this hypothesis requires a large-scale randomized trial of sufficient size and duration designed a priori to do so. Such a finding would have major clinical and public health implications. Copyright © 2017 Elsevier Inc. All rights reserved.
Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits.

PubMed

Mancuso, Nicholas; Shi, Huwenbo; Goddard, Pagé; Kichaev, Gleb; Gusev, Alexander; Pasaniuc, Bogdan

2017-03-02

Although genome-wide association studies (GWASs) have identified thousands of risk loci for many complex traits and diseases, the causal variants and genes at these loci remain largely unknown. Here, we introduce a method for estimating the local genetic correlation between gene expression and a complex trait and utilize it to estimate the genetic correlation due to predicted expression between pairs of traits. We integrated gene expression measurements from 45 expression panels with summary GWAS data to perform 30 multi-tissue transcriptome-wide association studies (TWASs). We identified 1,196 genes whose expression is associated with these traits; of these, 168 reside more than 0.5 Mb away from any previously reported GWAS significant variant. We then used our approach to find 43 pairs of traits with significant genetic correlation at the level of predicted expression; of these, eight were not found through genetic correlation at the SNP level. Finally, we used bi-directional regression to find evidence that BMI causally influences triglyceride levels and that triglyceride levels causally influence low-density lipoprotein. Together, our results provide insight into the role of gene expression in the susceptibility of complex traits and diseases. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Antecedents of students' achievement in statistics

NASA Astrophysics Data System (ADS)

Awaludin, Izyan Syazana; Razak, Ruzanna Ab; Harris, Hezlin; Selamat, Zarehan

2015-02-01

The applications of statistics in most fields have been vast. Many degree programmes at local universities require students to enroll in at least one statistics course. The standard of these courses varies across different degree programmes. This is because of students' diverse academic backgrounds in which some comes far from the field of statistics. The high failure rate in statistics courses for non-science stream students had been concerning every year. The purpose of this research is to investigate the antecedents of students' achievement in statistics. A total of 272 students participated in the survey. Multiple linear regression was applied to examine the relationship between the factors and achievement. We found that statistics anxiety was a significant predictor of students' achievement. We also found that students' age has significant effect to achievement. Older students are more likely to achieve lowers scores in statistics. Student's level of study also has a significant impact on their achievement in statistics.
Multi-trait analysis of genome-wide association summary statistics using MTAG.

PubMed

Turley, Patrick; Walters, Raymond K; Maghzian, Omeed; Okbay, Aysu; Lee, James J; Fontana, Mark Alan; Nguyen-Viet, Tuan Anh; Wedow, Robbee; Zacher, Meghan; Furlotte, Nicholas A; Magnusson, Patrik; Oskarsson, Sven; Johannesson, Magnus; Visscher, Peter M; Laibson, David; Cesarini, David; Neale, Benjamin M; Benjamin, Daniel J

2018-02-01

We introduce multi-trait analysis of GWAS (MTAG), a method for joint analysis of summary statistics from genome-wide association studies (GWAS) of different traits, possibly from overlapping samples. We apply MTAG to summary statistics for depressive symptoms (N eff = 354,862), neuroticism (N = 168,105), and subjective well-being (N = 388,538). As compared to the 32, 9, and 13 genome-wide significant loci identified in the single-trait GWAS (most of which are themselves novel), MTAG increases the number of associated loci to 64, 37, and 49, respectively. Moreover, association statistics from MTAG yield more informative bioinformatics analyses and increase the variance explained by polygenic scores by approximately 25%, matching theoretical expectations.
Worry, Intolerance of Uncertainty, and Statistics Anxiety

ERIC Educational Resources Information Center

Williams, Amanda S.

2013-01-01

Statistics anxiety is a problem for most graduate students. This study investigates the relationship between intolerance of uncertainty, worry, and statistics anxiety. Intolerance of uncertainty was significantly related to worry, and worry was significantly related to three types of statistics anxiety. Six types of statistics anxiety were…
Reducing statistics anxiety and enhancing statistics learning achievement: effectiveness of a one-minute strategy.

PubMed

Chiou, Chei-Chang; Wang, Yu-Min; Lee, Li-Tze

2014-08-01

Statistical knowledge is widely used in academia; however, statistics teachers struggle with the issue of how to reduce students' statistics anxiety and enhance students' statistics learning. This study assesses the effectiveness of a "one-minute paper strategy" in reducing students' statistics-related anxiety and in improving students' statistics-related achievement. Participants were 77 undergraduates from two classes enrolled in applied statistics courses. An experiment was implemented according to a pretest/posttest comparison group design. The quasi-experimental design showed that the one-minute paper strategy significantly reduced students' statistics anxiety and improved students' statistics learning achievement. The strategy was a better instructional tool than the textbook exercise for reducing students' statistics anxiety and improving students' statistics achievement.
Statistical learning algorithms for identifying contrasting tillage practices with landsat thematic mapper data

USDA-ARS?s Scientific Manuscript database

Tillage management practices have direct impact on water holding capacity, evaporation, carbon sequestration, and water quality. This study examines the feasibility of two statistical learning algorithms, such as Least Square Support Vector Machine (LSSVM) and Relevance Vector Machine (RVM), for cla...
Performance of cancer cluster Q-statistics for case-control residential histories

PubMed Central

Sloan, Chantel D.; Jacquez, Geoffrey M.; Gallagher, Carolyn M.; Ward, Mary H.; Raaschou-Nielsen, Ole; Nordsborg, Rikke Baastrup; Meliker, Jaymie R.

2012-01-01

Few investigations of health event clustering have evaluated residential mobility, though causative exposures for chronic diseases such as cancer often occur long before diagnosis. Recently developed Q-statistics incorporate human mobility into disease cluster investigations by quantifying space- and time-dependent nearest neighbor relationships. Using residential histories from two cancer case-control studies, we created simulated clusters to examine Q-statistic performance. Results suggest the intersection of cases with significant clustering over their life course, Qi, with cases who are constituents of significant local clusters at given times, Qit, yielded the best performance, which improved with increasing cluster size. Upon comparison, a larger proportion of true positives were detected with Kulldorf’s spatial scan method if the time of clustering was provided. We recommend using Q-statistics to identify when and where clustering may have occurred, followed by the scan method to localize the candidate clusters. Future work should investigate the generalizability of these findings. PMID:23149326
Comparison of untreated adolescent idiopathic scoliosis with normal controls: a review and statistical analysis of the literature.

PubMed

Rushton, Paul R P; Grevitt, Michael P

2013-04-20

Review and statistical analysis of studies evaluating health-related quality of life (HRQOL) in adolescents with untreated adolescent idiopathic scoliosis (AIS) using Scoliosis Research Society (SRS) outcomes. To apply normative values and minimum clinical important differences for the SRS-22r to the literature. Identify whether the HRQOL of adolescents with untreated AIS differs from unaffected peers and whether any differences are clinically relevant. The effect of untreated AIS on adolescent HRQOL is uncertain. The lack of published normative values and minimum clinical important difference for the SRS-22r has so far hindered our interpretation of previous studies. The publication of this background data allows these studies to be re-examined. Using suitable inclusion criteria, a literature search identified studies examining HRQOL in untreated adolescents with AIS. Each cohort was analyzed individually. Statistically significant differences were identified by using 95% confidence intervals for the difference in SRS-22r domain mean scores between the cohorts with AIS and the published data for unaffected adolescents. If the lower bound of the confidence interval was greater than the minimum clinical important difference, the difference was considered clinically significant. Of the 21 included patient cohorts, 81% reported statistically worse pain than those unaffected. Yet in only 5% of cohorts was this difference clinically important. Of the 11 cohorts included examining patient self-image, 91% reported statistically worse scores than those unaffected. In 73% of cohorts this difference was clinically significant. Affected cohorts tended to score well in function/activity and mental health domains and differences from those unaffected rarely reached clinically significant values. Pain and self-image tend to be statistically lower among cohorts with AIS than those unaffected. The literature to date suggests that it is only self-image which consistently differs
Combining a nontargeted and targeted metabolomics approach to identify metabolic pathways significantly altered in polycystic ovary syndrome.

PubMed

Chang, Alice Y; Lalia, Antigoni Z; Jenkins, Gregory D; Dutta, Tumpa; Carter, Rickey E; Singh, Ravinder J; Nair, K Sreekumaran

2017-06-01

Polycystic ovary syndrome (PCOS) is a condition of androgen excess and chronic anovulation frequently associated with insulin resistance. We combined a nontargeted and targeted metabolomics approach to identify pathways and metabolites that distinguished PCOS from metabolic syndrome (MetS). Twenty obese women with PCOS were compared with 18 obese women without PCOS. Both groups met criteria for MetS but could not have diabetes mellitus or take medications that treat PCOS or affect lipids or insulin sensitivity. Insulin sensitivity was derived from the frequently sampled intravenous glucose tolerance test. A nontargeted metabolomics approach was performed on fasting plasma samples to identify differentially expressed metabolites, which were further evaluated by principal component and pathway enrichment analysis. Quantitative targeted metabolomics was then applied on candidate metabolites. Measured metabolites were tested for associations with PCOS and clinical variables by logistic and linear regression analyses. This multiethnic, obese sample was matched by age (PCOS, 37±6; MetS, 40±6years) and body mass index (BMI) (PCOS, 34.6±5.1; MetS, 33.7±5.2kg/m 2 ). Principal component analysis of the nontargeted metabolomics data showed distinct group separation of PCOS from MetS controls. From the subset of 385 differentially expressed metabolites, 22% were identified by accurate mass, resulting in 19 canonical pathways significantly altered in PCOS, including amino acid, lipid, steroid, carbohydrate, and vitamin D metabolism. Targeted metabolomics identified many essential amino acids, including branched-chain amino acids (BCAA) that were elevated in PCOS compared with MetS. PCOS was most associated with BCAA (P=.02), essential amino acids (P=.03), the essential amino acid lysine (P=.02), and the lysine metabolite α-aminoadipic acid (P=.02) in models adjusted for surrogate variables representing technical variation in metabolites. No significant differences between
Combining a Nontargeted and Targeted Metabolomics Approach to Identify Metabolic Pathways Significantly Altered in Polycystic Ovary Syndrome

PubMed Central

Chang, Alice Y.; Lalia, Antigoni Z.; Jenkins, Gregory D.; Dutta, Tumpa; Carter, Rickey E.; Singh, Ravinder J.; Sreekumaran Nair, K.

2017-01-01

Objective Polycystic ovary syndrome (PCOS) is a condition of androgen excess and chronic anovulation frequently associated with insulin resistance. We combined a nontargeted and targeted metabolomics approach to identify pathways and metabolites that distinguished PCOS from metabolic syndrome (MetS). Methods Twenty obese women with PCOS were compared with 18 obese women without PCOS. Both groups met criteria for MetS but could not have diabetes mellitus or take medications that treat PCOS or affect lipids or insulin sensitivity. Insulin sensitivity was derived from the frequently sampled intravenous glucose tolerance test. A nontargeted metabolomics approach was performed on fasting plasma samples to identify differentially expressed metabolites, which were further evaluated by principal component and pathway enrichment analysis. Quantitative targeted metabolomics was then applied on candidate metabolites. Measured metabolites were tested for associations with PCOS and clinical variables by logistic and linear regression analyses. Results This multiethnic, obese sample was matched by age (PCOS, 37 ± 6; MetS, 40 ± 6 years) and body mass index (BMI) (PCOS, 34.6 ± 5.1; MetS, 33.7 ± 5.2 kg/m2). Principal component analysis of the nontargeted metabolomics data showed distinct group separation of PCOS from MetS controls. From the subset of 385 differentially expressed metabolites, 22% were identified by accurate mass, resulting in 19 canonical pathways significantly altered in PCOS, including amino acid, lipid, steroid, carbohydrate, and vitamin D metabolism. Targeted metabolomics identified many essential amino acids, including branched-chain amino acids (BCAA) that were elevated in PCOS compared with MetS. PCOS was most associated with BCAA (P = .02), essential amino acids (P = .03), the essential amino acid lysine (P = .02), and the lysine metabolite α-aminoadipic acid (P = .02) in models adjusted for surrogate variables representing technical variation in
A novel complete-case analysis to determine statistical significance between treatments in an intention-to-treat population of randomized clinical trials involving missing data.

PubMed

Liu, Wei; Ding, Jinhui

2018-04-01

The application of the principle of the intention-to-treat (ITT) to the analysis of clinical trials is challenged in the presence of missing outcome data. The consequences of stopping an assigned treatment in a withdrawn subject are unknown. It is difficult to make a single assumption about missing mechanisms for all clinical trials because there are complicated reactions in the human body to drugs due to the presence of complex biological networks, leading to data missing randomly or non-randomly. Currently there is no statistical method that can tell whether a difference between two treatments in the ITT population of a randomized clinical trial with missing data is significant at a pre-specified level. Making no assumptions about the missing mechanisms, we propose a generalized complete-case (GCC) analysis based on the data of completers. An evaluation of the impact of missing data on the ITT analysis reveals that a statistically significant GCC result implies a significant treatment effect in the ITT population at a pre-specified significance level unless, relative to the comparator, the test drug is poisonous to the non-completers as documented in their medical records. Applications of the GCC analysis are illustrated using literature data, and its properties and limits are discussed.
Genome-wide significant localization for working and spatial memory: Identifying genes for psychosis using models of cognition.

PubMed

Knowles, Emma E M; Carless, Melanie A; de Almeida, Marcio A A; Curran, Joanne E; McKay, D Reese; Sprooten, Emma; Dyer, Thomas D; Göring, Harald H; Olvera, Rene; Fox, Peter; Almasy, Laura; Duggirala, Ravi; Kent, Jack W; Blangero, John; Glahn, David C

2014-01-01

It is well established that risk for developing psychosis is largely mediated by the influence of genes, but identifying precisely which genes underlie that risk has been problematic. Focusing on endophenotypes, rather than illness risk, is one solution to this problem. Impaired cognition is a well-established endophenotype of psychosis. Here we aimed to characterize the genetic architecture of cognition using phenotypically detailed models as opposed to relying on general IQ or individual neuropsychological measures. In so doing we hoped to identify genes that mediate cognitive ability, which might also contribute to psychosis risk. Hierarchical factor models of genetically clustered cognitive traits were subjected to linkage analysis followed by QTL region-specific association analyses in a sample of 1,269 Mexican American individuals from extended pedigrees. We identified four genome wide significant QTLs, two for working and two for spatial memory, and a number of plausible and interesting candidate genes. The creation of detailed models of cognition seemingly enhanced the power to detect genetic effects on cognition and provided a number of possible candidate genes for psychosis. © 2013 Wiley Periodicals, Inc.
Statistical Methods Applied to Gamma-ray Spectroscopy Algorithms in Nuclear Security Missions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fagan, Deborah K.; Robinson, Sean M.; Runkle, Robert C.

2012-10-01

In a wide range of nuclear security missions, gamma-ray spectroscopy is a critical research and development priority. One particularly relevant challenge is the interdiction of special nuclear material for which gamma-ray spectroscopy supports the goals of detecting and identifying gamma-ray sources. This manuscript examines the existing set of spectroscopy methods, attempts to categorize them by the statistical methods on which they rely, and identifies methods that have yet to be considered. Our examination shows that current methods effectively estimate the effect of counting uncertainty but in many cases do not address larger sources of decision uncertainty—ones that are significantly moremore » complex. We thus explore the premise that significantly improving algorithm performance requires greater coupling between the problem physics that drives data acquisition and statistical methods that analyze such data. Untapped statistical methods, such as Bayes Modeling Averaging and hierarchical and empirical Bayes methods have the potential to reduce decision uncertainty by more rigorously and comprehensively incorporating all sources of uncertainty. We expect that application of such methods will demonstrate progress in meeting the needs of nuclear security missions by improving on the existing numerical infrastructure for which these analyses have not been conducted.« less
A method to identify aperiodic disturbances in the ionosphere

NASA Astrophysics Data System (ADS)

Wang, J.-S.; Chen, Z.; Huang, C.-M.

2014-05-01

In this paper, variations in the ionospheric F2 layer's critical frequency are decomposed into their periodic and aperiodic components. The latter include disturbances caused both by geophysical impacts on the ionosphere and random noise. The spectral whitening method (SWM), a signal-processing technique used in statistical estimation and/or detection, was used to identify aperiodic components in the ionosphere. The whitening algorithm adopted herein is used to divide the Fourier transform of the observed data series by a real envelope function. As a result, periodic components are suppressed and aperiodic components emerge as the dominant contributors. Application to a synthetic data set based on significant simulated periodic features of ionospheric observations containing artificial (and, hence, controllable) disturbances was used to validate the SWM for identification of aperiodic components. Although the random noise was somewhat enhanced by post-processing, the artificial disturbances could still be clearly identified. The SWM was then applied to real ionospheric observations. It was found to be more sensitive than the often-used monthly median method to identify geomagnetic effects. In addition, disturbances detected by the SWM were characterized by a Gaussian-type probability density function over all timescales, which further simplifies statistical analysis and suggests that the disturbances thus identified can be compared regardless of timescale.
Statistical analysis of long-term monitoring data for persistent organic pollutants in the atmosphere at 20 monitoring stations broadly indicates declining concentrations.

PubMed

Kong, Deguo; MacLeod, Matthew; Hung, Hayley; Cousins, Ian T

2014-11-04

During recent decades concentrations of persistent organic pollutants (POPs) in the atmosphere have been monitored at multiple stations worldwide. We used three statistical methods to analyze a total of 748 time series of selected POPs in the atmosphere to determine if there are statistically significant reductions in levels of POPs that have had control actions enacted to restrict or eliminate manufacture, use and emissions. Significant decreasing trends were identified in 560 (75%) of the 748 time series collected from the Arctic, North America, and Europe, indicating that the atmospheric concentrations of these POPs are generally decreasing, consistent with the overall effectiveness of emission control actions. Statistically significant trends in synthetic time series could be reliably identified with the improved Mann-Kendall (iMK) test and the digital filtration (DF) technique in time series longer than 5 years. The temporal trends of new (or emerging) POPs in the atmosphere are often unclear because time series are too short. A statistical detrending method based on the iMK test was not able to identify abrupt changes in the rates of decline of atmospheric POP concentrations encoded into synthetic time series.
Normal Distribution of CD8+ T-Cell-Derived ELISPOT Counts within Replicates Justifies the Reliance on Parametric Statistics for Identifying Positive Responses.

PubMed

Karulin, Alexey Y; Caspell, Richard; Dittrich, Marcus; Lehmann, Paul V

2015-03-02

Accurate assessment of positive ELISPOT responses for low frequencies of antigen-specific T-cells is controversial. In particular, it is still unknown whether ELISPOT counts within replicate wells follow a theoretical distribution function, and thus whether high power parametric statistics can be used to discriminate between positive and negative wells. We studied experimental distributions of spot counts for up to 120 replicate wells of IFN-γ production by CD8+ T-cell responding to EBV LMP2A (426 - 434) peptide in human PBMC. The cells were tested in serial dilutions covering a wide range of average spot counts per condition, from just a few to hundreds of spots per well. Statistical analysis of the data using diagnostic Q-Q plots and the Shapiro-Wilk normality test showed that in the entire dynamic range of ELISPOT spot counts within replicate wells followed a normal distribution. This result implies that the Student t-Test and ANOVA are suited to identify positive responses. We also show experimentally that borderline responses can be reliably detected by involving more replicate wells, plating higher numbers of PBMC, addition of IL-7, or a combination of these. Furthermore, we have experimentally verified that the number of replicates needed for detection of weak responses can be calculated using parametric statistics.

A scan statistic for identifying optimal risk windows in vaccine safety studies using self-controlled case series design.

PubMed

Xu, Stanley; Hambidge, Simon J; McClure, David L; Daley, Matthew F; Glanz, Jason M

2013-08-30

In the examination of the association between vaccines and rare adverse events after vaccination in postlicensure observational studies, it is challenging to define appropriate risk windows because prelicensure RCTs provide little insight on the timing of specific adverse events. Past vaccine safety studies have often used prespecified risk windows based on prior publications, biological understanding of the vaccine, and expert opinion. Recently, a data-driven approach was developed to identify appropriate risk windows for vaccine safety studies that use the self-controlled case series design. This approach employs both the maximum incidence rate ratio and the linear relation between the estimated incidence rate ratio and the inverse of average person time at risk, given a specified risk window. In this paper, we present a scan statistic that can identify appropriate risk windows in vaccine safety studies using the self-controlled case series design while taking into account the dependence of time intervals within an individual and while adjusting for time-varying covariates such as age and seasonality. This approach uses the maximum likelihood ratio test based on fixed-effects models, which has been used for analyzing data from self-controlled case series design in addition to conditional Poisson models. Copyright © 2013 John Wiley & Sons, Ltd.
The use of the temporal scan statistic to detect methicillin-resistant Staphylococcus aureus clusters in a community hospital.

PubMed

Faires, Meredith C; Pearl, David L; Ciccotelli, William A; Berke, Olaf; Reid-Smith, Richard J; Weese, J Scott

2014-07-08

In healthcare facilities, conventional surveillance techniques using rule-based guidelines may result in under- or over-reporting of methicillin-resistant Staphylococcus aureus (MRSA) outbreaks, as these guidelines are generally unvalidated. The objectives of this study were to investigate the utility of the temporal scan statistic for detecting MRSA clusters, validate clusters using molecular techniques and hospital records, and determine significant differences in the rate of MRSA cases using regression models. Patients admitted to a community hospital between August 2006 and February 2011, and identified with MRSA>48 hours following hospital admission, were included in this study. Between March 2010 and February 2011, MRSA specimens were obtained for spa typing. MRSA clusters were investigated using a retrospective temporal scan statistic. Tests were conducted on a monthly scale and significant clusters were compared to MRSA outbreaks identified by hospital personnel. Associations between the rate of MRSA cases and the variables year, month, and season were investigated using a negative binomial regression model. During the study period, 735 MRSA cases were identified and 167 MRSA isolates were spa typed. Nine different spa types were identified with spa type 2/t002 (88.6%) the most prevalent. The temporal scan statistic identified significant MRSA clusters at the hospital (n=2), service (n=16), and ward (n=10) levels (P ≤ 0.05). Seven clusters were concordant with nine MRSA outbreaks identified by hospital staff. For the remaining clusters, seven events may have been equivalent to true outbreaks and six clusters demonstrated possible transmission events. The regression analysis indicated years 2009-2011, compared to 2006, and months March and April, compared to January, were associated with an increase in the rate of MRSA cases (P ≤ 0.05). The application of the temporal scan statistic identified several MRSA clusters that were not detected by hospital
50 CFR 600.410 - Collection and maintenance of statistics.

Code of Federal Regulations, 2011 CFR

2011-10-01

... 50 Wildlife and Fisheries 10 2011-10-01 2011-10-01 false Collection and maintenance of statistics... of Statistics § 600.410 Collection and maintenance of statistics. (a) General. (1) All statistics..., the Assistant Administrator will remove all identifying particulars from the statistics if doing so is...
50 CFR 600.410 - Collection and maintenance of statistics.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 50 Wildlife and Fisheries 8 2010-10-01 2010-10-01 false Collection and maintenance of statistics... of Statistics § 600.410 Collection and maintenance of statistics. (a) General. (1) All statistics..., the Assistant Administrator will remove all identifying particulars from the statistics if doing so is...
Use of statistical procedures in Brazilian and international dental journals.

PubMed

Ambrosano, Gláucia Maria Bovi; Reis, André Figueiredo; Giannini, Marcelo; Pereira, Antônio Carlos

2004-01-01

A descriptive survey was performed in order to assess the statistical content and quality of Brazilian and international dental journals, and compare their evolution throughout the last decades. The authors identified the reporting and accuracy of statistical techniques in 1000 papers published from 1970 to 2000 in seven dental journals: three Brazilian (Brazilian Dental Journal, Revista de Odontologia da Universidade de Sao Paulo and Revista de Odontologia da UNESP) and four international journals (Journal of the American Dental Association, Journal of Dental Research, Caries Research and Journal of Periodontology). Papers were divided into two time periods: from 1970 to 1989, and from 1990 to 2000. A slight increase in the number of articles that presented some form of statistical technique was noticed for Brazilian journals (from 61.0 to 66.7%), whereas for international journals, a significant increase was observed (65.8 to 92.6%). In addition, a decrease in the number of statistical errors was verified. The most commonly used statistical tests as well as the most frequent errors found in dental journals were assessed. Hopefully, this investigation will encourage dental educators to better plan the teaching of biostatistics, and to improve the statistical quality of submitted manuscripts.
Statistical Characteristics of Single Sort of Grape Bulgarian Wines

NASA Astrophysics Data System (ADS)

Boyadzhiev, D.

2008-10-01

The aim of this paper is to evaluate the differences in the values of the 8 basic physicochemical indices of single sort of grape Bulgarian wines (white and red ones), obligatory for the standardization of ready production in the winery. Statistically significant differences in the values of various sorts and vintages are established and possibilities for identifying the sort and the vintage on the base of these indices by applying discriminant analysis are discussed.
Identifying cases of undiagnosed, clinically significant COPD in primary care: qualitative insight from patients in the target population

PubMed Central

Leidy, Nancy K; Kim, Katherine; Bacci, Elizabeth D; Yawn, Barbara P; Mannino, David M; Thomashow, Byron M; Barr, R Graham; Rennard, Stephen I; Houfek, Julia F; Han, Meilan K; Meldrum, Catherine A; Make, Barry J; Bowler, Russ P; Steenrod, Anna W; Murray, Lindsey T; Walsh, John W; Martinez, Fernando

2015-01-01

Background: Many cases of chronic obstructive pulmonary disease (COPD) are diagnosed only after significant loss of lung function or during exacerbations. Aims: This study is part of a multi-method approach to develop a new screening instrument for identifying undiagnosed, clinically significant COPD in primary care. Methods: Subjects with varied histories of COPD diagnosis, risk factors and history of exacerbations were recruited through five US clinics (four pulmonary, one primary care). Phase I: Eight focus groups and six telephone interviews were conducted to elicit descriptions of risk factors for COPD, recent or historical acute respiratory events, and symptoms to inform the development of candidate items for the new questionnaire. Phase II: A new cohort of subjects participated in cognitive interviews to assess and modify candidate items. Two peak expiratory flow (PEF) devices (electronic, manual) were assessed for use in screening. Results: Of 77 subjects, 50 participated in Phase I and 27 in Phase II. Six themes informed item development: exposure (smoking, second-hand smoke); health history (family history of lung problems, recurrent chest infections); recent history of respiratory events (clinic visits, hospitalisations); symptoms (respiratory, non-respiratory); impact (activity limitations); and attribution (age, obesity). PEF devices were rated easy to use; electronic values were significantly higher than manual (P<0.0001). Revisions were made to the draft items on the basis of cognitive interviews. Conclusions: Forty-eight candidate items are ready for quantitative testing to select the best, smallest set of questions that, together with PEF, can efficiently identify patients in need of diagnostic evaluation for clinically significant COPD. PMID:26028486
Radar error statistics for the space shuttle

NASA Technical Reports Server (NTRS)

Lear, W. M.

1979-01-01

Radar error statistics of C-band and S-band that are recommended for use with the groundtracking programs to process space shuttle tracking data are presented. The statistics are divided into two parts: bias error statistics, using the subscript B, and high frequency error statistics, using the subscript q. Bias errors may be slowly varying to constant. High frequency random errors (noise) are rapidly varying and may or may not be correlated from sample to sample. Bias errors were mainly due to hardware defects and to errors in correction for atmospheric refraction effects. High frequency noise was mainly due to hardware and due to atmospheric scintillation. Three types of atmospheric scintillation were identified: horizontal, vertical, and line of sight. This was the first time that horizontal and line of sight scintillations were identified.
Analysis of variance to assess statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes.

PubMed

Makeyev, Oleksandr; Joe, Cody; Lee, Colin; Besio, Walter G

2017-07-01

Concentric ring electrodes have shown promise in non-invasive electrophysiological measurement demonstrating their superiority to conventional disc electrodes, in particular, in accuracy of Laplacian estimation. Recently, we have proposed novel variable inter-ring distances concentric ring electrodes. Analytic and finite element method modeling results for linearly increasing distances electrode configurations suggested they may decrease the truncation error resulting in more accurate Laplacian estimates compared to currently used constant inter-ring distances configurations. This study assesses statistical significance of Laplacian estimation accuracy improvement due to novel variable inter-ring distances concentric ring electrodes. Full factorial design of analysis of variance was used with one categorical and two numerical factors: the inter-ring distances, the electrode diameter, and the number of concentric rings in the electrode. The response variables were the Relative Error and the Maximum Error of Laplacian estimation computed using a finite element method model for each of the combinations of levels of three factors. Effects of the main factors and their interactions on Relative Error and Maximum Error were assessed and the obtained results suggest that all three factors have statistically significant effects in the model confirming the potential of using inter-ring distances as a means of improving accuracy of Laplacian estimation.
28 CFR 22.22 - Revelation of identifiable data.

Code of Federal Regulations, 2010 CFR

2010-07-01

... STATISTICAL INFORMATION § 22.22 Revelation of identifiable data. (a) Except as noted in paragraph (b) of this section, research and statistical information relating to a private person may be revealed in identifiable... sections 223(a)(12)(A), 223(a)(13), 223(a)(14), and 243 of the Juvenile Justice and Delinquency Prevention...
Accidents in Malaysian construction industry: statistical data and court cases.

PubMed

Chong, Heap Yih; Low, Thuan Siang

2014-01-01

Safety and health issues remain critical to the construction industry due to its working environment and the complexity of working practises. This research attempts to adopt 2 research approaches using statistical data and court cases to address and identify the causes and behavior underlying construction safety and health issues in Malaysia. Factual data on the period of 2000-2009 were retrieved to identify the causes and agents that contributed to health issues. Moreover, court cases were tabulated and analyzed to identify legal patterns of parties involved in construction site accidents. Approaches of this research produced consistent results and highlighted a significant reduction in the rate of accidents per construction project in Malaysia.
Statistical analysis of Geopotential Height (GH) timeseries based on Tsallis non-extensive statistical mechanics

NASA Astrophysics Data System (ADS)

Karakatsanis, L. P.; Iliopoulos, A. C.; Pavlos, E. G.; Pavlos, G. P.

2018-02-01

In this paper, we perform statistical analysis of time series deriving from Earth's climate. The time series are concerned with Geopotential Height (GH) and correspond to temporal and spatial components of the global distribution of month average values, during the period (1948-2012). The analysis is based on Tsallis non-extensive statistical mechanics and in particular on the estimation of Tsallis' q-triplet, namely {qstat, qsens, qrel}, the reconstructed phase space and the estimation of correlation dimension and the Hurst exponent of rescaled range analysis (R/S). The deviation of Tsallis q-triplet from unity indicates non-Gaussian (Tsallis q-Gaussian) non-extensive character with heavy tails probability density functions (PDFs), multifractal behavior and long range dependences for all timeseries considered. Also noticeable differences of the q-triplet estimation found in the timeseries at distinct local or temporal regions. Moreover, in the reconstructive phase space revealed a lower-dimensional fractal set in the GH dynamical phase space (strong self-organization) and the estimation of Hurst exponent indicated multifractality, non-Gaussianity and persistence. The analysis is giving significant information identifying and characterizing the dynamical characteristics of the earth's climate.
Evaluating statistical and clinical significance of intervention effects in single-case experimental designs: an SPSS method to analyze univariate data.

PubMed

Maric, Marija; de Haan, Else; Hogendoorn, Sanne M; Wolters, Lidewij H; Huizenga, Hilde M

2015-03-01

Single-case experimental designs are useful methods in clinical research practice to investigate individual client progress. Their proliferation might have been hampered by methodological challenges such as the difficulty applying existing statistical procedures. In this article, we describe a data-analytic method to analyze univariate (i.e., one symptom) single-case data using the common package SPSS. This method can help the clinical researcher to investigate whether an intervention works as compared with a baseline period or another intervention type, and to determine whether symptom improvement is clinically significant. First, we describe the statistical method in a conceptual way and show how it can be implemented in SPSS. Simulation studies were performed to determine the number of observation points required per intervention phase. Second, to illustrate this method and its implications, we present a case study of an adolescent with anxiety disorders treated with cognitive-behavioral therapy techniques in an outpatient psychotherapy clinic, whose symptoms were regularly assessed before each session. We provide a description of the data analyses and results of this case study. Finally, we discuss the advantages and shortcomings of the proposed method. Copyright © 2014. Published by Elsevier Ltd.
Types of pelvic floor dysfunctions in nulliparous, vaginal delivery, and cesarean section female patients with obstructed defecation syndrome identified by echodefecography.

PubMed

Murad-Regadas, Sthela M; Regadas, Francisco Sérgio P; Rodrigues, Lusmar V; Oliveira, Leticia; Barreto, Rosilma G L; de Souza, Marcellus H L P; Silva, Flavio Roberto S

2009-10-01

This study aims to show pelvic floor dysfunctions in women with obstructed defecation syndrome (ODS), comparing nulliparous to those with vaginal delivery or cesarean section using the echodefecography (ECD). Three hundred seventy female patients with ODS were reviewed retrospectively and were divided in Group I-105 nulliparous, Group II-165 had at least one vaginal delivery, and Group III-comprised of 100 patients delivered only by cesarean section. All patients had been submitted to ECD to identify pelvic floor dysfunctions. No statistical significance was found between the groups with regard to anorectocele grade. Intussusception was identified in 40% from G I, 55.0% from G II, and 30.0% from G III, with statistical significance between Groups I and II. Intussusception was associated with significant anorectocele in 24.8%, 36.3%, and 18% patients from G I, II, and III, respectively. Anismus was identified in 39.0% from G I, 28.5% from G II, and 60% from G III, with statistical significance between Groups I and III. Anismus was associated with significant anorectocele in 22.8%, 15.7%, and 24% patients from G I, II, and III, respectively. Sigmoidocele/enterocele was identified in 7.6% from G I, 10.9% G II, and was associated with significant rectocele in 3.8% and 7.3% patients from G I and II, respectively. The distribution of pelvic floor dysfunctions showed no specific pattern across the groups, suggesting the absence of a correlation between these dysfunctions and vaginal delivery.
Performance studies of GooFit on GPUs vs RooFit on CPUs while estimating the statistical significance of a new physical signal

NASA Astrophysics Data System (ADS)

Di Florio, Adriano

2017-10-01

In order to test the computing capabilities of GPUs with respect to traditional CPU cores a high-statistics toy Monte Carlo technique has been implemented both in ROOT/RooFit and GooFit frameworks with the purpose to estimate the statistical significance of the structure observed by CMS close to the kinematical boundary of the J/ψϕ invariant mass in the three-body decay B + → J/ψϕK +. GooFit is a data analysis open tool under development that interfaces ROOT/RooFit to CUDA platform on nVidia GPU. The optimized GooFit application running on GPUs hosted by servers in the Bari Tier2 provides striking speed-up performances with respect to the RooFit application parallelised on multiple CPUs by means of PROOF-Lite tool. The considerable resulting speed-up, evident when comparing concurrent GooFit processes allowed by CUDA Multi Process Service and a RooFit/PROOF-Lite process with multiple CPU workers, is presented and discussed in detail. By means of GooFit it has also been possible to explore the behaviour of a likelihood ratio test statistic in different situations in which the Wilks Theorem may or may not apply because its regularity conditions are not satisfied.
Spatial statistical analysis of tree deaths using airborne digital imagery

NASA Astrophysics Data System (ADS)

Chang, Ya-Mei; Baddeley, Adrian; Wallace, Jeremy; Canci, Michael

2013-04-01

High resolution digital airborne imagery offers unprecedented opportunities for observation and monitoring of vegetation, providing the potential to identify, locate and track individual vegetation objects over time. Analytical tools are required to quantify relevant information. In this paper, locations of trees over a large area of native woodland vegetation were identified using morphological image analysis techniques. Methods of spatial point process statistics were then applied to estimate the spatially-varying tree death risk, and to show that it is significantly non-uniform. [Tree deaths over the area were detected in our previous work (Wallace et al., 2008).] The study area is a major source of ground water for the city of Perth, and the work was motivated by the need to understand and quantify vegetation changes in the context of water extraction and drying climate. The influence of hydrological variables on tree death risk was investigated using spatial statistics (graphical exploratory methods, spatial point pattern modelling and diagnostics).
Evaluation of undergraduate nursing students' attitudes towards statistics courses, before and after a course in applied statistics.

PubMed

Hagen, Brad; Awosoga, Olu; Kellett, Peter; Dei, Samuel Ofori

2013-09-01

Undergraduate nursing students must often take a course in statistics, yet there is scant research to inform teaching pedagogy. The objectives of this study were to assess nursing students' overall attitudes towards statistics courses - including (among other things) overall fear and anxiety, preferred learning and teaching styles, and the perceived utility and benefit of taking a statistics course - before and after taking a mandatory course in applied statistics. The authors used a pre-experimental research design (a one-group pre-test/post-test research design), by administering a survey to nursing students at the beginning and end of the course. The study was conducted at a University in Western Canada that offers an undergraduate Bachelor of Nursing degree. Participants included 104 nursing students, in the third year of a four-year nursing program, taking a course in statistics. Although students only reported moderate anxiety towards statistics, student anxiety about statistics had dropped by approximately 40% by the end of the course. Students also reported a considerable and positive change in their attitudes towards learning in groups by the end of the course, a potential reflection of the team-based learning that was used. Students identified preferred learning and teaching approaches, including the use of real-life examples, visual teaching aids, clear explanations, timely feedback, and a well-paced course. Students also identified preferred instructor characteristics, such as patience, approachability, in-depth knowledge of statistics, and a sense of humor. Unfortunately, students only indicated moderate agreement with the idea that statistics would be useful and relevant to their careers, even by the end of the course. Our findings validate anecdotal reports on statistics teaching pedagogy, although more research is clearly needed, particularly on how to increase students' perceptions of the benefit and utility of statistics courses for their nursing
The significance of spatial resolution: Identifying forest cover from satellite data

Treesearch

Dumitru Salajanu; Charles E. Olson

2001-01-01

Twenty-five years ago, a National Academy of Sciences report identified species identification as a requirement if satellite data are to reach their full potential in forest inventory and monitoring; the report suggested that improving spatial resolution to 10 meters would probably be required (Committee on Remote Sensing Programs for Earth Resource Surveys [CORSPERS]...
The effect of system nonlinearities on system noise statistics

NASA Technical Reports Server (NTRS)

Robinson, L. H., Jr.

1971-01-01

The effects are studied of nonlinearities in a baseline communications system on the system noise amplitude statistics. So that a meaningful identification of system nonlinearities can be made, the baseline system is assumed to transmit a single biphase-modulated signal through a relay satellite to the receiving equipment. The significant nonlinearities thus identified include square-law or product devices (e.g., in the carrier reference recovery loops in the receivers), bandpass limiters, and traveling wave tube amplifiers.
Mathematical Anxiety among Business Statistics Students.

ERIC Educational Resources Information Center

High, Robert V.

A survey instrument was developed to identify sources of mathematics anxiety among undergraduate business students in a statistics class. A number of statistics classes were selected at two colleges in Long Island, New York. A final sample of n=102 respondents indicated that there was a relationship between the mathematics grade in prior…

Students' attitudes towards learning statistics

NASA Astrophysics Data System (ADS)

Ghulami, Hassan Rahnaward; Hamid, Mohd Rashid Ab; Zakaria, Roslinazairimah

2015-05-01

Positive attitude towards learning is vital in order to master the core content of the subject matters under study. This is unexceptional in learning statistics course especially at the university level. Therefore, this study investigates the students' attitude towards learning statistics. Six variables or constructs have been identified such as affect, cognitive competence, value, difficulty, interest, and effort. The instrument used for the study is questionnaire that was adopted and adapted from the reliable instrument of Survey of Attitudes towards Statistics(SATS©). This study is conducted to engineering undergraduate students in one of the university in the East Coast of Malaysia. The respondents consist of students who were taking the applied statistics course from different faculties. The results are analysed in terms of descriptive analysis and it contributes to the descriptive understanding of students' attitude towards the teaching and learning process of statistics.
Statistical Significance and Baseline Monitoring.

DTIC Science & Technology

1984-07-01

impacted at once........................... 24 6 Observed versus nominal a levels for multivariate tests of data sets (50 runs of 4 groups each...cumulative proportion of the observations found for each nominal level. The results of the comparisons of the observed versus nominal a levels for the...a values are always higher than nominal levels. Virtual- . .,ly all nominal a levels are below 0.20. In other words, the discriminant analysis models
Statistics used in current nursing research.

PubMed

Zellner, Kathleen; Boerst, Connie J; Tabb, Wil

2007-02-01

Undergraduate nursing research courses should emphasize the statistics most commonly used in the nursing literature to strengthen students' and beginning researchers' understanding of them. To determine the most commonly used statistics, we reviewed all quantitative research articles published in 13 nursing journals in 2000. The findings supported Beitz's categorization of kinds of statistics. Ten primary statistics used in 80% of nursing research published in 2000 were identified. We recommend that the appropriate use of those top 10 statistics be emphasized in undergraduate nursing education and that the nursing profession continue to advocate for the use of methods (e.g., power analysis, odds ratio) that may contribute to the advancement of nursing research.
Teaching Statistics in Biology: Using Inquiry-based Learning to Strengthen Understanding of Statistical Analysis in Biology Laboratory Courses

PubMed Central

2008-01-01

There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study. PMID:18765754
Teaching statistics in biology: using inquiry-based learning to strengthen understanding of statistical analysis in biology laboratory courses.

PubMed

Metz, Anneke M

2008-01-01

There is an increasing need for students in the biological sciences to build a strong foundation in quantitative approaches to data analyses. Although most science, engineering, and math field majors are required to take at least one statistics course, statistical analysis is poorly integrated into undergraduate biology course work, particularly at the lower-division level. Elements of statistics were incorporated into an introductory biology course, including a review of statistics concepts and opportunity for students to perform statistical analysis in a biological context. Learning gains were measured with an 11-item statistics learning survey instrument developed for the course. Students showed a statistically significant 25% (p < 0.005) increase in statistics knowledge after completing introductory biology. Students improved their scores on the survey after completing introductory biology, even if they had previously completed an introductory statistics course (9%, improvement p < 0.005). Students retested 1 yr after completing introductory biology showed no loss of their statistics knowledge as measured by this instrument, suggesting that the use of statistics in biology course work may aid long-term retention of statistics knowledge. No statistically significant differences in learning were detected between male and female students in the study.
Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds.

PubMed

Bertolini, F; Galimberti, G; Schiavo, G; Mastrangelo, S; Di Gerlando, R; Strillacci, M G; Bagnato, A; Portolano, B; Fontanesi, L

2018-01-01

Commercial single nucleotide polymorphism (SNP) arrays have been recently developed for several species and can be used to identify informative markers to differentiate breeds or populations for several downstream applications. To identify the most discriminating genetic markers among thousands of genotyped SNPs, a few statistical approaches have been proposed. In this work, we compared several methods of SNPs preselection (Delta, F st and principal component analyses (PCA)) in addition to Random Forest classifications to analyse SNP data from six dairy cattle breeds, including cosmopolitan (Holstein, Brown and Simmental) and autochthonous Italian breeds raised in two different regions and subjected to limited or no breeding programmes (Cinisara, Modicana, raised only in Sicily and Reggiana, raised only in Emilia Romagna). From these classifications, two panels of 96 and 48 SNPs that contain the most discriminant SNPs were created for each preselection method. These panels were evaluated in terms of the ability to discriminate as a whole and breed-by-breed, as well as linkage disequilibrium within each panel. The obtained results showed that for the 48-SNP panel, the error rate increased mainly for autochthonous breeds, probably as a consequence of their admixed origin lower selection pressure and by ascertaining bias in the construction of the SNP chip. The 96-SNP panels were generally more able to discriminate all breeds. The panel derived by PCA-chrom (obtained by a preselection chromosome by chromosome) could identify informative SNPs that were particularly useful for the assignment of minor breeds that reached the lowest value of Out Of Bag error even in the Cinisara, whose value was quite high in all other panels. Moreover, this panel contained also the lowest number of SNPs in linkage disequilibrium. Several selected SNPs are located nearby genes affecting breed-specific phenotypic traits (coat colour and stature) or associated with production traits. In
Genome-wide association study identifies multiple loci associated with bladder cancer risk

PubMed Central

Figueroa, Jonine D.; Ye, Yuanqing; Siddiq, Afshan; Garcia-Closas, Montserrat; Chatterjee, Nilanjan; Prokunina-Olsson, Ludmila; Cortessis, Victoria K.; Kooperberg, Charles; Cussenot, Olivier; Benhamou, Simone; Prescott, Jennifer; Porru, Stefano; Dinney, Colin P.; Malats, Núria; Baris, Dalsu; Purdue, Mark; Jacobs, Eric J.; Albanes, Demetrius; Wang, Zhaoming; Deng, Xiang; Chung, Charles C.; Tang, Wei; Bas Bueno-de-Mesquita, H.; Trichopoulos, Dimitrios; Ljungberg, Börje; Clavel-Chapelon, Françoise; Weiderpass, Elisabete; Krogh, Vittorio; Dorronsoro, Miren; Travis, Ruth; Tjønneland, Anne; Brenan, Paul; Chang-Claude, Jenny; Riboli, Elio; Conti, David; Gago-Dominguez, Manuela; Stern, Mariana C.; Pike, Malcolm C.; Van Den Berg, David; Yuan, Jian-Min; Hohensee, Chancellor; Rodabough, Rebecca; Cancel-Tassin, Geraldine; Roupret, Morgan; Comperat, Eva; Chen, Constance; De Vivo, Immaculata; Giovannucci, Edward; Hunter, David J.; Kraft, Peter; Lindstrom, Sara; Carta, Angela; Pavanello, Sofia; Arici, Cecilia; Mastrangelo, Giuseppe; Kamat, Ashish M.; Lerner, Seth P.; Barton Grossman, H.; Lin, Jie; Gu, Jian; Pu, Xia; Hutchinson, Amy; Burdette, Laurie; Wheeler, William; Kogevinas, Manolis; Tardón, Adonina; Serra, Consol; Carrato, Alfredo; García-Closas, Reina; Lloreta, Josep; Schwenn, Molly; Karagas, Margaret R.; Johnson, Alison; Schned, Alan; Armenti, Karla R.; Hosain, G.M.; Andriole, Gerald; Grubb, Robert; Black, Amanda; Ryan Diver, W.; Gapstur, Susan M.; Weinstein, Stephanie J.; Virtamo, Jarmo; Haiman, Chris A.; Landi, Maria T.; Caporaso, Neil; Fraumeni, Joseph F.; Vineis, Paolo; Wu, Xifeng; Silverman, Debra T.; Chanock, Stephen; Rothman, Nathaniel

2014-01-01

Candidate gene and genome-wide association studies (GWAS) have identified 11 independent susceptibility loci associated with bladder cancer risk. To discover additional risk variants, we conducted a new GWAS of 2422 bladder cancer cases and 5751 controls, followed by a meta-analysis with two independently published bladder cancer GWAS, resulting in a combined analysis of 6911 cases and 11 814 controls of European descent. TaqMan genotyping of 13 promising single nucleotide polymorphisms with P < 1 × 10−5 was pursued in a follow-up set of 801 cases and 1307 controls. Two new loci achieved genome-wide statistical significance: rs10936599 on 3q26.2 (P = 4.53 × 10−9) and rs907611 on 11p15.5 (P = 4.11 × 10−8). Two notable loci were also identified that approached genome-wide statistical significance: rs6104690 on 20p12.2 (P = 7.13 × 10−7) and rs4510656 on 6p22.3 (P = 6.98 × 10−7); these require further studies for confirmation. In conclusion, our study has identified new susceptibility alleles for bladder cancer risk that require fine-mapping and laboratory investigation, which could further understanding into the biological underpinnings of bladder carcinogenesis. PMID:24163127
PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables.

PubMed

Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C; Downing, James R; Lamba, Jatinder

2009-08-15

In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org.
A study of statistics anxiety levels of graduate dental hygiene students.

PubMed

Welch, Paul S; Jacks, Mary E; Smiley, Lynn A; Walden, Carolyn E; Clark, William D; Nguyen, Carol A

2015-02-01

In light of increased emphasis on evidence-based practice in the profession of dental hygiene, it is important that today's dental hygienist comprehend statistical measures to fully understand research articles, and thereby apply scientific evidence to practice. Therefore, the purpose of this study was to investigate statistics anxiety among graduate dental hygiene students in the U.S. A web-based self-report, anonymous survey was emailed to directors of 17 MSDH programs in the U.S. with a request to distribute to graduate students. The survey collected data on statistics anxiety, sociodemographic characteristics and evidence-based practice. Statistic anxiety was assessed using the Statistical Anxiety Rating Scale. Study significance level was α=0.05. Only 8 of the 17 invited programs participated in the study. Statistical Anxiety Rating Scale data revealed graduate dental hygiene students experience low to moderate levels of statistics anxiety. Specifically, the level of anxiety on the Interpretation Anxiety factor indicated this population could struggle with making sense of scientific research. A decisive majority (92%) of students indicated statistics is essential for evidence-based practice and should be a required course for all dental hygienists. This study served to identify statistics anxiety in a previously unexplored population. The findings should be useful in both theory building and in practical applications. Furthermore, the results can be used to direct future research. Copyright © 2015 The American Dental Hygienists’ Association.
When is Chemical Similarity Significant? The Statistical Distribution of Chemical Similarity Scores and Its Extreme Values

PubMed Central

Baldi, Pierre

2010-01-01

As repositories of chemical molecules continue to expand and become more open, it becomes increasingly important to develop tools to search them efficiently and assess the statistical significance of chemical similarity scores. Here we develop a general framework for understanding, modeling, predicting, and approximating the distribution of chemical similarity scores and its extreme values in large databases. The framework can be applied to different chemical representations and similarity measures but is demonstrated here using the most common binary fingerprints with the Tanimoto similarity measure. After introducing several probabilistic models of fingerprints, including the Conditional Gaussian Uniform model, we show that the distribution of Tanimoto scores can be approximated by the distribution of the ratio of two correlated Normal random variables associated with the corresponding unions and intersections. This remains true also when the distribution of similarity scores is conditioned on the size of the query molecules in order to derive more fine-grained results and improve chemical retrieval. The corresponding extreme value distributions for the maximum scores are approximated by Weibull distributions. From these various distributions and their analytical forms, Z-scores, E-values, and p-values are derived to assess the significance of similarity scores. In addition, the framework allows one to predict also the value of standard chemical retrieval metrics, such as Sensitivity and Specificity at fixed thresholds, or ROC (Receiver Operating Characteristic) curves at multiple thresholds, and to detect outliers in the form of atypical molecules. Numerous and diverse experiments carried in part with large sets of molecules from the ChemDB show remarkable agreement between theory and empirical results. PMID:20540577
Meteor trail footprint statistics

NASA Astrophysics Data System (ADS)

Mui, S. Y.; Ellicott, R. C.

Footprint statistics derived from field-test data are presented. The statistics are the probability that two receivers will lie in the same footprint. The dependence of the footprint statistics on the transmitter range, link orientation, and antenna polarization are examined. Empirical expressions for the footprint statistics are presented. The need to distinguish the instantaneous footprint, which is the area illuminated at a particular instant, from the composite footprint, which is the total area illuminated during the lifetime of the meteor trail, is explained. The statistics for the instantaneous and composite footprints have been found to be similar. The only significant difference lies in the parameter that represents the probability of two colocated receivers being in the same footprint. The composite footprint statistics can be used to calculate the space diversity gain of a multiple-receiver system. The instantaneous footprint statistics are useful in the evaluation of the interference probability in a network of meteor burst communication nodes.
Genome-wide Association Study Identifies New Loci for Resistance to Leptosphaeria maculans in Canola

PubMed Central

Raman, Harsh; Raman, Rosy; Coombes, Neil; Song, Jie; Diffey, Simon; Kilian, Andrzej; Lindbeck, Kurt; Barbulescu, Denise M.; Batley, Jacqueline; Edwards, David; Salisbury, Phil A.; Marcroft, Steve

2016-01-01

Key message “We identified both quantitative and quantitative resistance loci to Leptosphaeria maculans, a fungal pathogen, causing blackleg disease in canola. Several genome-wide significant associations were detected at known and new loci for blackleg resistance. We further validated statistically significant associations in four genetic mapping populations, demonstrating that GWAS marker loci are indeed associated with resistance to L. maculans. One of the novel loci identified for the first time, Rlm12, conveys adult plant resistance in canola.” Blackleg, caused by Leptosphaeria maculans, is a significant disease which affects the sustainable production of canola (Brassica napus). This study reports a genome-wide association study based on 18,804 polymorphic SNPs to identify loci associated with qualitative and quantitative resistance to L. maculans. Genomic regions delimited with 694 significant SNP markers, that are associated with resistance evaluated using 12 single spore isolates and pathotypes from four canola stubble were identified. Several significant associations were detected at known disease resistance loci including in the vicinity of recently cloned Rlm2/LepR3 genes, and at new loci on chromosomes A01/C01, A02/C02, A03/C03, A05/C05, A06, A08, and A09. In addition, we validated statistically significant associations on A01, A07, and A10 in four genetic mapping populations, demonstrating that GWAS marker loci are indeed associated with resistance to L. maculans. One of the novel loci identified for the first time, Rlm12, conveys adult plant resistance and mapped within 13.2 kb from Arabidopsis R gene of TIR-NBS class. We showed that resistance loci are located in the vicinity of R genes of Arabidopsis thaliana and Brassica napus on the sequenced genome of B. napus cv. Darmor-bzh. Significantly associated SNP markers provide a valuable tool to enrich germplasm for favorable alleles in order to improve the level of resistance to L. maculans in
Statistical lamb wave localization based on extreme value theory

NASA Astrophysics Data System (ADS)

Harley, Joel B.

2018-04-01

Guided wave localization methods based on delay-and-sum imaging, matched field processing, and other techniques have been designed and researched to create images that locate and describe structural damage. The maximum value of these images typically represent an estimated damage location. Yet, it is often unclear if this maximum value, or any other value in the image, is a statistically significant indicator of damage. Furthermore, there are currently few, if any, approaches to assess the statistical significance of guided wave localization images. As a result, we present statistical delay-and-sum and statistical matched field processing localization methods to create statistically significant images of damage. Our framework uses constant rate of false alarm statistics and extreme value theory to detect damage with little prior information. We demonstrate our methods with in situ guided wave data from an aluminum plate to detect two 0.75 cm diameter holes. Our results show an expected improvement in statistical significance as the number of sensors increase. With seventeen sensors, both methods successfully detect damage with statistical significance.
The significance of organ prolapse in gastroschisis.

PubMed

Koehler, Shannon M; Szabo, Aniko; Loichinger, Matt; Peterson, Erika; Christensen, Melissa; Wagner, Amy J

2017-12-01

The aim of this study was to evaluate the incidence and importance of organ prolapse (stomach, bladder, reproductive organs) in gastroschisis. This is a retrospective review of gastroschisis patients from 2000 to 2014 at a single tertiary institution. Statistical analysis was performed using a chi-square test, Student's t test, log-rank test, or Cox regression analysis models. All tests were conducted as two-tailed tests, and p-values <0.05 were considered statistically significant. One hundred seventy-one gastroschisis patients were identified. Sixty-nine (40.6%) had at least one prolapsed organ besides bowel. The most commonly prolapsed organs were stomach (n=45, 26.3%), reproductive organs (n=34, 19.9%), and bladder (n=15, 8.8%). Patients with prolapsed organs were more likely to have simple gastroschisis with significant decreases in the rate of atresia and necrosis/perforation. They progressed to earlier enteral feeds, discontinuation of parenteral nutrition, and discharge. Likewise, these patients were less likely to have complications such as central line infections, sepsis, and short gut syndrome. Gastroschisis is typically described as isolated bowel herniation, but a large portion have prolapse of other organs. Prolapsed organs are associated with simple gastroschisis, and improved outcomes most likely due to a larger fascial defect. This may be useful for prenatal and postnatal counseling of families. Case Control/Retrospective Comparative Study. Level III. Copyright © 2017 Elsevier Inc. All rights reserved.
Regional and Temporal Variation in Methamphetamine-Related Incidents: Applications of Spatial and Temporal Scan Statistics

PubMed Central

Sudakin, Daniel L.

2009-01-01

Introduction This investigation utilized spatial scan statistics, geographic information systems and multiple data sources to assess spatial clustering of statewide methamphetamine-related incidents. Temporal and spatial associations with regulatory interventions to reduce access to precursor chemicals (pseudoephedrine) were also explored. Methods Four statewide data sources were utilized including regional poison control center statistics, fatality incidents, methamphetamine laboratory seizures, and hazardous substance releases involving methamphetamine laboratories. Spatial clustering of methamphetamine incidents was assessed using SaTScan™. SaTScan™ was also utilized to assess space-time clustering of methamphetamine laboratory incidents, in relation to the enactment of regulations to reduce access to pseudoephedrine. Results Five counties with a significantly higher relative risk of methamphetamine-related incidents were identified. The county identified as the most likely cluster had a significantly elevated relative risk of methamphetamine laboratories (RR=11.5), hazardous substance releases (RR=8.3), and fatalities relating to methamphetamine (RR=1.4). A significant increase in relative risk of methamphetamine laboratory incidents was apparent in this same geographic area (RR=20.7) during the time period when regulations were enacted in 2004 and 2005, restricting access to pseudoephedrine. Subsequent to the enactment of these regulations, a significantly lower rate of incidents (RR 0.111, p=0.0001) was observed over a large geographic area of the state, including regions that previously had significantly higher rates. Conclusions Spatial and temporal scan statistics can be effectively applied to multiple data sources to assess regional variation in methamphetamine-related incidents, and explore the impact of preventive regulatory interventions. PMID:19225949
Physical and genetic-interaction density reveals functional organization and informs significance cutoffs in genome-wide screens

PubMed Central

Dittmar, John C.; Pierce, Steven; Rothstein, Rodney; Reid, Robert J. D.

2013-01-01

Genome-wide experiments often measure quantitative differences between treated and untreated cells to identify affected strains. For these studies, statistical models are typically used to determine significance cutoffs. We developed a method termed “CLIK” (Cutoff Linked to Interaction Knowledge) that overlays biological knowledge from the interactome on screen results to derive a cutoff. The method takes advantage of the fact that groups of functionally related interacting genes often respond similarly to experimental conditions and, thus, cluster in a ranked list of screen results. We applied CLIK analysis to five screens of the yeast gene disruption library and found that it defined a significance cutoff that differed from traditional statistics. Importantly, verification experiments revealed that the CLIK cutoff correlated with the position in the rank order where the rate of true positives drops off significantly. In addition, the gene sets defined by CLIK analysis often provide further biological perspectives. For example, applying CLIK analysis retrospectively to a screen for cisplatin sensitivity allowed us to identify the importance of the Hrq1 helicase in DNA crosslink repair. Furthermore, we demonstrate the utility of CLIK to determine optimal treatment conditions by analyzing genome-wide screens at multiple rapamycin concentrations. We show that CLIK is an extremely useful tool for evaluating screen quality, determining screen cutoffs, and comparing results between screens. Furthermore, because CLIK uses previously annotated interaction data to determine biologically informed cutoffs, it provides additional insights into screen results, which supplement traditional statistical approaches. PMID:23589890
Statistical and molecular analyses of evolutionary significance of red-green color vision and color blindness in vertebrates.

PubMed

Yokoyama, Shozo; Takenaka, Naomi

2005-04-01

Red-green color vision is strongly suspected to enhance the survival of its possessors. Despite being red-green color blind, however, many species have successfully competed in nature, which brings into question the evolutionary advantage of achieving red-green color vision. Here, we propose a new method of identifying positive selection at individual amino acid sites with the premise that if positive Darwinian selection has driven the evolution of the protein under consideration, then it should be found mostly at the branches in the phylogenetic tree where its function had changed. The statistical and molecular methods have been applied to 29 visual pigments with the wavelengths of maximal absorption at approximately 510-540 nm (green- or middle wavelength-sensitive [MWS] pigments) and at approximately 560 nm (red- or long wavelength-sensitive [LWS] pigments), which are sampled from a diverse range of vertebrate species. The results show that the MWS pigments are positively selected through amino acid replacements S180A, Y277F, and T285A and that the LWS pigments have been subjected to strong evolutionary conservation. The fact that these positively selected M/LWS pigments are found not only in animals with red-green color vision but also in those with red-green color blindness strongly suggests that both red-green color vision and color blindness have undergone adaptive evolution independently in different species.
Statistical tests to compare motif count exceptionalities

PubMed Central

Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

2007-01-01

Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349
The Development of Statistical Literacy at School

ERIC Educational Resources Information Center

Callingham, Rosemary; Watson, Jane M.

2017-01-01

Statistical literacy increasingly is considered an important outcome of schooling. There is little information, however, about appropriate expectations of students at different stages of schooling. Some progress towards this goal was made by Watson and Callingham (2005), who identified an empirical 6-level hierarchy of statistical literacy and the…
Identifying and characterizing hepatitis C virus hotspots in Massachusetts: a spatial epidemiological approach.

PubMed

Stopka, Thomas J; Goulart, Michael A; Meyers, David J; Hutcheson, Marga; Barton, Kerri; Onofrey, Shauna; Church, Daniel; Donahue, Ashley; Chui, Kenneth K H

2017-04-20

Hepatitis C virus (HCV) infections have increased during the past decade but little is known about geographic clustering patterns. We used a unique analytical approach, combining geographic information systems (GIS), spatial epidemiology, and statistical modeling to identify and characterize HCV hotspots, statistically significant clusters of census tracts with elevated HCV counts and rates. We compiled sociodemographic and HCV surveillance data (n = 99,780 cases) for Massachusetts census tracts (n = 1464) from 2002 to 2013. We used a five-step spatial epidemiological approach, calculating incremental spatial autocorrelations and Getis-Ord Gi* statistics to identify clusters. We conducted logistic regression analyses to determine factors associated with the HCV hotspots. We identified nine HCV clusters, with the largest in Boston, New Bedford/Fall River, Worcester, and Springfield (p < 0.05). In multivariable analyses, we found that HCV hotspots were independently and positively associated with the percent of the population that was Hispanic (adjusted odds ratio [AOR]: 1.07; 95% confidence interval [CI]: 1.04, 1.09) and the percent of households receiving food stamps (AOR: 1.83; 95% CI: 1.22, 2.74). HCV hotspots were independently and negatively associated with the percent of the population that were high school graduates or higher (AOR: 0.91; 95% CI: 0.89, 0.93) and the percent of the population in the "other" race/ethnicity category (AOR: 0.88; 95% CI: 0.85, 0.91). We identified locations where HCV clusters were a concern, and where enhanced HCV prevention, treatment, and care can help combat the HCV epidemic in Massachusetts. GIS, spatial epidemiological and statistical analyses provided a rigorous approach to identify hotspot clusters of disease, which can inform public health policy and intervention targeting. Further studies that incorporate spatiotemporal cluster analyses, Bayesian spatial and geostatistical models, spatially weighted regression

A statistical method (cross-validation) for bone loss region detection after spaceflight

PubMed Central

Zhao, Qian; Li, Wenjun; Li, Caixia; Chu, Philip W.; Kornak, John; Lang, Thomas F.

2010-01-01

Astronauts experience bone loss after the long spaceflight missions. Identifying specific regions that undergo the greatest losses (e.g. the proximal femur) could reveal information about the processes of bone loss in disuse and disease. Methods for detecting such regions, however, remains an open problem. This paper focuses on statistical methods to detect such regions. We perform statistical parametric mapping to get t-maps of changes in images, and propose a new cross-validation method to select an optimum suprathreshold for forming clusters of pixels. Once these candidate clusters are formed, we use permutation testing of longitudinal labels to derive significant changes. PMID:20632144
Methodology to identify risk-significant components for inservice inspection and testing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Anderson, M.T.; Hartley, R.S.; Jones, J.L. Jr.

1992-08-01

Periodic inspection and testing of vital system components should be performed to ensure the safe and reliable operation of Department of Energy (DOE) nuclear processing facilities. Probabilistic techniques may be used to help identify and rank components by their relative risk. A risk-based ranking would allow varied DOE sites to implement inspection and testing programs in an effective and cost-efficient manner. This report describes a methodology that can be used to rank components, while addressing multiple risk issues.
A Technology-Based Statistical Reasoning Assessment Tool in Descriptive Statistics for Secondary School Students

ERIC Educational Resources Information Center

Chan, Shiau Wei; Ismail, Zaleha

2014-01-01

The focus of assessment in statistics has gradually shifted from traditional assessment towards alternative assessment where more attention has been paid to the core statistical concepts such as center, variability, and distribution. In spite of this, there are comparatively few assessments that combine the significant three types of statistical…
Timescales for detecting a significant acceleration in sea level rise

PubMed Central

Haigh, Ivan D.; Wahl, Thomas; Rohling, Eelco J.; Price, René M.; Pattiaratchi, Charitha B.; Calafat, Francisco M.; Dangendorf, Sönke

2014-01-01

There is observational evidence that global sea level is rising and there is concern that the rate of rise will increase, significantly threatening coastal communities. However, considerable debate remains as to whether the rate of sea level rise is currently increasing and, if so, by how much. Here we provide new insights into sea level accelerations by applying the main methods that have been used previously to search for accelerations in historical data, to identify the timings (with uncertainties) at which accelerations might first be recognized in a statistically significant manner (if not apparent already) in sea level records that we have artificially extended to 2100. We find that the most important approach to earliest possible detection of a significant sea level acceleration lies in improved understanding (and subsequent removal) of interannual to multidecadal variability in sea level records. PMID:24728012
How to read a paper. Statistics for the non-statistician. II: "Significant" relations and their pitfalls.

PubMed

Greenhalgh, T

1997-08-16

It is possible to be seriously misled by taking the statistical competence (and/or the intellectual honesty) of authors for granted. Some common errors committed (deliberately or inadvertently) by the authors of papers are given in the final box.
Identifying the most significant indicators of the total road safety performance index.

PubMed

Tešić, Milan; Hermans, Elke; Lipovac, Krsto; Pešić, Dalibor

2018-04-01

The review of the national and international literature dealing with the assessment of the road safety level has shown great efforts of the authors who tried to define the methodology for calculating the composite road safety index on a territory (region, state, etc.). The procedure for obtaining a road safety composite index of an area has been largely harmonized. The question that has not been fully resolved yet concerns the selection of indicators. There is a wide range of road safety indicators used to show a road safety situation on a territory. Road safety performance index (RSPI) obtained on the basis of a larger number of safety performance indicators (SPIs) enable decision makers to more precisely define the earlier goal- oriented actions. However, recording a broader comprehensive set of SPIs helps identify the strengths and weaknesses of a country's road safety system. Providing high quality national and international databases that would include comparable SPIs seems to be difficult since a larger number of countries dispose of a small number of identical indicators available for use. Therefore, there is a need for calculating a road safety performance index with a limited number of indicators (RSPI ln n ) which will provide a comparison of a sufficient quality, of as many countries as possible. The application of the Data Envelopment Analysis (DEA) method and correlative analysis has helped to check if the RSPI ln n is likely to be of sufficient quality. A strong correlation between the RSPI ln n and the RSPI has been identified using the proposed methodology. Based on this, the most contributing indicators and methodologies for gradual monitoring of SPIs, have been defined for each country analyzed. The indicator monitoring phases in the analyzed countries have been defined in the following way: Phase 1- the indicators relating to alcohol, speed and protective systems; Phase 2- the indicators relating to roads and Phase 3- the indicators relating to
RED: A Java-MySQL Software for Identifying and Visualizing RNA Editing Sites Using Rule-Based and Statistical Filters.

PubMed

Sun, Yongmei; Li, Xing; Wu, Di; Pan, Qi; Ji, Yuefeng; Ren, Hong; Ding, Keyue

2016-01-01

RNA editing is one of the post- or co-transcriptional processes that can lead to amino acid substitutions in protein sequences, alternative pre-mRNA splicing, and changes in gene expression levels. Although several methods have been suggested to identify RNA editing sites, there remains challenges to be addressed in distinguishing true RNA editing sites from its counterparts on genome and technical artifacts. In addition, there lacks a software framework to identify and visualize potential RNA editing sites. Here, we presented a software - 'RED' (RNA Editing sites Detector) - for the identification of RNA editing sites by integrating multiple rule-based and statistical filters. The potential RNA editing sites can be visualized at the genome and the site levels by graphical user interface (GUI). To improve performance, we used MySQL database management system (DBMS) for high-throughput data storage and query. We demonstrated the validity and utility of RED by identifying the presence and absence of C→U RNA-editing sites experimentally validated, in comparison with REDItools, a command line tool to perform high-throughput investigation of RNA editing. In an analysis of a sample data-set with 28 experimentally validated C→U RNA editing sites, RED had sensitivity and specificity of 0.64 and 0.5. In comparison, REDItools had a better sensitivity (0.75) but similar specificity (0.5). RED is an easy-to-use, platform-independent Java-based software, and can be applied to RNA-seq data without or with DNA sequencing data. The package is freely available under the GPLv3 license at http://github.com/REDetector/RED or https://sourceforge.net/projects/redetector.
RED: A Java-MySQL Software for Identifying and Visualizing RNA Editing Sites Using Rule-Based and Statistical Filters

PubMed Central

Sun, Yongmei; Li, Xing; Wu, Di; Pan, Qi; Ji, Yuefeng; Ren, Hong; Ding, Keyue

2016-01-01

RNA editing is one of the post- or co-transcriptional processes that can lead to amino acid substitutions in protein sequences, alternative pre-mRNA splicing, and changes in gene expression levels. Although several methods have been suggested to identify RNA editing sites, there remains challenges to be addressed in distinguishing true RNA editing sites from its counterparts on genome and technical artifacts. In addition, there lacks a software framework to identify and visualize potential RNA editing sites. Here, we presented a software − ‘RED’ (RNA Editing sites Detector) − for the identification of RNA editing sites by integrating multiple rule-based and statistical filters. The potential RNA editing sites can be visualized at the genome and the site levels by graphical user interface (GUI). To improve performance, we used MySQL database management system (DBMS) for high-throughput data storage and query. We demonstrated the validity and utility of RED by identifying the presence and absence of C→U RNA-editing sites experimentally validated, in comparison with REDItools, a command line tool to perform high-throughput investigation of RNA editing. In an analysis of a sample data-set with 28 experimentally validated C→U RNA editing sites, RED had sensitivity and specificity of 0.64 and 0.5. In comparison, REDItools had a better sensitivity (0.75) but similar specificity (0.5). RED is an easy-to-use, platform-independent Java-based software, and can be applied to RNA-seq data without or with DNA sequencing data. The package is freely available under the GPLv3 license at http://github.com/REDetector/RED or https://sourceforge.net/projects/redetector. PMID:26930599
How to read a paper. Statistics for the non-statistician. II: "Significant" relations and their pitfalls.

PubMed Central

Greenhalgh, T.

1997-01-01

It is possible to be seriously misled by taking the statistical competence (and/or the intellectual honesty) of authors for granted. Some common errors committed (deliberately or inadvertently) by the authors of papers are given in the final box. PMID:9277611
A Multinomial Model for Identifying Significant Pure-Tone Threshold Shifts

ERIC Educational Resources Information Center

Schlauch, Robert S.; Carney, Edward

2007-01-01

Purpose: Significant threshold differences on retest for pure-tone audiometry are often evaluated by application of ad hoc rules, such as a shift in a pure-tone average or in 2 adjacent frequencies that exceeds a predefined amount. Rules that are so derived do not consider the probability of observing a particular audiogram. Methods: A general…
Using scan statistics for congenital anomalies surveillance: the EUROCAT methodology.

PubMed

Teljeur, Conor; Kelly, Alan; Loane, Maria; Densem, James; Dolk, Helen

2015-11-01

Scan statistics have been used extensively to identify temporal clusters of health events. We describe the temporal cluster detection methodology adopted by the EUROCAT (European Surveillance of Congenital Anomalies) monitoring system. Since 2001, EUROCAT has implemented variable window width scan statistic for detecting unusual temporal aggregations of congenital anomaly cases. The scan windows are based on numbers of cases rather than being defined by time. The methodology is imbedded in the EUROCAT Central Database for annual application to centrally held registry data. The methodology was incrementally adapted to improve the utility and to address statistical issues. Simulation exercises were used to determine the power of the methodology to identify periods of raised risk (of 1-18 months). In order to operationalize the scan methodology, a number of adaptations were needed, including: estimating date of conception as unit of time; deciding the maximum length (in time) and recency of clusters of interest; reporting of multiple and overlapping significant clusters; replacing the Monte Carlo simulation with a lookup table to reduce computation time; and placing a threshold on underlying population change and estimating the false positive rate by simulation. Exploration of power found that raised risk periods lasting 1 month are unlikely to be detected except when the relative risk and case counts are high. The variable window width scan statistic is a useful tool for the surveillance of congenital anomalies. Numerous adaptations have improved the utility of the original methodology in the context of temporal cluster detection in congenital anomalies.
From Correlates to Causes: Can Quasi-Experimental Studies and Statistical Innovations Bring Us Closer to Identifying the Causes of Antisocial Behavior?

PubMed Central

Jaffee, Sara R.; Strait, Luciana B.; Odgers, Candice L.

2011-01-01

Longitudinal, epidemiological studies have identified robust risk factors for youth antisocial behavior, including harsh and coercive discipline, maltreatment, smoking during pregnancy, divorce, teen parenthood, peer deviance, parental psychopathology, and social disadvantage. Nevertheless, because this literature is largely based on observational studies, it remains unclear whether these risk factors have truly causal effects. Identifying causal risk factors for antisocial behavior would be informative for intervention efforts and for studies that test whether individuals are differentially susceptible to risk exposures. In this paper, we identify the challenges to causal inference posed by observational studies and describe quasi-experimental methods and statistical innovations that may move us beyond discussions of risk factors to allow for stronger causal inference. We then review studies that use these methods and we evaluate whether robust risk factors identified from observational studies are likely to play a causal role in the emergence and development of youth antisocial behavior. For most of the risk factors we review, there is evidence that they have causal effects. However, these effects are typically smaller than those reported in observational studies, suggesting that familial confounding, social selection, and misidentification might also explain some of the association between risk exposures and antisocial behavior. For some risk factors (e.g., smoking during pregnancy, parent alcohol problems) the evidence is weak that they have environmentally mediated effects on youth antisocial behavior. We discuss the implications of these findings for intervention efforts to reduce antisocial behavior and for basic research on the etiology and course of antisocial behavior. PMID:22023141
From correlates to causes: can quasi-experimental studies and statistical innovations bring us closer to identifying the causes of antisocial behavior?

PubMed

Jaffee, Sara R; Strait, Luciana B; Odgers, Candice L

2012-03-01

Longitudinal, epidemiological studies have identified robust risk factors for youth antisocial behavior, including harsh and coercive discipline, maltreatment, smoking during pregnancy, divorce, teen parenthood, peer deviance, parental psychopathology, and social disadvantage. Nevertheless, because this literature is largely based on observational studies, it remains unclear whether these risk factors have truly causal effects. Identifying causal risk factors for antisocial behavior would be informative for intervention efforts and for studies that test whether individuals are differentially susceptible to risk exposures. In this article, we identify the challenges to causal inference posed by observational studies and describe quasi-experimental methods and statistical innovations that may move researchers beyond discussions of risk factors to allow for stronger causal inference. We then review studies that used these methods, and we evaluate whether robust risk factors identified from observational studies are likely to play a causal role in the emergence and development of youth antisocial behavior. There is evidence of causal effects for most of the risk factors we review. However, these effects are typically smaller than those reported in observational studies, suggesting that familial confounding, social selection, and misidentification might also explain some of the association between risk exposures and antisocial behavior. For some risk factors (e.g., smoking during pregnancy, parent alcohol problems), the evidence is weak that they have environmentally mediated effects on youth antisocial behavior. We discuss the implications of these findings for intervention efforts to reduce antisocial behavior and for basic research on the etiology and course of antisocial behavior.
Empirical Bayes scan statistics for detecting clusters of disease risk variants in genetic studies.

PubMed

McCallum, Kenneth J; Ionita-Laza, Iuliana

2015-12-01

Recent developments of high-throughput genomic technologies offer an unprecedented detailed view of the genetic variation in various human populations, and promise to lead to significant progress in understanding the genetic basis of complex diseases. Despite this tremendous advance in data generation, it remains very challenging to analyze and interpret these data due to their sparse and high-dimensional nature. Here, we propose novel applications and new developments of empirical Bayes scan statistics to identify genomic regions significantly enriched with disease risk variants. We show that the proposed empirical Bayes methodology can be substantially more powerful than existing scan statistics methods especially so in the presence of many non-disease risk variants, and in situations when there is a mixture of risk and protective variants. Furthermore, the empirical Bayes approach has greater flexibility to accommodate covariates such as functional prediction scores and additional biomarkers. As proof-of-concept we apply the proposed methods to a whole-exome sequencing study for autism spectrum disorders and identify several promising candidate genes. © 2015, The International Biometric Society.
Image-guided radiotherapy quality control: Statistical process control using image similarity metrics.

PubMed

Shiraishi, Satomi; Grams, Michael P; Fong de Los Santos, Luis E

2018-05-01

, respectively. Patient-specific control charts using NCC evaluated daily variation and identified statistically significant deviations. This study also showed that subjective evaluations of the images were not always consistent. Population control charts identified a patient whose tracking metrics were significantly lower than those of other patients. The patient-specific action limits identified registrations that warranted immediate evaluation by an expert. When effective displacements in the anterior-posterior direction were compared to 3DoF couch displacements, the agreement was ±1 mm for seven of 10 patients for both C-spine and mandible RTVs. Qualitative review alone of IGRT images can result in inconsistent feedback to the IGRT process. Registration tracking using NCC objectively identifies statistically significant deviations. When used in conjunction with the current image review process, this tool can assist in improving the safety and consistency of the IGRT process. © 2018 American Association of Physicists in Medicine.
Identifying familiar strangers in human encounter networks

NASA Astrophysics Data System (ADS)

Liang, Di; Li, Xiang; Zhang, Yi-Qing

2016-10-01

Familiar strangers, pairs of individuals who encounter repeatedly but never know each other, have been discovered for four decades yet lack an effective method to identify. Here we propose a novel method called familiar stranger classifier (FSC) to identify familiar strangers from three empirical datasets, and classify human relationships into four types, i.e., familiar stranger (FS), in-role (IR), friend (F) and stranger (S). The analyses of the human encounter networks show that the average number of FS one may encounter is finite but larger than the Dunbar Number, and their encounters are structurally more stable and denser than those of S, indicating the encounters of FS are not limited by the social capacity, and more robust than the random scenario. Moreover, the temporal statistics of encounters between FS over the whole time span show strong periodicity, which are diverse from the bursts of encounters within one day, suggesting the significance of longitudinal patterns of human encounters. The proposed method to identify FS in this paper provides a valid framework to understand human encounter patterns and analyse complex human social behaviors.
Assessing attitudes towards statistics among medical students: psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS).

PubMed

Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa

2014-01-01

Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students' attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051-0.078) was below the suggested value of ≤0.08. Cronbach's alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students' attitudes towards statistics in the Serbian educational context.
Learning Predictive Statistics: Strategies and Brain Mechanisms.

PubMed

Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

2017-08-30

When immersed in a new environment, we are challenged to decipher initially incomprehensible streams of sensory information. However, quite rapidly, the brain finds structure and meaning in these incoming signals, helping us to predict and prepare ourselves for future actions. This skill relies on extracting the statistics of event streams in the environment that contain regularities of variable complexity from simple repetitive patterns to complex probabilistic combinations. Here, we test the brain mechanisms that mediate our ability to adapt to the environment's statistics and predict upcoming events. By combining behavioral training and multisession fMRI in human participants (male and female), we track the corticostriatal mechanisms that mediate learning of temporal sequences as they change in structure complexity. We show that learning of predictive structures relates to individual decision strategy; that is, selecting the most probable outcome in a given context (maximizing) versus matching the exact sequence statistics. These strategies engage distinct human brain regions: maximizing engages dorsolateral prefrontal, cingulate, sensory-motor regions, and basal ganglia (dorsal caudate, putamen), whereas matching engages occipitotemporal regions (including the hippocampus) and basal ganglia (ventral caudate). Our findings provide evidence for distinct corticostriatal mechanisms that facilitate our ability to extract behaviorally relevant statistics to make predictions. SIGNIFICANCE STATEMENT Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. Past work has studied how humans identify repetitive patterns and associative pairings. However, the natural environment contains regularities that vary in complexity from simple repetition to complex probabilistic combinations. Here, we combine behavior and multisession fMRI to track the brain mechanisms that mediate our ability to adapt to
The interprocess NIR sampling as an alternative approach to multivariate statistical process control for identifying sources of product-quality variability.

PubMed

Marković, Snežana; Kerč, Janez; Horvat, Matej

2017-03-01

We are presenting a new approach of identifying sources of variability within a manufacturing process by NIR measurements of samples of intermediate material after each consecutive unit operation (interprocess NIR sampling technique). In addition, we summarize the development of a multivariate statistical process control (MSPC) model for the production of enteric-coated pellet product of the proton-pump inhibitor class. By developing provisional NIR calibration models, the identification of critical process points yields comparable results to the established MSPC modeling procedure. Both approaches are shown to lead to the same conclusion, identifying parameters of extrusion/spheronization and characteristics of lactose that have the greatest influence on the end-product's enteric coating performance. The proposed approach enables quicker and easier identification of variability sources during manufacturing process, especially in cases when historical process data is not straightforwardly available. In the presented case the changes of lactose characteristics are influencing the performance of the extrusion/spheronization process step. The pellet cores produced by using one (considered as less suitable) lactose source were on average larger and more fragile, leading to consequent breakage of the cores during subsequent fluid bed operations. These results were confirmed by additional experimental analyses illuminating the underlying mechanism of fracture of oblong pellets during the pellet coating process leading to compromised film coating.
Integrated genomic and transcriptomic analysis of human brain metastases identifies alterations of potential clinical significance.

PubMed

Saunus, Jodi M; Quinn, Michael C J; Patch, Ann-Marie; Pearson, John V; Bailey, Peter J; Nones, Katia; McCart Reed, Amy E; Miller, David; Wilson, Peter J; Al-Ejeh, Fares; Mariasegaram, Mythily; Lau, Queenie; Withers, Teresa; Jeffree, Rosalind L; Reid, Lynne E; Da Silva, Leonard; Matsika, Admire; Niland, Colleen M; Cummings, Margaret C; Bruxner, Timothy J C; Christ, Angelika N; Harliwong, Ivon; Idrisoglu, Senel; Manning, Suzanne; Nourse, Craig; Nourbakhsh, Ehsan; Wani, Shivangi; Anderson, Matthew J; Fink, J Lynn; Holmes, Oliver; Kazakoff, Stephen; Leonard, Conrad; Newell, Felicity; Taylor, Darrin; Waddell, Nick; Wood, Scott; Xu, Qinying; Kassahn, Karin S; Narayanan, Vairavan; Taib, Nur Aishah; Teo, Soo-Hwang; Chow, Yock Ping; kConFab; Jat, Parmjit S; Brandner, Sebastian; Flanagan, Adrienne M; Khanna, Kum Kum; Chenevix-Trench, Georgia; Grimmond, Sean M; Simpson, Peter T; Waddell, Nicola; Lakhani, Sunil R

2015-11-01

Treatment options for patients with brain metastases (BMs) have limited efficacy and the mortality rate is virtually 100%. Targeted therapy is critically under-utilized, and our understanding of mechanisms underpinning metastatic outgrowth in the brain is limited. To address these deficiencies, we investigated the genomic and transcriptomic landscapes of 36 BMs from breast, lung, melanoma and oesophageal cancers, using DNA copy-number analysis and exome- and RNA-sequencing. The key findings were as follows. (a) Identification of novel candidates with possible roles in BM development, including the significantly mutated genes DSC2, ST7, PIK3R1 and SMC5, and the DNA repair, ERBB-HER signalling, axon guidance and protein kinase-A signalling pathways. (b) Mutational signature analysis was applied to successfully identify the primary cancer type for two BMs with unknown origins. (c) Actionable genomic alterations were identified in 31/36 BMs (86%); in one case we retrospectively identified ERBB2 amplification representing apparent HER2 status conversion, then confirmed progressive enrichment for HER2-positivity across four consecutive metastatic deposits by IHC and SISH, resulting in the deployment of HER2-targeted therapy for the patient. (d) In the ERBB/HER pathway, ERBB2 expression correlated with ERBB3 (r(2) = 0.496; p < 0.0001) and HER3 and HER4 were frequently activated in an independent cohort of 167 archival BM from seven primary cancer types: 57.6% and 52.6% of cases were phospho-HER3(Y1222) or phospho-HER4(Y1162) membrane-positive, respectively. The HER3 ligands NRG1/2 were barely detectable by RNAseq, with NRG1 (8p12) genomic loss in 63.6% breast cancer-BMs, suggesting a microenvironmental source of ligand. In summary, this is the first study to characterize the genomic landscapes of BM. The data revealed novel candidates, potential clinical applications for genomic profiling of resectable BMs, and highlighted the possibility of therapeutically targeting

MONKEY: Identifying conserved transcription-factor binding sitesin multiple alignments using a binding site-specific evolutionarymodel

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moses, Alan M.; Chiang, Derek Y.; Pollard, Daniel A.

2004-10-28

We introduce a method (MONKEY) to identify conserved transcription-factor binding sites in multispecies alignments. MONKEY employs probabilistic models of factor specificity and binding site evolution, on which basis we compute the likelihood that putative sites are conserved and assign statistical significance to each hit. Using genomes from the genus Saccharomyces, we illustrate how the significance of real sites increases with evolutionary distance and explore the relationship between conservation and function.
Computational algebraic geometry for statistical modeling FY09Q2 progress.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, David C.; Rojas, Joseph Maurice; Pebay, Philippe Pierre

2009-03-01

This is a progress report on polynomial system solving for statistical modeling. This is a progress report on polynomial system solving for statistical modeling. This quarter we have developed our first model of shock response data and an algorithm for identifying the chamber cone containing a polynomial system in n variables with n+k terms within polynomial time - a significant improvement over previous algorithms, all having exponential worst-case complexity. We have implemented and verified the chamber cone algorithm for n+3 and are working to extend the implementation to handle arbitrary k. Later sections of this report explain chamber cones inmore » more detail; the next section provides an overview of the project and how the current progress fits into it.« less
PROMISE: a tool to identify genomic features with a specific biologically interesting pattern of associations with multiple endpoint variables

PubMed Central

Pounds, Stan; Cheng, Cheng; Cao, Xueyuan; Crews, Kristine R.; Plunkett, William; Gandhi, Varsha; Rubnitz, Jeffrey; Ribeiro, Raul C.; Downing, James R.; Lamba, Jatinder

2009-01-01

Motivation: In some applications, prior biological knowledge can be used to define a specific pattern of association of multiple endpoint variables with a genomic variable that is biologically most interesting. However, to our knowledge, there is no statistical procedure designed to detect specific patterns of association with multiple endpoint variables. Results: Projection onto the most interesting statistical evidence (PROMISE) is proposed as a general procedure to identify genomic variables that exhibit a specific biologically interesting pattern of association with multiple endpoint variables. Biological knowledge of the endpoint variables is used to define a vector that represents the biologically most interesting values for statistics that characterize the associations of the endpoint variables with a genomic variable. A test statistic is defined as the dot-product of the vector of the observed association statistics and the vector of the most interesting values of the association statistics. By definition, this test statistic is proportional to the length of the projection of the observed vector of correlations onto the vector of most interesting associations. Statistical significance is determined via permutation. In simulation studies and an example application, PROMISE shows greater statistical power to identify genes with the interesting pattern of associations than classical multivariate procedures, individual endpoint analyses or listing genes that have the pattern of interest and are significant in more than one individual endpoint analysis. Availability: Documented R routines are freely available from www.stjuderesearch.org/depts/biostats and will soon be available as a Bioconductor package from www.bioconductor.org. Contact: stanley.pounds@stjude.org Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19528086
Statistics Anxiety among Postgraduate Students

ERIC Educational Resources Information Center

Koh, Denise; Zawi, Mohd Khairi

2014-01-01

Most postgraduate programmes, that have research components, require students to take at least one course of research statistics. Not all postgraduate programmes are science based, there are a significant number of postgraduate students who are from the social sciences that will be taking statistics courses, as they try to complete their…
Statistics Using Just One Formula

ERIC Educational Resources Information Center

Rosenthal, Jeffrey S.

2018-01-01

This article advocates that introductory statistics be taught by basing all calculations on a single simple margin-of-error formula and deriving all of the standard introductory statistical concepts (confidence intervals, significance tests, comparisons of means and proportions, etc) from that one formula. It is argued that this approach will…
Comparison of a video-based assessment and a multiple stimulus assessment to identify preferred jobs for individuals with significant intellectual disabilities.

PubMed

Horrocks, Erin L; Morgan, Robert L

2009-01-01

The authors compare two methods of identifying job preferences for individuals with significant intellectual disabilities. Three individuals with intellectual disabilities between the ages of 19 and 21 participated in a video-based preference assessment and a multiple stimulus without replacement (MSWO) assessment. Stimulus preference assessment procedures typically involve giving participants access to the selected stimuli to increase the probability that participants will associate the selected choice with the actual stimuli. Although individuals did not have access to the selected stimuli in the video-based assessment, results indicated that both assessments identified the same highest preference job for all participants. Results are discussed in terms of using a video-based assessment to accurately identify job preferences for individuals with developmental disabilities.
A GIS and statistical approach to identify variables that control water quality in hydrothermally altered and mineralized watersheds, Silverton, Colorado, USA

USGS Publications Warehouse

Yager, Douglas B.; Johnson, Raymond H.; Rockwell, Barnaby W.; Caine, Jonathan S.; Smith, Kathleen S.

2013-01-01

Hydrothermally altered bedrock in the Silverton mining area, southwest Colorado, USA, contains sulfide minerals that weather to produce acidic and metal-rich leachate that is toxic to aquatic life. This study utilized a geographic information system (GIS) and statistical approach to identify watershed-scale geologic variables in the Silverton area that influence water quality. GIS analysis of mineral maps produced using remote sensing datasets including Landsat Thematic Mapper, advanced spaceborne thermal emission and reflection radiometer, and a hybrid airborne visible infrared imaging spectrometer and field-based product enabled areas of alteration to be quantified. Correlations between water quality signatures determined at watershed outlets, and alteration types intersecting both total watershed areas and GIS-buffered areas along streams were tested using linear regression analysis. Despite remote sensing datasets having varying watershed area coverage due to vegetation cover and differing mineral mapping capabilities, each dataset was useful for delineating acid-generating bedrock. Areas of quartz–sericite–pyrite mapped by AVIRIS have the highest correlations with acidic surface water and elevated iron and aluminum concentrations. Alkalinity was only correlated with area of acid neutralizing, propylitically altered bedrock containing calcite and chlorite mapped by AVIRIS. Total watershed area of acid-generating bedrock is more significantly correlated with acidic and metal-rich surface water when compared with acid-generating bedrock intersected by GIS-buffered areas along streams. This methodology could be useful in assessing the possible effects that alteration type area has in either generating or neutralizing acidity in unmined watersheds and in areas where new mining is planned.
Inferring causal relationships between phenotypes using summary statistics from genome-wide association studies.

PubMed

Meng, Xiang-He; Shen, Hui; Chen, Xiang-Ding; Xiao, Hong-Mei; Deng, Hong-Wen

2018-03-01

Genome-wide association studies (GWAS) have successfully identified numerous genetic variants associated with diverse complex phenotypes and diseases, and provided tremendous opportunities for further analyses using summary association statistics. Recently, Pickrell et al. developed a robust method for causal inference using independent putative causal SNPs. However, this method may fail to infer the causal relationship between two phenotypes when only a limited number of independent putative causal SNPs identified. Here, we extended Pickrell's method to make it more applicable for the general situations. We extended the causal inference method by replacing the putative causal SNPs with the lead SNPs (the set of the most significant SNPs in each independent locus) and tested the performance of our extended method using both simulation and empirical data. Simulations suggested that when the same number of genetic variants is used, our extended method had similar distribution of test statistic under the null model as well as comparable power under the causal model compared with the original method by Pickrell et al. But in practice, our extended method would generally be more powerful because the number of independent lead SNPs was often larger than the number of independent putative causal SNPs. And including more SNPs, on the other hand, would not cause more false positives. By applying our extended method to summary statistics from GWAS for blood metabolites and femoral neck bone mineral density (FN-BMD), we successfully identified ten blood metabolites that may causally influence FN-BMD. We extended a causal inference method for inferring putative causal relationship between two phenotypes using summary statistics from GWAS, and identified a number of potential causal metabolites for FN-BMD, which may provide novel insights into the pathophysiological mechanisms underlying osteoporosis.
Identifying medical students at risk of underperformance from significant stressors.

PubMed

Wilkinson, Tim J; McKenzie, Jan M; Ali, Anthony N; Rudland, Joy; Carter, Frances A; Bell, Caroline J

2016-02-02

Stress is associated with poorer academic performance but identifying vulnerable students is less clear. A series of earthquakes and disrupted learning environments created an opportunity to explore the relationships among stress, student factors, support and academic performance within a medical course. The outcomes were deviations from expected performances on end of year written and clinical examinations. The predictors were questionnaire-based measures of connectedness/support, impact of the earthquakes, safety, depression, anxiety, stress, resilience and personality. The response rate was 77%. Poorer than expected performance on all examinations was associated with greater disruptions to living arrangements and fewer years in the country; on the written examination with not having a place to study; and on the clinical examination with relationship status, not having the support of others, less extroversion, and feeling less safe. There was a suggestion of a beneficial association with some markers of stress. We show that academic performance is assisted by students having a secure physical and emotional base. The students who are most vulnerable are those with fewer social networks, and those who are recent immigrants.
[Statistical analysis of German radiologic periodicals: developmental trends in the last 10 years].

PubMed

Golder, W

1999-09-01

To identify which statistical tests are applied in German radiological publications, to what extent their use has changed during the last decade, and which factors might be responsible for this development. The major articles published in "ROFO" and "DER RADIOLOGE" during 1988, 1993 and 1998 were reviewed for statistical content. The contributions were classified by principal focus and radiological subspecialty. The methods used were assigned to descriptive, basal and advanced statistics. Sample size, significance level and power were established. The use of experts' assistance was monitored. Finally, we calculated the so-called cumulative accessibility of the publications. 525 contributions were found to be eligible. In 1988, 87% used descriptive statistics only, 12.5% basal, and 0.5% advanced statistics. The corresponding figures in 1993 and 1998 are 62 and 49%, 32 and 41%, and 6 and 10%, respectively. Statistical techniques were most likely to be used in research on musculoskeletal imaging and articles dedicated to MRI. Six basic categories of statistical methods account for the complete statistical analysis appearing in 90% of the articles. ROC analysis is the single most common advanced technique. Authors make increasingly use of statistical experts' opinion and programs. During the last decade, the use of statistical methods in German radiological journals has fundamentally improved, both quantitatively and qualitatively. Presently, advanced techniques account for 20% of the pertinent statistical tests. This development seems to be promoted by the increasing availability of statistical analysis software.
Statistical identification of stimulus-activated network nodes in multi-neuron voltage-sensitive dye optical recordings.

PubMed

Fathiazar, Elham; Anemuller, Jorn; Kretzberg, Jutta

2016-08-01

Voltage-Sensitive Dye (VSD) imaging is an optical imaging method that allows measuring the graded voltage changes of multiple neurons simultaneously. In neuroscience, this method is used to reveal networks of neurons involved in certain tasks. However, the recorded relative dye fluorescence changes are usually low and signals are superimposed by noise and artifacts. Therefore, establishing a reliable method to identify which cells are activated by specific stimulus conditions is the first step to identify functional networks. In this paper, we present a statistical method to identify stimulus-activated network nodes as cells, whose activities during sensory network stimulation differ significantly from the un-stimulated control condition. This method is demonstrated based on voltage-sensitive dye recordings from up to 100 neurons in a ganglion of the medicinal leech responding to tactile skin stimulation. Without relying on any prior physiological knowledge, the network nodes identified by our statistical analysis were found to match well with published cell types involved in tactile stimulus processing and to be consistent across stimulus conditions and preparations.
Statistical analysis of soil geochemical data to identify pathfinders associated with mineral deposits: An example from the Coles Hill uranium deposit, Virginia, USA

USGS Publications Warehouse

Levitan, Denise M.; Zipper, Carl E.; Donovan, Patricia; Schreiber, Madeline E.; Seal, Robert; Engle, Mark A.; Chermak, John A.; Bodnar, Robert J.; Johnson, Daniel K.; Aylor, Joseph G.

2015-01-01

Soil geochemical anomalies can be used to identify pathfinders in exploration for ore deposits. In this study, compositional data analysis is used with multivariate statistical methods to analyse soil geochemical data collected from the Coles Hill uranium deposit, Virginia, USA, to identify pathfinders associated with this deposit. Elemental compositions and relationships were compared between the collected Coles Hill soil and reference soil samples extracted from a regional subset of a national-scale geochemical survey. Results show that pathfinders for the Coles Hill deposit include light rare earth elements (La and Ce), which, when normalised by their Al content, are correlated with U/Al, and elevated Th/Al values, which are not correlated with U/Al, supporting decoupling of U from Th during soil generation. These results can be used in genetic and weathering models of the Coles Hill deposit, and can also be applied to future prospecting for similar U deposits in the eastern United States, and in regions with similar geological/climatic conditions.
Identifying significant predictors of head-on conflicts on two-lane rural roads using inductive loop detectors data.

PubMed

Shariat-Mohaymany, Afshin; Tavakoli-Kashani, Ali; Nosrati, Hadi; Ranjbari, Andisheh

2011-12-01

To identify the significant factors that influence head-on conflicts resulting from dangerous overtaking maneuvers on 2-lane rural roads in Iran. A traffic conflict technique was applied to 12 two-lane rural roads in order to investigate the potential situations for accidents to occur and thus to identify the geometric and traffic factors affecting traffic conflicts. Traffic data were collected via the inductive loop detectors installed on these roads, and geometric characteristics were obtained through field observations. Two groups of data were then analyzed independently by Pearson's chi-square test to evaluate their relationship to traffic conflicts. The independent variables were percentage of time spent following (PTSF), percentage of heavy vehicles, directional distribution of traffic (DDT), mean speed, speed standard deviation, section type, road width, longitudinal slope, holiday or workday, and lighting condition. It was indicated that increasing the PTSF, decreasing the percentage of heavy vehicles, increasing the mean speed (up to 75 km/h), increasing DDT in the range of 0 to 60 percent, and decreasing the standard deviation of speed significantly increased the occurrence of traffic conflicts. It was also revealed that traffic conflicts occur more frequently on curve sections and on workdays. The variables road width, slope, and lighting condition were found to have a minor effect on conflict occurrence. To reduce the number of head-on conflicts on the aforementioned roads, some remedial measures are suggested, such as not constructing long "No Passing" zones and constructing passing lanes where necessary; keeping road width at the standard value; constructing roads with horizontal curves and a high radius and using appropriate road markings and overtaking-forbidden signs where it is impossible to modify the radius; providing enough light and installing caution signs/devices on the roads; and intensifying police control and supervision on workdays
The Heuristic Value of p in Inductive Statistical Inference

PubMed Central

Krueger, Joachim I.; Heck, Patrick R.

2017-01-01

Many statistical methods yield the probability of the observed data – or data more extreme – under the assumption that a particular hypothesis is true. This probability is commonly known as ‘the’ p-value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p-value has been subjected to much speculation, analysis, and criticism. We explore how well the p-value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p-value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p-value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say. PMID:28649206
The Heuristic Value of p in Inductive Statistical Inference.

PubMed

Krueger, Joachim I; Heck, Patrick R

2017-01-01

Many statistical methods yield the probability of the observed data - or data more extreme - under the assumption that a particular hypothesis is true. This probability is commonly known as 'the' p -value. (Null Hypothesis) Significance Testing ([NH]ST) is the most prominent of these methods. The p -value has been subjected to much speculation, analysis, and criticism. We explore how well the p -value predicts what researchers presumably seek: the probability of the hypothesis being true given the evidence, and the probability of reproducing significant results. We also explore the effect of sample size on inferential accuracy, bias, and error. In a series of simulation experiments, we find that the p -value performs quite well as a heuristic cue in inductive inference, although there are identifiable limits to its usefulness. We conclude that despite its general usefulness, the p -value cannot bear the full burden of inductive inference; it is but one of several heuristic cues available to the data analyst. Depending on the inferential challenge at hand, investigators may supplement their reports with effect size estimates, Bayes factors, or other suitable statistics, to communicate what they think the data say.
Progressive statistics for studies in sports medicine and exercise science.

PubMed

Hopkins, William G; Marshall, Stephen W; Batterham, Alan M; Hanin, Juri

2009-01-01

Statistical guidelines and expert statements are now available to assist in the analysis and reporting of studies in some biomedical disciplines. We present here a more progressive resource for sample-based studies, meta-analyses, and case studies in sports medicine and exercise science. We offer forthright advice on the following controversial or novel issues: using precision of estimation for inferences about population effects in preference to null-hypothesis testing, which is inadequate for assessing clinical or practical importance; justifying sample size via acceptable precision or confidence for clinical decisions rather than via adequate power for statistical significance; showing SD rather than SEM, to better communicate the magnitude of differences in means and nonuniformity of error; avoiding purely nonparametric analyses, which cannot provide inferences about magnitude and are unnecessary; using regression statistics in validity studies, in preference to the impractical and biased limits of agreement; making greater use of qualitative methods to enrich sample-based quantitative projects; and seeking ethics approval for public access to the depersonalized raw data of a study, to address the need for more scrutiny of research and better meta-analyses. Advice on less contentious issues includes the following: using covariates in linear models to adjust for confounders, to account for individual differences, and to identify potential mechanisms of an effect; using log transformation to deal with nonuniformity of effects and error; identifying and deleting outliers; presenting descriptive, effect, and inferential statistics in appropriate formats; and contending with bias arising from problems with sampling, assignment, blinding, measurement error, and researchers' prejudices. This article should advance the field by stimulating debate, promoting innovative approaches, and serving as a useful checklist for authors, reviewers, and editors.
Using Statistical Process Control to Enhance Student Progression

ERIC Educational Resources Information Center

Hanna, Mark D.; Raichura, Nilesh; Bernardes, Ednilson

2012-01-01

Public interest in educational outcomes has markedly increased in the most recent decade; however, quality management and statistical process control have not deeply penetrated the management of academic institutions. This paper presents results of an attempt to use Statistical Process Control (SPC) to identify a key impediment to continuous…
Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science.

PubMed

Veldkamp, Coosje L S; Nuijten, Michèle B; Dominguez-Alvarez, Linda; van Assen, Marcel A L M; Wicherts, Jelte M

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this 'co-piloting' currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors.
Statistics 101 for Radiologists.

PubMed

Anvari, Arash; Halpern, Elkan F; Samir, Anthony E

2015-10-01

Diagnostic tests have wide clinical applications, including screening, diagnosis, measuring treatment effect, and determining prognosis. Interpreting diagnostic test results requires an understanding of key statistical concepts used to evaluate test efficacy. This review explains descriptive statistics and discusses probability, including mutually exclusive and independent events and conditional probability. In the inferential statistics section, a statistical perspective on study design is provided, together with an explanation of how to select appropriate statistical tests. Key concepts in recruiting study samples are discussed, including representativeness and random sampling. Variable types are defined, including predictor, outcome, and covariate variables, and the relationship of these variables to one another. In the hypothesis testing section, we explain how to determine if observed differences between groups are likely to be due to chance. We explain type I and II errors, statistical significance, and study power, followed by an explanation of effect sizes and how confidence intervals can be used to generalize observed effect sizes to the larger population. Statistical tests are explained in four categories: t tests and analysis of variance, proportion analysis tests, nonparametric tests, and regression techniques. We discuss sensitivity, specificity, accuracy, receiver operating characteristic analysis, and likelihood ratios. Measures of reliability and agreement, including κ statistics, intraclass correlation coefficients, and Bland-Altman graphs and analysis, are introduced. © RSNA, 2015.
American Indian Population Statistics.

ERIC Educational Resources Information Center

Thomason, Timothy C., Ed.

This report summarizes American Indian and Alaska Native (AI/AN) population statistics from the 1990 Census. In 1990 there were about 2 million persons who identified themselves as American Indians in the United States, a 38 percent increase over the 1980 census. More than half of the Indian population lived in six states, with Oklahoma having the…

Forest statistics of western Kentucky

Treesearch

The Forest Survey Organization Central States Forest Experiment Station

1950-01-01

This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for the western region of Kentucky. Similar reports for the remainder of the state will be published as soon as statistical tabulations are completed. Later, an analytical report for the state will be published which will interpret forest area, timber volume,...
Forest statistics of southern Indiana

Treesearch

The Forest Survey Organization Central States Forest Experiment Station

1951-01-01

This Survey Release presents the more significant preliminary statistics on the forest area and timber volume for each of the three regions of southern Indiana. A similar report will be published for the two northern Indiana regions. Later, an analytical report for the state will be published which will interpret statistics on forest area, timber- volume, growth, and...
28 CFR 22.22 - Revelation of identifiable data.

Code of Federal Regulations, 2012 CFR

2012-07-01

... 28 Judicial Administration 1 2012-07-01 2012-07-01 false Revelation of identifiable data. 22.22 Section 22.22 Judicial Administration DEPARTMENT OF JUSTICE CONFIDENTIALITY OF IDENTIFIABLE RESEARCH AND STATISTICAL INFORMATION § 22.22 Revelation of identifiable data. (a) Except as noted in paragraph (b) of this...
Myths and Misconceptions Revisited - What are the (Statistically Significant) methods to prevent employee injuries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Potts, T.T.; Hylko, J.M.; Almond, D.

2007-07-01

A company's overall safety program becomes an important consideration to continue performing work and for procuring future contract awards. When injuries or accidents occur, the employer ultimately loses on two counts - increased medical costs and employee absences. This paper summarizes the human and organizational components that contributed to successful safety programs implemented by WESKEM, LLC's Environmental, Safety, and Health Departments located in Paducah, Kentucky, and Oak Ridge, Tennessee. The philosophy of 'safety, compliance, and then production' and programmatic components implemented at the start of the contracts were qualitatively identified as contributing factors resulting in a significant accumulation of safemore » work hours and an Experience Modification Rate (EMR) of <1.0. Furthermore, a study by the Associated General Contractors of America quantitatively validated components, already found in the WESKEM, LLC programs, as contributing factors to prevent employee accidents and injuries. Therefore, an investment in the human and organizational components now can pay dividends later by reducing the EMR, which is the key to reducing Workers' Compensation premiums. Also, knowing your employees' demographics and taking an active approach to evaluate and prevent fatigue may help employees balance work and non-work responsibilities. In turn, this approach can assist employers in maintaining a healthy and productive workforce. For these reasons, it is essential that safety needs be considered as the starting point when performing work. (authors)« less
An Assessment Blueprint for EncStat: A Statistics Anxiety Intervention Program.

ERIC Educational Resources Information Center

Watson, Freda S.; Lang, Thomas R.; Kromrey, Jeffrey D.; Ferron, John M.; Hess, Melinda R.; Hogarty, Kristine Y.

EncStat (Encouraged about Statistics) is a multimedia program being developed to identify and assist students with statistics anxiety or negative attitudes about statistics. This study explored the validity of the assessment instruments included in EncStat with respect to their diagnostic value for statistics anxiety and negative attitudes about…
Statistical testing and power analysis for brain-wide association study.

PubMed

Gong, Weikang; Wan, Lin; Lu, Wenlian; Ma, Liang; Cheng, Fan; Cheng, Wei; Grünewald, Stefan; Feng, Jianfeng

2018-04-05

The identification of connexel-wise associations, which involves examining functional connectivities between pairwise voxels across the whole brain, is both statistically and computationally challenging. Although such a connexel-wise methodology has recently been adopted by brain-wide association studies (BWAS) to identify connectivity changes in several mental disorders, such as schizophrenia, autism and depression, the multiple correction and power analysis methods designed specifically for connexel-wise analysis are still lacking. Therefore, we herein report the development of a rigorous statistical framework for connexel-wise significance testing based on the Gaussian random field theory. It includes controlling the family-wise error rate (FWER) of multiple hypothesis testings using topological inference methods, and calculating power and sample size for a connexel-wise study. Our theoretical framework can control the false-positive rate accurately, as validated empirically using two resting-state fMRI datasets. Compared with Bonferroni correction and false discovery rate (FDR), it can reduce false-positive rate and increase statistical power by appropriately utilizing the spatial information of fMRI data. Importantly, our method bypasses the need of non-parametric permutation to correct for multiple comparison, thus, it can efficiently tackle large datasets with high resolution fMRI images. The utility of our method is shown in a case-control study. Our approach can identify altered functional connectivities in a major depression disorder dataset, whereas existing methods fail. A software package is available at https://github.com/weikanggong/BWAS. Copyright © 2018 Elsevier B.V. All rights reserved.
Assessing the statistical significance of the achieved classification error of classifiers constructed using serum peptide profiles, and a prescription for random sampling repeated studies for massive high-throughput genomic and proteomic studies.

PubMed

Lyons-Weiler, James; Pelikan, Richard; Zeh, Herbert J; Whitcomb, David C; Malehorn, David E; Bigbee, William L; Hauskrecht, Milos

2005-01-01

Peptide profiles generated using SELDI/MALDI time of flight mass spectrometry provide a promising source of patient-specific information with high potential impact on the early detection and classification of cancer and other diseases. The new profiling technology comes, however, with numerous challenges and concerns. Particularly important are concerns of reproducibility of classification results and their significance. In this work we describe a computational validation framework, called PACE (Permutation-Achieved Classification Error), that lets us assess, for a given classification model, the significance of the Achieved Classification Error (ACE) on the profile data. The framework compares the performance statistic of the classifier on true data samples and checks if these are consistent with the behavior of the classifier on the same data with randomly reassigned class labels. A statistically significant ACE increases our belief that a discriminative signal was found in the data. The advantage of PACE analysis is that it can be easily combined with any classification model and is relatively easy to interpret. PACE analysis does not protect researchers against confounding in the experimental design, or other sources of systematic or random error. We use PACE analysis to assess significance of classification results we have achieved on a number of published data sets. The results show that many of these datasets indeed possess a signal that leads to a statistically significant ACE.
Prognostic significance of obstructive uropathy in advanced prostate cancer.

PubMed

Oefelein, Michael G

2004-06-01

To report the incidence and prognostic implications of obstructive uropathy (OU) in patients with advanced prostate cancer receiving androgen deprivation therapy and to define the impact initial local therapy has on the development of OU in patients with prostate cancer who develop recurrence and begin androgen deprivation therapy. From a population of 260 patients with advanced prostate cancer diagnosed between 1986 and 2003, OU was identified in 51 patients. The OU treatment options included ureteral stent, percutaneous nephrostomy, transurethral resection of the prostate, Foley catheter placement, and urinary diversion. Overall survival and the factors that influenced survival were calculated using standard statistical methods. OU was diagnosed in 15 (16%) of 80 patients who received local therapy with curative intent and in whom local therapy subsequently failed and in 36 (19%) of 180 patients who had never received local therapy (P = 0.7, chi-square test). Of these 51 patients, 39 had bladder neck obstruction and 16 had ureteral obstruction. Overall survival was significantly worse for the men with OU compared with those without OU (41 versus 54 months). OU was associated with tumor stage and androgen-insensitive prostate cancer. OU results in significantly reduced survival in men with prostate cancer. In a select group of patients with prostate cancer with progression after local therapy (primarily radiotherapy), no statistically significant reduction in the development of OU was observed relative to patients matched for stage, grade, and pretreatment prostate-specific antigen level treated with androgen deprivation therapy alone. Aggressive advanced stage and hormone-insensitive disease are variables associated with OU.
Complication Rates of Ostomy Surgery Are High and Vary Significantly Between Hospitals

PubMed Central

Sheetz, Kyle H.; Waits, Seth A.; Krell, Robert W.; Morris, Arden M.; Englesbe, Michael J.; Mullard, Andrew; Campbell, Darrell A.; Hendren, Samantha

2014-01-01

Structured Abstract Background Ostomy surgery is common and has traditionally been associated with high rates of morbidity and mortality, suggesting an important target for quality improvement. Objective To evaluate the variation in outcomes after ostomy creation surgery within Michigan in order to identify targets for quality improvement. Design Retrospective cohort study. Setting The 34-hospital Michigan Surgical Quality Collaborative (MSQC). Patients Patients undergoing ostomy creation surgery between 2006-2011. Main outcome measures We evaluated hospitals' morbidity and mortality rates after risk-adjustment (age, comorbidities, emergency v. elective, procedure type). Results 4,250 patients underwent ostomy creation surgery; 3,866 (91.0%) procedures were open and 384 (9.0%) were laparoscopic. Unadjusted morbidity and mortality rates were 43.9% and 10.7%, respectively. Unadjusted morbidity rates for specific procedures ranged from 32.7% for ostomy-creation-only procedures to 47.8% for Hartmann's procedures. Risk-adjusted morbidity rates varied significantly between hospitals, ranging from 31.2% (95%CI 18.4-43.9) to 60.8% (95%CI 48.9-72.6). There were five statistically-significant high-outlier hospitals and three statistically-significant low-outlier hospitals for risk-adjusted morbidity. The pattern of complication types was similar between high- and low-outlier hospitals. Case volume, operative duration, and use of laparoscopic surgery did not explain the variation in morbidity rates across hospitals. Conclusions Morbidity and mortality rates for modern ostomy surgery are high. While this type of surgery has received little attention in healthcare policy, these data reveal that it is both common and uncommonly morbid. Variation in hospital performance provides an opportunity to identify quality improvement practices that could be disseminated among hospitals. PMID:24819104
Teach a Confidence Interval for the Median in the First Statistics Course

ERIC Educational Resources Information Center

Howington, Eric B.

2017-01-01

Few introductory statistics courses consider statistical inference for the median. This article argues in favour of adding a confidence interval for the median to the first statistics course. Several methods suitable for introductory statistics students are identified and briefly reviewed.
Statistical competencies for medical research learners: What is fundamental?

PubMed

Enders, Felicity T; Lindsell, Christopher J; Welty, Leah J; Benn, Emma K T; Perkins, Susan M; Mayo, Matthew S; Rahbar, Mohammad H; Kidwell, Kelley M; Thurston, Sally W; Spratt, Heidi; Grambow, Steven C; Larson, Joseph; Carter, Rickey E; Pollock, Brad H; Oster, Robert A

2017-06-01

It is increasingly essential for medical researchers to be literate in statistics, but the requisite degree of literacy is not the same for every statistical competency in translational research. Statistical competency can range from 'fundamental' (necessary for all) to 'specialized' (necessary for only some). In this study, we determine the degree to which each competency is fundamental or specialized. We surveyed members of 4 professional organizations, targeting doctorally trained biostatisticians and epidemiologists who taught statistics to medical research learners in the past 5 years. Respondents rated 24 educational competencies on a 5-point Likert scale anchored by 'fundamental' and 'specialized.' There were 112 responses. Nineteen of 24 competencies were fundamental. The competencies considered most fundamental were assessing sources of bias and variation (95%), recognizing one's own limits with regard to statistics (93%), identifying the strengths, and limitations of study designs (93%). The least endorsed items were meta-analysis (34%) and stopping rules (18%). We have identified the statistical competencies needed by all medical researchers. These competencies should be considered when designing statistical curricula for medical researchers and should inform which topics are taught in graduate programs and evidence-based medicine courses where learners need to read and understand the medical research literature.
Expression of thymidylate synthase (TS) and its prognostic significance in patients with cutaneous angiosarcoma.

PubMed

Shimizu, A; Kaira, K; Okubo, Y; Utsumi, D; Bolag, A; Yasuda, M; Takahashi, K; Ishikawa, O

2017-01-01

Cutaneous angiosarcoma (CA) is extremely rare, and little is known about the biological significance of possible biomarkers for chemotherapeutic agents. Thymidylate synthase (TS) is an attractive target for cancer treatment in various human neoplasms. It remains unclear whether the expression of TS is associated with the clinicopathological features of CA patients. The aim of this study was to elucidate the relationship between TS expression and the clinicopathological significance in CA patients. Fifty-one patients with CA were included in this study. TS expression and Ki-67 labeling index were examined using immunohistochemical analysis. TS was positively expressed in 39% (20/51) of CA patients. No statistically significant prognostic factor was identified as a predictor of overall survival (OS) for all patients by univariate analysis, whereas a significant prognostic variable for progression free survival (PFS) was found to be the clinical stage. In addition, both univariate and multivariate analyses confirmed that positive expression of TS was a significant predictor of worse PFS in CA patients of clinical stage 1. Positive TS expression in CA was identified as a significant predictor of worse outcome in patients of clinical stage 1.
Statistics of Statisticians: Critical Mass of Statistics and Operational Research Groups

NASA Astrophysics Data System (ADS)

Kenna, Ralph; Berche, Bertrand

Using a recently developed model, inspired by mean field theory in statistical physics, and data from the UK's Research Assessment Exercise, we analyse the relationship between the qualities of statistics and operational research groups and the quantities of researchers in them. Similar to other academic disciplines, we provide evidence for a linear dependency of quality on quantity up to an upper critical mass, which is interpreted as the average maximum number of colleagues with whom a researcher can communicate meaningfully within a research group. The model also predicts a lower critical mass, which research groups should strive to achieve to avoid extinction. For statistics and operational research, the lower critical mass is estimated to be 9 ± 3. The upper critical mass, beyond which research quality does not significantly depend on group size, is 17 ± 6.
BETTER STATISTICS FOR BETTER DECISIONS: REJECTING NULL HYPOTHESES STATISTICAL TESTS IN FAVOR OF REPLICATION STATISTICS

PubMed Central

SANABRIA, FEDERICO; KILLEEN, PETER R.

2008-01-01

Despite being under challenge for the past 50 years, null hypothesis significance testing (NHST) remains dominant in the scientific field for want of viable alternatives. NHST, along with its significance level p, is inadequate for most of the uses to which it is put, a flaw that is of particular interest to educational practitioners who too often must use it to sanctify their research. In this article, we review the failure of NHST and propose prep, the probability of replicating an effect, as a more useful statistic for evaluating research and aiding practical decision making. PMID:19122766
Using statistical process control for monitoring the prevalence of hospital-acquired pressure ulcers.

PubMed

Kottner, Jan; Halfens, Ruud

2010-05-01

Institutionally acquired pressure ulcers are used as outcome indicators to assess the quality of pressure ulcer prevention programs. Determining whether quality improvement projects that aim to decrease the proportions of institutionally acquired pressure ulcers lead to real changes in clinical practice depends on the measurement method and statistical analysis used. To examine whether nosocomial pressure ulcer prevalence rates in hospitals in the Netherlands changed, a secondary data analysis using different statistical approaches was conducted of annual (1998-2008) nationwide nursing-sensitive health problem prevalence studies in the Netherlands. Institutions that participated regularly in all survey years were identified. Risk-adjusted nosocomial pressure ulcers prevalence rates, grade 2 to 4 (European Pressure Ulcer Advisory Panel system) were calculated per year and hospital. Descriptive statistics, chi-square trend tests, and P charts based on statistical process control (SPC) were applied and compared. Six of the 905 healthcare institutions participated in every survey year and 11,444 patients in these six hospitals were identified as being at risk for pressure ulcers. Prevalence rates per year ranged from 0.05 to 0.22. Chi-square trend tests revealed statistically significant downward trends in four hospitals but based on SPC methods, prevalence rates of five hospitals varied by chance only. Results of chi-square trend tests and SPC methods were not comparable, making it impossible to decide which approach is more appropriate. P charts provide more valuable information than single P values and are more helpful for monitoring institutional performance. Empirical evidence about the decrease of nosocomial pressure ulcer prevalence rates in the Netherlands is contradictory and limited.
Lessons from Inferentialism for Statistics Education

ERIC Educational Resources Information Center

Bakker, Arthur; Derry, Jan

2011-01-01

This theoretical paper relates recent interest in informal statistical inference (ISI) to the semantic theory termed inferentialism, a significant development in contemporary philosophy, which places inference at the heart of human knowing. This theory assists epistemological reflection on challenges in statistics education encountered when…
Enhancing Dairy Manufacturing through customer feedback: A statistical approach

NASA Astrophysics Data System (ADS)

Vineesh, D.; Anbuudayasankar, S. P.; Narassima, M. S.

2018-02-01

Dairy products have become inevitable of habitual diet. This study aims to investigate the consumers’ satisfaction towards dairy products so as to provide useful information for the manufacturers which would serve as useful inputs for enriching the quality of products delivered. The study involved consumers of dairy products from various demographical backgrounds across South India. The questionnaire focussed on quality aspects of dairy products and also the service provided. A customer satisfaction model was developed based on various factors identified, with robust hypotheses that govern the use of the product. The developed model proved to be statistically significant as it passed the required statistical tests for reliability, construct validity and interdependency between the constructs. Some major concerns detected were regarding the fat content, taste and odour of packaged milk. A minor proportion of people (15.64%) were unsatisfied with the quality of service provided, which is another issue to be addressed to eliminate the sense of dissatisfaction in the minds of consumers.
Rapid Statistical Learning Supporting Word Extraction From Continuous Speech.

PubMed

Batterink, Laura J

2017-07-01

The identification of words in continuous speech, known as speech segmentation, is a critical early step in language acquisition. This process is partially supported by statistical learning, the ability to extract patterns from the environment. Given that speech segmentation represents a potential bottleneck for language acquisition, patterns in speech may be extracted very rapidly, without extensive exposure. This hypothesis was examined by exposing participants to continuous speech streams composed of novel repeating nonsense words. Learning was measured on-line using a reaction time task. After merely one exposure to an embedded novel word, learners demonstrated significant learning effects, as revealed by faster responses to predictable than to unpredictable syllables. These results demonstrate that learners gained sensitivity to the statistical structure of unfamiliar speech on a very rapid timescale. This ability may play an essential role in early stages of language acquisition, allowing learners to rapidly identify word candidates and "break in" to an unfamiliar language.
Regional variation in the severity of pesticide exposure outcomes: applications of geographic information systems and spatial scan statistics.

PubMed

Sudakin, Daniel L; Power, Laura E

2009-03-01

Geographic information systems and spatial scan statistics have been utilized to assess regional clustering of symptomatic pesticide exposures reported to a state Poison Control Center (PCC) during a single year. In the present study, we analyzed five subsequent years of PCC data to test whether there are significant geographic differences in pesticide exposure incidents resulting in serious (moderate, major, and fatal) medical outcomes. A PCC provided the data on unintentional pesticide exposures for the time period 2001-2005. The geographic location of the caller, the location where the exposure occurred, the exposure route, and the medical outcome were abstracted. There were 273 incidents resulting in moderate effects (n = 261), major effects (n = 10), or fatalities (n = 2). Spatial scan statistics identified a geographic area consisting of two adjacent counties (one urban, one rural), where statistically significant clustering of serious outcomes was observed. The relative risk of moderate, major, and fatal outcomes was 2.0 in this spatial cluster (p = 0.0005). PCC data, geographic information systems, and spatial scan statistics can identify clustering of serious outcomes from human exposure to pesticides. These analyses may be useful for public health officials to target preventive interventions. Further investigation is warranted to understand better the potential explanations for geographical clustering, and to assess whether preventive interventions have an impact on reducing pesticide exposure incidents resulting in serious medical outcomes.
Statistical Reporting Errors and Collaboration on Statistical Analyses in Psychological Science

PubMed Central

Veldkamp, Coosje L. S.; Nuijten, Michèle B.; Dominguez-Alvarez, Linda; van Assen, Marcel A. L. M.; Wicherts, Jelte M.

2014-01-01

Statistical analysis is error prone. A best practice for researchers using statistics would therefore be to share data among co-authors, allowing double-checking of executed tasks just as co-pilots do in aviation. To document the extent to which this ‘co-piloting’ currently occurs in psychology, we surveyed the authors of 697 articles published in six top psychology journals and asked them whether they had collaborated on four aspects of analyzing data and reporting results, and whether the described data had been shared between the authors. We acquired responses for 49.6% of the articles and found that co-piloting on statistical analysis and reporting results is quite uncommon among psychologists, while data sharing among co-authors seems reasonably but not completely standard. We then used an automated procedure to study the prevalence of statistical reporting errors in the articles in our sample and examined the relationship between reporting errors and co-piloting. Overall, 63% of the articles contained at least one p-value that was inconsistent with the reported test statistic and the accompanying degrees of freedom, and 20% of the articles contained at least one p-value that was inconsistent to such a degree that it may have affected decisions about statistical significance. Overall, the probability that a given p-value was inconsistent was over 10%. Co-piloting was not found to be associated with reporting errors. PMID:25493918

Pisces did not have increased heart failure: data-driven comparisons of binary proportions between levels of a categorical variable can result in incorrect statistical significance levels.

PubMed

Austin, Peter C; Goldwasser, Meredith A

2008-03-01

We examined the impact on statistical inference when a chi(2) test is used to compare the proportion of successes in the level of a categorical variable that has the highest observed proportion of successes with the proportion of successes in all other levels of the categorical variable combined. Monte Carlo simulations and a case study examining the association between astrological sign and hospitalization for heart failure. A standard chi(2) test results in an inflation of the type I error rate, with the type I error rate increasing as the number of levels of the categorical variable increases. Using a standard chi(2) test, the hospitalization rate for Pisces was statistically significantly different from that of the other 11 astrological signs combined (P=0.026). After accounting for the fact that the selection of Pisces was based on it having the highest observed proportion of heart failure hospitalizations, subjects born under the sign of Pisces no longer had a significantly higher rate of heart failure hospitalization compared to the other residents of Ontario (P=0.152). Post hoc comparisons of the proportions of successes across different levels of a categorical variable can result in incorrect inferences.
Statistical methods for detecting periodic fragments in DNA sequence data

PubMed Central

2011-01-01

Background Period 10 dinucleotides are structurally and functionally validated factors that influence the ability of DNA to form nucleosomes, histone core octamers. Robust identification of periodic signals in DNA sequences is therefore required to understand nucleosome organisation in genomes. While various techniques for identifying periodic components in genomic sequences have been proposed or adopted, the requirements for such techniques have not been considered in detail and confirmatory testing for a priori specified periods has not been developed. Results We compared the estimation accuracy and suitability for confirmatory testing of autocorrelation, discrete Fourier transform (DFT), integer period discrete Fourier transform (IPDFT) and a previously proposed Hybrid measure. A number of different statistical significance procedures were evaluated but a blockwise bootstrap proved superior. When applied to synthetic data whose period-10 signal had been eroded, or for which the signal was approximately period-10, the Hybrid technique exhibited superior properties during exploratory period estimation. In contrast, confirmatory testing using the blockwise bootstrap procedure identified IPDFT as having the greatest statistical power. These properties were validated on yeast sequences defined from a ChIP-chip study where the Hybrid metric confirmed the expected dominance of period-10 in nucleosome associated DNA but IPDFT identified more significant occurrences of period-10. Application to the whole genomes of yeast and mouse identified ~ 21% and ~ 19% respectively of these genomes as spanned by period-10 nucleosome positioning sequences (NPS). Conclusions For estimating the dominant period, we find the Hybrid period estimation method empirically to be the most effective for both eroded and approximate periodicity. The blockwise bootstrap was found to be effective as a significance measure, performing particularly well in the problem of period detection in the
A discrete wavelet spectrum approach for identifying non-monotonic trends in hydroclimate data

NASA Astrophysics Data System (ADS)

Sang, Yan-Fang; Sun, Fubao; Singh, Vijay P.; Xie, Ping; Sun, Jian

2018-01-01

The hydroclimatic process is changing non-monotonically and identifying its trends is a great challenge. Building on the discrete wavelet transform theory, we developed a discrete wavelet spectrum (DWS) approach for identifying non-monotonic trends in hydroclimate time series and evaluating their statistical significance. After validating the DWS approach using two typical synthetic time series, we examined annual temperature and potential evaporation over China from 1961-2013 and found that the DWS approach detected both the warming and the warming hiatus in temperature, and the reversed changes in potential evaporation. Further, the identified non-monotonic trends showed stable significance when the time series was longer than 30 years or so (i.e. the widely defined climate timescale). The significance of trends in potential evaporation measured at 150 stations in China, with an obvious non-monotonic trend, was underestimated and was not detected by the Mann-Kendall test. Comparatively, the DWS approach overcame the problem and detected those significant non-monotonic trends at 380 stations, which helped understand and interpret the spatiotemporal variability in the hydroclimatic process. Our results suggest that non-monotonic trends of hydroclimate time series and their significance should be carefully identified, and the DWS approach proposed has the potential for wide use in the hydrological and climate sciences.
Identifying the hydrochemical characteristics of rivers and groundwater by multivariate statistical analysis in the Sanjiang Plain, China

NASA Astrophysics Data System (ADS)

Cao, Yingjie; Tang, Changyuan; Song, Xianfang; Liu, Changming; Zhang, Yinghua

2016-06-01

Two multivariate statistical technologies, factor analysis (FA) and discriminant analysis (DA), are applied to study the river and groundwater hydrochemistry and its controlling processes in the Sanjiang Plain of the northeast China. Factor analysis identifies five factors which account for 79.65 % of the total variance in the dataset. Four factors bearing specific meanings as the river and groundwater hydrochemistry controlling processes are divided into two groups, the "natural hydrochemistry evolution" group and the "pollution" group. The "natural hydrochemistry evolution" group includes the salinity factor (factor 1) caused by rock weathering and the residence time factor (factor 2) reflecting the groundwater traveling time. The "pollution" group represents the groundwater quality deterioration due to geogenic pollution caused by elevated Fe and Mn (factor 3) and elevated nitrate (NO3 -) introduced by human activities such as agriculture exploitations (factor 5). The hydrochemical difference and hydraulic connection among rivers (surface water, SW), shallow groundwater (SG) and deep groundwater (DG) group are evaluated by the factor scores obtained from FA and DA (Fisher's method). It is showed that the river water is characterized as low salinity and slight pollution, and the shallow groundwater has the highest salinity and severe pollution. The SW is well separated from SG and DG by Fisher's discriminant function, but the SG and DG can not be well separated showing their hydrochemical similarities, and emphasize hydraulic connections between SG and DG.
Statistical Analysis of the Links between Blocking and Nor'easters

NASA Astrophysics Data System (ADS)

Booth, J. F.; Pfahl, S.

2015-12-01

Nor'easters can be loosely defined as extratropical cyclones that develop as they progress northward along the eastern coast of North America. The path makes it possible for these storms to generate storm surge along the coastline and/or heavy precipitation or snow inland. In the present analysis, the path of the storms is investigated relative to the behavior of upstream blocking events over the North Atlantic Ocean. For this analysis, two separate Lagrangian tracking methods are used to identify the extratropical cyclone paths and the blocking events. Using the cyclone paths, Nor'easters are identified and blocking statistics are calculated for the days prior to, during and following the occurrence of the Nor'easters. The path, strength and intensification rates of the cyclones are compared with the strength and location of the blocks. In the event that a Nor'easter occurs, the likelihood of the presence of block at the southeast tip of Greenland is statistically significantly increased, i.e., the presence of a block concurrent with a Nor'easter happens more often than by random coincidence. However no significant link between the strength of the storms and the strength of the block is identified. These results suggest that the presence of the block mainly affects the path of the Nor'easters. On the other hand, in the event of blocking at the southeast tip of Greenland, the likelihood of a Nor'easter, as opposed to a different type of storm is no greater than what one might expect from randomly sampling cyclone tracks. The results confirm a long held understanding in forecast meteorology that upstream blocking is a necessary but not sufficient condition for generating a Nor'easter.
The Statistical Power of Planned Comparisons.

ERIC Educational Resources Information Center

Benton, Roberta L.

Basic principles underlying statistical power are examined; and issues pertaining to effect size, sample size, error variance, and significance level are highlighted via the use of specific hypothetical examples. Analysis of variance (ANOVA) and related methods remain popular, although other procedures sometimes have more statistical power against…
Generating an Empirical Probability Distribution for the Andrews-Pregibon Statistic.

ERIC Educational Resources Information Center

Jarrell, Michele G.

A probability distribution was developed for the Andrews-Pregibon (AP) statistic. The statistic, developed by D. F. Andrews and D. Pregibon (1978), identifies multivariate outliers. It is a ratio of the determinant of the data matrix with an observation deleted to the determinant of the entire data matrix. Although the AP statistic has been used…
Sex differences in discriminative power of volleyball game-related statistics.

PubMed

João, Paulo Vicente; Leite, Nuno; Mesquita, Isabel; Sampaio, Jaime

2010-12-01

To identify sex differences in volleyball game-related statistics, the game-related statistics of several World Championships in 2007 (N=132) were analyzed using the software VIS from the International Volleyball Federation. Discriminant analysis was used to identify the game-related statistics which better discriminated performances by sex. Analysis yielded an emphasis on fault serves (SC = -.40), shot spikes (SC = .40), and reception digs (SC = .31). Specific robust numbers represent that considerable variability was evident in the game-related statistics profile, as men's volleyball games were better associated with terminal actions (errors of service), and women's volleyball games were characterized by continuous actions (in defense and attack). These differences may be related to the anthropometric and physiological differences between women and men and their influence on performance profiles.
49 CFR 520.5 - Guidelines for identifying major actions significantly affecting the environment.

Code of Federal Regulations, 2011 CFR

2011-10-01

... impact but which have a potential for significantly affecting the environment; (2) Any proposed action... relating to the environment; (ii) has a significantly detrimental impact on air or water quality or on... vehicles or motor vehicle equipment; and (13) Any other action that causes significant environment impact...
49 CFR 520.5 - Guidelines for identifying major actions significantly affecting the environment.

Code of Federal Regulations, 2010 CFR

2010-10-01

... significantly affecting the environment. 520.5 Section 520.5 Transportation Other Regulations Relating to... significantly affecting the environment. (a) General guidelines. The phrase, “major Federal actions significantly affecting the quality of the human environment,” as used in this part, shall be construed with a...
Public health information and statistics dissemination efforts for Indonesia on the Internet.

PubMed

Hanani, Febiana; Kobayashi, Takashi; Jo, Eitetsu; Nakajima, Sawako; Oyama, Hiroshi

2011-01-01

To elucidate current issues related to health statistics dissemination efforts on the Internet in Indonesia and to propose a new dissemination website as a solution. A cross-sectional survey was conducted. Sources of statistics were identified using link relationship and Google™ search. Menu used to locate statistics, mode of presentation and means of access to statistics, and available statistics were assessed for each site. Assessment results were used to derive design specification; a prototype system was developed and evaluated with usability test. 49 sources were identified on 18 governmental, 8 international and 5 non-government websites. Of 49 menus identified, 33% used non-intuitive titles and lead to inefficient search. 69% of them were on government websites. Of 31 websites, only 39% and 23% used graph/chart and map for presentation. Further, only 32%, 39% and 19% provided query, export and print feature. While >50% sources reported morbidity, risk factor and service provision statistics, <40% sources reported health resource and mortality statistics. Statistics portal website was developed using Joomla!™ content management system. Usability test demonstrated its potential to improve data accessibility. In this study, government's efforts to disseminate statistics in Indonesia are supported by non-governmental and international organizations and existing their information may not be very useful because it is: a) not widely distributed, b) difficult to locate, and c) not effectively communicated. Actions are needed to ensure information usability, and one of such actions is the development of statistics portal website.
The relationship between procrastination, learning strategies and statistics anxiety among Iranian college students: a canonical correlation analysis.

PubMed

Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

2012-01-01

Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction.
Genome-wide identification of significant aberrations in cancer genome.

PubMed

Yuan, Xiguo; Yu, Guoqiang; Hou, Xuchu; Shih, Ie-Ming; Clarke, Robert; Zhang, Junying; Hoffman, Eric P; Wang, Roger R; Zhang, Zhen; Wang, Yue

2012-07-27

Somatic Copy Number Alterations (CNAs) in human genomes are present in almost all human cancers. Systematic efforts to characterize such structural variants must effectively distinguish significant consensus events from random background aberrations. Here we introduce Significant Aberration in Cancer (SAIC), a new method for characterizing and assessing the statistical significance of recurrent CNA units. Three main features of SAIC include: (1) exploiting the intrinsic correlation among consecutive probes to assign a score to each CNA unit instead of single probes; (2) performing permutations on CNA units that preserve correlations inherent in the copy number data; and (3) iteratively detecting Significant Copy Number Aberrations (SCAs) and estimating an unbiased null distribution by applying an SCA-exclusive permutation scheme. We test and compare the performance of SAIC against four peer methods (GISTIC, STAC, KC-SMART, CMDS) on a large number of simulation datasets. Experimental results show that SAIC outperforms peer methods in terms of larger area under the Receiver Operating Characteristics curve and increased detection power. We then apply SAIC to analyze structural genomic aberrations acquired in four real cancer genome-wide copy number data sets (ovarian cancer, metastatic prostate cancer, lung adenocarcinoma, glioblastoma). When compared with previously reported results, SAIC successfully identifies most SCAs known to be of biological significance and associated with oncogenes (e.g., KRAS, CCNE1, and MYC) or tumor suppressor genes (e.g., CDKN2A/B). Furthermore, SAIC identifies a number of novel SCAs in these copy number data that encompass tumor related genes and may warrant further studies. Supported by a well-grounded theoretical framework, SAIC has been developed and used to identify SCAs in various cancer copy number data sets, providing useful information to study the landscape of cancer genomes. Open-source and platform-independent SAIC software is
Statistical benchmark for BosonSampling

NASA Astrophysics Data System (ADS)

Walschaers, Mattia; Kuipers, Jack; Urbina, Juan-Diego; Mayer, Klaus; Tichy, Malte Christopher; Richter, Klaus; Buchleitner, Andreas

2016-03-01

Boson samplers—set-ups that generate complex many-particle output states through the transmission of elementary many-particle input states across a multitude of mutually coupled modes—promise the efficient quantum simulation of a classically intractable computational task, and challenge the extended Church-Turing thesis, one of the fundamental dogmas of computer science. However, as in all experimental quantum simulations of truly complex systems, one crucial problem remains: how to certify that a given experimental measurement record unambiguously results from enforcing the claimed dynamics, on bosons, fermions or distinguishable particles? Here we offer a statistical solution to the certification problem, identifying an unambiguous statistical signature of many-body quantum interference upon transmission across a multimode, random scattering device. We show that statistical analysis of only partial information on the output state allows to characterise the imparted dynamics through particle type-specific features of the emerging interference patterns. The relevant statistical quantifiers are classically computable, define a falsifiable benchmark for BosonSampling, and reveal distinctive features of many-particle quantum dynamics, which go much beyond mere bunching or anti-bunching effects.
Assessing Attitudes towards Statistics among Medical Students: Psychometric Properties of the Serbian Version of the Survey of Attitudes Towards Statistics (SATS)

PubMed Central

Stanisavljevic, Dejana; Trajkovic, Goran; Marinkovic, Jelena; Bukumiric, Zoran; Cirkovic, Andja; Milic, Natasa

2014-01-01

Background Medical statistics has become important and relevant for future doctors, enabling them to practice evidence based medicine. Recent studies report that students’ attitudes towards statistics play an important role in their statistics achievements. The aim of the study was to test the psychometric properties of the Serbian version of the Survey of Attitudes Towards Statistics (SATS) in order to acquire a valid instrument to measure attitudes inside the Serbian educational context. Methods The validation study was performed on a cohort of 417 medical students who were enrolled in an obligatory introductory statistics course. The SATS adaptation was based on an internationally accepted methodology for translation and cultural adaptation. Psychometric properties of the Serbian version of the SATS were analyzed through the examination of factorial structure and internal consistency. Results Most medical students held positive attitudes towards statistics. The average total SATS score was above neutral (4.3±0.8), and varied from 1.9 to 6.2. Confirmatory factor analysis validated the six-factor structure of the questionnaire (Affect, Cognitive Competence, Value, Difficulty, Interest and Effort). Values for fit indices TLI (0.940) and CFI (0.961) were above the cut-off of ≥0.90. The RMSEA value of 0.064 (0.051–0.078) was below the suggested value of ≤0.08. Cronbach’s alpha of the entire scale was 0.90, indicating scale reliability. In a multivariate regression model, self-rating of ability in mathematics and current grade point average were significantly associated with the total SATS score after adjusting for age and gender. Conclusion Present study provided the evidence for the appropriate metric properties of the Serbian version of SATS. Confirmatory factor analysis validated the six-factor structure of the scale. The SATS might be reliable and a valid instrument for identifying medical students’ attitudes towards statistics in the Serbian
Computed statistics at streamgages, and methods for estimating low-flow frequency statistics and development of regional regression equations for estimating low-flow frequency statistics at ungaged locations in Missouri

USGS Publications Warehouse

Southard, Rodney E.

2013-01-01

located in Region 1, 120 were located in Region 2, and 10 were located in Region 3. Streamgages located outside of Missouri were selected to extend the range of data used for the independent variables in the regression analyses. Streamgages included in the regression analyses had 10 or more years of record and were considered to be affected minimally by anthropogenic activities or trends. Regional regression analyses identified three characteristics as statistically significant for the development of regional equations. For Region 1, drainage area, longest flow path, and streamflow-variability index were statistically significant. The range in the standard error of estimate for Region 1 is 79.6 to 94.2 percent. For Region 2, drainage area and streamflow variability index were statistically significant, and the range in the standard error of estimate is 48.2 to 72.1 percent. For Region 3, drainage area and streamflow-variability index also were statistically significant with a range in the standard error of estimate of 48.1 to 96.2 percent. Limitations on the use of estimating low-flow frequency statistics at ungaged locations are dependent on the method used. The first method outlined for use in Missouri, power curve equations, were developed to estimate the selected statistics for ungaged locations on 28 selected streams with multiple streamgages located on the same stream. A second method uses a drainage-area ratio to compute statistics at an ungaged location using data from a single streamgage on the same stream with 10 or more years of record. Ungaged locations on these streams may use the ratio of the drainage area at an ungaged location to the drainage area at a streamgage location to scale the selected statistic value from the streamgage location to the ungaged location. This method can be used if the drainage area of the ungaged location is within 40 to 150 percent of the streamgage drainage area. The third method is the use of the regional regression equations
Understanding Statistics - Cancer Statistics

Cancer.gov

Annual reports of U.S. cancer statistics including new cases, deaths, trends, survival, prevalence, lifetime risk, and progress toward Healthy People targets, plus statistical summaries for a number of common cancer types.
P-Value Club: Teaching Significance Level on the Dance Floor

ERIC Educational Resources Information Center

Gray, Jennifer

2010-01-01

Courses: Beginning research methods and statistics courses, as well as advanced communication courses that require reading research articles and completing research projects involving statistics. Objective: Students will understand the difference between significant and nonsignificant statistical results based on p-value.
49 CFR 520.5 - Guidelines for identifying major actions significantly affecting the environment.

Code of Federal Regulations, 2012 CFR

2012-10-01

... indirectly result in a significant increase in the problem of solid waste, as in the disposal of motor... directly or indirectly result in a significant increase in noise levels, either within a motor vehicle's... significant increase in the energy or fuel necessary to operate a motor vehicle, including but not limited to...
49 CFR 520.5 - Guidelines for identifying major actions significantly affecting the environment.

Code of Federal Regulations, 2014 CFR

2014-10-01

... indirectly result in a significant increase in the problem of solid waste, as in the disposal of motor... directly or indirectly result in a significant increase in noise levels, either within a motor vehicle's... significant increase in the energy or fuel necessary to operate a motor vehicle, including but not limited to...

49 CFR 520.5 - Guidelines for identifying major actions significantly affecting the environment.

Code of Federal Regulations, 2013 CFR

2013-10-01

... indirectly result in a significant increase in the problem of solid waste, as in the disposal of motor... directly or indirectly result in a significant increase in noise levels, either within a motor vehicle's... significant increase in the energy or fuel necessary to operate a motor vehicle, including but not limited to...
Identifying seismic electirc signals upon significant periodic data loss. The case of Japan

NASA Astrophysics Data System (ADS)

Varotsos, P.; Skordas, E. S.; Sarlis, N. V.; Lazaridou, M. S.

2011-12-01

In many cases of geophysical interest, it happens that for substantial parts of the time of data collection, high noise prevents any attempt for extracting a useful signal. Data for such time segments are removed from further analysis. This is the case, for example, in the geoelectrical field measurements at some sites in Japan, where high noise - due mainly to leakage currents from DC driven trains - prevails almost during 70% of the 24 hour operational time. In particular, the low noise time occurs from 00:00 to 06:00 and from 22:00 to 24:00 local time (LT) when nearby DC driven trains cease service, i.e., almost only 30% of the 24 h. Thus, the question arises whether it is still possible to identify seismic electric signals [P. Varotsos and K. Alexopoulos, Tectonophysics 110 (1984) 73-98; 99-125] upon removing the noisy data segments lasting for the period 06:00 to 22:00 every day. We show that even in such a case, the identification of seismic electric signals, which are long-range correlated signals [PA Varotsos, NV Sarlis and ES Skordas, Phys. Rev. E 66 (2002), 011902], may be possible[PA Varotsos, NV Sarlis and ES Skordas, Tectonophysics 503 (2011) 189-194]. The key point is the use of the following two modern methods: The natural time analysis [PA Varotsos, NV Sarlis and ES Skordas, Natural Time Analysis: The new view of time (2011) Springer-Verlag Berlin-Heidelberg] of the remaining data and the Detrended Fluctuation Analysis (DFA). Our main conclusion states that the distinction between seismic electric signal activities (critical dynamics) and artificial noise becomes possible even after removing periodically a significant portion of the data.
A statistical approach to identify, monitor, and manage incomplete curated data sets.

PubMed

Howe, Douglas G

2018-04-02

Many biological knowledge bases gather data through expert curation of published literature. High data volume, selective partial curation, delays in access, and publication of data prior to the ability to curate it can result in incomplete curation of published data. Knowing which data sets are incomplete and how incomplete they are remains a challenge. Awareness that a data set may be incomplete is important for proper interpretation, to avoiding flawed hypothesis generation, and can justify further exploration of published literature for additional relevant data. Computational methods to assess data set completeness are needed. One such method is presented here. In this work, a multivariate linear regression model was used to identify genes in the Zebrafish Information Network (ZFIN) Database having incomplete curated gene expression data sets. Starting with 36,655 gene records from ZFIN, data aggregation, cleansing, and filtering reduced the set to 9870 gene records suitable for training and testing the model to predict the number of expression experiments per gene. Feature engineering and selection identified the following predictive variables: the number of journal publications; the number of journal publications already attributed for gene expression annotation; the percent of journal publications already attributed for expression data; the gene symbol; and the number of transgenic constructs associated with each gene. Twenty-five percent of the gene records (2483 genes) were used to train the model. The remaining 7387 genes were used to test the model. One hundred and twenty-two and 165 of the 7387 tested genes were identified as missing expression annotations based on their residuals being outside the model lower or upper 95% confidence interval respectively. The model had precision of 0.97 and recall of 0.71 at the negative 95% confidence interval and precision of 0.76 and recall of 0.73 at the positive 95% confidence interval. This method can be used to
Confounding and Statistical Significance of Indirect Effects: Childhood Adversity, Education, Smoking, and Anxious and Depressive Symptomatology.

PubMed

Sheikh, Mashhood Ahmed

2017-01-01

association between childhood adversity and ADS in adulthood. However, when education was excluded as a mediator-response confounding variable, the indirect effect of childhood adversity on ADS in adulthood was statistically significant ( p < 0.05). This study shows that a careful inclusion of potential confounding variables is important when assessing mediation.
Confounding and Statistical Significance of Indirect Effects: Childhood Adversity, Education, Smoking, and Anxious and Depressive Symptomatology

PubMed Central

Sheikh, Mashhood Ahmed

2017-01-01

association between childhood adversity and ADS in adulthood. However, when education was excluded as a mediator-response confounding variable, the indirect effect of childhood adversity on ADS in adulthood was statistically significant (p < 0.05). This study shows that a careful inclusion of potential confounding variables is important when assessing mediation. PMID:28824498
Probability of identification: a statistical model for the validation of qualitative botanical identification methods.

PubMed

LaBudde, Robert A; Harnly, James M

2012-01-01

A qualitative botanical identification method (BIM) is an analytical procedure that returns a binary result (1 = Identified, 0 = Not Identified). A BIM may be used by a buyer, manufacturer, or regulator to determine whether a botanical material being tested is the same as the target (desired) material, or whether it contains excessive nontarget (undesirable) material. The report describes the development and validation of studies for a BIM based on the proportion of replicates identified, or probability of identification (POI), as the basic observed statistic. The statistical procedures proposed for data analysis follow closely those of the probability of detection, and harmonize the statistical concepts and parameters between quantitative and qualitative method validation. Use of POI statistics also harmonizes statistical concepts for botanical, microbiological, toxin, and other analyte identification methods that produce binary results. The POI statistical model provides a tool for graphical representation of response curves for qualitative methods, reporting of descriptive statistics, and application of performance requirements. Single collaborator and multicollaborative study examples are given.
The Misidentified Identifiability Problem of Bayesian Knowledge Tracing

ERIC Educational Resources Information Center

Doroudi, Shayan; Brunskill, Emma

2017-01-01

In this paper, we investigate two purported problems with Bayesian Knowledge Tracing (BKT), a popular statistical model of student learning: "identifiability" and "semantic model degeneracy." In 2007, Beck and Chang stated that BKT is susceptible to an "identifiability problem"--various models with different…
Identifying tectonic parameters that affect tsunamigenesis

NASA Astrophysics Data System (ADS)

van Zelst, I.; Brizzi, S.; Heuret, A.; Funiciello, F.; van Dinther, Y.

2016-12-01

The role of tectonics in tsunami generation is at present poorly understood. However, the fact thatsome regions produce more tsunamis than others indicates that tectonics could influencetsunamigenesis. Here, we complement a global earthquake database that contains geometrical,mechanical, and seismicity parameters of subduction zones with tsunami data. We statisticallyanalyse the database to identify the tectonic parameters that affect tsunamigenesis. The Pearson'sproduct-moment correlation coefficients reveal high positive correlations of 0.65 between,amongst others, the maximum water height of tsunamis and the seismic coupling in a subductionzone. However, these correlations are mainly caused by outliers. The Spearman's rank correlationcoefficient results in statistically significant correlations of 0.60 between the number of tsunamisin a subduction zone and subduction velocity (positive correlation) and the sediment thickness atthe trench (negative correlation). Interestingly, there is a positive correlation between the latter andtsunami magnitude. These bivariate statistical methods are extended to a binary decision tree(BDT) and multivariate analysis. Using the BDT, the tectonic parameters that distinguish betweensubduction zones with tsunamigenic and non-tsunamigenic earthquakes are identified. To assessphysical causality of the tectonic parameters with regard to tsunamigenesis, we complement ouranalysis by a numerical study of the most promising parameters using a geodynamic seismic cyclemodel. We show that the inclusion of sediments on the subducting plate results in an increase insplay fault activity, which could lead to larger vertical seafloor displacements due to their steeperdips and hence a larger tsunamigenic potential. We also show that the splay fault is the preferredrupture path for a strongly velocity strengthening friction regime in the shallow part of thesubduction zone, which again increases the tsunamigenic potential.
Risk of Clinically Significant Prostate Cancer Associated With Prostate Imaging Reporting and Data System Category 3 (Equivocal) Lesions Identified on Multiparametric Prostate MRI.

PubMed

Sheridan, Alison D; Nath, Sameer K; Syed, Jamil S; Aneja, Sanjay; Sprenkle, Preston C; Weinreb, Jeffrey C; Spektor, Michael

2018-02-01

The objective of this study is to determine the frequency of clinically significant cancer (CSC) in Prostate Imaging Reporting and Data System (PI-RADS) category 3 (equivocal) lesions prospectively identified on multiparametric prostate MRI and to identify risk factors (RFs) for CSC that may aid in decision making. Between January 2015 and July 2016, a total of 977 consecutively seen men underwent multiparametric prostate MRI, and 342 underwent MRI-ultrasound (US) fusion targeted biopsy. A total of 474 lesions were retrospectively reviewed, and 111 were scored as PI-RADS category 3 and were visualized using a 3-T MRI scanner. Multiparametric prostate MR images were prospectively interpreted by body subspecialty radiologists trained to use PI-RADS version 2. CSC was defined as a Gleason score of at least 7 on targeted biopsy. A multivariate logistic regression model was constructed to identify the RFs associated with CSC. Of the 111 PI-RADS category 3 lesions, 81 (73.0%) were benign, 11 (9.9%) were clinically insignificant (Gleason score, 6), and 19 (17.1%) were clinically significant. On multivariate analysis, three RFs were identified as significant predictors of CSC: older patient age (odds ratio [OR], 1.13; p = 0.002), smaller prostate volume (OR, 0.94; p = 0.008), and abnormal digital rectal examination (DRE) findings (OR, 3.92; p = 0.03). For PI-RADS category 3 lesions associated with zero, one, two, or three RFs, the risk of CSC was 4%, 16%, 62%, and 100%, respectively. PI-RADS category 3 lesions for which two or more RFs were noted (e.g., age ≥ 70 years, gland size ≤ 36 mL, or abnormal DRE findings) had a CSC detection rate of 67% with a sensitivity of 53%, a specificity of 95%, a positive predictive value of 67%, and a negative predictive value of 91%. Incorporating clinical parameters into risk stratification algorithms may improve the ability to detect clinically significant disease among PI-RADS category 3 lesions and may aid in the decision to
Statistical analyses to support guidelines for marine avian sampling. Final report

USGS Publications Warehouse

Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

2012-01-01

Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical
Simple Statistics: - Summarized!

ERIC Educational Resources Information Center

Blai, Boris, Jr.

Statistics are an essential tool for making proper judgement decisions. It is concerned with probability distribution models, testing of hypotheses, significance tests and other means of determining the correctness of deductions and the most likely outcome of decisions. Measures of central tendency include the mean, median and mode. A second…
Complication rates of ostomy surgery are high and vary significantly between hospitals.

PubMed

Sheetz, Kyle H; Waits, Seth A; Krell, Robert W; Morris, Arden M; Englesbe, Michael J; Mullard, Andrew; Campbell, Darrell A; Hendren, Samantha

2014-05-01

Ostomy surgery is common and has traditionally been associated with high rates of morbidity and mortality, suggesting an important target for quality improvement. The purpose of this work was to evaluate the variation in outcomes after ostomy creation surgery within Michigan to identify targets for quality improvement. This was a retrospective cohort study. The study took place within the 34-hospital Michigan Surgical Quality Collaborative. Patients included were those undergoing ostomy creation surgery between 2006 and 2011. We evaluated hospital morbidity and mortality rates after risk adjustment (age, comorbidities, emergency vs elective, and procedure type). A total of 4250 patients underwent ostomy creation surgery; 3866 procedures (91.0%) were open and 384 (9.0%) were laparoscopic. Unadjusted morbidity and mortality rates were 43.9% and 10.7%. Unadjusted morbidity rates for specific procedures ranged from 32.7% for ostomy-creation-only procedures to 47.8% for Hartmann procedures. Risk-adjusted morbidity rates varied significantly between hospitals, ranging from 31.2% (95% CI, 18.4-43.9) to 60.8% (95% CI, 48.9-72.6). There were 5 statistically significant high-outlier hospitals and 3 statistically significant low-outlier hospitals for risk-adjusted morbidity. The pattern of complication types was similar between high- and low-outlier hospitals. Case volume, operative duration, and use of laparoscopic surgery did not explain the variation in morbidity rates across hospitals. This work was limited by its retrospective study design, by unmeasured variation in case severity, and by our inability to differentiate between colostomies and ileostomies because of the use of Current Procedural Terminology codes. Morbidity and mortality rates for modern ostomy surgery are high. Although this type of surgery has received little attention in healthcare policy, these data reveal that it is both common and uncommonly morbid. Variation in hospital performance provides an
[The principal components analysis--method to classify the statistical variables with applications in medicine].

PubMed

Dascălu, Cristina Gena; Antohe, Magda Ecaterina

2009-01-01

Based on the eigenvalues and the eigenvectors analysis, the principal component analysis has the purpose to identify the subspace of the main components from a set of parameters, which are enough to characterize the whole set of parameters. Interpreting the data for analysis as a cloud of points, we find through geometrical transformations the directions where the cloud's dispersion is maximal--the lines that pass through the cloud's center of weight and have a maximal density of points around them (by defining an appropriate criteria function and its minimization. This method can be successfully used in order to simplify the statistical analysis on questionnaires--because it helps us to select from a set of items only the most relevant ones, which cover the variations of the whole set of data. For instance, in the presented sample we started from a questionnaire with 28 items and, applying the principal component analysis we identified 7 principal components--or main items--fact that simplifies significantly the further data statistical analysis.
Identification of reliable gridded reference data for statistical downscaling methods in Alberta

NASA Astrophysics Data System (ADS)

Eum, H. I.; Gupta, A.

2017-12-01

Climate models provide essential information to assess impacts of climate change at regional and global scales. However, statistical downscaling methods have been applied to prepare climate model data for various applications such as hydrologic and ecologic modelling at a watershed scale. As the reliability and (spatial and temporal) resolution of statistically downscaled climate data mainly depend on a reference data, identifying the most reliable reference data is crucial for statistical downscaling. A growing number of gridded climate products are available for key climate variables which are main input data to regional modelling systems. However, inconsistencies in these climate products, for example, different combinations of climate variables, varying data domains and data lengths and data accuracy varying with physiographic characteristics of the landscape, have caused significant challenges in selecting the most suitable reference climate data for various environmental studies and modelling. Employing various observation-based daily gridded climate products available in public domain, i.e. thin plate spline regression products (ANUSPLIN and TPS), inverse distance method (Alberta Townships), and numerical climate model (North American Regional Reanalysis) and an optimum interpolation technique (Canadian Precipitation Analysis), this study evaluates the accuracy of the climate products at each grid point by comparing with the Adjusted and Homogenized Canadian Climate Data (AHCCD) observations for precipitation, minimum and maximum temperature over the province of Alberta. Based on the performance of climate products at AHCCD stations, we ranked the reliability of these publically available climate products corresponding to the elevations of stations discretized into several classes. According to the rank of climate products for each elevation class, we identified the most reliable climate products based on the elevation of target points. A web-based system
Algorithm for Identifying Erroneous Rain-Gauge Readings

NASA Technical Reports Server (NTRS)

Rickman, Doug

2005-01-01

An algorithm analyzes rain-gauge data to identify statistical outliers that could be deemed to be erroneous readings. Heretofore, analyses of this type have been performed in burdensome manual procedures that have involved subjective judgements. Sometimes, the analyses have included computational assistance for detecting values falling outside of arbitrary limits. The analyses have been performed without statistically valid knowledge of the spatial and temporal variations of precipitation within rain events. In contrast, the present algorithm makes it possible to automate such an analysis, makes the analysis objective, takes account of the spatial distribution of rain gauges in conjunction with the statistical nature of spatial variations in rainfall readings, and minimizes the use of arbitrary criteria. The algorithm implements an iterative process that involves nonparametric statistics.
New insights into old methods for identifying causal rare variants.

PubMed

Wang, Haitian; Huang, Chien-Hsun; Lo, Shaw-Hwa; Zheng, Tian; Hu, Inchi

2011-11-29

The advance of high-throughput next-generation sequencing technology makes possible the analysis of rare variants. However, the investigation of rare variants in unrelated-individuals data sets faces the challenge of low power, and most methods circumvent the difficulty by using various collapsing procedures based on genes, pathways, or gene clusters. We suggest a new way to identify causal rare variants using the F-statistic and sliced inverse regression. The procedure is tested on the data set provided by the Genetic Analysis Workshop 17 (GAW17). After preliminary data reduction, we ranked markers according to their F-statistic values. Top-ranked markers were then subjected to sliced inverse regression, and those with higher absolute coefficients in the most significant sliced inverse regression direction were selected. The procedure yields good false discovery rates for the GAW17 data and thus is a promising method for future study on rare variants.
Fully Bayesian tests of neutrality using genealogical summary statistics.

PubMed

Drummond, Alexei J; Suchard, Marc A

2008-10-31

Many data summary statistics have been developed to detect departures from neutral expectations of evolutionary models. However questions about the neutrality of the evolution of genetic loci within natural populations remain difficult to assess. One critical cause of this difficulty is that most methods for testing neutrality make simplifying assumptions simultaneously about the mutational model and the population size model. Consequentially, rejecting the null hypothesis of neutrality under these methods could result from violations of either or both assumptions, making interpretation troublesome. Here we harness posterior predictive simulation to exploit summary statistics of both the data and model parameters to test the goodness-of-fit of standard models of evolution. We apply the method to test the selective neutrality of molecular evolution in non-recombining gene genealogies and we demonstrate the utility of our method on four real data sets, identifying significant departures of neutrality in human influenza A virus, even after controlling for variation in population size. Importantly, by employing a full model-based Bayesian analysis, our method separates the effects of demography from the effects of selection. The method also allows multiple summary statistics to be used in concert, thus potentially increasing sensitivity. Furthermore, our method remains useful in situations where analytical expectations and variances of summary statistics are not available. This aspect has great potential for the analysis of temporally spaced data, an expanding area previously ignored for limited availability of theory and methods.
A statistical approach to selecting and confirming validation targets in -omics experiments

PubMed Central

2012-01-01

Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Joint source based morphometry identifies linked gray and white matter group differences.

PubMed

Xu, Lai; Pearlson, Godfrey; Calhoun, Vince D

2009-02-01

We present a multivariate approach called joint source based morphometry (jSBM), to identify linked gray and white matter regions which differ between groups. In jSBM, joint independent component analysis (jICA) is used to decompose preprocessed gray and white matter images into joint sources and statistical analysis is used to determine the significant joint sources showing group differences and their relationship to other variables of interest (e.g. age or sex). The identified joint sources are groupings of linked gray and white matter regions with common covariation among subjects. In this study, we first provide a simulation to validate the jSBM approach. To illustrate our method on real data, jSBM is then applied to structural magnetic resonance imaging (sMRI) data obtained from 120 chronic schizophrenia patients and 120 healthy controls to identify group differences. JSBM identified four joint sources as significantly associated with schizophrenia. Linked gray-white matter regions identified in each of the joint sources included: 1) temporal--corpus callosum, 2) occipital/frontal--inferior fronto-occipital fasciculus, 3) frontal/parietal/occipital/temporal--superior longitudinal fasciculus and 4) parietal/frontal--thalamus. Age effects on all four joint sources were significant, but sex effects were significant only for the third joint source. Our findings demonstrate that jSBM can exploit the natural linkage between gray and white matter by incorporating them into a unified framework. This approach is applicable to a wide variety of problems to study linked gray and white matter group differences.
Joint source based morphometry identifies linked gray and white matter group differences

PubMed Central

Xu, Lai; Pearlson, Godfrey; Calhoun, Vince D.

2009-01-01

We present a multivariate approach called joint source based morphometry (jSBM), to identify linked gray and white matter regions which differ between groups. In jSBM, joint independent component analysis (jICA) is used to decompose preprocessed gray and white matter images into joint sources and statistical analysis is used to determine the significant joint sources showing group differences and their relationship to other variables of interest (e.g. age or sex). The identified joint sources are groupings of linked gray and white matter regions with common covariation among subjects. In this study, we first provide a simulation to validate the jSBM approach. To illustrate our method on real data, jSBM is then applied to structural magnetic resonance imaging (sMRI) data obtained from 120 chronic schizophrenia patients and 120 healthy controls to identify group differences. JSBM identified four joint sources as significantly associated with schizophrenia. Linked gray–white matter regions identified in each of the joint sources included: 1) temporal — corpus callosum, 2) occipital/frontal — inferior fronto-occipital fasciculus, 3) frontal/parietal/occipital/temporal —superior longitudinal fasciculus and 4) parietal/frontal — thalamus. Age effects on all four joint sources were significant, but sex effects were significant only for the third joint source. Our findings demonstrate that jSBM can exploit the natural linkage between gray and white matter by incorporating them into a unified framework. This approach is applicable to a wide variety of problems to study linked gray and white matter group differences. PMID:18992825

Predicting Success in Psychological Statistics Courses.

PubMed

Lester, David

2016-06-01

Many students perform poorly in courses on psychological statistics, and it is useful to be able to predict which students will have difficulties. In a study of 93 undergraduates enrolled in Statistical Methods (18 men, 75 women; M age = 22.0 years, SD = 5.1), performance was significantly associated with sex (female students performed better) and proficiency in algebra in a linear regression analysis. Anxiety about statistics was not associated with course performance, indicating that basic mathematical skills are the best correlate for performance in statistics courses and can usefully be used to stream students into classes by ability. © The Author(s) 2016.
A Statistics Curriculum for the Undergraduate Chemistry Major

ERIC Educational Resources Information Center

Schlotter, Nicholas E.

2013-01-01

Our ability to statistically analyze data has grown significantly with the maturing of computer hardware and software. However, the evolution of our statistics capabilities has taken place without a corresponding evolution in the curriculum for the undergraduate chemistry major. Most faculty understands the need for a statistical educational…
Seeing Prehistory through New Lenses: Using Geophysical and Statistical Analysis to Identify Fresh Perspectives of a 15th Century Mandan Occupation

NASA Astrophysics Data System (ADS)

Mitchum, Amber Marie

Great Plains prehistoric research has evolved over the course of a century, with many sites like Huff Village (32MO11) in North Dakota recently coming back to the forefront of discussion through new technological applications. Through a majority of its studies and excavations, Huff Village appeared to endure as the final stage in the Middle Missouri tradition. Long thought to reflect only systematically placed long-rectangular structure types of its Middle Missouri predecessors, recent magnetic gradiometry and topographic mapping data revealed circular structure types that deviated from long-held traditions, highlighting new associations with Coalescent groups. A compact system for food capacity was also discovered, with more than 1,500 storage pits visible inside and outside of all structures delineated. Archaeological applications of these new technologies have provided a near-complete picture of this 15th century Mandan expression, allowing new questions to be raised about its previous taxonomic placement. Using a combination of GIS and statistical analysis, an attempt is made to quantitatively examine if it truly represented the Terminal Middle Missouri variant, or if Huff diverted in new directions. Statistical analysis disagrees with previous conclusions that a patterned layout of structures existed, significant clustering shown through point pattern analysis and Ripley’s K function amongst structures. Clustering of external storage pits also resulted from similar analysis, highlighting a connection between external storage features and the structures they surrounded. A combination of documented defensive features, a much higher estimation of caloric support for a population present, and a short occupation lead us to believe that a significant transition was occurring that incorporated attributes of both the Middle Missouri tradition as well as the Coalescent tradition. With more refined taxonomies currently developing, it is hoped that these data will help
Beyond imperviousness: A statistical approach to identifying functional differences between development morphologies on variable source area-type response in urbanized watersheds

NASA Astrophysics Data System (ADS)

Lim, T. C.

2016-12-01

Empirical evidence has shown linkages between urbanization, hydrological regime change, and degradation of water quality and aquatic habitat. Percent imperviousness, has long been suggested as the dominant source of these negative changes. However, recent research identifying alternative pathways of runoff production at the watershed scale have called into question percent impervious surface area's primacy in urban runoff production compared to other aspects of urbanization including change in vegetative cover, imported water and water leakages, and the presence of drainage infrastructure. In this research I show how a robust statistical methodology can detect evidence of variable source area (VSA)-type hydrologic response associated with incremental hydraulic connectivity in watersheds. I then use logistic regression to explore how evidence of VSA-type response relates to the physical and meterological characteristics of the watershed. I find that impervious surface area is highly correlated with development, but does not add significant explanatory power beyond percent developed in predicting VSA-type response. Other aspects of development morphology, including percent developed open space and type of drainage infrastructure also do not add to the explanatory power of undeveloped land in predicting VSA-type response. Within only developed areas, the effect of developed open space was found to be more similar to that of total impervious area than to undeveloped land. These findings were consistent when tested across a national cross-section of urbanized watersheds, a higher resolution dataset of Baltimore Metropolitan Area watersheds, and a subsample of watersheds confirmed not to be served by combined sewer systems. These findings suggest that land development policies that focus on lot coverage should be revisited, and more focus should be placed on preserving native vegetation and soil conditions alongside development.
Accuracy Rates of Ancestry Estimation by Forensic Anthropologists Using Identified Forensic Cases.

PubMed

Thomas, Richard M; Parks, Connie L; Richard, Adam H

2017-07-01

A common task in forensic anthropology involves the estimation of the ancestry of a decedent by comparing their skeletal morphology and measurements to skeletons of individuals from known geographic groups. However, the accuracy rates of ancestry estimation methods in actual forensic casework have rarely been studied. This article uses 99 forensic cases with identified skeletal remains to develop accuracy rates for ancestry estimations conducted by forensic anthropologists. The overall rate of correct ancestry estimation from these cases is 90.9%, which is comparable to most research-derived rates and those reported by individual practitioners. Statistical tests showed no significant difference in accuracy rates depending on examiner education level or on the estimated or identified ancestry. More recent cases showed a significantly higher accuracy rate. The incorporation of metric analyses into the ancestry estimate in these cases led to a higher accuracy rate. © 2017 American Academy of Forensic Sciences.
a Statistical Theory of the Epilepsies.

NASA Astrophysics Data System (ADS)

Thomas, Kuryan

1988-12-01

A new physical and mathematical model for the epilepsies is proposed, based on the theory of bond percolation on finite lattices. Within this model, the onset of seizures in the brain is identified with the appearance of spanning clusters of neurons engaged in the spurious and uncontrollable electrical activity characteristic of seizures. It is proposed that the fraction of excitatory to inhibitory synapses can be identified with a bond probability, and that the bond probability is a randomly varying quantity displaying Gaussian statistics. The consequences of the proposed model to the treatment of the epilepsies is explored. The nature of the data on the epilepsies which can be acquired in a clinical setting is described. It is shown that such data can be analyzed to provide preliminary support for the bond percolation hypothesis, and to quantify the efficacy of anti-epileptic drugs in a treatment program. The results of a battery of statistical tests on seizure distributions are discussed. The physical theory of the electroencephalogram (EEG) is described, and extant models of the electrical activity measured by the EEG are discussed, with an emphasis on their physical behavior. A proposal is made to explain the difference between the power spectra of electrical activity measured with cranial probes and with the EEG. Statistical tests on the characteristic EEG manifestations of epileptic activity are conducted, and their results described. Computer simulations of a correlated bond percolating system are constructed. It is shown that the statistical properties of the results of such a simulation are strongly suggestive of the statistical properties of clinical data. The study finds no contradictions between the predictions of the bond percolation model and the observed properties of the available data. Suggestions are made for further research and for techniques based on the proposed model which may be used for tuning the effects of anti -epileptic drugs.
A Tablet-PC Software Application for Statistics Classes

ERIC Educational Resources Information Center

Probst, Alexandre C.

2014-01-01

A significant deficiency in the area of introductory statistics education exists: Student performance on standardized assessments after a full semester statistics course is poor and students report a very low desire to learn statistics. Research on the current generation of students indicates an affinity for technology and for multitasking.…
Clinical progress of human papillomavirus genotypes and their persistent infection in subjects with atypical squamous cells of undetermined significance cytology: Statistical and latent Dirichlet allocation analysis

PubMed Central

Kim, Yee Suk; Lee, Sungin; Zong, Nansu; Kahng, Jimin

2017-01-01

The present study aimed to investigate differences in prognosis based on human papillomavirus (HPV) infection, persistent infection and genotype variations for patients exhibiting atypical squamous cells of undetermined significance (ASCUS) in their initial Papanicolaou (PAP) test results. A latent Dirichlet allocation (LDA)-based tool was developed that may offer a facilitated means of communication to be employed during patient-doctor consultations. The present study assessed 491 patients (139 HPV-positive and 352 HPV-negative cases) with a PAP test result of ASCUS with a follow-up period ≥2 years. Patients underwent PAP and HPV DNA chip tests between January 2006 and January 2009. The HPV-positive subjects were followed up with at least 2 instances of PAP and HPV DNA chip tests. The most common genotypes observed were HPV-16 (25.9%, 36/139), HPV-52 (14.4%, 20/139), HPV-58 (13.7%, 19/139), HPV-56 (11.5%, 16/139), HPV-51 (9.4%, 13/139) and HPV-18 (8.6%, 12/139). A total of 33.3% (12/36) patients positive for HPV-16 had cervical intraepithelial neoplasia (CIN)2 or a worse result, which was significantly higher than the prevalence of CIN2 of 1.8% (8/455) in patients negative for HPV-16 (P<0.001), while no significant association was identified for other genotypes in terms of genotype and clinical progress. There was a significant association between clearance and good prognosis (P<0.001). Persistent infection was higher in patients aged ≥51 years (38.7%) than in those aged ≤50 years (20.4%; P=0.036). Progression from persistent infection to CIN2 or worse (19/34, 55.9%) was higher than clearance (0/105, 0.0%; P<0.001). In the LDA analysis, using symmetric Dirichlet priors α=0.1 and β=0.01, and clusters (k)=5 or 10 provided the most meaningful groupings. Statistical and LDA analyses produced consistent results regarding the association between persistent infection of HPV-16, old age and long infection period with a clinical progression of CIN2 or worse
Cutpoints for Low Appendicular Lean Mass That Identify Older Adults With Clinically Significant Weakness

PubMed Central

Peters, Katherine W.; Shardell, Michelle D.; McLean, Robert R.; Dam, Thuy-Tien L.; Kenny, Anne M.; Fragala, Maren S.; Harris, Tamara B.; Kiel, Douglas P.; Guralnik, Jack M.; Ferrucci, Luigi; Kritchevsky, Stephen B.; Vassileva, Maria T.; Studenski, Stephanie A.; Alley, Dawn E.

2014-01-01

Background. Low lean mass is potentially clinically important in older persons, but criteria have not been empirically validated. As part of the FNIH (Foundation for the National Institutes of Health) Sarcopenia Project, this analysis sought to identify cutpoints in lean mass by dual-energy x-ray absorptiometry that discriminate the presence or absence of weakness (defined in a previous report in the series as grip strength <26kg in men and <16kg in women). Methods. In pooled cross-sectional data stratified by sex (7,582 men and 3,688 women), classification and regression tree (CART) analysis was used to derive cutpoints for appendicular lean body mass (ALM) that best discriminated the presence or absence of weakness. Mixed-effects logistic regression was used to quantify the strength of the association between lean mass category and weakness. Results. In primary analyses, CART models identified cutpoints for low lean mass (ALM <19.75kg in men and <15.02kg in women). Sensitivity analyses using ALM divided by body mass index (BMI: ALMBMI) identified a secondary definition (ALMBMI <0.789 in men and ALMBMI <0.512 in women). As expected, after accounting for study and age, low lean mass (compared with higher lean mass) was associated with weakness by both the primary (men, odds ratio [OR]: 6.9 [95% CI: 5.4, 8.9]; women, OR: 3.6 [95% CI: 2.9, 4.3]) and secondary definitions (men, OR: 4.3 [95% CI: 3.4, 5.5]; women, OR: 2.2 [95% CI: 1.8, 2.8]). Conclusions. ALM cutpoints derived from a large, diverse sample of older adults identified lean mass thresholds below which older adults had a higher likelihood of weakness. PMID:24737559
Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

PubMed Central

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

PubMed

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data
Identifiability of PBPK Models with Applications to Dimethylarsinic Acid Exposure

EPA Science Inventory

Any statistical model should be identifiable in order for estimates and tests using it to be meaningful. We consider statistical analysis of physiologically-based pharmacokinetic (PBPK) models in which parameters cannot be estimated precisely from available data, and discuss diff...
The Relationship Between Procrastination, Learning Strategies and Statistics Anxiety Among Iranian College Students: A Canonical Correlation Analysis

PubMed Central

Vahedi, Shahrum; Farrokhi, Farahman; Gahramani, Farahnaz; Issazadegan, Ali

2012-01-01

Objective: Approximately 66-80%of graduate students experience statistics anxiety and some researchers propose that many students identify statistics courses as the most anxiety-inducing courses in their academic curriculums. As such, it is likely that statistics anxiety is, in part, responsible for many students delaying enrollment in these courses for as long as possible. This paper proposes a canonical model by treating academic procrastination (AP), learning strategies (LS) as predictor variables and statistics anxiety (SA) as explained variables. Methods: A questionnaire survey was used for data collection and 246-college female student participated in this study. To examine the mutually independent relations between procrastination, learning strategies and statistics anxiety variables, a canonical correlation analysis was computed. Results: Findings show that two canonical functions were statistically significant. The set of variables (metacognitive self-regulation, source management, preparing homework, preparing for test and preparing term papers) helped predict changes of statistics anxiety with respect to fearful behavior, Attitude towards math and class, Performance, but not Anxiety. Conclusion: These findings could be used in educational and psychological interventions in the context of statistics anxiety reduction. PMID:24644468
Across-cohort QC analyses of GWAS summary statistics from complex traits.

PubMed

Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

2016-01-01

Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics F st statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy.
Across-cohort QC analyses of GWAS summary statistics from complex traits

PubMed Central

Chen, Guo-Bo; Lee, Sang Hong; Robinson, Matthew R; Trzaskowski, Maciej; Zhu, Zhi-Xiang; Winkler, Thomas W; Day, Felix R; Croteau-Chonka, Damien C; Wood, Andrew R; Locke, Adam E; Kutalik, Zoltán; Loos, Ruth J F; Frayling, Timothy M; Hirschhorn, Joel N; Yang, Jian; Wray, Naomi R; Visscher, Peter M

2017-01-01

Genome-wide association studies (GWASs) have been successful in discovering SNP trait associations for many quantitative traits and common diseases. Typically, the effect sizes of SNP alleles are very small and this requires large genome-wide association meta-analyses (GWAMAs) to maximize statistical power. A trend towards ever-larger GWAMA is likely to continue, yet dealing with summary statistics from hundreds of cohorts increases logistical and quality control problems, including unknown sample overlap, and these can lead to both false positive and false negative findings. In this study, we propose four metrics and visualization tools for GWAMA, using summary statistics from cohort-level GWASs. We propose methods to examine the concordance between demographic information, and summary statistics and methods to investigate sample overlap. (I) We use the population genetics Fst statistic to verify the genetic origin of each cohort and their geographic location, and demonstrate using GWAMA data from the GIANT Consortium that geographic locations of cohorts can be recovered and outlier cohorts can be detected. (II) We conduct principal component analysis based on reported allele frequencies, and are able to recover the ancestral information for each cohort. (III) We propose a new statistic that uses the reported allelic effect sizes and their standard errors to identify significant sample overlap or heterogeneity between pairs of cohorts. (IV) To quantify unknown sample overlap across all pairs of cohorts, we propose a method that uses randomly generated genetic predictors that does not require the sharing of individual-level genotype data and does not breach individual privacy. PMID:27552965
Evaluation of Solid Rocket Motor Component Data Using a Commercially Available Statistical Software Package

NASA Technical Reports Server (NTRS)

Stefanski, Philip L.

2015-01-01

Commercially available software packages today allow users to quickly perform the routine evaluations of (1) descriptive statistics to numerically and graphically summarize both sample and population data, (2) inferential statistics that draws conclusions about a given population from samples taken of it, (3) probability determinations that can be used to generate estimates of reliability allowables, and finally (4) the setup of designed experiments and analysis of their data to identify significant material and process characteristics for application in both product manufacturing and performance enhancement. This paper presents examples of analysis and experimental design work that has been conducted using Statgraphics®(Registered Trademark) statistical software to obtain useful information with regard to solid rocket motor propellants and internal insulation material. Data were obtained from a number of programs (Shuttle, Constellation, and Space Launch System) and sources that include solid propellant burn rate strands, tensile specimens, sub-scale test motors, full-scale operational motors, rubber insulation specimens, and sub-scale rubber insulation analog samples. Besides facilitating the experimental design process to yield meaningful results, statistical software has demonstrated its ability to quickly perform complex data analyses and yield significant findings that might otherwise have gone unnoticed. One caveat to these successes is that useful results not only derive from the inherent power of the software package, but also from the skill and understanding of the data analyst.
An operational definition of a statistically meaningful trend.

PubMed

Bryhn, Andreas C; Dimberg, Peter H

2011-04-28

Linear trend analysis of time series is standard procedure in many scientific disciplines. If the number of data is large, a trend may be statistically significant even if data are scattered far from the trend line. This study introduces and tests a quality criterion for time trends referred to as statistical meaningfulness, which is a stricter quality criterion for trends than high statistical significance. The time series is divided into intervals and interval mean values are calculated. Thereafter, r(2) and p values are calculated from regressions concerning time and interval mean values. If r(2) ≥ 0.65 at p ≤ 0.05 in any of these regressions, then the trend is regarded as statistically meaningful. Out of ten investigated time series from different scientific disciplines, five displayed statistically meaningful trends. A Microsoft Excel application (add-in) was developed which can perform statistical meaningfulness tests and which may increase the operationality of the test. The presented method for distinguishing statistically meaningful trends should be reasonably uncomplicated for researchers with basic statistics skills and may thus be useful for determining which trends are worth analysing further, for instance with respect to causal factors. The method can also be used for determining which segments of a time trend may be particularly worthwhile to focus on.
Mutual interference between statistical summary perception and statistical learning.

PubMed

Zhao, Jiaying; Ngo, Nhi; McKendrick, Ryan; Turk-Browne, Nicholas B

2011-09-01

The visual system is an efficient statistician, extracting statistical summaries over sets of objects (statistical summary perception) and statistical regularities among individual objects (statistical learning). Although these two kinds of statistical processing have been studied extensively in isolation, their relationship is not yet understood. We first examined how statistical summary perception influences statistical learning by manipulating the task that participants performed over sets of objects containing statistical regularities (Experiment 1). Participants who performed a summary task showed no statistical learning of the regularities, whereas those who performed control tasks showed robust learning. We then examined how statistical learning influences statistical summary perception by manipulating whether the sets being summarized contained regularities (Experiment 2) and whether such regularities had already been learned (Experiment 3). The accuracy of summary judgments improved when regularities were removed and when learning had occurred in advance. In sum, calculating summary statistics impeded statistical learning, and extracting statistical regularities impeded statistical summary perception. This mutual interference suggests that statistical summary perception and statistical learning are fundamentally related.
Unsupervised Scalable Statistical Method for Identifying Influential Users in Online Social Networks.

PubMed

Azcorra, A; Chiroque, L F; Cuevas, R; Fernández Anta, A; Laniado, H; Lillo, R E; Romo, J; Sguera, C

2018-05-03

Billions of users interact intensively every day via Online Social Networks (OSNs) such as Facebook, Twitter, or Google+. This makes OSNs an invaluable source of information, and channel of actuation, for sectors like advertising, marketing, or politics. To get the most of OSNs, analysts need to identify influential users that can be leveraged for promoting products, distributing messages, or improving the image of companies. In this report we propose a new unsupervised method, Massive Unsupervised Outlier Detection (MUOD), based on outliers detection, for providing support in the identification of influential users. MUOD is scalable, and can hence be used in large OSNs. Moreover, it labels the outliers as of shape, magnitude, or amplitude, depending of their features. This allows classifying the outlier users in multiple different classes, which are likely to include different types of influential users. Applying MUOD to a subset of roughly 400 million Google+ users, it has allowed identifying and discriminating automatically sets of outlier users, which present features associated to different definitions of influential users, like capacity to attract engagement, capacity to attract a large number of followers, or high infection capacity.
Spectral and cross-spectral analysis of uneven time series with the smoothed Lomb-Scargle periodogram and Monte Carlo evaluation of statistical significance

NASA Astrophysics Data System (ADS)

Pardo-Igúzquiza, Eulogio; Rodríguez-Tovar, Francisco J.

2012-12-01

Many spectral analysis techniques have been designed assuming sequences taken with a constant sampling interval. However, there are empirical time series in the geosciences (sediment cores, fossil abundance data, isotope analysis, …) that do not follow regular sampling because of missing data, gapped data, random sampling or incomplete sequences, among other reasons. In general, interpolating an uneven series in order to obtain a succession with a constant sampling interval alters the spectral content of the series. In such cases it is preferable to follow an approach that works with the uneven data directly, avoiding the need for an explicit interpolation step. The Lomb-Scargle periodogram is a popular choice in such circumstances, as there are programs available in the public domain for its computation. One new computer program for spectral analysis improves the standard Lomb-Scargle periodogram approach in two ways: (1) It explicitly adjusts the statistical significance to any bias introduced by variance reduction smoothing, and (2) it uses a permutation test to evaluate confidence levels, which is better suited than parametric methods when neighbouring frequencies are highly correlated. Another novel program for cross-spectral analysis offers the advantage of estimating the Lomb-Scargle cross-periodogram of two uneven time series defined on the same interval, and it evaluates the confidence levels of the estimated cross-spectra by a non-parametric computer intensive permutation test. Thus, the cross-spectrum, the squared coherence spectrum, the phase spectrum, and the Monte Carlo statistical significance of the cross-spectrum and the squared-coherence spectrum can be obtained. Both of the programs are written in ANSI Fortran 77, in view of its simplicity and compatibility. The program code is of public domain, provided on the website of the journal (http://www.iamg.org/index.php/publisher/articleview/frmArticleID/112/). Different examples (with simulated and

MetaboLyzer: A Novel Statistical Workflow for Analyzing Post-Processed LC/MS Metabolomics Data

PubMed Central

Mak, Tytus D.; Laiakis, Evagelia C.; Goudarzi, Maryam; Fornace, Albert J.

2014-01-01

Metabolomics, the global study of small molecules in a particular system, has in the last few years risen to become a primary –omics platform for the study of metabolic processes. With the ever-increasing pool of quantitative data yielded from metabolomic research, specialized methods and tools with which to analyze and extract meaningful conclusions from these data are becoming more and more crucial. Furthermore, the depth of knowledge and expertise required to undertake a metabolomics oriented study is a daunting obstacle to investigators new to the field. As such, we have created a new statistical analysis workflow, MetaboLyzer, which aims to both simplify analysis for investigators new to metabolomics, as well as provide experienced investigators the flexibility to conduct sophisticated analysis. MetaboLyzer’s workflow is specifically tailored to the unique characteristics and idiosyncrasies of postprocessed liquid chromatography/mass spectrometry (LC/MS) based metabolomic datasets. It utilizes a wide gamut of statistical tests, procedures, and methodologies that belong to classical biostatistics, as well as several novel statistical techniques that we have developed specifically for metabolomics data. Furthermore, MetaboLyzer conducts rapid putative ion identification and putative biologically relevant analysis via incorporation of four major small molecule databases: KEGG, HMDB, Lipid Maps, and BioCyc. MetaboLyzer incorporates these aspects into a comprehensive workflow that outputs easy to understand statistically significant and potentially biologically relevant information in the form of heatmaps, volcano plots, 3D visualization plots, correlation maps, and metabolic pathway hit histograms. For demonstration purposes, a urine metabolomics data set from a previously reported radiobiology study in which samples were collected from mice exposed to gamma radiation was analyzed. MetaboLyzer was able to identify 243 statistically significant ions out of a
A critique of the use of indicator-species scores for identifying thresholds in species responses

USGS Publications Warehouse

Cuffney, Thomas F.; Qian, Song S.

2013-01-01

Identification of ecological thresholds is important both for theoretical and applied ecology. Recently, Baker and King (2010, King and Baker 2010) proposed a method, threshold indicator analysis (TITAN), to calculate species and community thresholds based on indicator species scores adapted from Dufrêne and Legendre (1997). We tested the ability of TITAN to detect thresholds using models with (broken-stick, disjointed broken-stick, dose-response, step-function, Gaussian) and without (linear) definitive thresholds. TITAN accurately and consistently detected thresholds in step-function models, but not in models characterized by abrupt changes in response slopes or response direction. Threshold detection in TITAN was very sensitive to the distribution of 0 values, which caused TITAN to identify thresholds associated with relatively small differences in the distribution of 0 values while ignoring thresholds associated with large changes in abundance. Threshold identification and tests of statistical significance were based on the same data permutations resulting in inflated estimates of statistical significance. Application of bootstrapping to the split-point problem that underlies TITAN led to underestimates of the confidence intervals of thresholds. Bias in the derivation of the z-scores used to identify TITAN thresholds and skewedness in the distribution of data along the gradient produced TITAN thresholds that were much more similar than the actual thresholds. This tendency may account for the synchronicity of thresholds reported in TITAN analyses. The thresholds identified by TITAN represented disparate characteristics of species responses that, when coupled with the inability of TITAN to identify thresholds accurately and consistently, does not support the aggregation of individual species thresholds into a community threshold.
The unrealized promise of infant statistical word-referent learning

PubMed Central

Smith, Linda B.; Suanda, Sumarga H.; Yu, Chen

2014-01-01

Recent theory and experiments offer a new solution as to how infant learners may break into word learning, by using cross-situational statistics to find the underlying word-referent mappings. Computational models demonstrate the in-principle plausibility of this statistical learning solution and experimental evidence shows that infants can aggregate and make statistically appropriate decisions from word-referent co-occurrence data. We review these contributions and then identify the gaps in current knowledge that prevent a confident conclusion about whether cross-situational learning is the mechanism through which infants break into word learning. We propose an agenda to address that gap that focuses on detailing the statistics in the learning environment and the cognitive processes that make use of those statistics. PMID:24637154
Statistical-dynamical modeling of the cloud-to-ground lightning activity in Portugal

NASA Astrophysics Data System (ADS)

Sousa, J. F.; Fragoso, M.; Mendes, S.; Corte-Real, J.; Santos, J. A.

2013-10-01

The present study employs a dataset of cloud-to-ground discharges over Portugal, collected by the Portuguese lightning detection network in the period of 2003-2009, to identify dynamically coherent lightning regimes in Portugal and to implement a statistical-dynamical modeling of the daily discharges over the country. For this purpose, the high-resolution MERRA reanalysis is used. Three lightning regimes are then identified for Portugal: WREG, WREM and SREG. WREG is a typical cold-core cut-off low. WREM is connected to strong frontal systems driven by remote low pressure systems at higher latitudes over the North Atlantic. SREG is a combination of an inverted trough and a mid-tropospheric cold-core nearby Portugal. The statistical-dynamical modeling is based on logistic regressions (statistical component) developed for each regime separately (dynamical component). It is shown that the strength of the lightning activity (either strong or weak) for each regime is consistently modeled by a set of suitable dynamical predictors (65-70% of efficiency). The difference of the equivalent potential temperature in the 700-500 hPa layer is the best predictor for the three regimes, while the best 4-layer lifted index is still important for all regimes, but with much weaker significance. Six other predictors are more suitable for a specific regime. For the purpose of validating the modeling approach, a regional-scale climate model simulation is carried out under a very intense lightning episode.
TGMI: an efficient algorithm for identifying pathway regulators through evaluation of triple-gene mutual interaction

PubMed Central

Gunasekara, Chathura; Zhang, Kui; Deng, Wenping; Brown, Laura

2018-01-01

Abstract Despite their important roles, the regulators for most metabolic pathways and biological processes remain elusive. Presently, the methods for identifying metabolic pathway and biological process regulators are intensively sought after. We developed a novel algorithm called triple-gene mutual interaction (TGMI) for identifying these regulators using high-throughput gene expression data. It first calculated the regulatory interactions among triple gene blocks (two pathway genes and one transcription factor (TF)), using conditional mutual information, and then identifies significantly interacted triple genes using a newly identified novel mutual interaction measure (MIM), which was substantiated to reflect strengths of regulatory interactions within each triple gene block. The TGMI calculated the MIM for each triple gene block and then examined its statistical significance using bootstrap. Finally, the frequencies of all TFs present in all significantly interacted triple gene blocks were calculated and ranked. We showed that the TFs with higher frequencies were usually genuine pathway regulators upon evaluating multiple pathways in plants, animals and yeast. Comparison of TGMI with several other algorithms demonstrated its higher accuracy. Therefore, TGMI will be a valuable tool that can help biologists to identify regulators of metabolic pathways and biological processes from the exploded high-throughput gene expression data in public repositories. PMID:29579312
Investigating spousal concordance of diabetes through statistical analysis and data mining.

PubMed

Wang, Jong-Yi; Liu, Chiu-Shong; Lung, Chi-Hsuan; Yang, Ya-Tun; Lin, Ming-Hung

2017-01-01

Spousal clustering of diabetes merits attention. Whether old-age vulnerability or a shared family environment determines the concordance of diabetes is also uncertain. This study investigated the spousal concordance of diabetes and compared the risk of diabetes concordance between couples and noncouples by using nationally representative data. A total of 22,572 individuals identified from the 2002-2013 National Health Insurance Research Database of Taiwan constituted 5,643 couples and 5,643 noncouples through 1:1 dual propensity score matching (PSM). Factors associated with concordance in both spouses with diabetes were analyzed at the individual level. The risk of diabetes concordance between couples and noncouples was compared at the couple level. Logistic regression was the main statistical method. Statistical data were analyzed using SAS 9.4. C&RT and Apriori of data mining conducted in IBM SPSS Modeler 13 served as a supplement to statistics. High odds of the spousal concordance of diabetes were associated with old age, middle levels of urbanization, and high comorbidities (all P < 0.05). The dual PSM analysis revealed that the risk of diabetes concordance was significantly higher in couples (5.19%) than in noncouples (0.09%; OR = 61.743, P < 0.0001). A high concordance rate of diabetes in couples may indicate the influences of assortative mating and shared environment. Diabetes in a spouse implicates its risk in the partner. Family-based diabetes care that emphasizes the screening of couples at risk of diabetes by using the identified risk factors is suggested in prospective clinical practice interventions.
Investigating spousal concordance of diabetes through statistical analysis and data mining

PubMed Central

Liu, Chiu-Shong; Lung, Chi-Hsuan; Yang, Ya-Tun; Lin, Ming-Hung

2017-01-01

Objective Spousal clustering of diabetes merits attention. Whether old-age vulnerability or a shared family environment determines the concordance of diabetes is also uncertain. This study investigated the spousal concordance of diabetes and compared the risk of diabetes concordance between couples and noncouples by using nationally representative data. Methods A total of 22,572 individuals identified from the 2002–2013 National Health Insurance Research Database of Taiwan constituted 5,643 couples and 5,643 noncouples through 1:1 dual propensity score matching (PSM). Factors associated with concordance in both spouses with diabetes were analyzed at the individual level. The risk of diabetes concordance between couples and noncouples was compared at the couple level. Logistic regression was the main statistical method. Statistical data were analyzed using SAS 9.4. C&RT and Apriori of data mining conducted in IBM SPSS Modeler 13 served as a supplement to statistics. Results High odds of the spousal concordance of diabetes were associated with old age, middle levels of urbanization, and high comorbidities (all P < 0.05). The dual PSM analysis revealed that the risk of diabetes concordance was significantly higher in couples (5.19%) than in noncouples (0.09%; OR = 61.743, P < 0.0001). Conclusions A high concordance rate of diabetes in couples may indicate the influences of assortative mating and shared environment. Diabetes in a spouse implicates its risk in the partner. Family-based diabetes care that emphasizes the screening of couples at risk of diabetes by using the identified risk factors is suggested in prospective clinical practice interventions. PMID:28817654
Statistics Anxiety, State Anxiety during an Examination, and Academic Achievement

ERIC Educational Resources Information Center

Macher, Daniel; Paechter, Manuela; Papousek, Ilona; Ruggeri, Kai; Freudenthaler, H. Harald; Arendasy, Martin

2013-01-01

Background: A large proportion of students identify statistics courses as the most anxiety-inducing courses in their curriculum. Many students feel impaired by feelings of state anxiety in the examination and therefore probably show lower achievements. Aims: The study investigates how statistics anxiety, attitudes (e.g., interest, mathematical…
A novel approach to identify genes that determine grain protein deviation in cereals.

PubMed

Mosleth, Ellen F; Wan, Yongfang; Lysenko, Artem; Chope, Gemma A; Penson, Simon P; Shewry, Peter R; Hawkesford, Malcolm J

2015-06-01

Grain yield and protein content were determined for six wheat cultivars grown over 3 years at multiple sites and at multiple nitrogen (N) fertilizer inputs. Although grain protein content was negatively correlated with yield, some grain samples had higher protein contents than expected based on their yields, a trait referred to as grain protein deviation (GPD). We used novel statistical approaches to identify gene transcripts significantly related to GPD across environments. The yield and protein content were initially adjusted for nitrogen fertilizer inputs and then adjusted for yield (to remove the negative correlation with protein content), resulting in a parameter termed corrected GPD. Significant genetic variation in corrected GPD was observed for six cultivars grown over a range of environmental conditions (a total of 584 samples). Gene transcript profiles were determined in a subset of 161 samples of developing grain to identify transcripts contributing to GPD. Principal component analysis (PCA), analysis of variance (ANOVA) and means of scores regression (MSR) were used to identify individual principal components (PCs) correlating with GPD alone. Scores of the selected PCs, which were significantly related to GPD and protein content but not to the yield and significantly affected by cultivar, were identified as reflecting a multivariate pattern of gene expression related to genetic variation in GPD. Transcripts with consistent variation along the selected PCs were identified by an approach hereby called one-block means of scores regression (one-block MSR). © 2014 The Authors. Plant Biotechnology Journal published by Society for Experimental Biology and The Association of Applied Biologists and John Wiley & Sons Ltd.
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

PubMed

Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal

2017-01-01

Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Statistics of Magnetic Reconnection X-Lines in Kinetic Turbulence

NASA Astrophysics Data System (ADS)

Haggerty, C. C.; Parashar, T.; Matthaeus, W. H.; Shay, M. A.; Wan, M.; Servidio, S.; Wu, P.

2016-12-01

In this work we examine the statistics of magnetic reconnection (x-lines) and their associated reconnection rates in intermittent current sheets generated in turbulent plasmas. Although such statistics have been studied previously for fluid simulations (e.g. [1]), they have not yet been generalized to fully kinetic particle-in-cell (PIC) simulations. A significant problem with PIC simulations, however, is electrostatic fluctuations generated due to numerical particle counting statistics. We find that analyzing gradients of the magnetic vector potential from the raw PIC field data identifies numerous artificial (or non-physical) x-points. Using small Orszag-Tang vortex PIC simulations, we analyze x-line identification and show that these artificial x-lines can be removed using sub-Debye length filtering of the data. We examine how turbulent properties such as the magnetic spectrum and scale dependent kurtosis are affected by particle noise and sub-Debye length filtering. We subsequently apply these analysis methods to a large scale kinetic PIC turbulent simulation. Consistent with previous fluid models, we find a range of normalized reconnection rates as large as ½ but with the bulk of the rates being approximately less than to 0.1. [1] Servidio, S., W. H. Matthaeus, M. A. Shay, P. A. Cassak, and P. Dmitruk (2009), Magnetic reconnection and two-dimensional magnetohydrodynamic turbulence, Phys. Rev. Lett., 102, 115003.
Solar Radio Burst Statistics and Implications for Space Weather Effects

NASA Astrophysics Data System (ADS)

Giersch, O. D.; Kennewell, J.; Lynch, M.

2017-11-01

Solar radio bursts have the potential to affect space and terrestrial navigation, communication, and other technical systems that are sometimes overlooked. However, over the last decade a series of extreme L band solar radio bursts in December 2006 have renewed interest in these effects. In this paper we point out significant deficiencies in the solar radio data archives of the National Centers for Environmental Information (NCEI) that are used by most researchers in analyzing and producing statistics on solar radio burst phenomena. In particular, we examine the records submitted by the United States Air Force (USAF) Radio Solar Telescope Network (RSTN) and its predecessors from the period 1966 to 2010. Besides identifying substantial missing burst records we show that different observatories can have statistically different burst distributions, particularly at 245 MHz. We also point out that different solar cycles may show statistically different distributions and that it is a mistake to assume that the Sun shows similar behavior in different sunspot cycles. Large solar radio bursts are not confined to the period around sunspot maximum, and prediction of such events that utilize historical data will invariably be an underestimate due to archive data deficiencies. It is important that researchers and forecasters use historical occurrence frequency with caution in attempting to predict future cycles.
[Statistics for statistics?--Thoughts about psychological tools].

PubMed

Berger, Uwe; Stöbel-Richter, Yve

2007-12-01

Statistical methods take a prominent place among psychologists' educational programs. Being known as difficult to understand and heavy to learn, students fear of these contents. Those, who do not aspire after a research carrier at the university, will forget the drilled contents fast. Furthermore, because it does not apply for the work with patients and other target groups at a first glance, the methodological education as a whole was often questioned. For many psychological practitioners the statistical education makes only sense by enforcing respect against other professions, namely physicians. For the own business, statistics is rarely taken seriously as a professional tool. The reason seems to be clear: Statistics treats numbers, while psychotherapy treats subjects. So, does statistics ends in itself? With this article, we try to answer the question, if and how statistical methods were represented within the psychotherapeutical and psychological research. Therefore, we analyzed 46 Originals of a complete volume of the journal Psychotherapy, Psychosomatics, Psychological Medicine (PPmP). Within the volume, 28 different analyse methods were applied, from which 89 per cent were directly based upon statistics. To be able to write and critically read Originals as a backbone of research, presumes a high degree of statistical education. To ignore statistics means to ignore research and at least to reveal the own professional work to arbitrariness.
Automatic identification of bacterial types using statistical imaging methods

NASA Astrophysics Data System (ADS)

Trattner, Sigal; Greenspan, Hayit; Tepper, Gapi; Abboud, Shimon

2003-05-01

The objective of the current study is to develop an automatic tool to identify bacterial types using computer-vision and statistical modeling techniques. Bacteriophage (phage)-typing methods are used to identify and extract representative profiles of bacterial types, such as the Staphylococcus Aureus. Current systems rely on the subjective reading of plaque profiles by human expert. This process is time-consuming and prone to errors, especially as technology is enabling the increase in the number of phages used for typing. The statistical methodology presented in this work, provides for an automated, objective and robust analysis of visual data, along with the ability to cope with increasing data volumes.
Statistics of Natural Communication Signals Observed in the Wild Identify Important Yet Neglected Stimulus Regimes in Weakly Electric Fish.

PubMed

Henninger, Jörg; Krahe, Rüdiger; Kirschbaum, Frank; Grewe, Jan; Benda, Jan

2018-06-13

Sensory systems evolve in the ecological niches that each species is occupying. Accordingly, encoding of natural stimuli by sensory neurons is expected to be adapted to the statistics of these stimuli. For a direct quantification of sensory scenes, we tracked natural communication behavior of male and female weakly electric fish, Apteronotus rostratus , in their Neotropical rainforest habitat with high spatiotemporal resolution over several days. In the context of courtship, we observed large quantities of electrocommunication signals. Echo responses, acknowledgment signals, and their synchronizing role in spawning demonstrated the behavioral relevance of these signals. In both courtship and aggressive contexts, we observed robust behavioral responses in stimulus regimes that have so far been neglected in electrophysiological studies of this well characterized sensory system and that are well beyond the range of known best frequency and amplitude tuning of the electroreceptor afferents' firing rate modulation. Our results emphasize the importance of quantifying sensory scenes derived from freely behaving animals in their natural habitats for understanding the function and evolution of neural systems. SIGNIFICANCE STATEMENT The processing mechanisms of sensory systems have evolved in the context of the natural lives of organisms. To understand the functioning of sensory systems therefore requires probing them in the stimulus regimes in which they evolved. We took advantage of the continuously generated electric fields of weakly electric fish to explore electrosensory stimulus statistics in their natural Neotropical habitat. Unexpectedly, many of the electrocommunication signals recorded during courtship, spawning, and aggression had much smaller amplitudes or higher frequencies than stimuli used so far in neurophysiological characterizations of the electrosensory system. Our results demonstrate that quantifying sensory scenes derived from freely behaving animals in
Mass detection, localization and estimation for wind turbine blades based on statistical pattern recognition

NASA Astrophysics Data System (ADS)

Colone, L.; Hovgaard, M. K.; Glavind, L.; Brincker, R.

2018-07-01

A method for mass change detection on wind turbine blades using natural frequencies is presented. The approach is based on two statistical tests. The first test decides if there is a significant mass change and the second test is a statistical group classification based on Linear Discriminant Analysis. The frequencies are identified by means of Operational Modal Analysis using natural excitation. Based on the assumption of Gaussianity of the frequencies, a multi-class statistical model is developed by combining finite element model sensitivities in 10 classes of change location on the blade, the smallest area being 1/5 of the span. The method is experimentally validated for a full scale wind turbine blade in a test setup and loaded by natural wind. Mass change from natural causes was imitated with sand bags and the algorithm was observed to perform well with an experimental detection rate of 1, localization rate of 0.88 and mass estimation rate of 0.72.
Statistics and Title VII Proof: Prima Facie Case and Rebuttal.

ERIC Educational Resources Information Center

Whitten, David

1978-01-01

The method and means by which statistics can raise a prima facie case of Title VII violation are analyzed. A standard is identified that can be applied to determine whether a statistical disparity is sufficient to shift the burden to the employer to rebut a prima facie case of discrimination. (LBH)
Statistical methods for identifying and bounding a UXO target area or minefield

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinstry, Craig A.; Pulsipher, Brent A.; Gilbert, Richard O.

2003-09-18

The sampling unit for minefield or UXO area characterization is typically represented by a geographical block or transect swath that lends itself to characterization by geophysical instrumentation such as mobile sensor arrays. New spatially based statistical survey methods and tools, more appropriate for these unique sampling units have been developed and implemented at PNNL (Visual Sample Plan software, ver. 2.0) with support from the US Department of Defense. Though originally developed to support UXO detection and removal efforts, these tools may also be used in current form or adapted to support demining efforts and aid in the development of newmore » sensors and detection technologies by explicitly incorporating both sampling and detection error in performance assessments. These tools may be used to (1) determine transect designs for detecting and bounding target areas of critical size, shape, and density of detectable items of interest with a specified confidence probability, (2) evaluate the probability that target areas of a specified size, shape and density have not been missed by a systematic or meandering transect survey, and (3) support post-removal verification by calculating the number of transects required to achieve a specified confidence probability that no UXO or mines have been missed.« less
Effects of Consecutive Basketball Games on the Game-Related Statistics that Discriminate Winner and Losing Teams

PubMed Central

Ibáñez, Sergio J.; García, Javier; Feu, Sebastian; Lorenzo, Alberto; Sampaio, Jaime

2009-01-01

The aim of the present study was to identify the game-related statistics that discriminated basketball winning and losing teams in each of the three consecutive games played in a condensed tournament format. The data were obtained from the Spanish Basketball Federation and included game-related statistics from the Under-20 league (2005-2006 and 2006-2007 seasons). A total of 223 games were analyzed with the following game-related statistics: two and three-point field goal (made and missed), free-throws (made and missed), offensive and defensive rebounds, assists, steals, turnovers, blocks (made and received), fouls committed, ball possessions and offensive rating. Results showed that winning teams in this competition had better values in all game-related statistics, with the exception of three point field goals made, free-throws missed and turnovers (p ≥ 0.05). The main effect of game number was only identified in turnovers, with a statistical significant decrease between the second and third game. No interaction was found in the analysed variables. A discriminant analysis allowed identifying the two-point field goals made, the defensive rebounds and the assists as discriminators between winning and losing teams in all three games. Additionally to these, only the three-point field goals made contributed to discriminate teams in game three, suggesting a moderate effect of fatigue. Coaches may benefit from being aware of this variation in game determinant related statistics and, also, from using offensive and defensive strategies in the third game, allowing to explore or hide the three point field-goals performance. Key points Overall team performances along the three consecutive games were very similar, not confirming an accumulated fatigue effect. The results from the three-point field goals in the third game suggested that winning teams were able to shoot better from longer distances and this could be the result of exhibiting higher conditioning status and
Statistical testing of baseline differences in sports medicine RCTs: a systematic evaluation.

PubMed

Peterson, Ross L; Tran, Matthew; Koffel, Jonathan; Stovitz, Steven D

2017-01-01

The CONSORT (Consolidated Standards of Reporting Trials) statement discourages reporting statistical tests of baseline differences between groups in randomised controlled trials (RCTs). However, this practice is still common in many medical fields. Our aim was to determine the prevalence of this practice in leading sports medicine journals. We conducted a comprehensive search in Medline through PubMed to identify RCTs published in the years 2005 and 2015 from 10 high-impact sports medicine journals. Two reviewers independently confirmed the trial design and reached consensus on which articles contained statistical tests of baseline differences. Our search strategy identified a total of 324 RCTs, with 85 from the year 2005 and 239 from the year 2015. Overall, 64.8% of studies (95% CI (59.6, 70.0)) reported statistical tests of baseline differences; broken down by year, this percentage was 67.1% in 2005 (95% CI (57.1, 77.1)) and 64.0% in 2015 (95% CI (57.9, 70.1)). Although discouraged by the CONSORT statement, statistical testing of baseline differences remains highly prevalent in sports medicine RCTs. Statistical testing of baseline differences can mislead authors; for example, by failing to identify meaningful baseline differences in small studies. Journals that ask authors to follow the CONSORT statement guidelines should recognise that many manuscripts are ignoring the recommendation against statistical testing of baseline differences.

Tools for Basic Statistical Analysis

NASA Technical Reports Server (NTRS)

Luz, Paul L.

2005-01-01

Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.
A novel chi-square statistic for detecting group differences between pathways in systems epidemiology.

PubMed

Yuan, Zhongshang; Ji, Jiadong; Zhang, Tao; Liu, Yi; Zhang, Xiaoshuai; Chen, Wei; Xue, Fuzhong

2016-12-20

Traditional epidemiology often pays more attention to the identification of a single factor rather than to the pathway that is related to a disease, and therefore, it is difficult to explore the disease mechanism. Systems epidemiology aims to integrate putative lifestyle exposures and biomarkers extracted from multiple omics platforms to offer new insights into the pathway mechanisms that underlie disease at the human population level. One key but inadequately addressed question is how to develop powerful statistics to identify whether one candidate pathway is associated with a disease. Bearing in mind that a pathway difference can result from not only changes in the nodes but also changes in the edges, we propose a novel statistic for detecting group differences between pathways, which in principle, captures the nodes changes and edge changes, as well as simultaneously accounting for the pathway structure simultaneously. The proposed test has been proven to follow the chi-square distribution, and various simulations have shown it has better performance than other existing methods. Integrating genome-wide DNA methylation data, we analyzed one real data set from the Bogalusa cohort study and significantly identified a potential pathway, Smoking → SOCS3 → PIK3R1, which was strongly associated with abdominal obesity. The proposed test was powerful and efficient at identifying pathway differences between two groups, and it can be extended to other disciplines that involve statistical comparisons between pathways. The source code in R is available on our website. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Why publishing everything is more effective than selective publishing of statistically significant results.

PubMed

van Assen, Marcel A L M; van Aert, Robbie C M; Nuijten, Michèle B; Wicherts, Jelte M

2014-01-01

De Winter and Happee examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that "selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective" (p.4). Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing. Publishing everything is more effective than only reporting significant outcomes.
Why Publishing Everything Is More Effective than Selective Publishing of Statistically Significant Results

PubMed Central

van Assen, Marcel A. L. M.; van Aert, Robbie C. M.; Nuijten, Michèle B.; Wicherts, Jelte M.

2014-01-01

Background De Winter and Happee [1] examined whether science based on selective publishing of significant results may be effective in accurate estimation of population effects, and whether this is even more effective than a science in which all results are published (i.e., a science without publication bias). Based on their simulation study they concluded that “selective publishing yields a more accurate meta-analytic estimation of the true effect than publishing everything, (and that) publishing nonreplicable results while placing null results in the file drawer can be beneficial for the scientific collective” (p.4). Methods and Findings Using their scenario with a small to medium population effect size, we show that publishing everything is more effective for the scientific collective than selective publishing of significant results. Additionally, we examined a scenario with a null effect, which provides a more dramatic illustration of the superiority of publishing everything over selective publishing. Conclusion Publishing everything is more effective than only reporting significant outcomes. PMID:24465448
The efficacy of adult christian support groups in coping with the death of a significant loved one.

PubMed

Goodman, Herbert; Stone, Mark H

2009-09-01

Psychologists sometimes minimize important resources such as religion and spiritual beliefs for coping with bereavement. Alienation of therapeutic psychology from religious values contrasts to professional and public interest in religious experience and commitment. A supportive viewpoint has come about partially as a result of recognizing important values which clinicians have found absent in many of their clients. Until spiritual belief systems become integrated into the work of clinicians, clients may not be fully integrative in coping with loss. The key finding of this study was that individuals who participated in Christian and secular support groups showed no statistically significant difference in their mean endorsement of negative criteria on the BHS, and no statistically significant difference for their mean score endorsement of positive criteria on the RCOPE. However, a Christian-oriented approach was no less effective than a psychological-oriented one. In both groups, a spiritual connection to a specific or generalized higher power was frequently identified which clients ascribed to facilitating the management of their coping.
Student Self-Grading in Social Statistics

ERIC Educational Resources Information Center

Edwards, Nelta M.

2007-01-01

This article analyzes a social statistics class that engaged in self-grading. Students liked self-grading because they identified their own mistakes, it reinforced what they learned, and they received immediate feedback. Some students worried that others would cheat, but this assertion was not confirmed in the data and the possibility of cheating…
Tables of Significance Points for the Variance-Weighted Kolmogorov-Smirnov Statistics.

DTIC Science & Technology

1981-02-19

neN x; 1; ( := 0 for all z Vo0 For the values of a function V: N0 IR we use both notations V(i) and vi " ii Tables of Significance Points for the...satisfying 0 < V0 < 10 and v, < Vi -i ¥i IN,* The following functions define a 4-Sheffer sequence (see (A.12)) for the derivative operator D: d i if x ɘ f(,c...S U(i ) < Ii ¥ Vi l,...,M)IMl 7 i7- 1.2. Recursions. For this section we assume p’ 1 without loss of generality.M With qn-k(x)- x n-k/(n-k)! in (A.13
Statistics for Learning Genetics

NASA Astrophysics Data System (ADS)

Charles, Abigail Sheena

This study investigated the knowledge and skills that biology students may need to help them understand statistics/mathematics as it applies to genetics. The data are based on analyses of current representative genetics texts, practicing genetics professors' perspectives, and more directly, students' perceptions of, and performance in, doing statistically-based genetics problems. This issue is at the emerging edge of modern college-level genetics instruction, and this study attempts to identify key theoretical components for creating a specialized biological statistics curriculum. The goal of this curriculum will be to prepare biology students with the skills for assimilating quantitatively-based genetic processes, increasingly at the forefront of modern genetics. To fulfill this, two college level classes at two universities were surveyed. One university was located in the northeastern US and the other in the West Indies. There was a sample size of 42 students and a supplementary interview was administered to a select 9 students. Interviews were also administered to professors in the field in order to gain insight into the teaching of statistics in genetics. Key findings indicated that students had very little to no background in statistics (55%). Although students did perform well on exams with 60% of the population receiving an A or B grade, 77% of them did not offer good explanations on a probability question associated with the normal distribution provided in the survey. The scope and presentation of the applicable statistics/mathematics in some of the most used textbooks in genetics teaching, as well as genetics syllabi used by instructors do not help the issue. It was found that the text books, often times, either did not give effective explanations for students, or completely left out certain topics. The omission of certain statistical/mathematical oriented topics was seen to be also true with the genetics syllabi reviewed for this study. Nonetheless
Statistics Canada's Definition and Classification of Postsecondary and Adult Education Providers in Canada. Culture, Tourism and the Centre for Education Statistics. Research Paper. Catalogue no. 81-595-M No. 071

ERIC Educational Resources Information Center

Orton, Larry

2009-01-01

This document outlines the definitions and the typology now used by Statistics Canada's Centre for Education Statistics to identify, classify and delineate the universities, colleges and other providers of postsecondary and adult education in Canada for which basic enrollments, graduates, professors and finance statistics are produced. These new…
Targeted Resequencing and Functional Testing Identifies Low-Frequency Missense Variants in the Gene Encoding GARP as Significant Contributors to Atopic Dermatitis Risk.

PubMed

Manz, Judith; Rodríguez, Elke; ElSharawy, Abdou; Oesau, Eva-Maria; Petersen, Britt-Sabina; Baurecht, Hansjörg; Mayr, Gabriele; Weber, Susanne; Harder, Jürgen; Reischl, Eva; Schwarz, Agatha; Novak, Natalija; Franke, Andre; Weidinger, Stephan

2016-12-01

Gene-mapping studies have consistently identified a susceptibility locus for atopic dermatitis and other inflammatory diseases on chromosome band 11q13.5, with the strongest association observed for a common variant located in an intergenic region between the two annotated genes C11orf30 and LRRC32. Using a targeted resequencing approach we identified low-frequency and rare missense mutations within the LRRC32 gene encoding the protein GARP, a receptor on activated regulatory T cells that binds latent transforming growth factor-β. Subsequent association testing in more than 2,000 atopic dermatitis patients and 2,000 control subjects showed a significant excess of these LRRC32 variants in individuals with atopic dermatitis. Structural protein modeling and bioinformatic analysis predicted a disruption of protein transport upon these variants, and overexpression assays in CD4 + CD25 - T cells showed a significant reduction in surface expression of the mutated protein. Consistently, flow cytometric (FACS) analyses of different T-cell subtypes obtained from atopic dermatitis patients showed a significantly reduced surface expression of GARP and a reduced conversion of CD4 + CD25 - T cells into regulatory T cells, along with lower expression of latency-associated protein upon stimulation in carriers of the LRRC32 A407T variant. These results link inherited disturbances of transforming growth factor-β signaling with atopic dermatitis risk. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.
Identifying Balance Measures Most Likely to Identify Recent Falls.

PubMed

Criter, Robin E; Honaker, Julie A

2016-01-01

Falls sustained by older adults are an increasing health care issue. Early identification of those at risk for falling can lead to successful prevention of falls. Balance complaints are common among individuals who fall or are at risk for falling. The purpose of this study was to evaluate the clinical utility of a multifaceted balance protocol used for fall risk screening, with the hypothesis that this protocol would successfully identify individuals who had a recent fall (within the previous 12 months). This is a retrospective review of 30 individuals who self-referred for a free fall risk screening. Measures included case history, Activities-Specific Balance Confidence Scale, modified Clinical Test of Sensory Interaction on Balance, Timed Up and Go test, and Dynamic Visual Acuity. Statistical analyses were focused on the ability of the test protocol to identify a fall within the past 12 months and included descriptive statistics, clinical utility indices, logistic regression, receiver operating characteristic curve, area under the curve analysis, effect size (Cohen d), and Spearman correlation coefficients. All individuals who self-referred for this free screening had current imbalance complaints, and were typically women (70%), had a mean age of 77.2 years, and had a fear of falling (70%). Almost half (46.7%) reported at least 1 lifetime fall and 40.0% within the past 12 months. Regression analysis suggested that the Timed Up and Go test was the most important indicator of a recent fall. A cutoff score of 12 or more seconds was optimal (sensitivity: 83.3%; specificity: 61.1%). Older adults with current complaints of imbalance have a higher rate of falls, fall-related injury, and fear of falling than the general community-dwelling public. The Timed Up and Go test is useful for determining recent fall history in individuals with imbalance.
Identifying optimal threshold statistics for elimination of hookworm using a stochastic simulation model.

PubMed

Truscott, James E; Werkman, Marleen; Wright, James E; Farrell, Sam H; Sarkar, Rajiv; Ásbjörnsdóttir, Kristjana; Anderson, Roy M

2017-06-30

There is an increased focus on whether mass drug administration (MDA) programmes alone can interrupt the transmission of soil-transmitted helminths (STH). Mathematical models can be used to model these interventions and are increasingly being implemented to inform investigators about expected trial outcome and the choice of optimum study design. One key factor is the choice of threshold for detecting elimination. However, there are currently no thresholds defined for STH regarding breaking transmission. We develop a simulation of an elimination study, based on the DeWorm3 project, using an individual-based stochastic disease transmission model in conjunction with models of MDA, sampling, diagnostics and the construction of study clusters. The simulation is then used to analyse the relationship between the study end-point elimination threshold and whether elimination is achieved in the long term within the model. We analyse the quality of a range of statistics in terms of the positive predictive values (PPV) and how they depend on a range of covariates, including threshold values, baseline prevalence, measurement time point and how clusters are constructed. End-point infection prevalence performs well in discriminating between villages that achieve interruption of transmission and those that do not, although the quality of the threshold is sensitive to baseline prevalence and threshold value. Optimal post-treatment prevalence threshold value for determining elimination is in the range 2% or less when the baseline prevalence range is broad. For multiple clusters of communities, both the probability of elimination and the ability of thresholds to detect it are strongly dependent on the size of the cluster and the size distribution of the constituent communities. Number of communities in a cluster is a key indicator of probability of elimination and PPV. Extending the time, post-study endpoint, at which the threshold statistic is measured improves PPV value in
Novel genes identified in a high-density genome wide association study for nicotine dependence.

PubMed

Bierut, Laura Jean; Madden, Pamela A F; Breslau, Naomi; Johnson, Eric O; Hatsukami, Dorothy; Pomerleau, Ovide F; Swan, Gary E; Rutter, Joni; Bertelsen, Sarah; Fox, Louis; Fugman, Douglas; Goate, Alison M; Hinrichs, Anthony L; Konvicka, Karel; Martin, Nicholas G; Montgomery, Grant W; Saccone, Nancy L; Saccone, Scott F; Wang, Jen C; Chase, Gary A; Rice, John P; Ballinger, Dennis G

2007-01-01

Tobacco use is a leading contributor to disability and death worldwide, and genetic factors contribute in part to the development of nicotine dependence. To identify novel genes for which natural variation contributes to the development of nicotine dependence, we performed a comprehensive genome wide association study using nicotine dependent smokers as cases and non-dependent smokers as controls. To allow the efficient, rapid, and cost effective screen of the genome, the study was carried out using a two-stage design. In the first stage, genotyping of over 2.4 million single nucleotide polymorphisms (SNPs) was completed in case and control pools. In the second stage, we selected SNPs for individual genotyping based on the most significant allele frequency differences between cases and controls from the pooled results. Individual genotyping was performed in 1050 cases and 879 controls using 31 960 selected SNPs. The primary analysis, a logistic regression model with covariates of age, gender, genotype and gender by genotype interaction, identified 35 SNPs with P-values less than 10(-4) (minimum P-value 1.53 x 10(-6)). Although none of the individual findings is statistically significant after correcting for multiple tests, additional statistical analyses support the existence of true findings in this group. Our study nominates several novel genes, such as Neurexin 1 (NRXN1), in the development of nicotine dependence while also identifying a known candidate gene, the beta3 nicotinic cholinergic receptor. This work anticipates the future directions of large-scale genome wide association studies with state-of-the-art methodological approaches and sharing of data with the scientific community.
Whither Statistics Education Research?

ERIC Educational Resources Information Center

Watson, Jane

2016-01-01

This year marks the 25th anniversary of the publication of a "National Statement on Mathematics for Australian Schools", which was the first curriculum statement this country had including "Chance and Data" as a significant component. It is hence an opportune time to survey the history of the related statistics education…
Clinical significance in nursing research: A discussion and descriptive analysis.

PubMed

Polit, Denise F

2017-08-01

It is widely understood that statistical significance should not be equated with clinical significance, but the topic of clinical significance has not received much attention in the nursing literature. By contrast, interest in conceptualizing and operationalizing clinical significance has been a "hot topic" in other health care fields for several decades. The major purpose of this paper is to briefly describe recent advances in defining and quantifying clinical significance. The overview covers both group-level indicators of clinical significance (e.g., effect size indexes), and individual-level benchmarks (e.g., the minimal important change index). A secondary purpose is to describe the extent to which developments in clinical significance have penetrated the nursing literature. A descriptive analysis of a sample of primary research articles published in three high-impact nursing research journals in 2016 was undertaken. A total of 362 articles were electronically searched for terms relating to statistical and clinical significance. Of the 362 articles, 261 were reports of quantitative studies, the vast majority of which (93%) included a formal evaluation of the statistical significance of the results. By contrast, the term "clinical significance" or related surrogate terms were found in only 33 papers, and most often the term was used informally, without explicit definition or assessment. Raising consciousness about clinical significance should be an important priority among nurse researchers. Several recommendations are offered to improve the visibility and salience of clinical significance in nursing science. Copyright © 2017 Elsevier Ltd. All rights reserved.
Impaired Statistical Learning in Developmental Dyslexia

PubMed Central

Thiessen, Erik D.; Holt, Lori L.

2015-01-01

Purpose Developmental dyslexia (DD) is commonly thought to arise from phonological impairments. However, an emerging perspective is that a more general procedural learning deficit, not specific to phonological processing, may underlie DD. The current study examined if individuals with DD are capable of extracting statistical regularities across sequences of passively experienced speech and nonspeech sounds. Such statistical learning is believed to be domain-general, to draw upon procedural learning systems, and to relate to language outcomes. Method DD and control groups were familiarized with a continuous stream of syllables or sine-wave tones, the ordering of which was defined by high or low transitional probabilities across adjacent stimulus pairs. Participants subsequently judged two 3-stimulus test items with either high or low statistical coherence as being the most similar to the sounds heard during familiarization. Results As with control participants, the DD group was sensitive to the transitional probability structure of the familiarization materials as evidenced by above-chance performance. However, the performance of participants with DD was significantly poorer than controls across linguistic and nonlinguistic stimuli. In addition, reading-related measures were significantly correlated with statistical learning performance of both speech and nonspeech material. Conclusion Results are discussed in light of procedural learning impairments among participants with DD. PMID:25860795
PyEvolve: a toolkit for statistical modelling of molecular evolution.

PubMed

Butterfield, Andrew; Vedagiri, Vivek; Lang, Edward; Lawrence, Cath; Wakefield, Matthew J; Isaev, Alexander; Huttley, Gavin A

2004-01-05

Examining the distribution of variation has proven an extremely profitable technique in the effort to identify sequences of biological significance. Most approaches in the field, however, evaluate only the conserved portions of sequences - ignoring the biological significance of sequence differences. A suite of sophisticated likelihood based statistical models from the field of molecular evolution provides the basis for extracting the information from the full distribution of sequence variation. The number of different problems to which phylogeny-based maximum likelihood calculations can be applied is extensive. Available software packages that can perform likelihood calculations suffer from a lack of flexibility and scalability, or employ error-prone approaches to model parameterisation. Here we describe the implementation of PyEvolve, a toolkit for the application of existing, and development of new, statistical methods for molecular evolution. We present the object architecture and design schema of PyEvolve, which includes an adaptable multi-level parallelisation schema. The approach for defining new methods is illustrated by implementing a novel dinucleotide model of substitution that includes a parameter for mutation of methylated CpG's, which required 8 lines of standard Python code to define. Benchmarking was performed using either a dinucleotide or codon substitution model applied to an alignment of BRCA1 sequences from 20 mammals, or a 10 species subset. Up to five-fold parallel performance gains over serial were recorded. Compared to leading alternative software, PyEvolve exhibited significantly better real world performance for parameter rich models with a large data set, reducing the time required for optimisation from approximately 10 days to approximately 6 hours. PyEvolve provides flexible functionality that can be used either for statistical modelling of molecular evolution, or the development of new methods in the field. The toolkit can be used
Statistical Signal Process in R Language in the Pharmacovigilance Programme of India.

PubMed

Kumar, Aman; Ahuja, Jitin; Shrivastava, Tarani Prakash; Kumar, Vipin; Kalaiselvan, Vivekanandan

2018-05-01

The Ministry of Health & Family Welfare, Government of India, initiated the Pharmacovigilance Programme of India (PvPI) in July 2010. The purpose of the PvPI is to collect data on adverse reactions due to medications, analyze it, and use the reference to recommend informed regulatory intervention, besides communicating the risk to health care professionals and the public. The goal of the present study was to apply statistical tools to find the relationship between drugs and ADRs for signal detection by R programming. Four statistical parameters were proposed for quantitative signal detection. These 4 parameters are IC 025 , PRR and PRR lb , chi-square, and N 11 ; we calculated these 4 values using R programming. We analyzed 78,983 drug-ADR combinations, and the total count of drug-ADR combination was 4,20,060. During the calculation of the statistical parameter, we use 3 variables: (1) N 11 (number of counts), (2) N 1. (Drug margin), and (3) N .1 (ADR margin). The structure and calculation of these 4 statistical parameters in R language are easily understandable. On the basis of the IC value (IC value >0), out of the 78,983 drug-ADR combination (drug-ADR combination), we found the 8,667 combinations to be significantly associated. The calculation of statistical parameters in R language is time saving and allows to easily identify new signals in the Indian ICSR (Individual Case Safety Reports) database.
A risk-based approach to management of leachables utilizing statistical analysis of extractables.

PubMed

Stults, Cheryl L M; Mikl, Jaromir; Whelehan, Oliver; Morrical, Bradley; Duffield, William; Nagao, Lee M

2015-04-01

To incorporate quality by design concepts into the management of leachables, an emphasis is often put on understanding the extractable profile for the materials of construction for manufacturing disposables, container-closure, or delivery systems. Component manufacturing processes may also impact the extractable profile. An approach was developed to (1) identify critical components that may be sources of leachables, (2) enable an understanding of manufacturing process factors that affect extractable profiles, (3) determine if quantitative models can be developed that predict the effect of those key factors, and (4) evaluate the practical impact of the key factors on the product. A risk evaluation for an inhalation product identified injection molding as a key process. Designed experiments were performed to evaluate the impact of molding process parameters on the extractable profile from an ABS inhaler component. Statistical analysis of the resulting GC chromatographic profiles identified processing factors that were correlated with peak levels in the extractable profiles. The combination of statistically significant molding process parameters was different for different types of extractable compounds. ANOVA models were used to obtain optimal process settings and predict extractable levels for a selected number of compounds. The proposed paradigm may be applied to evaluate the impact of material composition and processing parameters on extractable profiles and utilized to manage product leachables early in the development process and throughout the product lifecycle.
Statistical significance of hair analysis of clenbuterol to discriminate therapeutic use from contamination.

PubMed

Krumbholz, Aniko; Anielski, Patricia; Gfrerer, Lena; Graw, Matthias; Geyer, Hans; Schänzer, Wilhelm; Dvorak, Jiri; Thieme, Detlef

2014-01-01

Clenbuterol is a well-established β2-agonist, which is prohibited in sports and strictly regulated for use in the livestock industry. During the last few years clenbuterol-positive results in doping controls and in samples from residents or travellers from a high-risk country were suspected to be related the illegal use of clenbuterol for fattening. A sensitive liquid chromatography-tandem mass spectrometry (LC-MS/MS) method was developed to detect low clenbuterol residues in hair with a detection limit of 0.02 pg/mg. A sub-therapeutic application study and a field study with volunteers, who have a high risk of contamination, were performed. For the application study, a total dosage of 30 µg clenbuterol was applied to 20 healthy volunteers on 5 subsequent days. One month after the beginning of the application, clenbuterol was detected in the proximal hair segment (0-1 cm) in concentrations between 0.43 and 4.76 pg/mg. For the second part, samples of 66 Mexican soccer players were analyzed. In 89% of these volunteers, clenbuterol was detectable in their hair at concentrations between 0.02 and 1.90 pg/mg. A comparison of both parts showed no statistical difference between sub-therapeutic application and contamination. In contrast, discrimination to a typical abuse of clenbuterol is apparently possible. Due to these findings results of real doping control samples can be evaluated. Copyright © 2014 John Wiley & Sons, Ltd.

Interpreting “statistical hypothesis testing” results in clinical research

PubMed Central

Sarmukaddam, Sanjeev B.

2012-01-01

Difference between “Clinical Significance and Statistical Significance” should be kept in mind while interpreting “statistical hypothesis testing” results in clinical research. This fact is already known to many but again pointed out here as philosophy of “statistical hypothesis testing” is sometimes unnecessarily criticized mainly due to failure in considering such distinction. Randomized controlled trials are also wrongly criticized similarly. Some scientific method may not be applicable in some peculiar/particular situation does not mean that the method is useless. Also remember that “statistical hypothesis testing” is not for decision making and the field of “decision analysis” is very much an integral part of science of statistics. It is not correct to say that “confidence intervals have nothing to do with confidence” unless one understands meaning of the word “confidence” as used in context of confidence interval. Interpretation of the results of every study should always consider all possible alternative explanations like chance, bias, and confounding. Statistical tests in inferential statistics are, in general, designed to answer the question “How likely is the difference found in random sample(s) is due to chance” and therefore limitation of relying only on statistical significance in making clinical decisions should be avoided. PMID:22707861
Estimating order statistics of network degrees

NASA Astrophysics Data System (ADS)

Chu, J.; Nadarajah, S.

2018-01-01

We model the order statistics of network degrees of big data sets by a range of generalised beta distributions. A three parameter beta distribution due to Libby and Novick (1982) is shown to give the best overall fit for at least four big data sets. The fit of this distribution is significantly better than the fit suggested by Olhede and Wolfe (2012) across the whole range of order statistics for all four data sets.
Statistical aspects of solar flares

NASA Technical Reports Server (NTRS)

Wilson, Robert M.

1987-01-01

A survey of the statistical properties of 850 H alpha solar flares during 1975 is presented. Comparison of the results found here with those reported elsewhere for different epochs is accomplished. Distributions of rise time, decay time, and duration are given, as are the mean, mode, median, and 90th percentile values. Proportions by selected groupings are also determined. For flares in general, mean values for rise time, decay time, and duration are 5.2 + or - 0.4 min, and 18.1 + or 1.1 min, respectively. Subflares, accounting for nearly 90 percent of the flares, had mean values lower than those found for flares of H alpha importance greater than 1, and the differences are statistically significant. Likewise, flares of bright and normal relative brightness have mean values of decay time and duration that are significantly longer than those computed for faint flares, and mass-motion related flares are significantly longer than non-mass-motion related flares. Seventy-three percent of the mass-motion related flares are categorized as being a two-ribbon flare and/or being accompanied by a high-speed dark filament. Slow rise time flares (rise time greater than 5 min) have a mean value for duration that is significantly longer than that computed for fast rise time flares, and long-lived duration flares (duration greater than 18 min) have a mean value for rise time that is significantly longer than that computed for short-lived duration flares, suggesting a positive linear relationship between rise time and duration for flares. Monthly occurrence rates for flares in general and by group are found to be linearly related in a positive sense to monthly sunspot number. Statistical testing reveals the association between sunspot number and numbers of flares to be significant at the 95 percent level of confidence, and the t statistic for slope is significant at greater than 99 percent level of confidence. Dependent upon the specific fit, between 58 percent and 94 percent of
Attitude towards statistics and performance among post-graduate students

NASA Astrophysics Data System (ADS)

Rosli, Mira Khalisa; Maat, Siti Mistima

2017-05-01

For student to master Statistics is a necessity, especially for those post-graduates that are involved in the research field. The purpose of this research was to identify the attitude towards Statistics among the post-graduates and to determine the relationship between the attitude towards Statistics and post-graduates' of Faculty of Education, UKM, Bangi performance. 173 post-graduate students were chosen randomly to participate in the study. These students registered in Research Methodology II course that was introduced by faculty. A survey of attitude toward Statistics using 5-points Likert scale was used for data collection purposes. The instrument consists of four components such as affective, cognitive competency, value and difficulty. The data was analyzed using the SPSS version 22 in producing the descriptive and inferential Statistics output. The result of this research showed that there is a medium and positive relation between attitude towards statistics and students' performance. As a conclusion, educators need to access students' attitude towards the course to accomplish the learning outcomes.
Identifying customer-focused performance measures : final report 655.

DOT National Transportation Integrated Search

2010-10-01

The Arizona Department of Transportation (ADOT) completed a comprehensive customer satisfaction : assessment in July 2009. ADOT commissioned the assessment to acquire statistically valid data from residents : and community leaders to help it identify...
Statistical issues in signal extraction from microarrays

NASA Astrophysics Data System (ADS)

Bergemann, Tracy; Quiaoit, Filemon; Delrow, Jeffrey J.; Zhao, Lue Ping

2001-06-01

Microarray technologies are increasingly used in biomedical research to study genome-wide expression profiles in the post genomic era. Their popularity is largely due to their high throughput and economical affordability. For example, microarrays have been applied to studies of cell cycle, regulatory circuitry, cancer cell lines, tumor tissues, and drug discoveries. One obstacle facing the continued success of applying microarray technologies, however, is the random variaton present on microarrays: within signal spots, between spots and among chips. In addition, signals extracted by available software packages seem to vary significantly. Despite a variety of software packages, it appears that there are two major approaches to signal extraction. One approach is to focus on the identification of signal regions and hence estimation of signal levels above background levels. The other approach is to use the distribution of intensity values as a way of identifying relevant signals. Building upon both approaches, the objective of our work is to develop a method that is statistically rigorous and also efficient and robust. Statistical issues to be considered here include: (1) how to refine grid alignment so that the overall variation is minimized, (2) how to estimate the signal levels relative to the local background levels as well as the variance of this estimate, and (3) how to integrate red and green channel signals so that the ratio of interest is stable, simultaneously relaxing distributional assumptions.
Childhood-compared to adolescent-onset bipolar disorder has more statistically significant clinical correlates.

PubMed

Holtzman, Jessica N; Miller, Shefali; Hooshmand, Farnaz; Wang, Po W; Chang, Kiki D; Hill, Shelley J; Rasgon, Natalie L; Ketter, Terence A

2015-07-01

The strengths and limitations of considering childhood-and adolescent-onset bipolar disorder (BD) separately versus together remain to be established. We assessed this issue. BD patients referred to the Stanford Bipolar Disorder Clinic during 2000-2011 were assessed with the Systematic Treatment Enhancement Program for BD Affective Disorders Evaluation. Patients with childhood- and adolescent-onset were compared to those with adult-onset for 7 unfavorable bipolar illness characteristics with replicated associations with early-onset patients. Among 502 BD outpatients, those with childhood- (<13 years, N=110) and adolescent- (13-18 years, N=218) onset had significantly higher rates for 4/7 unfavorable illness characteristics, including lifetime comorbid anxiety disorder, at least ten lifetime mood episodes, lifetime alcohol use disorder, and prior suicide attempt, than those with adult-onset (>18 years, N=174). Childhood- but not adolescent-onset BD patients also had significantly higher rates of first-degree relative with mood disorder, lifetime substance use disorder, and rapid cycling in the prior year. Patients with pooled childhood/adolescent - compared to adult-onset had significantly higher rates for 5/7 of these unfavorable illness characteristics, while patients with childhood- compared to adolescent-onset had significantly higher rates for 4/7 of these unfavorable illness characteristics. Caucasian, insured, suburban, low substance abuse, American specialty clinic-referred sample limits generalizability. Onset age is based on retrospective recall. Childhood- compared to adolescent-onset BD was more robustly related to unfavorable bipolar illness characteristics, so pooling these groups attenuated such relationships. Further study is warranted to determine the extent to which adolescent-onset BD represents an intermediate phenotype between childhood- and adult-onset BD. Copyright © 2015 Elsevier B.V. All rights reserved.
On Statistical Analysis of Neuroimages with Imperfect Registration

PubMed Central

Kim, Won Hwa; Ravi, Sathya N.; Johnson, Sterling C.; Okonkwo, Ozioma C.; Singh, Vikas

2016-01-01

A variety of studies in neuroscience/neuroimaging seek to perform statistical inference on the acquired brain image scans for diagnosis as well as understanding the pathological manifestation of diseases. To do so, an important first step is to register (or co-register) all of the image data into a common coordinate system. This permits meaningful comparison of the intensities at each voxel across groups (e.g., diseased versus healthy) to evaluate the effects of the disease and/or use machine learning algorithms in a subsequent step. But errors in the underlying registration make this problematic, they either decrease the statistical power or make the follow-up inference tasks less effective/accurate. In this paper, we derive a novel algorithm which offers immunity to local errors in the underlying deformation field obtained from registration procedures. By deriving a deformation invariant representation of the image, the downstream analysis can be made more robust as if one had access to a (hypothetical) far superior registration procedure. Our algorithm is based on recent work on scattering transform. Using this as a starting point, we show how results from harmonic analysis (especially, non-Euclidean wavelets) yields strategies for designing deformation and additive noise invariant representations of large 3-D brain image volumes. We present a set of results on synthetic and real brain images where we achieve robust statistical analysis even in the presence of substantial deformation errors; here, standard analysis procedures significantly under-perform and fail to identify the true signal. PMID:27042168
Identifying natural flow regimes using fish communities

NASA Astrophysics Data System (ADS)

Chang, Fi-John; Tsai, Wen-Ping; Wu, Tzu-Ching; Chen, Hung-kwai; Herricks, Edwin E.

2011-10-01

SummaryModern water resources management has adopted natural flow regimes as reasonable targets for river restoration and conservation. The characterization of a natural flow regime begins with the development of hydrologic statistics from flow records. However, little guidance exists for defining the period of record needed for regime determination. In Taiwan, the Taiwan Eco-hydrological Indicator System (TEIS), a group of hydrologic statistics selected for fisheries relevance, is being used to evaluate ecological flows. The TEIS consists of a group of hydrologic statistics selected to characterize the relationships between flow and the life history of indigenous species. Using the TEIS and biosurvey data for Taiwan, this paper identifies the length of hydrologic record sufficient for natural flow regime characterization. To define the ecological hydrology of fish communities, this study connected hydrologic statistics to fish communities by using methods to define antecedent conditions that influence existing community composition. A moving average method was applied to TEIS statistics to reflect the effects of antecedent flow condition and a point-biserial correlation method was used to relate fisheries collections with TEIS statistics. The resulting fish species-TEIS (FISH-TEIS) hydrologic statistics matrix takes full advantage of historical flows and fisheries data. The analysis indicates that, in the watersheds analyzed, averaging TEIS statistics for the present year and 3 years prior to the sampling date, termed MA(4), is sufficient to develop a natural flow regime. This result suggests that flow regimes based on hydrologic statistics for the period of record can be replaced by regimes developed for sampled fish communities.
The Relationship between Statistics Self-Efficacy, Statistics Anxiety, and Performance in an Introductory Graduate Statistics Course

ERIC Educational Resources Information Center

Schneider, William R.

2011-01-01

The purpose of this study was to determine the relationship between statistics self-efficacy, statistics anxiety, and performance in introductory graduate statistics courses. The study design compared two statistics self-efficacy measures developed by Finney and Schraw (2003), a statistics anxiety measure developed by Cruise and Wilkins (1980),…
Identification of key micro-organisms involved in Douchi fermentation by statistical analysis and their use in an experimental fermentation.

PubMed

Chen, C; Xiang, J Y; Hu, W; Xie, Y B; Wang, T J; Cui, J W; Xu, Y; Liu, Z; Xiang, H; Xie, Q

2015-11-01

To screen and identify safe micro-organisms used during Douchi fermentation, and verify the feasibility of producing high-quality Douchi using these identified micro-organisms. PCR-denaturing gradient gel electrophoresis (DGGE) and automatic amino-acid analyser were used to investigate the microbial diversity and free amino acids (FAAs) content of 10 commercial Douchi samples. The correlations between microbial communities and FAAs were analysed by statistical analysis. Ten strains with significant positive correlation were identified. Then an experiment on Douchi fermentation by identified strains was carried out, and the nutritional composition in Douchi was analysed. Results showed that FAAs and relative content of isoflavone aglycones in verification Douchi samples were generally higher than those in commercial Douchi samples. Our study indicated that fungi, yeasts, Bacillus and lactic acid bacteria were the key players in Douchi fermentation, and with identified probiotic micro-organisms participating in fermentation, a higher quality Douchi product was produced. This is the first report to analyse and confirm the key micro-organisms during Douchi fermentation by statistical analysis. This work proves fermentation micro-organisms to be the key influencing factor of Douchi quality, and demonstrates the feasibility of fermenting Douchi using identified starter micro-organisms. © 2015 The Society for Applied Microbiology.
Spatial Statistics for Tumor Cell Counting and Classification

NASA Astrophysics Data System (ADS)

Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas

To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.
Survey of editors and reviewers of high-impact psychology journals: statistical and research design problems in submitted manuscripts.

PubMed

Harris, Alex; Reeder, Rachelle; Hyun, Jenny

2011-01-01

The authors surveyed 21 editors and reviewers from major psychology journals to identify and describe the statistical and design errors they encounter most often and to get their advice regarding prevention of these problems. Content analysis of the text responses revealed themes in 3 major areas: (a) problems with research design and reporting (e.g., lack of an a priori power analysis, lack of congruence between research questions and study design/analysis, failure to adequately describe statistical procedures); (b) inappropriate data analysis (e.g., improper use of analysis of variance, too many statistical tests without adjustments, inadequate strategy for addressing missing data); and (c) misinterpretation of results. If researchers attended to these common methodological and analytic issues, the scientific quality of manuscripts submitted to high-impact psychology journals might be significantly improved.
A Powerful Approach to Estimating Annotation-Stratified Genetic Covariance via GWAS Summary Statistics.

PubMed

Lu, Qiongshi; Li, Boyang; Ou, Derek; Erlendsdottir, Margret; Powles, Ryan L; Jiang, Tony; Hu, Yiming; Chang, David; Jin, Chentian; Dai, Wei; He, Qidu; Liu, Zefeng; Mukherjee, Shubhabrata; Crane, Paul K; Zhao, Hongyu

2017-12-07

Despite the success of large-scale genome-wide association studies (GWASs) on complex traits, our understanding of their genetic architecture is far from complete. Jointly modeling multiple traits' genetic profiles has provided insights into the shared genetic basis of many complex traits. However, large-scale inference sets a high bar for both statistical power and biological interpretability. Here we introduce a principled framework to estimate annotation-stratified genetic covariance between traits using GWAS summary statistics. Through theoretical and numerical analyses, we demonstrate that our method provides accurate covariance estimates, thereby enabling researchers to dissect both the shared and distinct genetic architecture across traits to better understand their etiologies. Among 50 complex traits with publicly accessible GWAS summary statistics (N total ≈ 4.5 million), we identified more than 170 pairs with statistically significant genetic covariance. In particular, we found strong genetic covariance between late-onset Alzheimer disease (LOAD) and amyotrophic lateral sclerosis (ALS), two major neurodegenerative diseases, in single-nucleotide polymorphisms (SNPs) with high minor allele frequencies and in SNPs located in the predicted functional genome. Joint analysis of LOAD, ALS, and other traits highlights LOAD's correlation with cognitive traits and hints at an autoimmune component for ALS. Copyright © 2017 American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
Statistical Control Paradigm for Aerospace Structures Under Impulsive Disturbances

DTIC Science & Technology

2006-08-03

attitude control system with an innovative and robust statistical controller design shows significant promise for use in attitude hold mode operation...indicate that the existing attitude control system with an innovative and robust statistical controller design shows significant promise for use in...and three thrusters are for use in controlling the attitude of the satellite. Then the angular momentum of the satellite with three thrusters and a
A Statistical Test for Identifying the Number of Creep Regimes When Using the Wilshire Equations for Creep Property Predictions

NASA Astrophysics Data System (ADS)

Evans, Mark

2016-12-01

A new parametric approach, termed the Wilshire equations, offers the realistic potential of being able to accurately lift materials operating at in-service conditions from accelerated test results lasting no more than 5000 hours. The success of this approach can be attributed to a well-defined linear relationship that appears to exist between various creep properties and a log transformation of the normalized stress. However, these linear trends are subject to discontinuities, the number of which appears to differ from material to material. These discontinuities have until now been (1) treated as abrupt in nature and (2) identified by eye from an inspection of simple graphical plots of the data. This article puts forward a statistical test for determining the correct number of discontinuities present within a creep data set and a method for allowing these discontinuities to occur more gradually, so that the methodology is more in line with the accepted view as to how creep mechanisms evolve with changing test conditions. These two developments are fully illustrated using creep data sets on two steel alloys. When these new procedures are applied to these steel alloys, not only do they produce more accurate and realistic looking long-term predictions of the minimum creep rate, but they also lead to different conclusions about the mechanisms determining the rates of creep from those originally put forward by Wilshire.
The applications of statistical quantification techniques in nanomechanics and nanoelectronics.

PubMed

Mai, Wenjie; Deng, Xinwei

2010-10-08

Although nanoscience and nanotechnology have been developing for approximately two decades and have achieved numerous breakthroughs, the experimental results from nanomaterials with a higher noise level and poorer repeatability than those from bulk materials still remain as a practical issue, and challenge many techniques of quantification of nanomaterials. This work proposes a physical-statistical modeling approach and a global fitting statistical method to use all the available discrete data or quasi-continuous curves to quantify a few targeted physical parameters, which can provide more accurate, efficient and reliable parameter estimates, and give reasonable physical explanations. In the resonance method for measuring the elastic modulus of ZnO nanowires (Zhou et al 2006 Solid State Commun. 139 222-6), our statistical technique gives E = 128.33 GPa instead of the original E = 108 GPa, and unveils a negative bias adjustment f(0). The causes are suggested by the systematic bias in measuring the length of the nanowires. In the electronic measurement of the resistivity of a Mo nanowire (Zach et al 2000 Science 290 2120-3), the proposed new method automatically identified the importance of accounting for the Ohmic contact resistance in the model of the Ohmic behavior in nanoelectronics experiments. The 95% confidence interval of resistivity in the proposed one-step procedure is determined to be 3.57 +/- 0.0274 x 10( - 5) ohm cm, which should be a more reliable and precise estimate. The statistical quantification technique should find wide applications in obtaining better estimations from various systematic errors and biased effects that become more significant at the nanoscale.
[Comment on] Statistical discrimination

NASA Astrophysics Data System (ADS)

Chinn, Douglas

In the December 8, 1981, issue of Eos, a news item reported the conclusion of a National Research Council study that sexual discrimination against women with Ph.D.'s exists in the field of geophysics. Basically, the item reported that even when allowances are made for motherhood the percentage of female Ph.D.'s holding high university and corporate positions is significantly lower than the percentage of male Ph.D.'s holding the same types of positions. The sexual discrimination conclusion, based only on these statistics, assumes that there are no basic psychological differences between men and women that might cause different populations in the employment group studied. Therefore, the reasoning goes, after taking into account possible effects from differences related to anatomy, such as women stopping their careers in order to bear and raise children, the statistical distributions of positions held by male and female Ph.D.'s ought to be very similar to one another. Any significant differences between the distributions must be caused primarily by sexual discrimination.
Statistical analysis of co-occurrence patterns in microbial presence-absence datasets.

PubMed

Mainali, Kumar P; Bewick, Sharon; Thielen, Peter; Mehoke, Thomas; Breitwieser, Florian P; Paudel, Shishir; Adhikari, Arjun; Wolfe, Joshua; Slud, Eric V; Karig, David; Fagan, William F

2017-01-01

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson's correlation coefficient (r) and Jaccard's index (J)-two of the most common metrics for correlation analysis of presence-absence data-can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson's correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard's index of similarity (J) can yield improvements over Pearson's correlation coefficient. However, the standard null model for Jaccard's index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately corrects for
Statistical analysis of co-occurrence patterns in microbial presence-absence datasets

PubMed Central

Bewick, Sharon; Thielen, Peter; Mehoke, Thomas; Breitwieser, Florian P.; Paudel, Shishir; Adhikari, Arjun; Wolfe, Joshua; Slud, Eric V.; Karig, David; Fagan, William F.

2017-01-01

Drawing on a long history in macroecology, correlation analysis of microbiome datasets is becoming a common practice for identifying relationships or shared ecological niches among bacterial taxa. However, many of the statistical issues that plague such analyses in macroscale communities remain unresolved for microbial communities. Here, we discuss problems in the analysis of microbial species correlations based on presence-absence data. We focus on presence-absence data because this information is more readily obtainable from sequencing studies, especially for whole-genome sequencing, where abundance estimation is still in its infancy. First, we show how Pearson’s correlation coefficient (r) and Jaccard’s index (J)–two of the most common metrics for correlation analysis of presence-absence data–can contradict each other when applied to a typical microbiome dataset. In our dataset, for example, 14% of species-pairs predicted to be significantly correlated by r were not predicted to be significantly correlated using J, while 37.4% of species-pairs predicted to be significantly correlated by J were not predicted to be significantly correlated using r. Mismatch was particularly common among species-pairs with at least one rare species (<10% prevalence), explaining why r and J might differ more strongly in microbiome datasets, where there are large numbers of rare taxa. Indeed 74% of all species-pairs in our study had at least one rare species. Next, we show how Pearson’s correlation coefficient can result in artificial inflation of positive taxon relationships and how this is a particular problem for microbiome studies. We then illustrate how Jaccard’s index of similarity (J) can yield improvements over Pearson’s correlation coefficient. However, the standard null model for Jaccard’s index is flawed, and thus introduces its own set of spurious conclusions. We thus identify a better null model based on a hypergeometric distribution, which appropriately

A novel statistical approach for identification of the master regulator transcription factor.

PubMed

Sikdar, Sinjini; Datta, Susmita

2017-02-02

Transcription factors are known to play key roles in carcinogenesis and therefore, are gaining popularity as potential therapeutic targets in drug development. A 'master regulator' transcription factor often appears to control most of the regulatory activities of the other transcription factors and the associated genes. This 'master regulator' transcription factor is at the top of the hierarchy of the transcriptomic regulation. Therefore, it is important to identify and target the master regulator transcription factor for proper understanding of the associated disease process and identifying the best therapeutic option. We present a novel two-step computational approach for identification of master regulator transcription factor in a genome. At the first step of our method we test whether there exists any master regulator transcription factor in the system. We evaluate the concordance of two ranked lists of transcription factors using a statistical measure. In case the concordance measure is statistically significant, we conclude that there is a master regulator. At the second step, our method identifies the master regulator transcription factor, if there exists one. In the simulation scenario, our method performs reasonably well in validating the existence of a master regulator when the number of subjects in each treatment group is reasonably large. In application to two real datasets, our method ensures the existence of master regulators and identifies biologically meaningful master regulators. An R code for implementing our method in a sample test data can be found in http://www.somnathdatta.org/software . We have developed a screening method of identifying the 'master regulator' transcription factor just using only the gene expression data. Understanding the regulatory structure and finding the master regulator help narrowing the search space for identifying biomarkers for complex diseases such as cancer. In addition to identifying the master regulator our
Relationship between Graduate Students' Statistics Self-Efficacy, Statistics Anxiety, Attitude toward Statistics, and Social Support

ERIC Educational Resources Information Center

Perepiczka, Michelle; Chandler, Nichelle; Becerra, Michael

2011-01-01

Statistics plays an integral role in graduate programs. However, numerous intra- and interpersonal factors may lead to successful completion of needed coursework in this area. The authors examined the extent of the relationship between self-efficacy to learn statistics and statistics anxiety, attitude towards statistics, and social support of 166…
A statistical parts-based appearance model of inter-subject variability.

PubMed

Toews, Matthew; Collins, D Louis; Arbel, Tal

2006-01-01

In this article, we present a general statistical parts-based model for representing the appearance of an image set, applied to the problem of inter-subject MR brain image matching. In contrast with global image representations such as active appearance models, the parts-based model consists of a collection of localized image parts whose appearance, geometry and occurrence frequency are quantified statistically. The parts-based approach explicitly addresses the case where one-to-one correspondence does not exist between subjects due to anatomical differences, as parts are not expected to occur in all subjects. The model can be learned automatically, discovering structures that appear with statistical regularity in a large set of subject images, and can be robustly fit to new images, all in the presence of significant inter-subject variability. As parts are derived from generic scale-invariant features, the framework can be applied in a wide variety of image contexts, in order to study the commonality of anatomical parts or to group subjects according to the parts they share. Experimentation shows that a parts-based model can be learned from a large set of MR brain images, and used to determine parts that are common within the group of subjects. Preliminary results indicate that the model can be used to automatically identify distinctive features for inter-subject image registration despite large changes in appearance.
A quantitative analysis of statistical power identifies obesity endpoints for improved in vivo preclinical study design

PubMed Central

Selimkhanov, Jangir; Thompson, W. Clayton; Guo, Juen; Hall, Kevin D.; Musante, Cynthia J.

2017-01-01

The design of well-powered in vivo preclinical studies is a key element in building knowledge of disease physiology for the purpose of identifying and effectively testing potential anti-obesity drug targets. However, as a result of the complexity of the obese phenotype, there is limited understanding of the variability within and between study animals of macroscopic endpoints such as food intake and body composition. This, combined with limitations inherent in the measurement of certain endpoints, presents challenges to study design that can have significant consequences for an anti-obesity program. Here, we analyze a large, longitudinal study of mouse food intake and body composition during diet perturbation to quantify the variability and interaction of key metabolic endpoints. To demonstrate how conclusions can change as a function of study size, we show that a simulated pre-clinical study properly powered for one endpoint may lead to false conclusions based on secondary endpoints. We then propose guidelines for endpoint selection and study size estimation under different conditions to facilitate proper power calculation for a more successful in vivo study design. PMID:28392555
Flash visual evoked potentials are not specific enough to identify parieto-occipital lobe involvement in term neonates after significant hypoglycaemia.

PubMed

Hu, Liyuan; Gu, Qiufang; Zhu, Zhen; Yang, Chenhao; Chen, Chao; Cao, Yun; Zhou, Wenhao

2014-08-01

Hypoglycaemia is a significant problem in high-risk neonates and predominant parieto-occipital lobe involvement has been observed after severe hypoglycaemic insult. We explored the use of flash visual evoked potentials (FVEP) in detecting parieto-occipital lobe involvement after significant hypoglycaemia. Full-term neonates (n = 15) who underwent FVEP from January 2008 to May 2013 were compared with infants (n = 11) without hypoglycaemia or parietal-occipital lobe injury. Significant hypoglycaemia was defined as being symptomatic or needing steroids, glucagon or a glucose infusion rate of ≥12 mg/kg/min. The hypoglycaemia group exhibited delayed latency of the first positive waveform on FVEP. The initial detected time for hypoglycaemia was later in the eight subjects with seizures (median 51-h-old) than those without (median 22-h-old) (P = 0.003). Magnetic resonance imaging showed that 80% of the hypoglycaemia group exhibited occipital-lobe injuries, and they were more likely to exhibit abnormal FVEP morphology (P = 0.007) than the controls. FVEP exhibited 100% sensitivity, but only 25% specificity, for detecting injuries to the parieto-occipital lobes. Flash visual evoked potential (FVEP) was sensitive, but not sufficiently specific, in identifying parieto-occipital lobe injuries among term neonates exposed to significant hypoglycaemia. Larger studies exploring the potential role of FVEP in neonatal hypoglycaemia are required. ©2014 Foundation Acta Paediatrica. Published by John Wiley & Sons Ltd.
Data-driven inference for the spatial scan statistic.

PubMed

Almeida, Alexandre C L; Duarte, Anderson R; Duczmal, Luiz H; Oliveira, Fernando L P; Takahashi, Ricardo H C

2011-08-02

Kulldorff's spatial scan statistic for aggregated area maps searches for clusters of cases without specifying their size (number of areas) or geographic location in advance. Their statistical significance is tested while adjusting for the multiple testing inherent in such a procedure. However, as is shown in this work, this adjustment is not done in an even manner for all possible cluster sizes. A modification is proposed to the usual inference test of the spatial scan statistic, incorporating additional information about the size of the most likely cluster found. A new interpretation of the results of the spatial scan statistic is done, posing a modified inference question: what is the probability that the null hypothesis is rejected for the original observed cases map with a most likely cluster of size k, taking into account only those most likely clusters of size k found under null hypothesis for comparison? This question is especially important when the p-value computed by the usual inference process is near the alpha significance level, regarding the correctness of the decision based in this inference. A practical procedure is provided to make more accurate inferences about the most likely cluster found by the spatial scan statistic.
Registered nurses' perceptions of rewarding and its significance.

PubMed

Seitovirta, Jaana; Lehtimäki, Aku-Ville; Vehviläinen-Julkunen, Katri; Mitronen, Lasse; Kvist, Tarja

2017-11-07

To examine reward type preferences and their relationships with the significance of rewarding perceived by registered nurses in Finland. Previous studies have found relationships between nurses' rewarding and their motivation at work, job satisfaction and organisational commitment. Data were collected in a cross-sectional, descriptive, questionnaire survey from 402 registered nurses using the Registered Nurses' Perceptions of Rewarding Scale in 2015, and analysed with descriptive and multivariate statistical methods. Registered nurses assigned slightly higher values to several non-financial than to financial rewards. The non-financial reward types appreciation and feedback from work community, worktime arrangements, work content, and opportunity to develop, influence and participate were highly related to the significance of rewarding. We identified various rewards that registered nurses value, and indications that providing an appropriate array of rewards, particularly non-financial rewards, is a highly beneficial element of nursing management. It is important to understand the value of rewards for nursing management. Nurse managers should offer diverse rewards to their registered nurses to promote excellent performance and to help efforts to secure and maintain high-quality, safe patient care. The use of appropriate rewards is especially crucial to improving registered nurses' reward satisfaction and job satisfaction globally in the nursing profession. © 2017 John Wiley & Sons Ltd.
Forest statistics of Indiana

Treesearch

The Forest Survey Organization Central States Forest Experiment Station

1953-01-01

The Forest Survey is conducted in the various regions by the forest experiment stations of the Forest Service. In Indiana the project is directed by the Central States Forest Experiment Station with headquarters in Columbus, Ohio. This Survey Release presents the more significant preliminary statistics on the forest area timber volume, timber growth, and timber drain...
Forest statistics of Kentucky

Treesearch

The Forest Survey Organization Central States Forest Experiment Station

1952-01-01

The Forest Survey is conducted in the various regions by the forest experiment stations of the Forest Service. In Kentucky the project is directed by the Central States Forest Experiment Station with headquarters in Columbus, Ohio. This Survey Release presents the more significant preliminary statistics on the forest area, timber volume, timber growth, and timber drain...
Statistical modeling of SRAM yield performance and circuit variability

NASA Astrophysics Data System (ADS)

Cheng, Qi; Chen, Yijian

2015-03-01

In this paper, we develop statistical models to investigate SRAM yield performance and circuit variability in the presence of self-aligned multiple patterning (SAMP) process. It is assumed that SRAM fins are fabricated by a positivetone (spacer is line) self-aligned sextuple patterning (SASP) process which accommodates two types of spacers, while gates are fabricated by a more pitch-relaxed self-aligned quadruple patterning (SAQP) process which only allows one type of spacer. A number of possible inverter and SRAM structures are identified and the related circuit multi-modality is studied using the developed failure-probability and yield models. It is shown that SRAM circuit yield is significantly impacted by the multi-modality of fins' spatial variations in a SRAM cell. The sensitivity of 6-transistor SRAM read/write failure probability to SASP process variations is calculated and the specific circuit type with the highest probability to fail in the reading/writing operation is identified. Our study suggests that the 6-transistor SRAM configuration may not be scalable to 7-nm half pitch and more robust SRAM circuit design needs to be researched.
A simulation study of the strength of evidence in the recommendation of medications based on two trials with statistically significant results

PubMed Central

Ioannidis, John P. A.

2017-01-01

A typical rule that has been used for the endorsement of new medications by the Food and Drug Administration is to have two trials, each convincing on its own, demonstrating effectiveness. “Convincing” may be subjectively interpreted, but the use of p-values and the focus on statistical significance (in particular with p < .05 being coined significant) is pervasive in clinical research. Therefore, in this paper, we calculate with simulations what it means to have exactly two trials, each with p < .05, in terms of the actual strength of evidence quantified by Bayes factors. Our results show that different cases where two trials have a p-value below .05 have wildly differing Bayes factors. Bayes factors of at least 20 in favor of the alternative hypothesis are not necessarily achieved and they fail to be reached in a large proportion of cases, in particular when the true effect size is small (0.2 standard deviations) or zero. In a non-trivial number of cases, evidence actually points to the null hypothesis, in particular when the true effect size is zero, when the number of trials is large, and when the number of participants in both groups is low. We recommend use of Bayes factors as a routine tool to assess endorsement of new medications, because Bayes factors consistently quantify strength of evidence. Use of p-values may lead to paradoxical and spurious decision-making regarding the use of new medications. PMID:28273140
Multivariate two-part statistics for analysis of correlated mass spectrometry data from multiple biological specimens.

PubMed

Taylor, Sandra L; Ruhaak, L Renee; Weiss, Robert H; Kelly, Karen; Kim, Kyoungmi

2017-01-01

High through-put mass spectrometry (MS) is now being used to profile small molecular compounds across multiple biological sample types from the same subjects with the goal of leveraging information across biospecimens. Multivariate statistical methods that combine information from all biospecimens could be more powerful than the usual univariate analyses. However, missing values are common in MS data and imputation can impact between-biospecimen correlation and multivariate analysis results. We propose two multivariate two-part statistics that accommodate missing values and combine data from all biospecimens to identify differentially regulated compounds. Statistical significance is determined using a multivariate permutation null distribution. Relative to univariate tests, the multivariate procedures detected more significant compounds in three biological datasets. In a simulation study, we showed that multi-biospecimen testing procedures were more powerful than single-biospecimen methods when compounds are differentially regulated in multiple biospecimens but univariate methods can be more powerful if compounds are differentially regulated in only one biospecimen. We provide R functions to implement and illustrate our method as supplementary information CONTACT: sltaylor@ucdavis.eduSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Statistical Exposé of a Multiple-Compartment Anaerobic Reactor Treating Domestic Wastewater.

PubMed

Pfluger, Andrew R; Hahn, Martha J; Hering, Amanda S; Munakata-Marr, Junko; Figueroa, Linda

2018-06-01

Mainstream anaerobic treatment of domestic wastewater is a promising energy-generating treatment strategy; however, such reactors operated in colder regions are not well characterized. Performance data from a pilot-scale, multiple-compartment anaerobic reactor taken over 786 days were subjected to comprehensive statistical analyses. Results suggest that chemical oxygen demand (COD) was a poor proxy for organics in anaerobic systems as oxygen demand from dissolved inorganic material, dissolved methane, and colloidal material influence dissolved and particulate COD measurements. Additionally, univariate and functional boxplots were useful in visualizing variability in contaminant concentrations and identifying statistical outliers. Further, significantly different dissolved organic removal and methane production was observed between operational years, suggesting that anaerobic reactor systems may not achieve steady-state performance within one year. Last, modeling multiple-compartment reactor systems will require data collected over at least two years to capture seasonal variations of the major anaerobic microbial functions occurring within each reactor compartment.
Is it possible to identify a risk factor condition of hypocalcemia in patients candidates to thyroidectomy for benign disease?

PubMed

Del Rio, Paolo; Iapichino, Gioacchino; De Simone, Belinda; Bezer, Lamia; Arcuri, MariaFrancesca; Sianesi, Mario

2010-01-01

Hypocalcaemia is the most frequent complication after total thyroidectomy. The incidence of postoperative hypocalcaemia is reported with different percentages in literature. We report 227 patients undergoing surgery for benign thyroid disease. After obtaining patient's informed consent, we collected and analyzed prospectively the following data: calcium serum levels pre and postoperative in the first 24 hours after surgery according to sex, age, duration of surgery, number of parathyroids identified by the surgeon, surgical technique (open and minimally invasive video-assisted thyroidectomy, i.e., MIVAT). We have considered cases treated consecutively from the same two experienced endocrine surgeons. Hypocalcaemia is assumed when the value of serum calcium is below 7.5 mg/dL. Pre-and post-operative mean serum calcium, with confidence intervals at 99% divided by sex, revealed a statistically significant difference in the ANOVA test (p < 0.01) in terms of incidence. Female sex has higher incidence of hypocalcemia. The evaluation of the mean serum calcium in pre-and post-operative period, with confidence intervals at 95%, depending on the number of identified parathyroid glands by surgeon, showed that the result is not correlated with values of postoperative serum calcium. Age and pre-and postoperative serum calcium values with confidence intervals at 99% based on sex of patients, didn't show statistically significant differences. We haven't highlighted a significant difference in postoperative hypocalcemia in patients treated with conventional thyroidectomy versus MIVAT. A difference in pre- and postoperative mean serum calcium occurs in all patients surgically treated. The only statistical meaningful risk factor for hypocalcemia has been the female sex.
An ANOVA approach for statistical comparisons of brain networks.

PubMed

Fraiman, Daniel; Fraiman, Ricardo

2018-03-16

The study of brain networks has developed extensively over the last couple of decades. By contrast, techniques for the statistical analysis of these networks are less developed. In this paper, we focus on the statistical comparison of brain networks in a nonparametric framework and discuss the associated detection and identification problems. We tested network differences between groups with an analysis of variance (ANOVA) test we developed specifically for networks. We also propose and analyse the behaviour of a new statistical procedure designed to identify different subnetworks. As an example, we show the application of this tool in resting-state fMRI data obtained from the Human Connectome Project. We identify, among other variables, that the amount of sleep the days before the scan is a relevant variable that must be controlled. Finally, we discuss the potential bias in neuroimaging findings that is generated by some behavioural and brain structure variables. Our method can also be applied to other kind of networks such as protein interaction networks, gene networks or social networks.
The Effect Size Statistic: Overview of Various Choices.

ERIC Educational Resources Information Center

Mahadevan, Lakshmi

Over the years, methodologists have been recommending that researchers use magnitude of effect estimates in result interpretation to highlight the distinction between statistical and practical significance (cf. R. Kirk, 1996). A magnitude of effect statistic (i.e., effect size) tells to what degree the dependent variable can be controlled,…
Identifying seizure clusters in patients with epilepsy

PubMed Central

Lipton, R. B.; LeValley, A. J.; Hall, C. B.; Shinnar, S.

2006-01-01

Clinicians often encounter patients whose neurologic attacks appear to cluster. In a daily diary study, the authors explored whether clustering is a true phenomenon in epilepsy and can be identified in the clinical setting. Nearly half the subjects experienced at least one episode of three or more seizures in 24 hours; 20% also met a statistical clustering criterion. Utilizing the clinical definition of clustering should identify all seizure clusterers, and false positives can be determined with diary data. PMID:16247068
Harmonic statistics

NASA Astrophysics Data System (ADS)

Eliazar, Iddo

2017-05-01

The exponential, the normal, and the Poisson statistical laws are of major importance due to their universality. Harmonic statistics are as universal as the three aforementioned laws, but yet they fall short in their 'public relations' for the following reason: the full scope of harmonic statistics cannot be described in terms of a statistical law. In this paper we describe harmonic statistics, in their full scope, via an object termed harmonic Poisson process: a Poisson process, over the positive half-line, with a harmonic intensity. The paper reviews the harmonic Poisson process, investigates its properties, and presents the connections of this object to an assortment of topics: uniform statistics, scale invariance, random multiplicative perturbations, Pareto and inverse-Pareto statistics, exponential growth and exponential decay, power-law renormalization, convergence and domains of attraction, the Langevin equation, diffusions, Benford's law, and 1/f noise.
AIC identifies optimal representation of longitudinal dietary variables.

PubMed

VanBuren, John; Cavanaugh, Joseph; Marshall, Teresa; Warren, John; Levy, Steven M

2017-09-01

The Akaike Information Criterion (AIC) is a well-known tool for variable selection in multivariable modeling as well as a tool to help identify the optimal representation of explanatory variables. However, it has been discussed infrequently in the dental literature. The purpose of this paper is to demonstrate the use of AIC in determining the optimal representation of dietary variables in a longitudinal dental study. The Iowa Fluoride Study enrolled children at birth and dental examinations were conducted at ages 5, 9, 13, and 17. Decayed or filled surfaces (DFS) trend clusters were created based on age 13 DFS counts and age 13-17 DFS increments. Dietary intake data (water, milk, 100 percent-juice, and sugar sweetened beverages) were collected semiannually using a food frequency questionnaire. Multinomial logistic regression models were fit to predict DFS cluster membership (n=344). Multiple approaches could be used to represent the dietary data including averaging across all collected surveys or over different shorter time periods to capture age-specific trends or using the individual time points of dietary data. AIC helped identify the optimal representation. Averaging data for all four dietary variables for the whole period from age 9.0 to 17.0 provided a better representation in the multivariable full model (AIC=745.0) compared to other methods assessed in full models (AICs=750.6 for age 9 and 9-13 increment dietary measurements and AIC=762.3 for age 9, 13, and 17 individual measurements). The results illustrate that AIC can help researchers identify the optimal way to summarize information for inclusion in a statistical model. The method presented here can be used by researchers performing statistical modeling in dental research. This method provides an alternative approach for assessing the propriety of variable representation to significance-based procedures, which could potentially lead to improved research in the dental community. © 2017 American
How Does Teacher Knowledge in Statistics Impact on Teacher Listening?

ERIC Educational Resources Information Center

Burgess, Tim

2012-01-01

For teaching statistics investigations at primary school level, teacher knowledge has been identified using a framework developed from a classroom based study. Through development of the framework, three types of teacher listening problems were identified, each of which had potential impact on the students' learning. The three types of problems…

Statistical relations of salt and selenium loads to geospatial characteristics of corresponding subbasins of the Colorado and Gunnison Rivers in Colorado

USGS Publications Warehouse

Leib, Kenneth J.; Linard, Joshua I.; Williams, Cory A.

2012-01-01

Elevated loads of salt and selenium can impair the quality of water for both anthropogenic and natural uses. Understanding the environmental processes controlling how salt and selenium are introduced to streams is critical to managing and mitigating the effects of elevated loads. Dominant relations between salt and selenium loads and environmental characteristics can be established by using geospatial data. The U.S. Geological Survey, in cooperation with the Bureau of Reclamation, investigated statistical relations between seasonal salt or selenium loads emanating from the Upper Colorado River Basin and geospatial data. Salt and selenium loads measured during the irrigation and nonirrigation seasons were related to geospatial variables for 168 subbasins within the Gunnison and Colorado River Basins. These geospatial variables represented subbasin characteristics of the physical environment, precipitation, geology, land use, and the irrigation network. All subbasin variables with units of area had statistically significant relations with load. The few variables that were not in units of area but were statistically significant helped to identify types of geospatial data that might influence salt and selenium loading. Following a stepwise approach, combinations of these statistically significant variables were used to develop multiple linear regression models. The models can be used to help prioritize areas where salt and selenium control projects might be most effective.
Latinos in science: Identifying factors that influence the low percentage of Latino representation in the sciences

NASA Astrophysics Data System (ADS)

Miranda, Susan Jennifer

A mixed methods approach was used to identify factors that influence the underrepresentation of Latinos in the domain of science. The researcher investigated the role of family influences, academic preparation, and personal motivations to determine science-related career choices by Latinos. Binary logistic regression analyses were conducted using information from Latinos gathered from the National Education Longitudinal Study of 1988 (NELS: 88) administered by the National Center for Education Statistics. For the present study, data were analyzed using participants' responses as high school seniors, college students, and post-baccalaureates. Students responded to questions on school, work, parental academic influences, personal aspirations, and self-perception. To provide more insight into the experiences of Latinos in science and support the statistical analyses, nine students majoring in science in a private, urban university located in the northeastern part of the country were interviewed. Eleven variables related to parents' academic support and students' perceptions of parental support were taken together as predictors for two separate criteria from the survey. These results identified parents' level of education and the importance of academics to parents in their teen's college choice as significant predictors in determining college major in science. When the criterion was degree in science, the significant predictor was the frequency parents contacted high school as volunteers. Student interviews supported this information, demonstrating the importance of parental support in attaining a degree in science. Academic preparation was also analyzed. Students' reasons for taking science classes in high school was a significant predictor for science major; significant predictors for science degree were the emphasis placed on objectives in math and science classes and number of courses in biology and physics. Student interviews supported this information and
Enriched pathways for major depressive disorder identified from a genome-wide association study.

PubMed

Kao, Chung-Feng; Jia, Peilin; Zhao, Zhongming; Kuo, Po-Hsiu

2012-11-01

Major depressive disorder (MDD) has caused a substantial burden of disease worldwide with moderate heritability. Despite efforts through conducting numerous association studies and now, genome-wide association (GWA) studies, the success of identifying susceptibility loci for MDD has been limited, which is partially attributed to the complex nature of depression pathogenesis. A pathway-based analytic strategy to investigate the joint effects of various genes within specific biological pathways has emerged as a powerful tool for complex traits. The present study aimed to identify enriched pathways for depression using a GWA dataset for MDD. For each gene, we estimated its gene-wise p value using combined and minimum p value, separately. Canonical pathways from the Kyoto Encyclopedia of Genes and Genomes (KEGG) and BioCarta were used. We employed four pathway-based analytic approaches (gene set enrichment analysis, hypergeometric test, sum-square statistic, sum-statistic). We adjusted for multiple testing using Benjamini & Hochberg's method to report significant pathways. We found 17 significantly enriched pathways for depression, which presented low-to-intermediate crosstalk. The top four pathways were long-term depression (p⩽1×10-5), calcium signalling (p⩽6×10-5), arrhythmogenic right ventricular cardiomyopathy (p⩽1.6×10-4) and cell adhesion molecules (p⩽2.2×10-4). In conclusion, our comprehensive pathway analyses identified promising pathways for depression that are related to neurotransmitter and neuronal systems, immune system and inflammatory response, which may be involved in the pathophysiological mechanisms underlying depression. We demonstrated that pathway enrichment analysis is promising to facilitate our understanding of complex traits through a deeper interpretation of GWA data. Application of this comprehensive analytic strategy in upcoming GWA data for depression could validate the findings reported in this study.
Statistical process control in nursing research.

PubMed

Polit, Denise F; Chaboyer, Wendy

2012-02-01

In intervention studies in which randomization to groups is not possible, researchers typically use quasi-experimental designs. Time series designs are strong quasi-experimental designs but are seldom used, perhaps because of technical and analytic hurdles. Statistical process control (SPC) is an alternative analytic approach to testing hypotheses about intervention effects using data collected over time. SPC, like traditional statistical methods, is a tool for understanding variation and involves the construction of control charts that distinguish between normal, random fluctuations (common cause variation), and statistically significant special cause variation that can result from an innovation. The purpose of this article is to provide an overview of SPC and to illustrate its use in a study of a nursing practice improvement intervention. Copyright © 2011 Wiley Periodicals, Inc.
Statistics Anxiety and Business Statistics: The International Student

ERIC Educational Resources Information Center

Bell, James A.

2008-01-01

Does the international student suffer from statistics anxiety? To investigate this, the Statistics Anxiety Rating Scale (STARS) was administered to sixty-six beginning statistics students, including twelve international students and fifty-four domestic students. Due to the small number of international students, nonparametric methods were used to…
Changing world extreme temperature statistics

NASA Astrophysics Data System (ADS)

Finkel, J. M.; Katz, J. I.

2018-04-01

We use the Global Historical Climatology Network--daily database to calculate a nonparametric statistic that describes the rate at which all-time daily high and low temperature records have been set in nine geographic regions (continents or major portions of continents) during periods mostly from the mid-20th Century to the present. This statistic was defined in our earlier work on temperature records in the 48 contiguous United States. In contrast to this earlier work, we find that in every region except North America all-time high records were set at a rate significantly (at least $3\\sigma$) higher than in the null hypothesis of a stationary climate. Except in Antarctica, all-time low records were set at a rate significantly lower than in the null hypothesis. In Europe, North Africa and North Asia the rate of setting new all-time highs increased suddenly in the 1990's, suggesting a change in regional climate regime; in most other regions there was a steadier increase.
A Simple Test Identifies Selection on Complex Traits.

PubMed

Beissinger, Tim; Kruppa, Jochen; Cavero, David; Ha, Ngoc-Thuy; Erbe, Malena; Simianer, Henner

2018-05-01

Important traits in agricultural, natural, and human populations are increasingly being shown to be under the control of many genes that individually contribute only a small proportion of genetic variation. However, the majority of modern tools in quantitative and population genetics, including genome-wide association studies and selection-mapping protocols, are designed to identify individual genes with large effects. We have developed an approach to identify traits that have been under selection and are controlled by large numbers of loci. In contrast to existing methods, our technique uses additive-effects estimates from all available markers, and relates these estimates to allele-frequency change over time. Using this information, we generate a composite statistic, denoted [Formula: see text] which can be used to test for significant evidence of selection on a trait. Our test requires pre- and postselection genotypic data but only a single time point with phenotypic information. Simulations demonstrate that [Formula: see text] is powerful for identifying selection, particularly in situations where the trait being tested is controlled by many genes, which is precisely the scenario where classical approaches for selection mapping are least powerful. We apply this test to breeding populations of maize and chickens, where we demonstrate the successful identification of selection on traits that are documented to have been under selection. Copyright © 2018 Beissinger et al.
Stable statistical representations facilitate visual search.

PubMed

Corbett, Jennifer E; Melcher, David

2014-10-01

Observers represent the average properties of object ensembles even when they cannot identify individual elements. To investigate the functional role of ensemble statistics, we examined how modulating statistical stability affects visual search. We varied the mean and/or individual sizes of an array of Gabor patches while observers searched for a tilted target. In "stable" blocks, the mean and/or local sizes of the Gabors were constant over successive displays, whereas in "unstable" baseline blocks they changed from trial to trial. Although there was no relationship between the context and the spatial location of the target, observers found targets faster (as indexed by faster correct responses and fewer saccades) as the global mean size became stable over several displays. Building statistical stability also facilitated scanning the scene, as measured by larger saccadic amplitudes, faster saccadic reaction times, and shorter fixation durations. These findings suggest a central role for peripheral visual information, creating context to free resources for detailed processing of salient targets and maintaining the illusion of visual stability.
Targeting regional pediatric congenital hearing loss using a spatial scan statistic.

PubMed

Bush, Matthew L; Christian, Warren Jay; Bianchi, Kristin; Lester, Cathy; Schoenberg, Nancy

2015-01-01

Congenital hearing loss is a common problem, and timely identification and intervention are paramount for language development. Patients from rural regions may have many barriers to timely diagnosis and intervention. The purpose of this study was to examine the spatial and hospital-based distribution of failed infant hearing screening testing and pediatric congenital hearing loss throughout Kentucky. Data on live births and audiological reporting of infant hearing loss results in Kentucky from 2009 to 2011 were analyzed. The authors used spatial scan statistics to identify high-rate clusters of failed newborn screening tests and permanent congenital hearing loss (PCHL), based on the total number of live births per county. The authors conducted further analyses on PCHL and failed newborn hearing screening tests, based on birth hospital data and method of screening. The authors observed four statistically significant (p < 0.05) high-rate clusters with failed newborn hearing screenings in Kentucky, including two in the Appalachian region. Hospitals using two-stage otoacoustic emission testing demonstrated higher rates of failed screening (p = 0.009) than those using two-stage automated auditory brainstem response testing. A significant cluster of high rate of PCHL was observed in Western Kentucky. Five of the 54 birthing hospitals were found to have higher relative risk of PCHL, and two of those hospitals are located in a very rural region of Western Kentucky within the cluster. This spatial analysis in children in Kentucky has identified specific regions throughout the state with high rates of congenital hearing loss and failed newborn hearing screening tests. Further investigation regarding causative factors is warranted. This method of analysis can be useful in the setting of hearing health disparities to focus efforts on regions facing high incidence of congenital hearing loss.
A SIGNIFICANCE TEST FOR THE LASSO1

PubMed Central

Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert

2014-01-01

In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and
28 CFR 22.28 - Use of data identifiable to a private person for judicial, legislative or administrative purposes.

Code of Federal Regulations, 2011 CFR

2011-07-01

... DEPARTMENT OF JUSTICE CONFIDENTIALITY OF IDENTIFIABLE RESEARCH AND STATISTICAL INFORMATION § 22.28 Use of...) Research or statistical information identifiable to a private person shall be immune from legal process and...
28 CFR 22.28 - Use of data identifiable to a private person for judicial, legislative or administrative purposes.

Code of Federal Regulations, 2010 CFR

2010-07-01

... DEPARTMENT OF JUSTICE CONFIDENTIALITY OF IDENTIFIABLE RESEARCH AND STATISTICAL INFORMATION § 22.28 Use of...) Research or statistical information identifiable to a private person shall be immune from legal process and...
AN EXPLORATION OF THE STATISTICAL SIGNATURES OF STELLAR FEEDBACK

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boyden, Ryan D.; Offner, Stella S. R.; Koch, Eric W.

2016-12-20

All molecular clouds are observed to be turbulent, but the origin, means of sustenance, and evolution of the turbulence remain debated. One possibility is that stellar feedback injects enough energy into the cloud to drive observed motions on parsec scales. Recent numerical studies of molecular clouds have found that feedback from stars, such as protostellar outflows and winds, injects energy and impacts turbulence. We expand upon these studies by analyzing magnetohydrodynamic simulations of molecular clouds, including stellar winds, with a range of stellar mass-loss rates and magnetic field strengths. We generate synthetic {sup 12}CO(1–0) maps assuming that the simulations aremore » at the distance of the nearby Perseus molecular cloud. By comparing the outputs from different initial conditions and evolutionary times, we identify differences in the synthetic observations and characterize these using common astrostatistics. We quantify the different statistical responses using a variety of metrics proposed in the literature. We find that multiple astrostatistics, including the principal component analysis, the spectral correlation function, and the velocity coordinate spectrum (VCS), are sensitive to changes in stellar mass-loss rates and/or time evolution. A few statistics, including the Cramer statistic and VCS, are sensitive to the magnetic field strength. These findings demonstrate that stellar feedback influences molecular cloud turbulence and can be identified and quantified observationally using such statistics.« less
Identifying the impact of social determinants of health on disease rates using correlation analysis of area-based summary information.

PubMed

Song, Ruiguang; Hall, H Irene; Harrison, Kathleen McDavid; Sharpe, Tanya Telfair; Lin, Lillian S; Dean, Hazel D

2011-01-01

We developed a statistical tool that brings together standard, accessible, and well-understood analytic approaches and uses area-based information and other publicly available data to identify social determinants of health (SDH) that significantly affect the morbidity of a specific disease. We specified AIDS as the disease of interest and used data from the American Community Survey and the National HIV Surveillance System. Morbidity and socioeconomic variables in the two data systems were linked through geographic areas that can be identified in both systems. Correlation and partial correlation coefficients were used to measure the impact of socioeconomic factors on AIDS diagnosis rates in certain geographic areas. We developed an easily explained approach that can be used by a data analyst with access to publicly available datasets and standard statistical software to identify the impact of SDH. We found that the AIDS diagnosis rate was highly correlated with the distribution of race/ethnicity, population density, and marital status in an area. The impact of poverty, education level, and unemployment depended on other SDH variables. Area-based measures of socioeconomic variables can be used to identify risk factors associated with a disease of interest. When correlation analysis is used to identify risk factors, potential confounding from other variables must be taken into account.
Fragile entanglement statistics

NASA Astrophysics Data System (ADS)

Brody, Dorje C.; Hughston, Lane P.; Meier, David M.

2015-10-01

If X and Y are independent, Y and Z are independent, and so are X and Z, one might be tempted to conclude that X, Y, and Z are independent. But it has long been known in classical probability theory that, intuitive as it may seem, this is not true in general. In quantum mechanics one can ask whether analogous statistics can emerge for configurations of particles in certain types of entangled states. The explicit construction of such states, along with the specification of suitable sets of observables that have the purported statistical properties, is not entirely straightforward. We show that an example of such a configuration arises in the case of an N-particle GHZ state, and we are able to identify a family of observables with the property that the associated measurement outcomes are independent for any choice of 2,3,\\ldots ,N-1 of the particles, even though the measurement outcomes for all N particles are not independent. Although such states are highly entangled, the entanglement turns out to be ‘fragile’, i.e. the associated density matrix has the property that if one traces out the freedom associated with even a single particle, the resulting reduced density matrix is separable.
Software Used to Generate Cancer Statistics - SEER Cancer Statistics

Cancer.gov

Videos that highlight topics and trends in cancer statistics and definitions of statistical terms. Also software tools for analyzing and reporting cancer statistics, which are used to compile SEER's annual reports.
Oculocutaneous albinism: identifying and overcoming barriers to vision care in a Nigerian population.

PubMed

Udeh, N N; Eze, B I; Onwubiko, S N; Arinze, O C; Onwasigwe, E N; Umeh, R E

2014-06-01

To assess eye care service utilization, and identify access barriers in a south-eastern Nigerian albino population. The study was a population-based, cross-sectional survey conducted in Enugu state between August, 2011 and January, 2012. Using the data base of the state's Albino Foundation and tailored awareness creation, persons living with albinism were identified and recruited at two study centres. Data on participants' socio-demographics, perception of vision, visual needs, previous eye examination and or low vision assessment, use of glasses or low vision devices were collected. Reasons for non-utilisation of available vision care services were also obtained. Descriptive and comparative statistics were performed. A p < 0.05 was considered statistically significant. The participants (n = 153; males 70; females 83; sex ratio: 1:1.1) were aged 23.46 + 10.44 SD years (range 6-60 years). Most--95.4 % of the participants had no previous low vision assessment and none--0.0% had used low vision device. Of the participants, 82.4% reported previous eye examination, 33.3% had not used spectacles previously, despite the existing need. Ignorance--88.9% and poor access--8.5% were the main barriers to uptake of vision care services. In Enugu, Nigeria, there is poor awareness and low utilization of vision care services among people with albinism. The identified barriers to vision care access are amenable to awareness creation and logistic change in the provision of appropriate vision care services.
Statistically optimal perception and learning: from behavior to neural representations

PubMed Central

Fiser, József; Berkes, Pietro; Orbán, Gergő; Lengyel, Máté

2010-01-01

Human perception has recently been characterized as statistical inference based on noisy and ambiguous sensory inputs. Moreover, suitable neural representations of uncertainty have been identified that could underlie such probabilistic computations. In this review, we argue that learning an internal model of the sensory environment is another key aspect of the same statistical inference procedure and thus perception and learning need to be treated jointly. We review evidence for statistically optimal learning in humans and animals, and reevaluate possible neural representations of uncertainty based on their potential to support statistically optimal learning. We propose that spontaneous activity can have a functional role in such representations leading to a new, sampling-based, framework of how the cortex represents information and uncertainty. PMID:20153683
Quantifying spatial and temporal trends in beach-dune volumetric changes using spatial statistics

NASA Astrophysics Data System (ADS)

Eamer, Jordan B. R.; Walker, Ian J.

2013-06-01

Spatial statistics are generally underutilized in coastal geomorphology, despite offering great potential for identifying and quantifying spatial-temporal trends in landscape morphodynamics. In particular, local Moran's Ii provides a statistical framework for detecting clusters of significant change in an attribute (e.g., surface erosion or deposition) and quantifying how this changes over space and time. This study analyzes and interprets spatial-temporal patterns in sediment volume changes in a beach-foredune-transgressive dune complex following removal of invasive marram grass (Ammophila spp.). Results are derived by detecting significant changes in post-removal repeat DEMs derived from topographic surveys and airborne LiDAR. The study site was separated into discrete, linked geomorphic units (beach, foredune, transgressive dune complex) to facilitate sub-landscape scale analysis of volumetric change and sediment budget responses. Difference surfaces derived from a pixel-subtraction algorithm between interval DEMs and the LiDAR baseline DEM were filtered using the local Moran's Ii method and two different spatial weights (1.5 and 5 m) to detect statistically significant change. Moran's Ii results were compared with those derived from a more spatially uniform statistical method that uses a simpler student's t distribution threshold for change detection. Morphodynamic patterns and volumetric estimates were similar between the uniform geostatistical method and Moran's Ii at a spatial weight of 5 m while the smaller spatial weight (1.5 m) consistently indicated volumetric changes of less magnitude. The larger 5 m spatial weight was most representative of broader site morphodynamics and spatial patterns while the smaller spatial weight provided volumetric changes consistent with field observations. All methods showed foredune deflation immediately following removal with increased sediment volumes into the spring via deposition at the crest and on lobes in the lee
A Bifactor Approach to Model Multifaceted Constructs in Statistical Mediation Analysis

ERIC Educational Resources Information Center

Gonzalez, Oscar; MacKinnon, David P.

2018-01-01

Statistical mediation analysis allows researchers to identify the most important mediating constructs in the causal process studied. Identifying specific mediators is especially relevant when the hypothesized mediating construct consists of multiple related facets. The general definition of the construct and its facets might relate differently to…

Some links on this page may take you to non-federal websites. Their policies may differ from this site.