Bootstrap Methods: A Very Leisurely Look.
ERIC Educational Resources Information Center
Hinkle, Dennis E.; Winstead, Wayland H.
The Bootstrap method, a computer-intensive statistical method of estimation, is illustrated using a simple and efficient Statistical Analysis System (SAS) routine. The utility of the method for generating unknown parameters, including standard errors for simple statistics, regression coefficients, discriminant function coefficients, and factor…
Statistical Tutorial | Center for Cancer Research
Recent advances in cancer biology have resulted in the need for increased statistical analysis of research data. ST is designed as a follow up to Statistical Analysis of Research Data (SARD) held in April 2018. The tutorial will apply the general principles of statistical analysis of research data including descriptive statistics, z- and t-tests of means and mean differences, simple and multiple linear regression, ANOVA tests, and Chi-Squared distribution.
[Bayesian statistics in medicine -- part II: main applications and inference].
Montomoli, C; Nichelatti, M
2008-01-01
Bayesian statistics is not only used when one is dealing with 2-way tables, but it can be used for inferential purposes. Using the basic concepts presented in the first part, this paper aims to give a simple overview of Bayesian methods by introducing its foundation (Bayes' theorem) and then applying this rule to a very simple practical example; whenever possible, the elementary processes at the basis of analysis are compared to those of frequentist (classical) statistical analysis. The Bayesian reasoning is naturally connected to medical activity, since it appears to be quite similar to a diagnostic process.
ERIC Educational Resources Information Center
Haans, Antal
2018-01-01
Contrast analysis is a relatively simple but effective statistical method for testing theoretical predictions about differences between group means against the empirical data. Despite its advantages, contrast analysis is hardly used to date, perhaps because it is not implemented in a convenient manner in many statistical software packages. This…
Applying Statistics in the Undergraduate Chemistry Laboratory: Experiments with Food Dyes.
ERIC Educational Resources Information Center
Thomasson, Kathryn; Lofthus-Merschman, Sheila; Humbert, Michelle; Kulevsky, Norman
1998-01-01
Describes several experiments to teach different aspects of the statistical analysis of data using household substances and a simple analysis technique. Each experiment can be performed in three hours. Students learn about treatment of spurious data, application of a pooled variance, linear least-squares fitting, and simultaneous analysis of dyes…
Statistics without Tears: Complex Statistics with Simple Arithmetic
ERIC Educational Resources Information Center
Smith, Brian
2011-01-01
One of the often overlooked aspects of modern statistics is the analysis of time series data. Modern introductory statistics courses tend to rush to probabilistic applications involving risk and confidence. Rarely does the first level course linger on such useful and fascinating topics as time series decomposition, with its practical applications…
NAUSEA and the Principle of Supplementarity of Damping and Isolation in Noise Control.
1980-02-01
New approaches and uses of the statistical energy analysis (NAUSEA) have been considered and developed in recent months. The advances were made...possible in that the requirement, in the olde statistical energy analysis , that the dynamic systems be highly reverberant and the couplings between the...analytical consideration in terms of the statistical energy analysis (SEA). A brief discussion and simple examples that relate to these recent advances
On the (In)Validity of Tests of Simple Mediation: Threats and Solutions
Pek, Jolynn; Hoyle, Rick H.
2015-01-01
Mediation analysis is a popular framework for identifying underlying mechanisms in social psychology. In the context of simple mediation, we review and discuss the implications of three facets of mediation analysis: (a) conceptualization of the relations between the variables, (b) statistical approaches, and (c) relevant elements of design. We also highlight the issue of equivalent models that are inherent in simple mediation. The extent to which results are meaningful stem directly from choices regarding these three facets of mediation analysis. We conclude by discussing how mediation analysis can be better applied to examine causal processes, highlight the limits of simple mediation, and make recommendations for better practice. PMID:26985234
ERIC Educational Resources Information Center
Brossart, Daniel F.; Parker, Richard I.; Olson, Elizabeth A.; Mahadevan, Lakshmi
2006-01-01
This study explored some practical issues for single-case researchers who rely on visual analysis of graphed data, but who also may consider supplemental use of promising statistical analysis techniques. The study sought to answer three major questions: (a) What is a typical range of effect sizes from these analytic techniques for data from…
Using R-Project for Free Statistical Analysis in Extension Research
ERIC Educational Resources Information Center
Mangiafico, Salvatore S.
2013-01-01
One option for Extension professionals wishing to use free statistical software is to use online calculators, which are useful for common, simple analyses. A second option is to use a free computing environment capable of performing statistical analyses, like R-project. R-project is free, cross-platform, powerful, and respected, but may be…
Ganger, Michael T; Dietz, Geoffrey D; Ewing, Sarah J
2017-12-01
qPCR has established itself as the technique of choice for the quantification of gene expression. Procedures for conducting qPCR have received significant attention; however, more rigorous approaches to the statistical analysis of qPCR data are needed. Here we develop a mathematical model, termed the Common Base Method, for analysis of qPCR data based on threshold cycle values (C q ) and efficiencies of reactions (E). The Common Base Method keeps all calculations in the logscale as long as possible by working with log 10 (E) ∙ C q , which we call the efficiency-weighted C q value; subsequent statistical analyses are then applied in the logscale. We show how efficiency-weighted C q values may be analyzed using a simple paired or unpaired experimental design and develop blocking methods to help reduce unexplained variation. The Common Base Method has several advantages. It allows for the incorporation of well-specific efficiencies and multiple reference genes. The method does not necessitate the pairing of samples that must be performed using traditional analysis methods in order to calculate relative expression ratios. Our method is also simple enough to be implemented in any spreadsheet or statistical software without additional scripts or proprietary components.
Learning investment indicators through data extension
NASA Astrophysics Data System (ADS)
Dvořák, Marek
2017-07-01
Stock prices in the form of time series were analysed using single and multivariate statistical methods. After simple data preprocessing in the form of logarithmic differences, we augmented this single variate time series to a multivariate representation. This method makes use of sliding windows to calculate several dozen of new variables using simple statistic tools like first and second moments as well as more complicated statistic, like auto-regression coefficients and residual analysis, followed by an optional quadratic transformation that was further used for data extension. These were used as a explanatory variables in a regularized logistic LASSO regression which tried to estimate Buy-Sell Index (BSI) from real stock market data.
Zhao, Zhihua; Zheng, Zhiqin; Roux, Clément; Delmas, Céline; Marty, Jean-Daniel; Kahn, Myrtil L; Mingotaud, Christophe
2016-08-22
Analysis of nanoparticle size through a simple 2D plot is proposed in order to extract the correlation between length and width in a collection or a mixture of anisotropic particles. Compared to the usual statistics on the length associated with a second and independent statistical analysis of the width, this simple plot easily points out the various types of nanoparticles and their (an)isotropy. For each class of nano-objects, the relationship between width and length (i.e., the strong or weak correlations between these two parameters) may suggest information concerning the nucleation/growth processes. It allows one to follow the effect on the shape and size distribution of physical or chemical processes such as simple ripening. Various electron microscopy pictures from the literature or from the authors' own syntheses are used as examples to demonstrate the efficiency and simplicity of the proposed 2D plot combined with a multivariate analysis. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Arroyo-Hernández, M; Mellado-Romero, M A; Páramo-Díaz, P; Martín-López, C M; Cano-Egea, J M; Vilá Y Rico, J
2015-01-01
The purpose of this study is to analyze if there is any difference between the arthroscopic reparation of full-thickness supraspinatus tears with simple row technique versus suture bridge technique. We accomplished a retrospective study of 123 patients with full-thickness supraspinatus tears between January 2009 and January 2013 in our hospital. There were 60 simple row reparations, and 63 suture bridge ones. The mean age in the simple row group was 62.9, and in the suture bridge group was 63.3 years old. There were more women than men in both groups (67%). All patients were studied using the Constant test. The mean Constant test in the suture bridge group was 76.7, and in the simple row group was 72.4. We have also accomplished a statistical analysis of each Constant item. Strength was higher in the suture bridge group, with a significant statistical difference (p 0.04). The range of movement was also greater in the suture bridge group, but was not statistically significant. Suture bridge technique has better clinical results than single row reparations, but the difference is not statistically significant (p = 0.298).
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference
DOE Office of Scientific and Technical Information (OSTI.GOV)
Beggs, W.J.
1981-02-01
This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less
Tang, Qi-Yi; Zhang, Chuan-Xi
2013-04-01
A comprehensive but simple-to-use software package called DPS (Data Processing System) has been developed to execute a range of standard numerical analyses and operations used in experimental design, statistics and data mining. This program runs on standard Windows computers. Many of the functions are specific to entomological and other biological research and are not found in standard statistical software. This paper presents applications of DPS to experimental design, statistical analysis and data mining in entomology. © 2012 The Authors Insect Science © 2012 Institute of Zoology, Chinese Academy of Sciences.
Rodríguez-Arias, Miquel Angel; Rodó, Xavier
2004-03-01
Here we describe a practical, step-by-step primer to scale-dependent correlation (SDC) analysis. The analysis of transitory processes is an important but often neglected topic in ecological studies because only a few statistical techniques appear to detect temporary features accurately enough. We introduce here the SDC analysis, a statistical and graphical method to study transitory processes at any temporal or spatial scale. SDC analysis, thanks to the combination of conventional procedures and simple well-known statistical techniques, becomes an improved time-domain analogue of wavelet analysis. We use several simple synthetic series to describe the method, a more complex example, full of transitory features, to compare SDC and wavelet analysis, and finally we analyze some selected ecological series to illustrate the methodology. The SDC analysis of time series of copepod abundances in the North Sea indicates that ENSO primarily is the main climatic driver of short-term changes in population dynamics. SDC also uncovers some long-term, unexpected features in the population. Similarly, the SDC analysis of Nicholson's blowflies data locates where the proposed models fail and provides new insights about the mechanism that drives the apparent vanishing of the population cycle during the second half of the series.
More Powerful Tests of Simple Interaction Contrasts in the Two-Way Factorial Design
ERIC Educational Resources Information Center
Hancock, Gregory R.; McNeish, Daniel M.
2017-01-01
For the two-way factorial design in analysis of variance, the current article explicates and compares three methods for controlling the Type I error rate for all possible simple interaction contrasts following a statistically significant interaction, including a proposed modification to the Bonferroni procedure that increases the power of…
Calibration of Response Data Using MIRT Models with Simple and Mixed Structures
ERIC Educational Resources Information Center
Zhang, Jinming
2012-01-01
It is common to assume during a statistical analysis of a multiscale assessment that the assessment is composed of several unidimensional subtests or that it has simple structure. Under this assumption, the unidimensional and multidimensional approaches can be used to estimate item parameters. These two approaches are equivalent in parameter…
School District Enrollment Projections: A Comparison of Three Methods.
ERIC Educational Resources Information Center
Pettibone, Timothy J.; Bushan, Latha
This study assesses three methods of forecasting school enrollments: the cohort-sruvival method (grade progression), the statistical forecasting procedure developed by the Statistical Analysis System (SAS) Institute, and a simple ratio computation. The three methods were used to forecast school enrollments for kindergarten through grade 12 in a…
Application of Transformations in Parametric Inference
ERIC Educational Resources Information Center
Brownstein, Naomi; Pensky, Marianna
2008-01-01
The objective of the present paper is to provide a simple approach to statistical inference using the method of transformations of variables. We demonstrate performance of this powerful tool on examples of constructions of various estimation procedures, hypothesis testing, Bayes analysis and statistical inference for the stress-strength systems.…
Asymptotic Linear Spectral Statistics for Spiked Hermitian Random Matrices
NASA Astrophysics Data System (ADS)
Passemier, Damien; McKay, Matthew R.; Chen, Yang
2015-07-01
Using the Coulomb Fluid method, this paper derives central limit theorems (CLTs) for linear spectral statistics of three "spiked" Hermitian random matrix ensembles. These include Johnstone's spiked model (i.e., central Wishart with spiked correlation), non-central Wishart with rank-one non-centrality, and a related class of non-central matrices. For a generic linear statistic, we derive simple and explicit CLT expressions as the matrix dimensions grow large. For all three ensembles under consideration, we find that the primary effect of the spike is to introduce an correction term to the asymptotic mean of the linear spectral statistic, which we characterize with simple formulas. The utility of our proposed framework is demonstrated through application to three different linear statistics problems: the classical likelihood ratio test for a population covariance, the capacity analysis of multi-antenna wireless communication systems with a line-of-sight transmission path, and a classical multiple sample significance testing problem.
Bonetti, Jennifer; Quarino, Lawrence
2014-05-01
This study has shown that the combination of simple techniques with the use of multivariate statistics offers the potential for the comparative analysis of soil samples. Five samples were obtained from each of twelve state parks across New Jersey in both the summer and fall seasons. Each sample was examined using particle-size distribution, pH analysis in both water and 1 M CaCl2 , and a loss on ignition technique. Data from each of the techniques were combined, and principal component analysis (PCA) and canonical discriminant analysis (CDA) were used for multivariate data transformation. Samples from different locations could be visually differentiated from one another using these multivariate plots. Hold-one-out cross-validation analysis showed error rates as low as 3.33%. Ten blind study samples were analyzed resulting in no misclassifications using Mahalanobis distance calculations and visual examinations of multivariate plots. Seasonal variation was minimal between corresponding samples, suggesting potential success in forensic applications. © 2014 American Academy of Forensic Sciences.
"Using Power Tables to Compute Statistical Power in Multilevel Experimental Designs"
ERIC Educational Resources Information Center
Konstantopoulos, Spyros
2009-01-01
Power computations for one-level experimental designs that assume simple random samples are greatly facilitated by power tables such as those presented in Cohen's book about statistical power analysis. However, in education and the social sciences experimental designs have naturally nested structures and multilevel models are needed to compute the…
Correlation and simple linear regression.
Zou, Kelly H; Tuncali, Kemal; Silverman, Stuart G
2003-06-01
In this tutorial article, the concepts of correlation and regression are reviewed and demonstrated. The authors review and compare two correlation coefficients, the Pearson correlation coefficient and the Spearman rho, for measuring linear and nonlinear relationships between two continuous variables. In the case of measuring the linear relationship between a predictor and an outcome variable, simple linear regression analysis is conducted. These statistical concepts are illustrated by using a data set from published literature to assess a computed tomography-guided interventional technique. These statistical methods are important for exploring the relationships between variables and can be applied to many radiologic studies.
Mager, P P; Rothe, H
1990-10-01
Multicollinearity of physicochemical descriptors leads to serious consequences in quantitative structure-activity relationship (QSAR) analysis, such as incorrect estimators and test statistics of regression coefficients of the ordinary least-squares (OLS) model applied usually to QSARs. Beside the diagnosis of the known simple collinearity, principal component regression analysis (PCRA) also allows the diagnosis of various types of multicollinearity. Only if the absolute values of PCRA estimators are order statistics that decrease monotonically, the effects of multicollinearity can be circumvented. Otherwise, obscure phenomena may be observed, such as good data recognition but low predictive model power of a QSAR model.
A Simple Test of Class-Level Genetic Association Can Reveal Novel Cardiometabolic Trait Loci.
Qian, Jing; Nunez, Sara; Reed, Eric; Reilly, Muredach P; Foulkes, Andrea S
2016-01-01
Characterizing the genetic determinants of complex diseases can be further augmented by incorporating knowledge of underlying structure or classifications of the genome, such as newly developed mappings of protein-coding genes, epigenetic marks, enhancer elements and non-coding RNAs. We apply a simple class-level testing framework, termed Genetic Class Association Testing (GenCAT), to identify protein-coding gene association with 14 cardiometabolic (CMD) related traits across 6 publicly available genome wide association (GWA) meta-analysis data resources. GenCAT uses SNP-level meta-analysis test statistics across all SNPs within a class of elements, as well as the size of the class and its unique correlation structure, to determine if the class is statistically meaningful. The novelty of findings is evaluated through investigation of regional signals. A subset of findings are validated using recently updated, larger meta-analysis resources. A simulation study is presented to characterize overall performance with respect to power, control of family-wise error and computational efficiency. All analysis is performed using the GenCAT package, R version 3.2.1. We demonstrate that class-level testing complements the common first stage minP approach that involves individual SNP-level testing followed by post-hoc ascribing of statistically significant SNPs to genes and loci. GenCAT suggests 54 protein-coding genes at 41 distinct loci for the 13 CMD traits investigated in the discovery analysis, that are beyond the discoveries of minP alone. An additional application to biological pathways demonstrates flexibility in defining genetic classes. We conclude that it would be prudent to include class-level testing as standard practice in GWA analysis. GenCAT, for example, can be used as a simple, complementary and efficient strategy for class-level testing that leverages existing data resources, requires only summary level data in the form of test statistics, and adds significant value with respect to its potential for identifying multiple novel and clinically relevant trait associations.
The Importance of Proving the Null
ERIC Educational Resources Information Center
Gallistel, C. R.
2009-01-01
Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is…
Shock transmission in coupled beams and rib stiffened structures
NASA Technical Reports Server (NTRS)
Pope, L. D.; Manning, J. E.; Scharton, T. D.
1971-01-01
Shock transmission in a simple coupled beam structure and in a ring-stringer stiffened cylinder is investigated experimentally and analytically using wave transmission and statistical energy analysis concepts. The use of the response spectrum to characterize the excitation provided to a simple beam by a force pulse is studied. Analysis of the transmission of a dilatation wave in a periodically stiffened plate indicates that the stiffeners are fairly transparent to the wave, but some of the dilatational energy is scattered into bending at each support.
Foldnes, Njål; Olsson, Ulf Henning
2016-01-01
We present and investigate a simple way to generate nonnormal data using linear combinations of independent generator (IG) variables. The simulated data have prespecified univariate skewness and kurtosis and a given covariance matrix. In contrast to the widely used Vale-Maurelli (VM) transform, the obtained data are shown to have a non-Gaussian copula. We analytically obtain asymptotic robustness conditions for the IG distribution. We show empirically that popular test statistics in covariance analysis tend to reject true models more often under the IG transform than under the VM transform. This implies that overly optimistic evaluations of estimators and fit statistics in covariance structure analysis may be tempered by including the IG transform for nonnormal data generation. We provide an implementation of the IG transform in the R environment.
An Automated Statistical Process Control Study of Inline Mixing Using Spectrophotometric Detection
ERIC Educational Resources Information Center
Dickey, Michael D.; Stewart, Michael D.; Willson, C. Grant
2006-01-01
An experiment is described, which is designed for a junior-level chemical engineering "fundamentals of measurements and data analysis" course, where students are introduced to the concept of statistical process control (SPC) through a simple inline mixing experiment. The students learn how to create and analyze control charts in an effort to…
On-Line Analysis of Southern FIA Data
Michael P. Spinney; Paul C. Van Deusen; Francis A. Roesch
2006-01-01
The Southern On-Line Estimator (SOLE) is a web-based FIA database analysis tool designed with an emphasis on modularity. The Java-based user interface is simple and intuitive to use and the R-based analysis engine is fast and stable. Each component of the program (data retrieval, statistical analysis and output) can be individually modified to accommodate major...
NASA Astrophysics Data System (ADS)
Mahmood, Ehab A.; Rana, Sohel; Hussin, Abdul Ghapor; Midi, Habshah
2016-06-01
The circular regression model may contain one or more data points which appear to be peculiar or inconsistent with the main part of the model. This may be occur due to recording errors, sudden short events, sampling under abnormal conditions etc. The existence of these data points "outliers" in the data set cause lot of problems in the research results and the conclusions. Therefore, we should identify them before applying statistical analysis. In this article, we aim to propose a statistic to identify outliers in the both of the response and explanatory variables of the simple circular regression model. Our proposed statistic is robust circular distance RCDxy and it is justified by the three robust measurements such as proportion of detection outliers, masking and swamping rates.
Use of iPhone technology in improving acetabular component position in total hip arthroplasty.
Tay, Xiau Wei; Zhang, Benny Xu; Gayagay, George
2017-09-01
Improper acetabular cup positioning is associated with high risk of complications after total hip arthroplasty. The aim of our study is to objectively compare 3 methods, namely (1) free hand, (2) alignment jig (Sputnik), and (3) iPhone application to identify an easy, reproducible, and accurate method in improving acetabular cup placement. We designed a simple setup and carried out a simple experiment (see Method section). Using statistical analysis, the difference in inclination angles using iPhone application compared with the freehand method was found to be statistically significant ( F [2,51] = 4.17, P = .02) in the "untrained group". There is no statistical significance detected for the other groups. This suggests a potential role for iPhone applications in junior surgeons in overcoming the steep learning curve.
Monte Carlo based statistical power analysis for mediation models: methods and software.
Zhang, Zhiyong
2014-12-01
The existing literature on statistical power analysis for mediation models often assumes data normality and is based on a less powerful Sobel test instead of the more powerful bootstrap test. This study proposes to estimate statistical power to detect mediation effects on the basis of the bootstrap method through Monte Carlo simulation. Nonnormal data with excessive skewness and kurtosis are allowed in the proposed method. A free R package called bmem is developed to conduct the power analysis discussed in this study. Four examples, including a simple mediation model, a multiple-mediator model with a latent mediator, a multiple-group mediation model, and a longitudinal mediation model, are provided to illustrate the proposed method.
Statistical Properties of Online Auctions
NASA Astrophysics Data System (ADS)
Namazi, Alireza; Schadschneider, Andreas
We characterize the statistical properties of a large number of online auctions run on eBay. Both stationary and dynamic properties, like distributions of prices, number of bids etc., as well as relations between these quantities are studied. The analysis of the data reveals surprisingly simple distributions and relations, typically of power-law form. Based on these findings we introduce a simple method to identify suspicious auctions that could be influenced by a form of fraud known as shill bidding. Furthermore the influence of bidding strategies is discussed. The results indicate that the observed behavior is related to a mixture of agents using a variety of strategies.
Thieler, E. Robert; Himmelstoss, Emily A.; Zichichi, Jessica L.; Ergul, Ayhan
2009-01-01
The Digital Shoreline Analysis System (DSAS) version 4.0 is a software extension to ESRI ArcGIS v.9.2 and above that enables a user to calculate shoreline rate-of-change statistics from multiple historic shoreline positions. A user-friendly interface of simple buttons and menus guides the user through the major steps of shoreline change analysis. Components of the extension and user guide include (1) instruction on the proper way to define a reference baseline for measurements, (2) automated and manual generation of measurement transects and metadata based on user-specified parameters, and (3) output of calculated rates of shoreline change and other statistical information. DSAS computes shoreline rates of change using four different methods: (1) endpoint rate, (2) simple linear regression, (3) weighted linear regression, and (4) least median of squares. The standard error, correlation coefficient, and confidence interval are also computed for the simple and weighted linear-regression methods. The results of all rate calculations are output to a table that can be linked to the transect file by a common attribute field. DSAS is intended to facilitate the shoreline change-calculation process and to provide rate-of-change information and the statistical data necessary to establish the reliability of the calculated results. The software is also suitable for any generic application that calculates positional change over time, such as assessing rates of change of glacier limits in sequential aerial photos, river edge boundaries, land-cover changes, and so on.
The vulnerability of electric equipment to carbon fibers of mixed lengths: An analysis
NASA Technical Reports Server (NTRS)
Elber, W.
1980-01-01
The susceptibility of a stereo amplifier to damage from a spectrum of lengths of graphite fibers was calculated. A simple analysis was developed by which such calculations can be based on test results with fibers of uniform lengths. A statistical analysis was applied for the conversation of data for various logical failure criteria.
NASA Astrophysics Data System (ADS)
Pearl, Judea
2000-03-01
Written by one of the pre-eminent researchers in the field, this book provides a comprehensive exposition of modern analysis of causation. It shows how causality has grown from a nebulous concept into a mathematical theory with significant applications in the fields of statistics, artificial intelligence, philosophy, cognitive science, and the health and social sciences. Pearl presents a unified account of the probabilistic, manipulative, counterfactual and structural approaches to causation, and devises simple mathematical tools for analyzing the relationships between causal connections, statistical associations, actions and observations. The book will open the way for including causal analysis in the standard curriculum of statistics, artifical intelligence, business, epidemiology, social science and economics. Students in these areas will find natural models, simple identification procedures, and precise mathematical definitions of causal concepts that traditional texts have tended to evade or make unduly complicated. This book will be of interest to professionals and students in a wide variety of fields. Anyone who wishes to elucidate meaningful relationships from data, predict effects of actions and policies, assess explanations of reported events, or form theories of causal understanding and causal speech will find this book stimulating and invaluable.
The power to detect linkage in complex disease by means of simple LOD-score analyses.
Greenberg, D A; Abreu, P; Hodge, S E
1998-01-01
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage. PMID:9718328
The power to detect linkage in complex disease by means of simple LOD-score analyses.
Greenberg, D A; Abreu, P; Hodge, S E
1998-09-01
Maximum-likelihood analysis (via LOD score) provides the most powerful method for finding linkage when the mode of inheritance (MOI) is known. However, because one must assume an MOI, the application of LOD-score analysis to complex disease has been questioned. Although it is known that one can legitimately maximize the maximum LOD score with respect to genetic parameters, this approach raises three concerns: (1) multiple testing, (2) effect on power to detect linkage, and (3) adequacy of the approximate MOI for the true MOI. We evaluated the power of LOD scores to detect linkage when the true MOI was complex but a LOD score analysis assumed simple models. We simulated data from 14 different genetic models, including dominant and recessive at high (80%) and low (20%) penetrances, intermediate models, and several additive two-locus models. We calculated LOD scores by assuming two simple models, dominant and recessive, each with 50% penetrance, then took the higher of the two LOD scores as the raw test statistic and corrected for multiple tests. We call this test statistic "MMLS-C." We found that the ELODs for MMLS-C are >=80% of the ELOD under the true model when the ELOD for the true model is >=3. Similarly, the power to reach a given LOD score was usually >=80% that of the true model, when the power under the true model was >=60%. These results underscore that a critical factor in LOD-score analysis is the MOI at the linked locus, not that of the disease or trait per se. Thus, a limited set of simple genetic models in LOD-score analysis can work well in testing for linkage.
A statistical method for measuring activation of gene regulatory networks.
Esteves, Gustavo H; Reis, Luiz F L
2018-06-13
Gene expression data analysis is of great importance for modern molecular biology, given our ability to measure the expression profiles of thousands of genes and enabling studies rooted in systems biology. In this work, we propose a simple statistical model for the activation measuring of gene regulatory networks, instead of the traditional gene co-expression networks. We present the mathematical construction of a statistical procedure for testing hypothesis regarding gene regulatory network activation. The real probability distribution for the test statistic is evaluated by a permutation based study. To illustrate the functionality of the proposed methodology, we also present a simple example based on a small hypothetical network and the activation measuring of two KEGG networks, both based on gene expression data collected from gastric and esophageal samples. The two KEGG networks were also analyzed for a public database, available through NCBI-GEO, presented as Supplementary Material. This method was implemented in an R package that is available at the BioConductor project website under the name maigesPack.
An Analysis of Variance Framework for Matrix Sampling.
ERIC Educational Resources Information Center
Sirotnik, Kenneth
Significant cost savings can be achieved with the use of matrix sampling in estimating population parameters from psychometric data. The statistical design is intuitively simple, using the framework of the two-way classification analysis of variance technique. For example, the mean and variance are derived from the performance of a certain grade…
Estimating annual bole biomass production using uncertainty analysis
Travis J. Woolley; Mark E. Harmon; Kari B. O' Connell
2007-01-01
Two common sampling methodologies coupled with a simple statistical model were evaluated to determine the accuracy and precision of annual bole biomass production (BBP) and inter-annual variability estimates using this type of approach. We performed an uncertainty analysis using Monte Carlo methods in conjunction with radial growth core data from trees in three Douglas...
A new SAS program for behavioral analysis of Electrical Penetration Graph (EPG) data
USDA-ARS?s Scientific Manuscript database
A new program is introduced that uses SAS software to duplicate output of descriptive statistics from the Sarria Excel workbook for EPG waveform analysis. Not only are publishable means and standard errors or deviations output, the user also is guided through four relatively simple sub-programs for ...
Humans make efficient use of natural image statistics when performing spatial interpolation.
D'Antona, Anthony D; Perry, Jeffrey S; Geisler, Wilson S
2013-12-16
Visual systems learn through evolution and experience over the lifespan to exploit the statistical structure of natural images when performing visual tasks. Understanding which aspects of this statistical structure are incorporated into the human nervous system is a fundamental goal in vision science. To address this goal, we measured human ability to estimate the intensity of missing image pixels in natural images. Human estimation accuracy is compared with various simple heuristics (e.g., local mean) and with optimal observers that have nearly complete knowledge of the local statistical structure of natural images. Human estimates are more accurate than those of simple heuristics, and they match the performance of an optimal observer that knows the local statistical structure of relative intensities (contrasts). This optimal observer predicts the detailed pattern of human estimation errors and hence the results place strong constraints on the underlying neural mechanisms. However, humans do not reach the performance of an optimal observer that knows the local statistical structure of the absolute intensities, which reflect both local relative intensities and local mean intensity. As predicted from a statistical analysis of natural images, human estimation accuracy is negligibly improved by expanding the context from a local patch to the whole image. Our results demonstrate that the human visual system exploits efficiently the statistical structure of natural images.
Evaluation of IOTA Simple Ultrasound Rules to Distinguish Benign and Malignant Ovarian Tumours.
Garg, Sugandha; Kaur, Amarjit; Mohi, Jaswinder Kaur; Sibia, Preet Kanwal; Kaur, Navkiran
2017-08-01
IOTA stands for International Ovarian Tumour Analysis group. Ovarian cancer is one of the common cancers in women and is diagnosed at later stage in majority. The limiting factor for early diagnosis is lack of standardized terms and procedures in gynaecological sonography. Introduction of IOTA rules has provided some consistency in defining morphological features of ovarian masses through a standardized examination technique. To evaluate the efficacy of IOTA simple ultrasound rules in distinguishing benign and malignant ovarian tumours and establishing their use as a tool in early diagnosis of ovarian malignancy. A hospital based case control prospective study was conducted. Patients with suspected ovarian pathology were evaluated using IOTA ultrasound rules and designated as benign or malignant. Findings were correlated with histopathological findings. Collected data was statistically analysed using chi-square test and kappa statistical method. Out of initial 55 patients, 50 patients were included in the final analysis who underwent surgery. IOTA simple rules were applicable in 45 out of these 50 patients (90%). The sensitivity for the detection of malignancy in cases where IOTA simple rules were applicable was 91.66% and the specificity was 84.84%. Accuracy was 86.66%. Classifying inconclusive cases as malignant, the sensitivity and specificity was 93% and 80% respectively. High level of agreement was found between USG and histopathological diagnosis with Kappa value as 0.323. IOTA simple ultrasound rules were highly sensitive and specific in predicting ovarian malignancy preoperatively yet being reproducible, easy to train and use.
NASA Astrophysics Data System (ADS)
Mercer, Gary J.
This quantitative study examined the relationship between secondary students with math anxiety and physics performance in an inquiry-based constructivist classroom. The Revised Math Anxiety Rating Scale was used to evaluate math anxiety levels. The results were then compared to the performance on a physics standardized final examination. A simple correlation was performed, followed by a multivariate regression analysis to examine effects based on gender and prior math background. The correlation showed statistical significance between math anxiety and physics performance. The regression analysis showed statistical significance for math anxiety, physics performance, and prior math background, but did not show statistical significance for math anxiety, physics performance, and gender.
Is simple nephrectomy truly simple? Comparison with the radical alternative.
Connolly, S S; O'Brien, M Frank; Kunni, I M; Phelan, E; Conroy, R; Thornhill, J A; Grainger, R
2011-03-01
The Oxford English dictionary defines the term "simple" as "easily done" and "uncomplicated". We tested the validity of this terminology in relation to open nephrectomy surgery. Retrospective review of 215 patients undergoing open, simple (n = 89) or radical (n = 126) nephrectomy in a single university-affiliated institution between 1998 and 2002. Operative time (OT), estimated blood loss (EBL), operative complications (OC) and length of stay in hospital (LOS) were analysed. Statistical analysis employed Fisher's exact test and Stata Release 8.2. Simple nephrectomy was associated with shorter OT (mean 126 vs. 144 min; p = 0.002), reduced EBL (mean 729 vs. 859 cc; p = 0.472), lower OC (9 vs. 17%; 0.087), and more brief LOS (mean 6 vs. 8 days; p < 0.001). All parameters suggest favourable outcome for the simple nephrectomy group, supporting the use of this terminology. This implies "simple" nephrectomies are truly easier to perform with less complication than their radical counterpart.
Effect of the absolute statistic on gene-sampling gene-set analysis methods.
Nam, Dougu
2017-06-01
Gene-set enrichment analysis and its modified versions have commonly been used for identifying altered functions or pathways in disease from microarray data. In particular, the simple gene-sampling gene-set analysis methods have been heavily used for datasets with only a few sample replicates. The biggest problem with this approach is the highly inflated false-positive rate. In this paper, the effect of absolute gene statistic on gene-sampling gene-set analysis methods is systematically investigated. Thus far, the absolute gene statistic has merely been regarded as a supplementary method for capturing the bidirectional changes in each gene set. Here, it is shown that incorporating the absolute gene statistic in gene-sampling gene-set analysis substantially reduces the false-positive rate and improves the overall discriminatory ability. Its effect was investigated by power, false-positive rate, and receiver operating curve for a number of simulated and real datasets. The performances of gene-set analysis methods in one-tailed (genome-wide association study) and two-tailed (gene expression data) tests were also compared and discussed.
Guo, Hui; Zhang, Zhen; Yao, Yuan; Liu, Jialin; Chang, Ruirui; Liu, Zhao; Hao, Hongyuan; Huang, Taohong; Wen, Jun; Zhou, Tingting
2018-08-30
Semen sojae praeparatum with homology of medicine and food is a famous traditional Chinese medicine. A simple and effective quality fingerprint analysis, coupled with chemometrics methods, was developed for quality assessment of Semen sojae praeparatum. First, similarity analysis (SA) and hierarchical clusting analysis (HCA) were applied to select the qualitative markers, which obviously influence the quality of Semen sojae praeparatum. 21 chemicals were selected and characterized by high resolution ion trap/time-of-flight mass spectrometry (LC-IT-TOF-MS). Subsequently, principal components analysis (PCA) and orthogonal partial least squares discriminant analysis (OPLS-DA) were conducted to select the quantitative markers of Semen sojae praeparatum samples from different origins. Moreover, 11 compounds with statistical significance were determined quantitatively, which provided an accurate and informative data for quality evaluation. This study proposes a new strategy for "statistic analysis-based fingerprint establishment", which would be a valuable reference for further study. Copyright © 2018 Elsevier Ltd. All rights reserved.
ASSESSMENT OF SPATIAL AUTOCORRELATION IN EMPIRICAL MODELS IN ECOLOGY
Statistically assessing ecological models is inherently difficult because data are autocorrelated and this autocorrelation varies in an unknown fashion. At a simple level, the linking of a single species to a habitat type is a straightforward analysis. With some investigation int...
An Exploratory Data Analysis System for Support in Medical Decision-Making
Copeland, J. A.; Hamel, B.; Bourne, J. R.
1979-01-01
An experimental system was developed to allow retrieval and analysis of data collected during a study of neurobehavioral correlates of renal disease. After retrieving data organized in a relational data base, simple bivariate statistics of parametric and nonparametric nature could be conducted. An “exploratory” mode in which the system provided guidance in selection of appropriate statistical analyses was also available to the user. The system traversed a decision tree using the inherent qualities of the data (e.g., the identity and number of patients, tests, and time epochs) to search for the appropriate analyses to employ.
Prediction of transmission loss through an aircraft sidewall using statistical energy analysis
NASA Astrophysics Data System (ADS)
Ming, Ruisen; Sun, Jincai
1989-06-01
The transmission loss of randomly incident sound through an aircraft sidewall is investigated using statistical energy analysis. Formulas are also obtained for the simple calculation of sound transmission loss through single- and double-leaf panels. Both resonant and nonresonant sound transmissions can be easily calculated using the formulas. The formulas are used to predict sound transmission losses through a Y-7 propeller airplane panel. The panel measures 2.56 m x 1.38 m and has two windows. The agreement between predicted and measured values through most of the frequency ranges tested is quite good.
NASA Technical Reports Server (NTRS)
Djorgovski, George
1993-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multiparameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resource.
NASA Technical Reports Server (NTRS)
Djorgovski, Stanislav
1992-01-01
The existing and forthcoming data bases from NASA missions contain an abundance of information whose complexity cannot be efficiently tapped with simple statistical techniques. Powerful multivariate statistical methods already exist which can be used to harness much of the richness of these data. Automatic classification techniques have been developed to solve the problem of identifying known types of objects in multi parameter data sets, in addition to leading to the discovery of new physical phenomena and classes of objects. We propose an exploratory study and integration of promising techniques in the development of a general and modular classification/analysis system for very large data bases, which would enhance and optimize data management and the use of human research resources.
Evaluation of IOTA Simple Ultrasound Rules to Distinguish Benign and Malignant Ovarian Tumours
Kaur, Amarjit; Mohi, Jaswinder Kaur; Sibia, Preet Kanwal; Kaur, Navkiran
2017-01-01
Introduction IOTA stands for International Ovarian Tumour Analysis group. Ovarian cancer is one of the common cancers in women and is diagnosed at later stage in majority. The limiting factor for early diagnosis is lack of standardized terms and procedures in gynaecological sonography. Introduction of IOTA rules has provided some consistency in defining morphological features of ovarian masses through a standardized examination technique. Aim To evaluate the efficacy of IOTA simple ultrasound rules in distinguishing benign and malignant ovarian tumours and establishing their use as a tool in early diagnosis of ovarian malignancy. Materials and Methods A hospital based case control prospective study was conducted. Patients with suspected ovarian pathology were evaluated using IOTA ultrasound rules and designated as benign or malignant. Findings were correlated with histopathological findings. Collected data was statistically analysed using chi-square test and kappa statistical method. Results Out of initial 55 patients, 50 patients were included in the final analysis who underwent surgery. IOTA simple rules were applicable in 45 out of these 50 patients (90%). The sensitivity for the detection of malignancy in cases where IOTA simple rules were applicable was 91.66% and the specificity was 84.84%. Accuracy was 86.66%. Classifying inconclusive cases as malignant, the sensitivity and specificity was 93% and 80% respectively. High level of agreement was found between USG and histopathological diagnosis with Kappa value as 0.323. Conclusion IOTA simple ultrasound rules were highly sensitive and specific in predicting ovarian malignancy preoperatively yet being reproducible, easy to train and use. PMID:28969237
Empirical Reference Distributions for Networks of Different Size
Smith, Anna; Calder, Catherine A.; Browning, Christopher R.
2016-01-01
Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556
Whole-Range Assessment: A Simple Method for Analysing Allelopathic Dose-Response Data
An, Min; Pratley, J. E.; Haig, T.; Liu, D.L.
2005-01-01
Based on the typical biological responses of an organism to allelochemicals (hormesis), concepts of whole-range assessment and inhibition index were developed for improved analysis of allelopathic data. Examples of their application are presented using data drawn from the literature. The method is concise and comprehensive, and makes data grouping and multiple comparisons simple, logical, and possible. It improves data interpretation, enhances research outcomes, and is a statistically efficient summary of the plant response profiles. PMID:19330165
Energy Savings Analysis for Energy Monitoring and Control Systems
1995-01-01
for evaluating design and construction a:-0 quality, and for studying the effectiveness of air - tightening AC retrofits. No simple relationship...Energy These models of residential infiltration are based on statistical "Resource Center (1983) include information on air tightening in fits of
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yoo, Jun Soo
The bubble departure diameter and bubble release frequency were obtained through the analysis of TAMU subcooled flow boiling experimental data. The numerous images of bubbles at departure were analyzed for each experimental condition to achieve the reliable statistics of the measured bubble parameters. The results are provided in this report with simple discussion.
New approach in the quantum statistical parton distribution
NASA Astrophysics Data System (ADS)
Sohaily, Sozha; Vaziri (Khamedi), Mohammad
2017-12-01
An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.
van Rhee, Henk; Hak, Tony
2017-01-01
We present a new tool for meta‐analysis, Meta‐Essentials, which is free of charge and easy to use. In this paper, we introduce the tool and compare its features to other tools for meta‐analysis. We also provide detailed information on the validation of the tool. Although free of charge and simple, Meta‐Essentials automatically calculates effect sizes from a wide range of statistics and can be used for a wide range of meta‐analysis applications, including subgroup analysis, moderator analysis, and publication bias analyses. The confidence interval of the overall effect is automatically based on the Knapp‐Hartung adjustment of the DerSimonian‐Laird estimator. However, more advanced meta‐analysis methods such as meta‐analytical structural equation modelling and meta‐regression with multiple covariates are not available. In summary, Meta‐Essentials may prove a valuable resource for meta‐analysts, including researchers, teachers, and students. PMID:28801932
Survival analysis in hematologic malignancies: recommendations for clinicians
Delgado, Julio; Pereira, Arturo; Villamor, Neus; López-Guillermo, Armando; Rozman, Ciril
2014-01-01
The widespread availability of statistical packages has undoubtedly helped hematologists worldwide in the analysis of their data, but has also led to the inappropriate use of statistical methods. In this article, we review some basic concepts of survival analysis and also make recommendations about how and when to perform each particular test using SPSS, Stata and R. In particular, we describe a simple way of defining cut-off points for continuous variables and the appropriate and inappropriate uses of the Kaplan-Meier method and Cox proportional hazard regression models. We also provide practical advice on how to check the proportional hazards assumption and briefly review the role of relative survival and multiple imputation. PMID:25176982
Statistical methods for astronomical data with upper limits. I - Univariate distributions
NASA Technical Reports Server (NTRS)
Feigelson, E. D.; Nelson, P. I.
1985-01-01
The statistical treatment of univariate censored data is discussed. A heuristic derivation of the Kaplan-Meier maximum-likelihood estimator from first principles is presented which results in an expression amenable to analytic error analysis. Methods for comparing two or more censored samples are given along with simple computational examples, stressing the fact that most astronomical problems involve upper limits while the standard mathematical methods require lower limits. The application of univariate survival analysis to six data sets in the recent astrophysical literature is described, and various aspects of the use of survival analysis in astronomy, such as the limitations of various two-sample tests and the role of parametric modelling, are discussed.
2008-07-07
analyzing multivariate data sets. The system was developed using the Java Development Kit (JDK) version 1.5; and it yields interactive performance on a... script and captures output from the MATLAB’s “regress” and “stepwisefit” utilities that perform simple and stepwise regression, respectively. The MATLAB...Statistical Association, vol. 85, no. 411, pp. 664–675, 1990. [9] H. Hauser, F. Ledermann, and H. Doleisch, “ Angular brushing of extended parallel coordinates
Yuan, Zhongshang; Liu, Hong; Zhang, Xiaoshuai; Li, Fangyu; Zhao, Jinghua; Zhang, Furen; Xue, Fuzhong
2013-01-01
Currently, the genetic variants identified by genome wide association study (GWAS) generally only account for a small proportion of the total heritability for complex disease. One crucial reason is the underutilization of gene-gene joint effects commonly encountered in GWAS, which includes their main effects and co-association. However, gene-gene co-association is often customarily put into the framework of gene-gene interaction vaguely. From the causal graph perspective, we elucidate in detail the concept and rationality of gene-gene co-association as well as its relationship with traditional gene-gene interaction, and propose two Fisher r-to-z transformation-based simple statistics to detect it. Three series of simulations further highlight that gene-gene co-association refers to the extent to which the joint effects of two genes differs from the main effects, not only due to the traditional interaction under the nearly independent condition but the correlation between two genes. The proposed statistics are more powerful than logistic regression under various situations, cannot be affected by linkage disequilibrium and can have acceptable false positive rate as long as strictly following the reasonable GWAS data analysis roadmap. Furthermore, an application to gene pathway analysis associated with leprosy confirms in practice that our proposed gene-gene co-association concepts as well as the correspondingly proposed statistics are strongly in line with reality. PMID:23923021
A critique of Rasch residual fit statistics.
Karabatsos, G
2000-01-01
In test analysis involving the Rasch model, a large degree of importance is placed on the "objective" measurement of individual abilities and item difficulties. The degree to which the objectivity properties are attained, of course, depends on the degree to which the data fit the Rasch model. It is therefore important to utilize fit statistics that accurately and reliably detect the person-item response inconsistencies that threaten the measurement objectivity of persons and items. Given this argument, it is somewhat surprising that there is far more emphasis placed in the objective measurement of person and items than there is in the measurement quality of Rasch fit statistics. This paper provides a critical analysis of the residual fit statistics of the Rasch model, arguably the most often used fit statistics, in an effort to illustrate that the task of Rasch fit analysis is not as simple and straightforward as it appears to be. The faulty statistical properties of the residual fit statistics do not allow either a convenient or a straightforward approach to Rasch fit analysis. For instance, given a residual fit statistic, the use of a single minimum critical value for misfit diagnosis across different testing situations, where the situations vary in sample and test properties, leads to both the overdetection and underdetection of misfit. To improve this situation, it is argued that psychometricians need to implement residual-free Rasch fit statistics that are based on the number of Guttman response errors, or use indices that are statistically optimal in detecting measurement disturbances.
Biological Parametric Mapping: A Statistical Toolbox for Multi-Modality Brain Image Analysis
Casanova, Ramon; Ryali, Srikanth; Baer, Aaron; Laurienti, Paul J.; Burdette, Jonathan H.; Hayasaka, Satoru; Flowers, Lynn; Wood, Frank; Maldjian, Joseph A.
2006-01-01
In recent years multiple brain MR imaging modalities have emerged; however, analysis methodologies have mainly remained modality specific. In addition, when comparing across imaging modalities, most researchers have been forced to rely on simple region-of-interest type analyses, which do not allow the voxel-by-voxel comparisons necessary to answer more sophisticated neuroscience questions. To overcome these limitations, we developed a toolbox for multimodal image analysis called biological parametric mapping (BPM), based on a voxel-wise use of the general linear model. The BPM toolbox incorporates information obtained from other modalities as regressors in a voxel-wise analysis, thereby permitting investigation of more sophisticated hypotheses. The BPM toolbox has been developed in MATLAB with a user friendly interface for performing analyses, including voxel-wise multimodal correlation, ANCOVA, and multiple regression. It has a high degree of integration with the SPM (statistical parametric mapping) software relying on it for visualization and statistical inference. Furthermore, statistical inference for a correlation field, rather than a widely-used T-field, has been implemented in the correlation analysis for more accurate results. An example with in-vivo data is presented demonstrating the potential of the BPM methodology as a tool for multimodal image analysis. PMID:17070709
Journal of Naval Science. Volume 2, Number 1
1976-01-01
has defined a probability distribution function which fits this type of data and forms the basis for statistical analysis of test results (see...Conditions to Assess the Performance of Fire-Resistant Fluids’. Wear, 28 (1974) 29. J.N.S., Vol. 2, No. 1 APPENDIX A Analysis of Fatigue Test Data...used to produce the impulse response and the equipment required for the analysis is relatively simple. The methods that must be used to produce
NASA Astrophysics Data System (ADS)
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Sacks, David B.; Yu, Yi-Kuo
2018-06-01
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y; Drake, Steven K; Gucek, Marjan; Sacks, David B; Yu, Yi-Kuo
2018-06-05
Rapid and accurate identification and classification of microorganisms is of paramount importance to public health and safety. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is complicating correct microbial identification even in a simple sample due to the large number of candidates present. To properly untwine candidate microbes in samples containing one or more microbes, one needs to go beyond apparent morphology or simple "fingerprinting"; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptide-centric representations of microbes to better separate them and by augmenting our earlier analysis method that yields accurate statistical significance. Here, we present an updated analysis workflow that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using 226 MS/MS publicly available data files (each containing from 2500 to nearly 100,000 MS/MS spectra) and 4000 additional MS/MS data files, that the updated workflow can correctly identify multiple microbes at the genus and often the species level for samples containing more than one microbe. We have also shown that the proposed workflow computes accurate statistical significances, i.e., E values for identified peptides and unified E values for identified microbes. Our updated analysis workflow MiCId, a freely available software for Microorganism Classification and Identification, is available for download at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html . Graphical Abstract ᅟ.
The Importance of Proving the Null
Gallistel, C. R.
2010-01-01
Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is favored. A general solution is a sensitivity analysis: Compute the odds for or against the null as a function of the limit(s) on the vagueness of the alternative. If the odds on the null approach 1 from above as the hypothesized maximum size of the possible effect approaches 0, then the data favor the null over any vaguer alternative to it. The simple computations and the intuitive graphic representation of the analysis are illustrated by the analysis of diverse examples from the current literature. They pose 3 common experimental questions: (a) Are 2 means the same? (b) Is performance at chance? (c) Are factors additive? PMID:19348549
Ji, Jun; Ling, Jeffrey; Jiang, Helen; Wen, Qiaojun; Whitin, John C; Tian, Lu; Cohen, Harvey J; Ling, Xuefeng B
2013-03-23
Mass spectrometry (MS) has evolved to become the primary high throughput tool for proteomics based biomarker discovery. Until now, multiple challenges in protein MS data analysis remain: large-scale and complex data set management; MS peak identification, indexing; and high dimensional peak differential analysis with the concurrent statistical tests based false discovery rate (FDR). "Turnkey" solutions are needed for biomarker investigations to rapidly process MS data sets to identify statistically significant peaks for subsequent validation. Here we present an efficient and effective solution, which provides experimental biologists easy access to "cloud" computing capabilities to analyze MS data. The web portal can be accessed at http://transmed.stanford.edu/ssa/. Presented web application supplies large scale MS data online uploading and analysis with a simple user interface. This bioinformatic tool will facilitate the discovery of the potential protein biomarkers using MS.
Advanced statistics: linear regression, part I: simple linear regression.
Marill, Keith A
2004-01-01
Simple linear regression is a mathematical technique used to model the relationship between a single independent predictor variable and a single dependent outcome variable. In this, the first of a two-part series exploring concepts in linear regression analysis, the four fundamental assumptions and the mechanics of simple linear regression are reviewed. The most common technique used to derive the regression line, the method of least squares, is described. The reader will be acquainted with other important concepts in simple linear regression, including: variable transformations, dummy variables, relationship to inference testing, and leverage. Simplified clinical examples with small datasets and graphic models are used to illustrate the points. This will provide a foundation for the second article in this series: a discussion of multiple linear regression, in which there are multiple predictor variables.
Prediction during statistical learning, and implications for the implicit/explicit divide
Dale, Rick; Duran, Nicholas D.; Morehead, J. Ryan
2012-01-01
Accounts of statistical learning, both implicit and explicit, often invoke predictive processes as central to learning, yet practically all experiments employ non-predictive measures during training. We argue that the common theoretical assumption of anticipation and prediction needs clearer, more direct evidence for it during learning. We offer a novel experimental context to explore prediction, and report results from a simple sequential learning task designed to promote predictive behaviors in participants as they responded to a short sequence of simple stimulus events. Predictive tendencies in participants were measured using their computer mouse, the trajectories of which served as a means of tapping into predictive behavior while participants were exposed to very short and simple sequences of events. A total of 143 participants were randomly assigned to stimulus sequences along a continuum of regularity. Analysis of computer-mouse trajectories revealed that (a) participants almost always anticipate events in some manner, (b) participants exhibit two stable patterns of behavior, either reacting to vs. predicting future events, (c) the extent to which participants predict relates to performance on a recall test, and (d) explicit reports of perceiving patterns in the brief sequence correlates with extent of prediction. We end with a discussion of implicit and explicit statistical learning and of the role prediction may play in both kinds of learning. PMID:22723817
NASA Astrophysics Data System (ADS)
Ghosh, Dipak; Sarkar, Sharmila; Sen, Sanjib; Roy, Jaya
1995-06-01
In this paper the behavior of factorial moments with rapidity window size, which is usually explained in terms of ``intermittency,'' has been interpreted by simple quantum statistical properties of the emitting system using the concept of ``modified two-source model'' as recently proposed by Ghosh and Sarkar [Phys. Lett. B 278, 465 (1992)]. The analysis has been performed using our own data of 16Ag/Br and 24Ag/Br interactions at a few tens of GeV energy regime.
The Evolution of Organization Analysis in ASQ, 1959-1979.
ERIC Educational Resources Information Center
Daft, Richard L.
1980-01-01
During the period 1959-1979, a sharp trend toward low-variety statistical languages has taken place, which may represent an organizational mapping phase in which simple, quantifiable relationships have been formally defined and measured. A broader scope of research languages will be needed in the future. (Author/IRT)
Experimental Analysis of Cell Function Using Cytoplasmic Streaming
ERIC Educational Resources Information Center
Janssens, Peter; Waldhuber, Megan
2012-01-01
This laboratory exercise investigates the phenomenon of cytoplasmic streaming in the fresh water alga "Nitella". Students use the fungal toxin cytochalasin D, an inhibitor of actin polymerization, to investigate the mechanism of streaming. Students use simple statistical methods to analyze their data. Typical student data are provided. (Contains 3…
A simple white noise analysis of neuronal light responses.
Chichilnisky, E J
2001-05-01
A white noise technique is presented for estimating the response properties of spiking visual system neurons. The technique is simple, robust, efficient and well suited to simultaneous recordings from multiple neurons. It provides a complete and easily interpretable model of light responses even for neurons that display a common form of response nonlinearity that precludes classical linear systems analysis. A theoretical justification of the technique is presented that relies only on elementary linear algebra and statistics. Implementation is described with examples. The technique and the underlying model of neural responses are validated using recordings from retinal ganglion cells, and in principle are applicable to other neurons. Advantages and disadvantages of the technique relative to classical approaches are discussed.
NASA Technical Reports Server (NTRS)
Thanedar, B. D.
1972-01-01
A simple repetitive calculation was used to investigate what happens to the field in terms of the signal paths of disturbances originating from the energy source. The computation allowed the field to be reconstructed as a function of space and time on a statistical basis. The suggested Monte Carlo method is in response to the need for a numerical method to supplement analytical methods of solution which are only valid when the boundaries have simple shapes, rather than for a medium that is bounded. For the analysis, a suitable model was created from which was developed an algorithm for the estimation of acoustic pressure variations in the region under investigation. The validity of the technique was demonstrated by analysis of simple physical models with the aid of a digital computer. The Monte Carlo method is applicable to a medium which is homogeneous and is enclosed by either rectangular or curved boundaries.
Reif, David M.; Israel, Mark A.; Moore, Jason H.
2007-01-01
The biological interpretation of gene expression microarray results is a daunting challenge. For complex diseases such as cancer, wherein the body of published research is extensive, the incorporation of expert knowledge provides a useful analytical framework. We have previously developed the Exploratory Visual Analysis (EVA) software for exploring data analysis results in the context of annotation information about each gene, as well as biologically relevant groups of genes. We present EVA as a flexible combination of statistics and biological annotation that provides a straightforward visual interface for the interpretation of microarray analyses of gene expression in the most commonly occuring class of brain tumors, glioma. We demonstrate the utility of EVA for the biological interpretation of statistical results by analyzing publicly available gene expression profiles of two important glial tumors. The results of a statistical comparison between 21 malignant, high-grade glioblastoma multiforme (GBM) tumors and 19 indolent, low-grade pilocytic astrocytomas were analyzed using EVA. By using EVA to examine the results of a relatively simple statistical analysis, we were able to identify tumor class-specific gene expression patterns having both statistical and biological significance. Our interactive analysis highlighted the potential importance of genes involved in cell cycle progression, proliferation, signaling, adhesion, migration, motility, and structure, as well as candidate gene loci on a region of Chromosome 7 that has been implicated in glioma. Because EVA does not require statistical or computational expertise and has the flexibility to accommodate any type of statistical analysis, we anticipate EVA will prove a useful addition to the repertoire of computational methods used for microarray data analysis. EVA is available at no charge to academic users and can be found at http://www.epistasis.org. PMID:19390666
Shen, Feng; Du, Wenbin; Kreutz, Jason E; Fok, Alice; Ismagilov, Rustem F
2010-10-21
This paper describes a SlipChip to perform digital PCR in a very simple and inexpensive format. The fluidic path for introducing the sample combined with the PCR mixture was formed using elongated wells in the two plates of the SlipChip designed to overlap during sample loading. This fluidic path was broken up by simple slipping of the two plates that removed the overlap among wells and brought each well in contact with a reservoir preloaded with oil to generate 1280 reaction compartments (2.6 nL each) simultaneously. After thermal cycling, end-point fluorescence intensity was used to detect the presence of nucleic acid. Digital PCR on the SlipChip was tested quantitatively by using Staphylococcus aureus genomic DNA. As the concentration of the template DNA in the reaction mixture was diluted, the fraction of positive wells decreased as expected from the statistical analysis. No cross-contamination was observed during the experiments. At the extremes of the dynamic range of digital PCR the standard confidence interval determined using a normal approximation of the binomial distribution is not satisfactory. Therefore, statistical analysis based on the score method was used to establish these confidence intervals. The SlipChip provides a simple strategy to count nucleic acids by using PCR. It may find applications in research applications such as single cell analysis, prenatal diagnostics, and point-of-care diagnostics. SlipChip would become valuable for diagnostics, including applications in resource-limited areas after integration with isothermal nucleic acid amplification technologies and visual readout.
Janssen, Dirk P
2012-03-01
Psychologists, psycholinguists, and other researchers using language stimuli have been struggling for more than 30 years with the problem of how to analyze experimental data that contain two crossed random effects (items and participants). The classical analysis of variance does not apply; alternatives have been proposed but have failed to catch on, and a statistically unsatisfactory procedure of using two approximations (known as F(1) and F(2)) has become the standard. A simple and elegant solution using mixed model analysis has been available for 15 years, and recent improvements in statistical software have made mixed models analysis widely available. The aim of this article is to increase the use of mixed models by giving a concise practical introduction and by giving clear directions for undertaking the analysis in the most popular statistical packages. The article also introduces the DJMIXED: add-on package for SPSS, which makes entering the models and reporting their results as straightforward as possible.
ERIC Educational Resources Information Center
Hannan, Michael T.
This document is part of a series of chapters described in SO 011 759. Stochastic models for the sociological analysis of change and the change process in quantitative variables are presented. The author lays groundwork for the statistical treatment of simple stochastic differential equations (SDEs) and discusses some of the continuities of…
Salvatore, Stefania; Bramness, Jørgen Gustav; Reid, Malcolm J; Thomas, Kevin Victor; Harman, Christopher; Røislien, Jo
2015-01-01
Wastewater-based epidemiology (WBE) is a new methodology for estimating the drug load in a population. Simple summary statistics and specification tests have typically been used to analyze WBE data, comparing differences between weekday and weekend loads. Such standard statistical methods may, however, overlook important nuanced information in the data. In this study, we apply functional data analysis (FDA) to WBE data and compare the results to those obtained from more traditional summary measures. We analysed temporal WBE data from 42 European cities, using sewage samples collected daily for one week in March 2013. For each city, the main temporal features of two selected drugs were extracted using functional principal component (FPC) analysis, along with simpler measures such as the area under the curve (AUC). The individual cities' scores on each of the temporal FPCs were then used as outcome variables in multiple linear regression analysis with various city and country characteristics as predictors. The results were compared to those of functional analysis of variance (FANOVA). The three first FPCs explained more than 99% of the temporal variation. The first component (FPC1) represented the level of the drug load, while the second and third temporal components represented the level and the timing of a weekend peak. AUC was highly correlated with FPC1, but other temporal characteristic were not captured by the simple summary measures. FANOVA was less flexible than the FPCA-based regression, and even showed concordance results. Geographical location was the main predictor for the general level of the drug load. FDA of WBE data extracts more detailed information about drug load patterns during the week which are not identified by more traditional statistical methods. Results also suggest that regression based on FPC results is a valuable addition to FANOVA for estimating associations between temporal patterns and covariate information.
Statistical science: a grammar for research.
Cox, David R
2017-06-01
I greatly appreciate the invitation to give this lecture with its century long history. The title is a warning that the lecture is rather discursive and not highly focused and technical. The theme is simple. That statistical thinking provides a unifying set of general ideas and specific methods relevant whenever appreciable natural variation is present. To be most fruitful these ideas should merge seamlessly with subject-matter considerations. By contrast, there is sometimes a temptation to regard formal statistical analysis as a ritual to be added after the serious work has been done, a ritual to satisfy convention, referees, and regulatory agencies. I want implicitly to refute that idea.
Willis, Brian H; Riley, Richard D
2017-09-20
An important question for clinicians appraising a meta-analysis is: are the findings likely to be valid in their own practice-does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity-where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple ('leave-one-out') cross-validation technique, we demonstrate how we may test meta-analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta-analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta-analysis and a tailored meta-regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within-study variance, between-study variance, study sample size, and the number of studies in the meta-analysis. Finally, we apply Vn to two published meta-analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta-analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd.
The Length of a Pestle: A Class Exercise in Measurement and Statistical Analysis.
ERIC Educational Resources Information Center
O'Reilly, James E.
1986-01-01
Outlines the simple exercise of measuring the length of an object as a concrete paradigm of the entire process of making chemical measurements and treating the resulting data. Discusses the procedure, significant figures, measurement error, spurious data, rejection of results, precision and accuracy, and student responses. (TW)
Code of Federal Regulations, 2010 CFR
2010-10-01
... by ACF statistical staff from the Adoption and Foster Care Analysis and Reporting System (AFCARS... primary review utilizing probability sampling methodologies. Usually, the chosen methodology will be simple random sampling, but other probability samples may be utilized, when necessary and appropriate. (3...
A Simple Method to Control Positive Baseline Trend within Data Nonoverlap
ERIC Educational Resources Information Center
Parker, Richard I.; Vannest, Kimberly J.; Davis, John L.
2014-01-01
Nonoverlap is widely used as a statistical summary of data; however, these analyses rarely correct unwanted positive baseline trend. This article presents and validates the graph rotation for overlap and trend (GROT) technique, a hand calculation method for controlling positive baseline trend within an analysis of data nonoverlap. GROT is…
Quantifying predictability in a model with statistical features of the atmosphere
Kleeman, Richard; Majda, Andrew J.; Timofeyev, Ilya
2002-01-01
The Galerkin truncated inviscid Burgers equation has recently been shown by the authors to be a simple model with many degrees of freedom, with many statistical properties similar to those occurring in dynamical systems relevant to the atmosphere. These properties include long time-correlated, large-scale modes of low frequency variability and short time-correlated “weather modes” at smaller scales. The correlation scaling in the model extends over several decades and may be explained by a simple theory. Here a thorough analysis of the nature of predictability in the idealized system is developed by using a theoretical framework developed by R.K. This analysis is based on a relative entropy functional that has been shown elsewhere by one of the authors to measure the utility of statistical predictions precisely. The analysis is facilitated by the fact that most relevant probability distributions are approximately Gaussian if the initial conditions are assumed to be so. Rather surprisingly this holds for both the equilibrium (climatological) and nonequilibrium (prediction) distributions. We find that in most cases the absolute difference in the first moments of these two distributions (the “signal” component) is the main determinant of predictive utility variations. Contrary to conventional belief in the ensemble prediction area, the dispersion of prediction ensembles is generally of secondary importance in accounting for variations in utility associated with different initial conditions. This conclusion has potentially important implications for practical weather prediction, where traditionally most attention has focused on dispersion and its variability. PMID:12429863
Mapping Quantitative Traits in Unselected Families: Algorithms and Examples
Dupuis, Josée; Shi, Jianxin; Manning, Alisa K.; Benjamin, Emelia J.; Meigs, James B.; Cupples, L. Adrienne; Siegmund, David
2009-01-01
Linkage analysis has been widely used to identify from family data genetic variants influencing quantitative traits. Common approaches have both strengths and limitations. Likelihood ratio tests typically computed in variance component analysis can accommodate large families but are highly sensitive to departure from normality assumptions. Regression-based approaches are more robust but their use has primarily been restricted to nuclear families. In this paper, we develop methods for mapping quantitative traits in moderately large pedigrees. Our methods are based on the score statistic which in contrast to the likelihood ratio statistic, can use nonparametric estimators of variability to achieve robustness of the false positive rate against departures from the hypothesized phenotypic model. Because the score statistic is easier to calculate than the likelihood ratio statistic, our basic mapping methods utilize relatively simple computer code that performs statistical analysis on output from any program that computes estimates of identity-by-descent. This simplicity also permits development and evaluation of methods to deal with multivariate and ordinal phenotypes, and with gene-gene and gene-environment interaction. We demonstrate our methods on simulated data and on fasting insulin, a quantitative trait measured in the Framingham Heart Study. PMID:19278016
Suurmond, Robert; van Rhee, Henk; Hak, Tony
2017-12-01
We present a new tool for meta-analysis, Meta-Essentials, which is free of charge and easy to use. In this paper, we introduce the tool and compare its features to other tools for meta-analysis. We also provide detailed information on the validation of the tool. Although free of charge and simple, Meta-Essentials automatically calculates effect sizes from a wide range of statistics and can be used for a wide range of meta-analysis applications, including subgroup analysis, moderator analysis, and publication bias analyses. The confidence interval of the overall effect is automatically based on the Knapp-Hartung adjustment of the DerSimonian-Laird estimator. However, more advanced meta-analysis methods such as meta-analytical structural equation modelling and meta-regression with multiple covariates are not available. In summary, Meta-Essentials may prove a valuable resource for meta-analysts, including researchers, teachers, and students. © 2017 The Authors. Research Synthesis Methods published by John Wiley & Sons Ltd.
Reliability Analysis of the Gradual Degradation of Semiconductor Devices.
1983-07-20
under the heading of linear models or linear statistical models . 3 ,4 We have not used this material in this report. Assuming catastrophic failure when...assuming a catastrophic model . In this treatment we first modify our system loss formula and then proceed to the actual analysis. II. ANALYSIS OF...Failure Time 1 Ti Ti 2 T2 T2 n Tn n and are easily analyzed by simple linear regression. Since we have assumed a log normal/Arrhenius activation
Kaplan, Metin; Erol, Fatih Serhat; Bozgeyik, Zülküf; Koparan, Mehmet
2007-07-01
In the present study, the clinical effectiveness of a surgical procedure in which no draining tubes are installed following simple burr hole drainage and saline irrigation is investigated. 10 patients, having undergone operative intervention for unilateral chronic subdural hemorrhage, having a clinical grade of 2 and a hemorrhage thickness of 2 cm, were included in the study. The cerebral blood flow rates of middle cerebral artery were evaluated bilaterally with Doppler before and after the surgery. All the cases underwent the operation using the simple burr hole drainage technique without the drain and consequent saline irrigation. Statistical analysis was performed by Wilcoxon signed rank test (p<0.05). There was a pronounced decrease in the preoperative MCA blood flow in the hemisphere the hemorrhage had occurred (p=0.008). An increased PI value on the side of the hemorrhage drew our attention (p=0.005). Postoperative MCA blood flow measurements showed a statistically significant improvement (p=0.005). Furthermore, the PI value showed normalization (p<0.05). The paresis and the level of consciousness improved in all cases. Simple burr hole drainage technique is sufficient for the improvement of cerebral blood flow and clinical recovery in patients with chronic subdural hemorrhage.
NASA Technical Reports Server (NTRS)
Levy, G.; Brown, R. A.
1986-01-01
A simple economical objective analysis scheme is devised and tested on real scatterometer data. It is designed to treat dense data such as those of the Seasat A Satellite Scatterometer (SASS) for individual or multiple passes, and preserves subsynoptic scale features. Errors are evaluated with the aid of sampling ('bootstrap') statistical methods. In addition, sensitivity tests have been performed which establish qualitative confidence in calculated fields of divergence and vorticity. The SASS wind algorithm could be improved; however, the data at this point are limited by instrument errors rather than analysis errors. The analysis error is typically negligible in comparison with the instrument error, but amounts to 30 percent of the instrument error in areas of strong wind shear. The scheme is very economical, and thus suitable for large volumes of dense data such as SASS data.
A Simple Illustration for the Need of Multiple Comparison Procedures
ERIC Educational Resources Information Center
Carter, Rickey E.
2010-01-01
Statistical adjustments to accommodate multiple comparisons are routinely covered in introductory statistical courses. The fundamental rationale for such adjustments, however, may not be readily understood. This article presents a simple illustration to help remedy this.
Does daily nurse staffing match ward workload variability? Three hospitals' experiences.
Gabbay, Uri; Bukchin, Michael
2009-01-01
Nurse shortage and rising healthcare resource burdens mean that appropriate workforce use is imperative. This paper aims to evaluate whether daily nursing staffing meets ward workload needs. Nurse attendance and daily nurses' workload capacity in three hospitals were evaluated. Statistical process control was used to evaluate intra-ward nurse workload capacity and day-to-day variations. Statistical process control is a statistics-based method for process monitoring that uses charts with predefined target measure and control limits. Standardization was performed for inter-ward analysis by converting ward-specific crude measures to ward-specific relative measures by dividing observed/expected. Two charts: acceptable and tolerable daily nurse workload intensity, were defined. Appropriate staffing indicators were defined as those exceeding predefined rates within acceptable and tolerable limits (50 percent and 80 percent respectively). A total of 42 percent of the overall days fell within acceptable control limits and 71 percent within tolerable control limits. Appropriate staffing indicators were met in only 33 percent of wards regarding acceptable nurse workload intensity and in only 45 percent of wards regarding tolerable workloads. The study work did not differentiate crude nurse attendance and it did not take into account patient severity since crude bed occupancy was used. Double statistical process control charts and certain staffing indicators were used, which is open to debate. Wards that met appropriate staffing indicators prove the method's feasibility. Wards that did not meet appropriate staffing indicators prove the importance and the need for process evaluations and monitoring. Methods presented for monitoring daily staffing appropriateness are simple to implement either for intra-ward day-to-day variation by using nurse workload capacity statistical process control charts or for inter-ward evaluation using standardized measure of nurse workload intensity. The real challenge will be to develop planning systems and implement corrective interventions such as dynamic and flexible daily staffing, which will face difficulties and barriers. The paper fulfils the need for workforce utilization evaluation. A simple method using available data for daily staffing appropriateness evaluation, which is easy to implement and operate, is presented. The statistical process control method enables intra-ward evaluation, while standardization by converting crude into relative measures enables inter-ward analysis. The staffing indicator definitions enable performance evaluation. This original study uses statistical process control to develop simple standardization methods and applies straightforward statistical tools. This method is not limited to crude measures, rather it uses weighted workload measures such as nursing acuity or weighted nurse level (i.e. grade/band).
Object Classification Based on Analysis of Spectral Characteristics of Seismic Signal Envelopes
NASA Astrophysics Data System (ADS)
Morozov, Yu. V.; Spektor, A. A.
2017-11-01
A method for classifying moving objects having a seismic effect on the ground surface is proposed which is based on statistical analysis of the envelopes of received signals. The values of the components of the amplitude spectrum of the envelopes obtained applying Hilbert and Fourier transforms are used as classification criteria. Examples illustrating the statistical properties of spectra and the operation of the seismic classifier are given for an ensemble of objects of four classes (person, group of people, large animal, vehicle). It is shown that the computational procedures for processing seismic signals are quite simple and can therefore be used in real-time systems with modest requirements for computational resources.
NASA Astrophysics Data System (ADS)
Pradeep, Krishna; Poiroux, Thierry; Scheer, Patrick; Juge, André; Gouget, Gilles; Ghibaudo, Gérard
2018-07-01
This work details the analysis of wafer level global process variability in 28 nm FD-SOI using split C-V measurements. The proposed approach initially evaluates the native on wafer process variability using efficient extraction methods on split C-V measurements. The on-wafer threshold voltage (VT) variability is first studied and modeled using a simple analytical model. Then, a statistical model based on the Leti-UTSOI compact model is proposed to describe the total C-V variability in different bias conditions. This statistical model is finally used to study the contribution of each process parameter to the total C-V variability.
Modeling and replicating statistical topology and evidence for CMB nonhomogeneity
Agami, Sarit
2017-01-01
Under the banner of “big data,” the detection and classification of structure in extremely large, high-dimensional, data sets are two of the central statistical challenges of our times. Among the most intriguing new approaches to this challenge is “TDA,” or “topological data analysis,” one of the primary aims of which is providing nonmetric, but topologically informative, preanalyses of data which make later, more quantitative, analyses feasible. While TDA rests on strong mathematical foundations from topology, in applications, it has faced challenges due to difficulties in handling issues of statistical reliability and robustness, often leading to an inability to make scientific claims with verifiable levels of statistical confidence. We propose a methodology for the parametric representation, estimation, and replication of persistence diagrams, the main diagnostic tool of TDA. The power of the methodology lies in the fact that even if only one persistence diagram is available for analysis—the typical case for big data applications—the replications permit conventional statistical hypothesis testing. The methodology is conceptually simple and computationally practical, and provides a broadly effective statistical framework for persistence diagram TDA analysis. We demonstrate the basic ideas on a toy example, and the power of the parametric approach to TDA modeling in an analysis of cosmic microwave background (CMB) nonhomogeneity. PMID:29078301
A simple and fast representation space for classifying complex time series
NASA Astrophysics Data System (ADS)
Zunino, Luciano; Olivares, Felipe; Bariviera, Aurelio F.; Rosso, Osvaldo A.
2017-03-01
In the context of time series analysis considerable effort has been directed towards the implementation of efficient discriminating statistical quantifiers. Very recently, a simple and fast representation space has been introduced, namely the number of turning points versus the Abbe value. It is able to separate time series from stationary and non-stationary processes with long-range dependences. In this work we show that this bidimensional approach is useful for distinguishing complex time series: different sets of financial and physiological data are efficiently discriminated. Additionally, a multiscale generalization that takes into account the multiple time scales often involved in complex systems has been also proposed. This multiscale analysis is essential to reach a higher discriminative power between physiological time series in health and disease.
Statistical analysis of strait time index and a simple model for trend and trend reversal
NASA Astrophysics Data System (ADS)
Chen, Kan; Jayaprakash, C.
2003-06-01
We analyze the daily closing prices of the Strait Time Index (STI) as well as the individual stocks traded in Singapore's stock market from 1988 to 2001. We find that the Hurst exponent is approximately 0.6 for both the STI and individual stocks, while the normal correlation functions show the random walk exponent of 0.5. We also investigate the conditional average of the price change in an interval of length T given the price change in the previous interval. We find strong correlations for price changes larger than a threshold value proportional to T; this indicates that there is no uniform crossover to Gaussian behavior. A simple model based on short-time trend and trend reversal is constructed. We show that the model exhibits statistical properties and market swings similar to those of the real market.
[Health for All-Italia: an indicator system on health].
Burgio, Alessandra; Crialesi, Roberta; Loghi, Marzia
2003-01-01
The Health for All - Italia information system collects health data from several sources. It is intended to be a cornerstone for the achievement of an overview about health in Italy. Health is analyzed at different levels, ranging from health services, health needs, lifestyles, demographic, social, economic and environmental contexts. The database associated software allows to pin down statistical data into graphs and tables, and to carry out simple statistical analysis. It is therefore possible to view the indicators' time series, make simple projections and compare the various indicators over the years for each territorial unit. This is possible by means of tables, graphs (histograms, line graphs, frequencies, linear regression with calculation of correlation coefficients, etc) and maps. These charts can be exported to other programs (i.e. Word, Excel, Power Point), or they can be directly printed in color or black and white.
Universality classes of fluctuation dynamics in hierarchical complex systems
NASA Astrophysics Data System (ADS)
Macêdo, A. M. S.; González, Iván R. Roa; Salazar, D. S. P.; Vasconcelos, G. L.
2017-03-01
A unified approach is proposed to describe the statistics of the short-time dynamics of multiscale complex systems. The probability density function of the relevant time series (signal) is represented as a statistical superposition of a large time-scale distribution weighted by the distribution of certain internal variables that characterize the slowly changing background. The dynamics of the background is formulated as a hierarchical stochastic model whose form is derived from simple physical constraints, which in turn restrict the dynamics to only two possible classes. The probability distributions of both the signal and the background have simple representations in terms of Meijer G functions. The two universality classes for the background dynamics manifest themselves in the signal distribution as two types of tails: power law and stretched exponential, respectively. A detailed analysis of empirical data from classical turbulence and financial markets shows excellent agreement with the theory.
Atlas of the Light Curves and Phase Plane Portraits of Selected Long-Period Variables
NASA Astrophysics Data System (ADS)
Kudashkina, L. S.; Andronov, I. L.
2017-12-01
For a group of the Mira-type stars, semi-regular variables and some RV Tau - type stars the limit cycles were computed and plotted using the phase plane diagrams. As generalized coordinates x and x', we have used φ - the brightness of the star and its phase derivative. We have used mean phase light curves using observations of various authors from the databases of AAVSO, AFOEV, VSOLJ, ASAS and approximated using a trigonometric polynomial of statistically optimal degree. For a simple sine-like light curve, the limit cycle is a simple ellipse. In a case of more complicated light curve, in which harmonics are statistically significant, the limit cycle has deviations from the ellipse. In an addition to a classical analysis, we use the error estimates of the smoothing function and its derivative to constrain an "error corridor" in the phase plane.
A simple program to measure and analyse tree rings using Excel, R and SigmaScan
Hietz, Peter
2011-01-01
I present a new software that links a program for image analysis (SigmaScan), one for spreadsheets (Excel) and one for statistical analysis (R) for applications of tree-ring analysis. The first macro measures ring width marked by the user on scanned images, stores raw and detrended data in Excel and calculates the distance to the pith and inter-series correlations. A second macro measures darkness along a defined path to identify latewood–earlywood transition in conifers, and a third shows the potential for automatic detection of boundaries. Written in Visual Basic for Applications, the code makes use of the advantages of existing programs and is consequently very economic and relatively simple to adjust to the requirements of specific projects or to expand making use of already available code. PMID:26109835
A simple program to measure and analyse tree rings using Excel, R and SigmaScan.
Hietz, Peter
I present a new software that links a program for image analysis (SigmaScan), one for spreadsheets (Excel) and one for statistical analysis (R) for applications of tree-ring analysis. The first macro measures ring width marked by the user on scanned images, stores raw and detrended data in Excel and calculates the distance to the pith and inter-series correlations. A second macro measures darkness along a defined path to identify latewood-earlywood transition in conifers, and a third shows the potential for automatic detection of boundaries. Written in Visual Basic for Applications, the code makes use of the advantages of existing programs and is consequently very economic and relatively simple to adjust to the requirements of specific projects or to expand making use of already available code.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ghosh, D.; Sarkar, S.; Sen, S.
1995-06-01
In this paper the behavior of factorial moments with rapidity window size, which is usually explained in terms of ``intermittency,`` has been interpreted by simple quantum statistical properties of the emitting system using the concept of ``modified two-source model`` as recently proposed by Ghosh and Sarkar [Phys. Lett. B 278, 465 (1992)]. The analysis has been performed using our own data of {sup 16}O-Ag/Br and {sup 24}Mg-Ag/Br interactions at a few tens of GeV energy regime.
NASA Astrophysics Data System (ADS)
Roberts, Michael J.; Braun, Noah O.; Sinclair, Thomas R.; Lobell, David B.; Schlenker, Wolfram
2017-09-01
We compare predictions of a simple process-based crop model (Soltani and Sinclair 2012), a simple statistical model (Schlenker and Roberts 2009), and a combination of both models to actual maize yields on a large, representative sample of farmer-managed fields in the Corn Belt region of the United States. After statistical post-model calibration, the process model (Simple Simulation Model, or SSM) predicts actual outcomes slightly better than the statistical model, but the combined model performs significantly better than either model. The SSM, statistical model and combined model all show similar relationships with precipitation, while the SSM better accounts for temporal patterns of precipitation, vapor pressure deficit and solar radiation. The statistical and combined models show a more negative impact associated with extreme heat for which the process model does not account. Due to the extreme heat effect, predicted impacts under uniform climate change scenarios are considerably more severe for the statistical and combined models than for the process-based model.
Maheshkumar, K; Dilara, K; Maruthy, K N; Sundareswaren, L
2016-07-01
Heart rate variability (HRV) analysis is a simple and noninvasive technique capable of assessing autonomic nervous system modulation on heart rate (HR) in healthy as well as disease conditions. The aim of the present study was to compare (validate) the HRV using a temporal series of electrocardiograms (ECG) obtained by simple analog amplifier with PC-based sound card (audacity) and Biopac MP36 module. Based on the inclusion criteria, 120 healthy participants, including 72 males and 48 females, participated in the present study. Following standard protocol, 5-min ECG was recorded after 10 min of supine rest by Portable simple analog amplifier PC-based sound card as well as by Biopac module with surface electrodes in Leads II position simultaneously. All the ECG data was visually screened and was found to be free of ectopic beats and noise. RR intervals from both ECG recordings were analyzed separately in Kubios software. Short-term HRV indexes in both time and frequency domain were used. The unpaired Student's t-test and Pearson correlation coefficient test were used for the analysis using the R statistical software. No statistically significant differences were observed when comparing the values analyzed by means of the two devices for HRV. Correlation analysis revealed perfect positive correlation (r = 0.99, P < 0.001) between the values in time and frequency domain obtained by the devices. On the basis of the results of the present study, we suggest that the calculation of HRV values in the time and frequency domains by RR series obtained from the PC-based sound card is probably as reliable as those obtained by the gold standard Biopac MP36.
Statistical issues in quality control of proteomic analyses: good experimental design and planning.
Cairns, David A
2011-03-01
Quality control is becoming increasingly important in proteomic investigations as experiments become more multivariate and quantitative. Quality control applies to all stages of an investigation and statistics can play a key role. In this review, the role of statistical ideas in the design and planning of an investigation is described. This involves the design of unbiased experiments using key concepts from statistical experimental design, the understanding of the biological and analytical variation in a system using variance components analysis and the determination of a required sample size to perform a statistically powerful investigation. These concepts are described through simple examples and an example data set from a 2-D DIGE pilot experiment. Each of these concepts can prove useful in producing better and more reproducible data. Copyright © 2011 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
The importance of proving the null.
Gallistel, C R
2009-04-01
Null hypotheses are simple, precise, and theoretically important. Conventional statistical analysis cannot support them; Bayesian analysis can. The challenge in a Bayesian analysis is to formulate a suitably vague alternative, because the vaguer the alternative is (the more it spreads out the unit mass of prior probability), the more the null is favored. A general solution is a sensitivity analysis: Compute the odds for or against the null as a function of the limit(s) on the vagueness of the alternative. If the odds on the null approach 1 from above as the hypothesized maximum size of the possible effect approaches 0, then the data favor the null over any vaguer alternative to it. The simple computations and the intuitive graphic representation of the analysis are illustrated by the analysis of diverse examples from the current literature. They pose 3 common experimental questions: (a) Are 2 means the same? (b) Is performance at chance? (c) Are factors additive? (c) 2009 APA, all rights reserved
A Simple and Robust Method for Partially Matched Samples Using the P-Values Pooling Approach
Kuan, Pei Fen; Huang, Bo
2013-01-01
This paper focuses on statistical analyses in scenarios where some samples from the matched pairs design are missing, resulting in partially matched samples. Motivated by the idea of meta-analysis, we recast the partially matched samples as coming from two experimental designs, and propose a simple yet robust approach based on the weighted Z-test to integrate the p-values computed from these two designs. We show that the proposed approach achieves better operating characteristics in simulations and a case study, compared to existing methods for partially matched samples. PMID:23417968
Yue Xu, Selene; Nelson, Sandahl; Kerr, Jacqueline; Godbole, Suneeta; Patterson, Ruth; Merchant, Gina; Abramson, Ian; Staudenmayer, John; Natarajan, Loki
2018-04-01
Physical inactivity is a recognized risk factor for many chronic diseases. Accelerometers are increasingly used as an objective means to measure daily physical activity. One challenge in using these devices is missing data due to device nonwear. We used a well-characterized cohort of 333 overweight postmenopausal breast cancer survivors to examine missing data patterns of accelerometer outputs over the day. Based on these observed missingness patterns, we created psuedo-simulated datasets with realistic missing data patterns. We developed statistical methods to design imputation and variance weighting algorithms to account for missing data effects when fitting regression models. Bias and precision of each method were evaluated and compared. Our results indicated that not accounting for missing data in the analysis yielded unstable estimates in the regression analysis. Incorporating variance weights and/or subject-level imputation improved precision by >50%, compared to ignoring missing data. We recommend that these simple easy-to-implement statistical tools be used to improve analysis of accelerometer data.
Statistical analysis of the determinations of the Sun's Galactocentric distance
NASA Astrophysics Data System (ADS)
Malkin, Zinovy
2013-02-01
Based on several tens of R0 measurements made during the past two decades, several studies have been performed to derive the best estimate of R0. Some used just simple averaging to derive a result, whereas others provided comprehensive analyses of possible errors in published results. In either case, detailed statistical analyses of data used were not performed. However, a computation of the best estimates of the Galactic rotation constants is not only an astronomical but also a metrological task. Here we perform an analysis of 53 R0 measurements (published in the past 20 years) to assess the consistency of the data. Our analysis shows that they are internally consistent. It is also shown that any trend in the R0 estimates from the last 20 years is statistically negligible, which renders the presence of a bandwagon effect doubtful. On the other hand, the formal errors in the published R0 estimates improve significantly with time.
Cost analysis and outcomes of simple elbow dislocations
Panteli, Michalis; Pountos, Ippokratis; Kanakaris, Nikolaos K; Tosounidis, Theodoros H; Giannoudis, Peter V
2015-01-01
AIM: To evaluate the management, clinical outcome and cost implications of three different treatment regimes for simple elbow dislocations. METHODS: Following institutional board approval, we performed a retrospective review of all consecutive patients treated for simple elbow dislocations in a Level I trauma centre between January 2008 and December 2010. Based on the length of elbow immobilisation (LOI), patients were divided in three groups (Group I, < 2 wk; Group II, 2-3 wk; and Group III, > 3 wk). Outcome was considered satisfactory when a patient could achieve a pain-free range of motion ≥ 100° (from 30° to 130°). The associated direct medical costs for the treatment of each patient were then calculated and analysed. RESULTS: We identified 80 patients who met the inclusion criteria. Due to loss to follow up, 13 patients were excluded from further analysis, leaving 67 patients for the final analysis. The mean LOI was 14 d (median 15 d; range 3-43 d) with a mean duration of hospital engagement of 67 d (median 57 d; range 10-351 d). Group III (prolonged immobilisation) had a statistically significant worse outcome in comparison to Group I and II (P = 0.04 and P = 0.01 respectively); however, there was no significant difference in the outcome between groups I and II (P = 0.30). No statistically significant difference in the direct medical costs between the groups was identified. CONCLUSION: The length of elbow immobilization doesn’t influence the medical cost; however immobilisation longer than three weeks is associated with persistent stiffness and a less satisfactory clinical outcome. PMID:26301180
Definitive or conservative surgery for perforated gastric ulcer?--An unresolved problem.
Sarath Chandra, Sistla; Kumar, S Siva
2009-04-01
Gastric ulcer perforation has not been the focus of many studies. In addition there is a need to analyze the results of gastric perforation separately and not along with duodenal perforations, to identify the factors influencing the outcome and to develop strategies for its management. Retrospective analysis of 54 patients presenting with gastric perforation. Mean age of the patients was 44.5 years with male preponderance. Morbidity following Closure of the perforation, acid reduction surgery and resection was not significantly different. Overall mortality was 16.6% with highest mortality 24.1% following simple closure. Mortality following simple closure and definitive surgery was not significantly different. Univariate analysis revealed preoperative shock, associated medical illness and surgical delay to be significant factors for mortality whereas on multivariate analysis, preoperative shock was the only independent predictor of mortality. Mortality increased with increasing Boey score but the association between the type of surgery and probability of survival was not statistically significant. Boey risk score is useful in predicting the outcome of surgical treatment for gastric perforation. Definitive surgery is not associated with greater morbidity or mortality compared to simple closure.
Riley, Richard D.
2017-01-01
An important question for clinicians appraising a meta‐analysis is: are the findings likely to be valid in their own practice—does the reported effect accurately represent the effect that would occur in their own clinical population? To this end we advance the concept of statistical validity—where the parameter being estimated equals the corresponding parameter for a new independent study. Using a simple (‘leave‐one‐out’) cross‐validation technique, we demonstrate how we may test meta‐analysis estimates for statistical validity using a new validation statistic, Vn, and derive its distribution. We compare this with the usual approach of investigating heterogeneity in meta‐analyses and demonstrate the link between statistical validity and homogeneity. Using a simulation study, the properties of Vn and the Q statistic are compared for univariate random effects meta‐analysis and a tailored meta‐regression model, where information from the setting (included as model covariates) is used to calibrate the summary estimate to the setting of application. Their properties are found to be similar when there are 50 studies or more, but for fewer studies Vn has greater power but a higher type 1 error rate than Q. The power and type 1 error rate of Vn are also shown to depend on the within‐study variance, between‐study variance, study sample size, and the number of studies in the meta‐analysis. Finally, we apply Vn to two published meta‐analyses and conclude that it usefully augments standard methods when deciding upon the likely validity of summary meta‐analysis estimates in clinical practice. © 2017 The Authors. Statistics in Medicine published by John Wiley & Sons Ltd. PMID:28620945
ERIC Educational Resources Information Center
Zhang, Jinming
2004-01-01
It is common to assume during statistical analysis of a multiscale assessment that the assessment has simple structure or that it is composed of several unidimensional subtests. Under this assumption, both the unidimensional and multidimensional approaches can be used to estimate item parameters. This paper theoretically demonstrates that these…
Helping Students Assess the Relative Importance of Different Intermolecular Interactions
ERIC Educational Resources Information Center
Jasien, Paul G.
2008-01-01
A semi-quantitative model has been developed to estimate the relative effects of dispersion, dipole-dipole interactions, and H-bonding on the normal boiling points ("T[subscript b]") for a subset of simple organic systems. The model is based upon a statistical analysis using multiple linear regression on a series of straight-chain organic…
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.
Lin, Johnny; Bentler, Peter M
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
The use of analysis of variance procedures in biological studies
Williams, B.K.
1987-01-01
The analysis of variance (ANOVA) is widely used in biological studies, yet there remains considerable confusion among researchers about the interpretation of hypotheses being tested. Ambiguities arise when statistical designs are unbalanced, and in particular when not all combinations of design factors are represented in the data. This paper clarifies the relationship among hypothesis testing, statistical modelling and computing procedures in ANOVA for unbalanced data. A simple two-factor fixed effects design is used to illustrate three common parametrizations for ANOVA models, and some associations among these parametrizations are developed. Biologically meaningful hypotheses for main effects and interactions are given in terms of each parametrization, and procedures for testing the hypotheses are described. The standard statistical computing procedures in ANOVA are given along with their corresponding hypotheses. Throughout the development unbalanced designs are assumed and attention is given to problems that arise with missing cells.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages.
Choi, Youn-Kyung; Kim, Jinmi; Yamaguchi, Tetsutaro; Maki, Koutaro; Ko, Ching-Chang; Kim, Yong-Il
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5-18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level.
Cervical Vertebral Body's Volume as a New Parameter for Predicting the Skeletal Maturation Stages
Choi, Youn-Kyung; Kim, Jinmi; Maki, Koutaro; Ko, Ching-Chang
2016-01-01
This study aimed to determine the correlation between the volumetric parameters derived from the images of the second, third, and fourth cervical vertebrae by using cone beam computed tomography with skeletal maturation stages and to propose a new formula for predicting skeletal maturation by using regression analysis. We obtained the estimation of skeletal maturation levels from hand-wrist radiographs and volume parameters derived from the second, third, and fourth cervical vertebrae bodies from 102 Japanese patients (54 women and 48 men, 5–18 years of age). We performed Pearson's correlation coefficient analysis and simple regression analysis. All volume parameters derived from the second, third, and fourth cervical vertebrae exhibited statistically significant correlations (P < 0.05). The simple regression model with the greatest R-square indicated the fourth-cervical-vertebra volume as an independent variable with a variance inflation factor less than ten. The explanation power was 81.76%. Volumetric parameters of cervical vertebrae using cone beam computed tomography are useful in regression models. The derived regression model has the potential for clinical application as it enables a simple and quantitative analysis to evaluate skeletal maturation level. PMID:27340668
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species
Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha
2011-01-01
Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
NASA Astrophysics Data System (ADS)
Zhaunerchyk, V.; Frasinski, L. J.; Eland, J. H. D.; Feifel, R.
2014-05-01
Multidimensional covariance analysis and its validity for correlation of processes leading to multiple products are investigated from a theoretical point of view. The need to correct for false correlations induced by experimental parameters which fluctuate from shot to shot, such as the intensity of self-amplified spontaneous emission x-ray free-electron laser pulses, is emphasized. Threefold covariance analysis based on simple extension of the two-variable formulation is shown to be valid for variables exhibiting Poisson statistics. In this case, false correlations arising from fluctuations in an unstable experimental parameter that scale linearly with signals can be eliminated by threefold partial covariance analysis, as defined here. Fourfold covariance based on the same simple extension is found to be invalid in general. Where fluctuations in an unstable parameter induce nonlinear signal variations, a technique of contingent covariance analysis is proposed here to suppress false correlations. In this paper we also show a method to eliminate false correlations associated with fluctuations of several unstable experimental parameters.
Meta-analysis of gene-level associations for rare variants based on single-variant statistics.
Hu, Yi-Juan; Berndt, Sonja I; Gustafsson, Stefan; Ganna, Andrea; Hirschhorn, Joel; North, Kari E; Ingelsson, Erik; Lin, Dan-Yu
2013-08-08
Meta-analysis of genome-wide association studies (GWASs) has led to the discoveries of many common variants associated with complex human diseases. There is a growing recognition that identifying "causal" rare variants also requires large-scale meta-analysis. The fact that association tests with rare variants are performed at the gene level rather than at the variant level poses unprecedented challenges in the meta-analysis. First, different studies may adopt different gene-level tests, so the results are not compatible. Second, gene-level tests require multivariate statistics (i.e., components of the test statistic and their covariance matrix), which are difficult to obtain. To overcome these challenges, we propose to perform gene-level tests for rare variants by combining the results of single-variant analysis (i.e., p values of association tests and effect estimates) from participating studies. This simple strategy is possible because of an insight that multivariate statistics can be recovered from single-variant statistics, together with the correlation matrix of the single-variant test statistics, which can be estimated from one of the participating studies or from a publicly available database. We show both theoretically and numerically that the proposed meta-analysis approach provides accurate control of the type I error and is as powerful as joint analysis of individual participant data. This approach accommodates any disease phenotype and any study design and produces all commonly used gene-level tests. An application to the GWAS summary results of the Genetic Investigation of ANthropometric Traits (GIANT) consortium reveals rare and low-frequency variants associated with human height. The relevant software is freely available. Copyright © 2013 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.
GWAMA: software for genome-wide association meta-analysis.
Mägi, Reedik; Morris, Andrew P
2010-05-28
Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. The GWAMA (Genome-Wide Association Meta-Analysis) software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
Evaluation of Skylab IB sensitivity to on-pad winds with turbulence
NASA Technical Reports Server (NTRS)
Coffin, T.
1972-01-01
Computer simulation was performed to estimate displacements and bending moments experienced by the SKYLAB 1B vehicle on the launch pad due to atmospheric winds. The vehicle was assumed to be a beam-like structure represented by a finite number of generalized coordinates. Wind flow across the vehicle was treated as a nonhomogeneous, stationary random process. Response computations were performed by the assumption of simple strip theory and application of generalized harmonic analysis. Displacement and bending moment statistics were obtained for six vehicle propellant loading conditions and four representative reference wind profile and turbulence levels. Means, variances and probability distributions are presented graphically for each case. A separate analysis was performed to indicate the influence of wind gradient variations on vehicle response statistics.
El Sharabasy, Sherif F; Soliman, Khaled A
2017-01-01
The date palm is an ancient domesticated plant with great diversity and has been cultivated in the Middle East and North Africa for at last 5000 years. Date palm cultivars are classified based on the fruit moisture content, as dry, semidry, and soft dates. There are a number of biochemical and molecular techniques available for characterization of the date palm variation. This chapter focuses on the DNA-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeats (ISSR) techniques, in addition to biochemical markers based on isozyme analysis. These techniques coupled with appropriate statistical tools proved useful for determining phylogenetic relationships among date palm cultivars and provide information resources for date palm gene banks.
Popa, Laurentiu S.; Streng, Martha L.
2017-01-01
Abstract Most hypotheses of cerebellar function emphasize a role in real-time control of movements. However, the cerebellum’s use of current information to adjust future movements and its involvement in sequencing, working memory, and attention argues for predicting and maintaining information over extended time windows. The present study examines the time course of Purkinje cell discharge modulation in the monkey (Macaca mulatta) during manual, pseudo-random tracking. Analysis of the simple spike firing from 183 Purkinje cells during tracking reveals modulation up to 2 s before and after kinematics and position error. Modulation significance was assessed against trial shuffled firing, which decoupled simple spike activity from behavior and abolished long-range encoding while preserving data statistics. Position, velocity, and position errors have the most frequent and strongest long-range feedforward and feedback modulations, with less common, weaker long-term correlations for speed and radial error. Position, velocity, and position errors can be decoded from the population simple spike firing with considerable accuracy for even the longest predictive (-2000 to -1500 ms) and feedback (1500 to 2000 ms) epochs. Separate analysis of the simple spike firing in the initial hold period preceding tracking shows similar long-range feedforward encoding of the upcoming movement and in the final hold period feedback encoding of the just completed movement, respectively. Complex spike analysis reveals little long-term modulation with behavior. We conclude that Purkinje cell simple spike discharge includes short- and long-range representations of both upcoming and preceding behavior that could underlie cerebellar involvement in error correction, working memory, and sequencing. PMID:28413823
Statistical complexity without explicit reference to underlying probabilities
NASA Astrophysics Data System (ADS)
Pennini, F.; Plastino, A.
2018-06-01
We show that extremely simple systems of a not too large number of particles can be simultaneously thermally stable and complex. To such an end, we extend the statistical complexity's notion to simple configurations of non-interacting particles, without appeal to probabilities, and discuss configurational properties.
Rosen, G D
2006-06-01
Meta-analysis is a vague descriptor used to encompass very diverse methods of data collection analysis, ranging from simple averages to more complex statistical methods. Holo-analysis is a fully comprehensive statistical analysis of all available data and all available variables in a specified topic, with results expressed in a holistic factual empirical model. The objectives and applications of holo-analysis include software production for prediction of responses with confidence limits, translation of research conditions to praxis (field) circumstances, exposure of key missing variables, discovery of theoretically unpredictable variables and interactions, and planning future research. Holo-analyses are cited as examples of the effects on broiler feed intake and live weight gain of exogenous phytases, which account for 70% of variation in responses in terms of 20 highly significant chronological, dietary, environmental, genetic, managemental, and nutrient variables. Even better future accountancy of variation will be facilitated if and when authors of papers routinely provide key data for currently neglected variables, such as temperatures, complete feed formulations, and mortalities.
NASA Astrophysics Data System (ADS)
He, Honghui; Dong, Yang; Zhou, Jialing; Ma, Hui
2017-03-01
As one of the salient features of light, polarization contains abundant structural and optical information of media. Recently, as a comprehensive description of polarization property, the Mueller matrix polarimetry has been applied to various biomedical studies such as cancerous tissues detections. In previous works, it has been found that the structural information encoded in the 2D Mueller matrix images can be presented by other transformed parameters with more explicit relationship to certain microstructural features. In this paper, we present a statistical analyzing method to transform the 2D Mueller matrix images into frequency distribution histograms (FDHs) and their central moments to reveal the dominant structural features of samples quantitatively. The experimental results of porcine heart, intestine, stomach, and liver tissues demonstrate that the transformation parameters and central moments based on the statistical analysis of Mueller matrix elements have simple relationships to the dominant microstructural properties of biomedical samples, including the density and orientation of fibrous structures, the depolarization power, diattenuation and absorption abilities. It is shown in this paper that the statistical analysis of 2D images of Mueller matrix elements may provide quantitative or semi-quantitative criteria for biomedical diagnosis.
Variation in reaction norms: Statistical considerations and biological interpretation.
Morrissey, Michael B; Liefting, Maartje
2016-09-01
Analysis of reaction norms, the functions by which the phenotype produced by a given genotype depends on the environment, is critical to studying many aspects of phenotypic evolution. Different techniques are available for quantifying different aspects of reaction norm variation. We examine what biological inferences can be drawn from some of the more readily applicable analyses for studying reaction norms. We adopt a strongly biologically motivated view, but draw on statistical theory to highlight strengths and drawbacks of different techniques. In particular, consideration of some formal statistical theory leads to revision of some recently, and forcefully, advocated opinions on reaction norm analysis. We clarify what simple analysis of the slope between mean phenotype in two environments can tell us about reaction norms, explore the conditions under which polynomial regression can provide robust inferences about reaction norm shape, and explore how different existing approaches may be used to draw inferences about variation in reaction norm shape. We show how mixed model-based approaches can provide more robust inferences than more commonly used multistep statistical approaches, and derive new metrics of the relative importance of variation in reaction norm intercepts, slopes, and curvatures. © 2016 The Author(s). Evolution © 2016 The Society for the Study of Evolution.
K. R. Sherrill; M. A. Lefsky; J. B. Bradford; M. G. Ryan
2008-01-01
This study evaluates the relative ability of simple light detection and ranging (lidar) indices (i.e., mean and maximum heights) and statistically derived canonical correlation analysis (CCA) variables attained from discrete-return lidar to estimate forest structure and forest biomass variables for three temperate subalpine forest sites. Both lidar and CCA explanatory...
K.R. Sherrill; M.A. Lefsky; J.B. Bradford; M.G. Ryan
2008-01-01
This study evaluates the relative ability of simple light detection and ranging (lidar) indices (i.e., mean and maximum heights) and statistically derived canonical correlation analysis (CCA) variables attained from discrete-return lidar to estimate forest structure and forest biomass variables for three temperate subalpine forest sites. Both lidar and CCA explanatory...
Information-Decay Pursuit of Dynamic Parameters in Student Models
1994-04-01
simple worked-through example). Commercially available computer programs for structuring and using Bayesian inference include ERGO ( Noetic Systems...Tukey, J.W. (1977). Data analysis and Regression: A second course in statistics. Reading, MA: Addison-Wesley. Noetic Systems, Inc. (1991). ERGO...Naval Academy Division of Educational Studies Annapolis MD 21402-5002 Elmory Univerity Dr Janice Gifford 210 Fiabburne Bldg University of
Nature of Driving Force for Protein Folding: A Result From Analyzing the Statistical Potential
NASA Astrophysics Data System (ADS)
Li, Hao; Tang, Chao; Wingreen, Ned S.
1997-07-01
In a statistical approach to protein structure analysis, Miyazawa and Jernigan derived a 20×20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the Miyazawa-Jernigan matrix can be accurately reconstructed from its first two principal component vectors as Mij = C0+C1\\(qi+qj\\)+C2qiqj, with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.
Advanced Statistics for Exotic Animal Practitioners.
Hodsoll, John; Hellier, Jennifer M; Ryan, Elizabeth G
2017-09-01
Correlation and regression assess the association between 2 or more variables. This article reviews the core knowledge needed to understand these analyses, moving from visual analysis in scatter plots through correlation, simple and multiple linear regression, and logistic regression. Correlation estimates the strength and direction of a relationship between 2 variables. Regression can be considered more general and quantifies the numerical relationships between an outcome and 1 or multiple variables in terms of a best-fit line, allowing predictions to be made. Each technique is discussed with examples and the statistical assumptions underlying their correct application. Copyright © 2017 Elsevier Inc. All rights reserved.
Six Guidelines for Interesting Research.
Gray, Kurt; Wegner, Daniel M
2013-09-01
There are many guides on proper psychology, but far fewer on interesting psychology. This article presents six guidelines for interesting research. The first three-Phenomena First, Be Surprising, and Grandmothers, Not Scientists-suggest how to choose your research question; the last three-Be The Participant, Simple Statistics, and Powerful Beginnings-suggest how to answer your research question and offer perspectives on experimental design, statistical analysis, and effective communication. These guidelines serve as reminders that replicability is necessary but not sufficient for compelling psychological science. Interesting research considers subjective experience; it listens to the music of the human condition. © The Author(s) 2013.
Are V1 Simple Cells Optimized for Visual Occlusions? A Comparative Study
Bornschein, Jörg; Henniges, Marc; Lücke, Jörg
2013-01-01
Simple cells in primary visual cortex were famously found to respond to low-level image components such as edges. Sparse coding and independent component analysis (ICA) emerged as the standard computational models for simple cell coding because they linked their receptive fields to the statistics of visual stimuli. However, a salient feature of image statistics, occlusions of image components, is not considered by these models. Here we ask if occlusions have an effect on the predicted shapes of simple cell receptive fields. We use a comparative approach to answer this question and investigate two models for simple cells: a standard linear model and an occlusive model. For both models we simultaneously estimate optimal receptive fields, sparsity and stimulus noise. The two models are identical except for their component superposition assumption. We find the image encoding and receptive fields predicted by the models to differ significantly. While both models predict many Gabor-like fields, the occlusive model predicts a much sparser encoding and high percentages of ‘globular’ receptive fields. This relatively new center-surround type of simple cell response is observed since reverse correlation is used in experimental studies. While high percentages of ‘globular’ fields can be obtained using specific choices of sparsity and overcompleteness in linear sparse coding, no or only low proportions are reported in the vast majority of studies on linear models (including all ICA models). Likewise, for the here investigated linear model and optimal sparsity, only low proportions of ‘globular’ fields are observed. In comparison, the occlusive model robustly infers high proportions and can match the experimentally observed high proportions of ‘globular’ fields well. Our computational study, therefore, suggests that ‘globular’ fields may be evidence for an optimal encoding of visual occlusions in primary visual cortex. PMID:23754938
A practical approach to language complexity: a Wikipedia case study.
Yasseri, Taha; Kornai, András; Kertész, János
2012-01-01
In this paper we present statistical analysis of English texts from Wikipedia. We try to address the issue of language complexity empirically by comparing the simple English Wikipedia (Simple) to comparable samples of the main English Wikipedia (Main). Simple is supposed to use a more simplified language with a limited vocabulary, and editors are explicitly requested to follow this guideline, yet in practice the vocabulary richness of both samples are at the same level. Detailed analysis of longer units (n-grams of words and part of speech tags) shows that the language of Simple is less complex than that of Main primarily due to the use of shorter sentences, as opposed to drastically simplified syntax or vocabulary. Comparing the two language varieties by the Gunning readability index supports this conclusion. We also report on the topical dependence of language complexity, that is, that the language is more advanced in conceptual articles compared to person-based (biographical) and object-based articles. Finally, we investigate the relation between conflict and language complexity by analyzing the content of the talk pages associated to controversial and peacefully developing articles, concluding that controversy has the effect of reducing language complexity.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gilbert, Richard O.
The application of statistics to environmental pollution monitoring studies requires a knowledge of statistical analysis methods particularly well suited to pollution data. This book fills that need by providing sampling plans, statistical tests, parameter estimation procedure techniques, and references to pertinent publications. Most of the statistical techniques are relatively simple, and examples, exercises, and case studies are provided to illustrate procedures. The book is logically divided into three parts. Chapters 1, 2, and 3 are introductory chapters. Chapters 4 through 10 discuss field sampling designs and Chapters 11 through 18 deal with a broad range of statistical analysis procedures. Somemore » statistical techniques given here are not commonly seen in statistics book. For example, see methods for handling correlated data (Sections 4.5 and 11.12), for detecting hot spots (Chapter 10), and for estimating a confidence interval for the mean of a lognormal distribution (Section 13.2). Also, Appendix B lists a computer code that estimates and tests for trends over time at one or more monitoring stations using nonparametric methods (Chapters 16 and 17). Unfortunately, some important topics could not be included because of their complexity and the need to limit the length of the book. For example, only brief mention could be made of time series analysis using Box-Jenkins methods and of kriging techniques for estimating spatial and spatial-time patterns of pollution, although multiple references on these topics are provided. Also, no discussion of methods for assessing risks from environmental pollution could be included.« less
Magezi, David A
2015-01-01
Linear mixed-effects models (LMMs) are increasingly being used for data analysis in cognitive neuroscience and experimental psychology, where within-participant designs are common. The current article provides an introductory review of the use of LMMs for within-participant data analysis and describes a free, simple, graphical user interface (LMMgui). LMMgui uses the package lme4 (Bates et al., 2014a,b) in the statistical environment R (R Core Team).
A simple hydrodynamic model of tornado-like vortices
NASA Astrophysics Data System (ADS)
Kurgansky, M. V.
2015-05-01
Based on similarity arguments, a simple fluid dynamic model of tornado-like vortices is offered that, with account for "vortex breakdown" at a certain height above the ground, relates the maximal azimuthal velocity in the vortex, reachable near the ground surface, to the convective available potential energy (CAPE) stored in the environmental atmosphere under pre-tornado conditions. The relative proportion of the helicity (kinetic energy) destruction (dissipation) in the "vortex breakdown" zone and, accordingly, within the surface boundary layer beneath the vortex is evaluated. These considerations form the basis of the dynamic-statistical analysis of the relationship between the tornado intensity and the CAPE budget in the surrounding atmosphere.
NASA Astrophysics Data System (ADS)
Skorobogatiy, Maksim; Sadasivan, Jayesh; Guerboukha, Hichem
2018-05-01
In this paper, we first discuss the main types of noise in a typical pump-probe system, and then focus specifically on terahertz time domain spectroscopy (THz-TDS) setups. We then introduce four statistical models for the noisy pulses obtained in such systems, and detail rigorous mathematical algorithms to de-noise such traces, find the proper averages and characterise various types of experimental noise. Finally, we perform a comparative analysis of the performance, advantages and limitations of the algorithms by testing them on the experimental data collected using a particular THz-TDS system available in our laboratories. We conclude that using advanced statistical models for trace averaging results in the fitting errors that are significantly smaller than those obtained when only a simple statistical average is used.
MAI statistics estimation and analysis in a DS-CDMA system
NASA Astrophysics Data System (ADS)
Alami Hassani, A.; Zouak, M.; Mrabti, M.; Abdi, F.
2018-05-01
A primary limitation of Direct Sequence Code Division Multiple Access DS-CDMA link performance and system capacity is multiple access interference (MAI). To examine the performance of CDMA systems in the presence of MAI, i.e., in a multiuser environment, several works assumed that the interference can be approximated by a Gaussian random variable. In this paper, we first develop a new and simple approach to characterize the MAI in a multiuser system. In addition to statistically quantifying the MAI power, the paper also proposes a statistical model for both variance and mean of the MAI for synchronous and asynchronous CDMA transmission. We show that the MAI probability density function (PDF) is Gaussian for the equal-received-energy case and validate it by computer simulations.
Scaling laws and fluctuations in the statistics of word frequencies
NASA Astrophysics Data System (ADS)
Gerlach, Martin; Altmann, Eduardo G.
2014-11-01
In this paper, we combine statistical analysis of written texts and simple stochastic models to explain the appearance of scaling laws in the statistics of word frequencies. The average vocabulary of an ensemble of fixed-length texts is known to scale sublinearly with the total number of words (Heaps’ law). Analyzing the fluctuations around this average in three large databases (Google-ngram, English Wikipedia, and a collection of scientific articles), we find that the standard deviation scales linearly with the average (Taylor's law), in contrast to the prediction of decaying fluctuations obtained using simple sampling arguments. We explain both scaling laws (Heaps’ and Taylor) by modeling the usage of words using a Poisson process with a fat-tailed distribution of word frequencies (Zipf's law) and topic-dependent frequencies of individual words (as in topic models). Considering topical variations lead to quenched averages, turn the vocabulary size a non-self-averaging quantity, and explain the empirical observations. For the numerous practical applications relying on estimations of vocabulary size, our results show that uncertainties remain large even for long texts. We show how to account for these uncertainties in measurements of lexical richness of texts with different lengths.
Sun, Jianguo; Feng, Yanqin; Zhao, Hui
2015-01-01
Interval-censored failure time data occur in many fields including epidemiological and medical studies as well as financial and sociological studies, and many authors have investigated their analysis (Sun, The statistical analysis of interval-censored failure time data, 2006; Zhang, Stat Modeling 9:321-343, 2009). In particular, a number of procedures have been developed for regression analysis of interval-censored data arising from the proportional hazards model (Finkelstein, Biometrics 42:845-854, 1986; Huang, Ann Stat 24:540-568, 1996; Pan, Biometrics 56:199-203, 2000). For most of these procedures, however, one drawback is that they involve estimation of both regression parameters and baseline cumulative hazard function. In this paper, we propose two simple estimation approaches that do not need estimation of the baseline cumulative hazard function. The asymptotic properties of the resulting estimates are given, and an extensive simulation study is conducted and indicates that they work well for practical situations.
How does the past of a soccer match influence its future? Concepts and statistical analysis.
Heuer, Andreas; Rubner, Oliver
2012-01-01
Scoring goals in a soccer match can be interpreted as a stochastic process. In the most simple description of a soccer match one assumes that scoring goals follows from independent rate processes of both teams. This would imply simple Poissonian and Markovian behavior. Deviations from this behavior would imply that the previous course of the match has an impact on the present match behavior. Here a general framework for the identification of deviations from this behavior is presented. For this endeavor it is essential to formulate an a priori estimate of the expected number of goals per team in a specific match. This can be done based on our previous work on the estimation of team strengths. Furthermore, the well-known general increase of the number of the goals in the course of a soccer match has to be removed by appropriate normalization. In general, three different types of deviations from a simple rate process can exist. First, the goal rate may depend on the exact time of the previous goals. Second, it may be influenced by the time passed since the previous goal and, third, it may reflect the present score. We show that the Poissonian scenario is fulfilled quite well for the German Bundesliga. However, a detailed analysis reveals significant deviations for the second and third aspect. Dramatic effects are observed if the away team leads by one or two goals in the final part of the match. This analysis allows one to identify generic features about soccer matches and to learn about the hidden complexities behind scoring goals. Among others the reason for the fact that the number of draws is larger than statistically expected can be identified.
How Does the Past of a Soccer Match Influence Its Future? Concepts and Statistical Analysis
Heuer, Andreas; Rubner, Oliver
2012-01-01
Scoring goals in a soccer match can be interpreted as a stochastic process. In the most simple description of a soccer match one assumes that scoring goals follows from independent rate processes of both teams. This would imply simple Poissonian and Markovian behavior. Deviations from this behavior would imply that the previous course of the match has an impact on the present match behavior. Here a general framework for the identification of deviations from this behavior is presented. For this endeavor it is essential to formulate an a priori estimate of the expected number of goals per team in a specific match. This can be done based on our previous work on the estimation of team strengths. Furthermore, the well-known general increase of the number of the goals in the course of a soccer match has to be removed by appropriate normalization. In general, three different types of deviations from a simple rate process can exist. First, the goal rate may depend on the exact time of the previous goals. Second, it may be influenced by the time passed since the previous goal and, third, it may reflect the present score. We show that the Poissonian scenario is fulfilled quite well for the German Bundesliga. However, a detailed analysis reveals significant deviations for the second and third aspect. Dramatic effects are observed if the away team leads by one or two goals in the final part of the match. This analysis allows one to identify generic features about soccer matches and to learn about the hidden complexities behind scoring goals. Among others the reason for the fact that the number of draws is larger than statistically expected can be identified. PMID:23226200
The Statistical Consulting Center for Astronomy (SCCA)
NASA Technical Reports Server (NTRS)
Akritas, Michael
2001-01-01
The process by which raw astronomical data acquisition is transformed into scientifically meaningful results and interpretation typically involves many statistical steps. Traditional astronomy limits itself to a narrow range of old and familiar statistical methods: means and standard deviations; least-squares methods like chi(sup 2) minimization; and simple nonparametric procedures such as the Kolmogorov-Smirnov tests. These tools are often inadequate for the complex problems and datasets under investigations, and recent years have witnessed an increased usage of maximum-likelihood, survival analysis, multivariate analysis, wavelet and advanced time-series methods. The Statistical Consulting Center for Astronomy (SCCA) assisted astronomers with the use of sophisticated tools, and to match these tools with specific problems. The SCCA operated with two professors of statistics and a professor of astronomy working together. Questions were received by e-mail, and were discussed in detail with the questioner. Summaries of those questions and answers leading to new approaches were posted on the Web (www.state.psu.edu/ mga/SCCA). In addition to serving individual astronomers, the SCCA established a Web site for general use that provides hypertext links to selected on-line public-domain statistical software and services. The StatCodes site (www.astro.psu.edu/statcodes) provides over 200 links in the areas of: Bayesian statistics; censored and truncated data; correlation and regression, density estimation and smoothing, general statistics packages and information; image analysis; interactive Web tools; multivariate analysis; multivariate clustering and classification; nonparametric analysis; software written by astronomers; spatial statistics; statistical distributions; time series analysis; and visualization tools. StatCodes has received a remarkable high and constant hit rate of 250 hits/week (over 10,000/year) since its inception in mid-1997. It is of interest to scientists both within and outside of astronomy. The most popular sections are multivariate techniques, image analysis, and time series analysis. Hundreds of copies of the ASURV, SLOPES and CENS-TAU codes developed by SCCA scientists were also downloaded from the StatCodes site. In addition to formal SCCA duties, SCCA scientists continued a variety of related activities in astrostatistics, including refereeing of statistically oriented papers submitted to the Astrophysical Journal, talks in meetings including Feigelson's talk to science journalists entitled "The reemergence of astrostatistics" at the American Association for the Advancement of Science meeting, and published papers of astrostatistical content.
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis
Lin, Johnny; Bentler, Peter M.
2012-01-01
Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511
A simple rain attenuation model for earth-space radio links operating at 10-35 GHz
NASA Technical Reports Server (NTRS)
Stutzman, W. L.; Yon, K. M.
1986-01-01
The simple attenuation model has been improved from an earlier version and now includes the effect of wave polarization. The model is for the prediction of rain attenuation statistics on earth-space communication links operating in the 10-35 GHz band. Simple calculations produce attenuation values as a function of average rain rate. These together with rain rate statistics (either measured or predicted) can be used to predict annual rain attenuation statistics. In this paper model predictions are compared to measured data from a data base of 62 experiments performed in the U.S., Europe, and Japan. Comparisons are also made to predictions from other models.
Zhai, Hong Lin; Zhai, Yue Yuan; Li, Pei Zhen; Tian, Yue Li
2013-01-21
A very simple approach to quantitative analysis is proposed based on the technology of digital image processing using three-dimensional (3D) spectra obtained by high-performance liquid chromatography coupled with a diode array detector (HPLC-DAD). As the region-based shape features of a grayscale image, Zernike moments with inherently invariance property were employed to establish the linear quantitative models. This approach was applied to the quantitative analysis of three compounds in mixed samples using 3D HPLC-DAD spectra, and three linear models were obtained, respectively. The correlation coefficients (R(2)) for training and test sets were more than 0.999, and the statistical parameters and strict validation supported the reliability of established models. The analytical results suggest that the Zernike moment selected by stepwise regression can be used in the quantitative analysis of target compounds. Our study provides a new idea for quantitative analysis using 3D spectra, which can be extended to the analysis of other 3D spectra obtained by different methods or instruments.
Yang, Xiao-Hui; Feng, Shi-Ya; Yu, Yang; Liang, Zhou
2018-01-01
This study aims to explore the relationship between the methylation of matrix metalloproteinase (MMP)-9 gene promoter region and diabetic nephropathy (DN) through the detection of the methylation level of MMP-9 gene promoter region in the peripheral blood of patients with DN in different periods and serum MMP-9 concentration. The methylation level of the MMP-9 gene promoter region was detected by methylation-specific polymerase chain reaction (MSP), and the content of MMP-9 in serum was determined by enzyme-linked immunosorbent assay (ELISA). Results of the statistical analysis revealed that serum MMP-9 protein expression levels gradually increased in patients in the simple diabetic group, early diabetic nephropathy group and clinical diabetic nephropathy group, compared with the control group; and the difference was statistically significant (P < 0.05). Compared with the control group, the methylation levels of MMP-9 gene promoter regions gradually decreased in patients in the simple diabetic group, early diabetic nephropathy group, and clinical diabetic nephropathy group; and the difference was statistically significant (P < 0.05). Furthermore, correlation analysis results indicated that the demethylation levels of the MMP-9 gene promoter region was positively correlated with serum protein levels, urinary albumin to creatinine ratio (UACR), urea and creatinine; and was negatively correlated with GFR. The demethylation of the MMP-9 gene promoter region may be involved in the occurrence and development of diabetic nephropathy by regulating the expression of MMP-9 protein in serum.
FY2017 Report on NISC Measurements and Detector Simulations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Andrews, Madison Theresa; Meierbachtol, Krista Cruse; Jordan, Tyler Alexander
FY17 work focused on automation, both of the measurement analysis and comparison of simulations. The experimental apparatus was relocated and weeks of continuous measurements of the spontaneous fission source 252Cf was performed. Programs were developed to automate the conversion of measurements into ROOT data framework files with a simple terminal input. The complete analysis of the measurement (which includes energy calibration and the identification of correlated counts) can now be completed with a documented process which involves one simple execution line as well. Finally, the hurdles of slow MCNP simulations resulting in low simulation statistics have been overcome with themore » generation of multi-run suites which make use of the highperformance computing resources at LANL. Preliminary comparisons of measurements and simulations have been performed and will be the focus of FY18 work.« less
[Development of an Excel spreadsheet for meta-analysis of indirect and mixed treatment comparisons].
Tobías, Aurelio; Catalá-López, Ferrán; Roqué, Marta
2014-01-01
Meta-analyses in clinical research usually aimed to evaluate treatment efficacy and safety in direct comparison with a unique comparator. Indirect comparisons, using the Bucher's method, can summarize primary data when information from direct comparisons is limited or nonexistent. Mixed comparisons allow combining estimates from direct and indirect comparisons, increasing statistical power. There is a need for simple applications for meta-analysis of indirect and mixed comparisons. These can easily be conducted using a Microsoft Office Excel spreadsheet. We developed a spreadsheet for indirect and mixed effects comparisons of friendly use for clinical researchers interested in systematic reviews, but non-familiarized with the use of more advanced statistical packages. The use of the proposed Excel spreadsheet for indirect and mixed comparisons can be of great use in clinical epidemiology to extend the knowledge provided by traditional meta-analysis when evidence from direct comparisons is limited or nonexistent.
Multi-Scale Surface Descriptors
Cipriano, Gregory; Phillips, George N.; Gleicher, Michael
2010-01-01
Local shape descriptors compactly characterize regions of a surface, and have been applied to tasks in visualization, shape matching, and analysis. Classically, curvature has be used as a shape descriptor; however, this differential property characterizes only an infinitesimal neighborhood. In this paper, we provide shape descriptors for surface meshes designed to be multi-scale, that is, capable of characterizing regions of varying size. These descriptors capture statistically the shape of a neighborhood around a central point by fitting a quadratic surface. They therefore mimic differential curvature, are efficient to compute, and encode anisotropy. We show how simple variants of mesh operations can be used to compute the descriptors without resorting to expensive parameterizations, and additionally provide a statistical approximation for reduced computational cost. We show how these descriptors apply to a number of uses in visualization, analysis, and matching of surfaces, particularly to tasks in protein surface analysis. PMID:19834190
Gunter, M.E.; Singleton, E.; Bandli, B.R.; Lowers, H.A.; Meeker, G.P.
2005-01-01
Major-, minor-, and trace-element compositions, as determined by X-ray fluorescence (XRF) analysis, were obtained on 34 samples of vermiculite to ascertain whether chemical differences exist to the extent of determining the source of commercial products. The sample set included ores from four deposits, seven commercially available garden products, and insulation from four attics. The trace-element distributions of Ba, Cr, and V can be used to distinguish the Libby vermiculite samples from the garden products. In general, the overall composition of the Libby and South Carolina deposits appeared similar, but differed from the South Africa and China deposits based on simple statistical methods. Cluster analysis provided a good distinction of the four ore types, grouped the four attic samples with the Libby ore, and, with less certainty, grouped the garden samples with the South Africa ore.
Atmospheric Tracer Inverse Modeling Using Markov Chain Monte Carlo (MCMC)
NASA Astrophysics Data System (ADS)
Kasibhatla, P.
2004-12-01
In recent years, there has been an increasing emphasis on the use of Bayesian statistical estimation techniques to characterize the temporal and spatial variability of atmospheric trace gas sources and sinks. The applications have been varied in terms of the particular species of interest, as well as in terms of the spatial and temporal resolution of the estimated fluxes. However, one common characteristic has been the use of relatively simple statistical models for describing the measurement and chemical transport model error statistics and prior source statistics. For example, multivariate normal probability distribution functions (pdfs) are commonly used to model these quantities and inverse source estimates are derived for fixed values of pdf paramaters. While the advantage of this approach is that closed form analytical solutions for the a posteriori pdfs of interest are available, it is worth exploring Bayesian analysis approaches which allow for a more general treatment of error and prior source statistics. Here, we present an application of the Markov Chain Monte Carlo (MCMC) methodology to an atmospheric tracer inversion problem to demonstrate how more gereral statistical models for errors can be incorporated into the analysis in a relatively straightforward manner. The MCMC approach to Bayesian analysis, which has found wide application in a variety of fields, is a statistical simulation approach that involves computing moments of interest of the a posteriori pdf by efficiently sampling this pdf. The specific inverse problem that we focus on is the annual mean CO2 source/sink estimation problem considered by the TransCom3 project. TransCom3 was a collaborative effort involving various modeling groups and followed a common modeling and analysis protocoal. As such, this problem provides a convenient case study to demonstrate the applicability of the MCMC methodology to atmospheric tracer source/sink estimation problems.
MAGMA: analysis of two-channel microarrays made easy.
Rehrauer, Hubert; Zoller, Stefan; Schlapbach, Ralph
2007-07-01
The web application MAGMA provides a simple and intuitive interface to identify differentially expressed genes from two-channel microarray data. While the underlying algorithms are not superior to those of similar web applications, MAGMA is particularly user friendly and can be used without prior training. The user interface guides the novice user through the most typical microarray analysis workflow consisting of data upload, annotation, normalization and statistical analysis. It automatically generates R-scripts that document MAGMA's entire data processing steps, thereby allowing the user to regenerate all results in his local R installation. The implementation of MAGMA follows the model-view-controller design pattern that strictly separates the R-based statistical data processing, the web-representation and the application logic. This modular design makes the application flexible and easily extendible by experts in one of the fields: statistical microarray analysis, web design or software development. State-of-the-art Java Server Faces technology was used to generate the web interface and to perform user input processing. MAGMA's object-oriented modular framework makes it easily extendible and applicable to other fields and demonstrates that modern Java technology is also suitable for rather small and concise academic projects. MAGMA is freely available at www.magma-fgcz.uzh.ch.
Køppe, Simo; Dammeyer, Jesper
2014-09-01
The evolution of developmental psychology has been characterized by the use of different quantitative and qualitative methods and procedures. But how does the use of methods and procedures change over time? This study explores the change and development of statistical methods used in articles published in Child Development from 1930 to 2010. The methods used in every article in the first issue of every volume were categorized into four categories. Until 1980 relatively simple statistical methods were used. During the last 30 years there has been an explosive use of more advanced statistical methods employed. The absence of statistical methods or use of simple methods had been eliminated.
On some stochastic formulations and related statistical moments of pharmacokinetic models.
Matis, J H; Wehrly, T E; Metzler, C M
1983-02-01
This paper presents the deterministic and stochastic model for a linear compartment system with constant coefficients, and it develops expressions for the mean residence times (MRT) and the variances of the residence times (VRT) for the stochastic model. The expressions are relatively simple computationally, involving primarily matrix inversion, and they are elegant mathematically, in avoiding eigenvalue analysis and the complex domain. The MRT and VRT provide a set of new meaningful response measures for pharmacokinetic analysis and they give added insight into the system kinetics. The new analysis is illustrated with an example involving the cholesterol turnover in rats.
A study of two statistical methods as applied to shuttle solid rocket booster expenditures
NASA Technical Reports Server (NTRS)
Perlmutter, M.; Huang, Y.; Graves, M.
1974-01-01
The state probability technique and the Monte Carlo technique are applied to finding shuttle solid rocket booster expenditure statistics. For a given attrition rate per launch, the probable number of boosters needed for a given mission of 440 launches is calculated. Several cases are considered, including the elimination of the booster after a maximum of 20 consecutive launches. Also considered is the case where the booster is composed of replaceable components with independent attrition rates. A simple cost analysis is carried out to indicate the number of boosters to build initially, depending on booster costs. Two statistical methods were applied in the analysis: (1) state probability method which consists of defining an appropriate state space for the outcome of the random trials, and (2) model simulation method or the Monte Carlo technique. It was found that the model simulation method was easier to formulate while the state probability method required less computing time and was more accurate.
An Adaptive Buddy Check for Observational Quality Control
NASA Technical Reports Server (NTRS)
Dee, Dick P.; Rukhovets, Leonid; Todling, Ricardo; DaSilva, Arlindo M.; Larson, Jay W.; Einaudi, Franco (Technical Monitor)
2000-01-01
An adaptive buddy check algorithm is presented that adjusts tolerances for outlier observations based on the variability of surrounding data. The algorithm derives from a statistical hypothesis test combined with maximum-likelihood covariance estimation. Its stability is shown to depend on the initial identification of outliers by a simple background check. The adaptive feature ensures that the final quality control decisions are not very sensitive to prescribed statistics of first-guess and observation errors, nor on other approximations introduced into the algorithm. The implementation of the algorithm in a global atmospheric data assimilation is described. Its performance is contrasted with that of a non-adaptive buddy check, for the surface analysis of an extreme storm that took place in Europe on 27 December 1999. The adaptive algorithm allowed the inclusion of many important observations that differed greatly from the first guess and that would have been excluded on the basis of prescribed statistics. The analysis of the storm development was much improved as a result of these additional observations.
Westfall, Jacob; Kenny, David A; Judd, Charles M
2014-10-01
Researchers designing experiments in which a sample of participants responds to a sample of stimuli are faced with difficult questions about optimal study design. The conventional procedures of statistical power analysis fail to provide appropriate answers to these questions because they are based on statistical models in which stimuli are not assumed to be a source of random variation in the data, models that are inappropriate for experiments involving crossed random factors of participants and stimuli. In this article, we present new methods of power analysis for designs with crossed random factors, and we give detailed, practical guidance to psychology researchers planning experiments in which a sample of participants responds to a sample of stimuli. We extensively examine 5 commonly used experimental designs, describe how to estimate statistical power in each, and provide power analysis results based on a reasonable set of default parameter values. We then develop general conclusions and formulate rules of thumb concerning the optimal design of experiments in which a sample of participants responds to a sample of stimuli. We show that in crossed designs, statistical power typically does not approach unity as the number of participants goes to infinity but instead approaches a maximum attainable power value that is possibly small, depending on the stimulus sample. We also consider the statistical merits of designs involving multiple stimulus blocks. Finally, we provide a simple and flexible Web-based power application to aid researchers in planning studies with samples of stimuli.
NASA Technical Reports Server (NTRS)
Zimmerman, G. A.; Olsen, E. T.
1992-01-01
Noise power estimation in the High-Resolution Microwave Survey (HRMS) sky survey element is considered as an example of a constant false alarm rate (CFAR) signal detection problem. Order-statistic-based noise power estimators for CFAR detection are considered in terms of required estimator accuracy and estimator dynamic range. By limiting the dynamic range of the value to be estimated, the performance of an order-statistic estimator can be achieved by simpler techniques requiring only a single pass of the data. Simple threshold-and-count techniques are examined, and it is shown how several parallel threshold-and-count estimation devices can be used to expand the dynamic range to meet HRMS system requirements with minimal hardware complexity. An input/output (I/O) efficient limited-precision order-statistic estimator with wide but limited dynamic range is also examined.
Bayesian models based on test statistics for multiple hypothesis testing problems.
Ji, Yuan; Lu, Yiling; Mills, Gordon B
2008-04-01
We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.
Holmium laser enucleation versus laparoscopic simple prostatectomy for large adenomas.
Juaneda, R; Thanigasalam, R; Rizk, J; Perrot, E; Theveniaud, P E; Baumert, H
2016-01-01
The aim of this study is to compare Holmium laser enucleation of the prostate with another minimally invasive technique, the laparoscopic simple prostatectomy. We compared outcomes of a series of 40 patients who underwent laparoscopic simple prostatectomy (n=20) with laser enucleation of the prostate (n=20) for large adenomas (>100 grams) at our institution. Study variables included operative time and catheterization time, hospital stay, pre- and post-operative International Prostate Symptom Score and maximum urinary flow rate, complications and economic evaluation. Statistical analyses were performed using the Student t test and Fisher test. There were no significant differences in patient age, preoperative prostatic size, operating time or specimen weight between the 2 groups. Duration of catheterization (P=.0008) and hospital stay (P<.0001) were significantly less in the laser group. Both groups showed a statistically significant improvement in functional variables at 3 months post operatively. The cost utility analysis for Holmium per case was 2589 euros versus 4706 per laparoscopic case. In the laser arm, 4 patients (20%) experienced complications according to the modified Clavien classification system versus 5 (25%) in the laparoscopic group (P>.99). Holmium enucleation of the prostate has similar short term functional results and complication rates compared to laparoscopic simple prostatectomy performed in large glands with the advantage of less catheterization time, lower economic costs and a reduced hospital stay. Copyright © 2015 AEU. Publicado por Elsevier España, S.L.U. All rights reserved.
Zumrutdal, Emin; Karateke, Faruk; Eser, Pınar Eylem; Turan, Umit; Ozyazici, Sefa; Sozutek, Alper; Gulkaya, Mustafa; Kunt, Mevlut
2016-12-01
We aimed to determine the biochemical and histopathologic effects of direct oxygen supply to the preservation fluid of static cold storage system with a simple method on rat livers. Sixteen rats were randomly divided into 2 groups: the control group, which contained Ringer's lactate as preservation fluid; and the oxygen group, which contained oxygen and Ringer's lactate for preservation. Each liver was placed in a bag containing 50 mL Ringer's lactate and placed in ice-filled storage containers. One hundred percent oxygen supplies were given via a simple, inexpensive system created in our laboratory, to the livers in oxygen group. We obtained samples for histopathologic evaluation in the 12th hour. In addition, 3 mL of preservation fluid was subjected to biochemical analysis at 0, sixth, and twelfth hours. Aspartate aminotransferase, alanine aminotransferase, lactate dehydrogenase, and pH levels were measured from the preservation fluid. In oxygen-supplemented group, the acceleration speed of increase in alanine aminotransferase and lactate dehydrogenase levels at sixth hour and lactate dehydrogenase, alanine aminotransferase, and lactate dehydrogenase levels at 12th hour were statistically significantly reduced. In histopathologic examination, all parameters except ballooning were statistically significantly better in the oxygen-supplemented group. This simple system for oxygenation of liver tissues during static cold storage was shown to be effective with good results in biochemical and histopathologic assessments. Because this is a simple, inexpensive, and easily available method, larger studies are warranted to evaluate its effects (especially in humans).
Nature of Driving Force for Protein Folding-- A Result From Analyzing the Statistical Potential
NASA Astrophysics Data System (ADS)
Li, Hao; Tang, Chao; Wingreen, Ned S.
1998-03-01
In a statistical approach to protein structure analysis, Miyazawa and Jernigan (MJ) derived a 20× 20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the MJ matrix can be accurately reconstructed from its first two principal component vectors as M_ij=C_0+C_1(q_i+q_j)+C2 qi q_j, with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.
Statistical mechanics of broadcast channels using low-density parity-check codes.
Nakamura, Kazutaka; Kabashima, Yoshiyuki; Morelos-Zaragoza, Robert; Saad, David
2003-03-01
We investigate the use of Gallager's low-density parity-check (LDPC) codes in a degraded broadcast channel, one of the fundamental models in network information theory. Combining linear codes is a standard technique in practical network communication schemes and is known to provide better performance than simple time sharing methods when algebraic codes are used. The statistical physics based analysis shows that the practical performance of the suggested method, achieved by employing the belief propagation algorithm, is superior to that of LDPC based time sharing codes while the best performance, when received transmissions are optimally decoded, is bounded by the time sharing limit.
Entropy in sound and vibration: towards a new paradigm.
Le Bot, A
2017-01-01
This paper describes a discussion on the method and the status of a statistical theory of sound and vibration, called statistical energy analysis (SEA). SEA is a simple theory of sound and vibration in elastic structures that applies when the vibrational energy is diffusely distributed. We show that SEA is a thermodynamical theory of sound and vibration, based on a law of exchange of energy analogous to the Clausius principle. We further investigate the notion of entropy in this context and discuss its meaning. We show that entropy is a measure of information lost in the passage from the classical theory of sound and vibration and SEA, its thermodynamical counterpart.
Haranas, Ioannis; Gkigkitzis, Ioannis; Kotsireas, Ilias; Austerlitz, Carlos
2017-01-01
Understanding how the brain encodes information and performs computation requires statistical and functional analysis. Given the complexity of the human brain, simple methods that facilitate the interpretation of statistical correlations among different brain regions can be very useful. In this report we introduce a numerical correlation measure that may serve the interpretation of correlational neuronal data, and may assist in the evaluation of different brain states. The description of the dynamical brain system, through a global numerical measure may indicate the presence of an action principle which may facilitate a application of physics principles in the study of the human brain and cognition.
NASA Technical Reports Server (NTRS)
Abbey, Craig K.; Eckstein, Miguel P.
2002-01-01
We consider estimation and statistical hypothesis testing on classification images obtained from the two-alternative forced-choice experimental paradigm. We begin with a probabilistic model of task performance for simple forced-choice detection and discrimination tasks. Particular attention is paid to general linear filter models because these models lead to a direct interpretation of the classification image as an estimate of the filter weights. We then describe an estimation procedure for obtaining classification images from observer data. A number of statistical tests are presented for testing various hypotheses from classification images based on some more compact set of features derived from them. As an example of how the methods we describe can be used, we present a case study investigating detection of a Gaussian bump profile.
Statistical Analysis of CFD Solutions from the Drag Prediction Workshop
NASA Technical Reports Server (NTRS)
Hemsch, Michael J.
2002-01-01
A simple, graphical framework is presented for robust statistical evaluation of results obtained from N-Version testing of a series of RANS CFD codes. The solutions were obtained by a variety of code developers and users for the June 2001 Drag Prediction Workshop sponsored by the AIAA Applied Aerodynamics Technical Committee. The aerodynamic configuration used for the computational tests is the DLR-F4 wing-body combination previously tested in several European wind tunnels and for which a previous N-Version test had been conducted. The statistical framework is used to evaluate code results for (1) a single cruise design point, (2) drag polars and (3) drag rise. The paper concludes with a discussion of the meaning of the results, especially with respect to predictability, Validation, and reporting of solutions.
Dyess, Susan Mac Leod; Prestia, Angela S; Marquit, Doren-Elyse; Newman, David
2018-03-01
Acute care practice settings are stressful. Nurse leaders face stressful demands of numerous competing priorities. Some nurse leaders experience unmanageable stress, but success requires self-care. This article presents a repeated measures intervention design study using mixed methods to investigate a self-care simple meditation practice for nurse leaders. Themes and subthemes emerged in association with the three data collection points: at baseline (pretest), after 6 weeks, and after 12 weeks (posttest) from introduction of the self-care simple meditation practice. An analysis of variance yielded a statistically significant drop in perceived stress at 6 weeks and again at 12 weeks. Conducting future research is merited.
NASA Astrophysics Data System (ADS)
Andersson, C. David; Hillgren, J. Mikael; Lindgren, Cecilia; Qian, Weixing; Akfur, Christine; Berg, Lotta; Ekström, Fredrik; Linusson, Anna
2015-03-01
Scientific disciplines such as medicinal- and environmental chemistry, pharmacology, and toxicology deal with the questions related to the effects small organic compounds exhort on biological targets and the compounds' physicochemical properties responsible for these effects. A common strategy in this endeavor is to establish structure-activity relationships (SARs). The aim of this work was to illustrate benefits of performing a statistical molecular design (SMD) and proper statistical analysis of the molecules' properties before SAR and quantitative structure-activity relationship (QSAR) analysis. Our SMD followed by synthesis yielded a set of inhibitors of the enzyme acetylcholinesterase (AChE) that had very few inherent dependencies between the substructures in the molecules. If such dependencies exist, they cause severe errors in SAR interpretation and predictions by QSAR-models, and leave a set of molecules less suitable for future decision-making. In our study, SAR- and QSAR models could show which molecular sub-structures and physicochemical features that were advantageous for the AChE inhibition. Finally, the QSAR model was used for the prediction of the inhibition of AChE by an external prediction set of molecules. The accuracy of these predictions was asserted by statistical significance tests and by comparisons to simple but relevant reference models.
Indiana chronic disease management program risk stratification analysis.
Li, Jingjin; Holmes, Ann M; Rosenman, Marc B; Katz, Barry P; Downs, Stephen M; Murray, Michael D; Ackermann, Ronald T; Inui, Thomas S
2005-10-01
The objective of this study was to compare the ability of risk stratification models derived from administrative data to classify groups of patients for enrollment in a tailored chronic disease management program. This study included 19,548 Medicaid patients with chronic heart failure or diabetes in the Indiana Medicaid data warehouse during 2001 and 2002. To predict costs (total claims paid) in FY 2002, we considered candidate predictor variables available in FY 2001, including patient characteristics, the number and type of prescription medications, laboratory tests, pharmacy charges, and utilization of primary, specialty, inpatient, emergency department, nursing home, and home health care. We built prospective models to identify patients with different levels of expenditure. Model fit was assessed using R statistics, whereas discrimination was assessed using the weighted kappa statistic, predictive ratios, and the area under the receiver operating characteristic curve. We found a simple least-squares regression model in which logged total charges in FY 2002 were regressed on the log of total charges in FY 2001, the number of prescriptions filled in FY 2001, and the FY 2001 eligibility category, performed as well as more complex models. This simple 3-parameter model had an R of 0.30 and, in terms in classification efficiency, had a sensitivity of 0.57, a specificity of 0.90, an area under the receiver operator curve of 0.80, and a weighted kappa statistic of 0.51. This simple model based on readily available administrative data stratified Medicaid members according to predicted future utilization as well as more complicated models.
Wittenberg, Philipp; Gan, Fah Fatt; Knoth, Sven
2018-04-17
The variable life-adjusted display (VLAD) is the first risk-adjusted graphical procedure proposed in the literature for monitoring the performance of a surgeon. It displays the cumulative sum of expected minus observed deaths. It has since become highly popular because the statistic plotted is easy to understand. But it is also easy to misinterpret a surgeon's performance by utilizing the VLAD, potentially leading to grave consequences. The problem of misinterpretation is essentially caused by the variance of the VLAD's statistic that increases with sample size. In order for the VLAD to be truly useful, a simple signaling rule is desperately needed. Various forms of signaling rules have been developed, but they are usually quite complicated. Without signaling rules, making inferences using the VLAD alone is difficult if not misleading. In this paper, we establish an equivalence between a VLAD with V-mask and a risk-adjusted cumulative sum (RA-CUSUM) chart based on the difference between the estimated probability of death and surgical outcome. Average run length analysis based on simulation shows that this particular RA-CUSUM chart has similar performance as compared to the established RA-CUSUM chart based on the log-likelihood ratio statistic obtained by testing the odds ratio of death. We provide a simple design procedure for determining the V-mask parameters based on a resampling approach. Resampling from a real data set ensures that these parameters can be estimated appropriately. Finally, we illustrate the monitoring of a real surgeon's performance using VLAD with V-mask. Copyright © 2018 John Wiley & Sons, Ltd.
The Problem of Auto-Correlation in Parasitology
Pollitt, Laura C.; Reece, Sarah E.; Mideo, Nicole; Nussey, Daniel H.; Colegrave, Nick
2012-01-01
Explaining the contribution of host and pathogen factors in driving infection dynamics is a major ambition in parasitology. There is increasing recognition that analyses based on single summary measures of an infection (e.g., peak parasitaemia) do not adequately capture infection dynamics and so, the appropriate use of statistical techniques to analyse dynamics is necessary to understand infections and, ultimately, control parasites. However, the complexities of within-host environments mean that tracking and analysing pathogen dynamics within infections and among hosts poses considerable statistical challenges. Simple statistical models make assumptions that will rarely be satisfied in data collected on host and parasite parameters. In particular, model residuals (unexplained variance in the data) should not be correlated in time or space. Here we demonstrate how failure to account for such correlations can result in incorrect biological inference from statistical analysis. We then show how mixed effects models can be used as a powerful tool to analyse such repeated measures data in the hope that this will encourage better statistical practices in parasitology. PMID:22511865
D'Agostino, M F; Sanz, J; Martínez-Castro, I; Giuffrè, A M; Sicari, V; Soria, A C
2014-07-01
Statistical analysis has been used for the first time to evaluate the dispersion of quantitative data in the solid-phase microextraction (SPME) followed by gas chromatography-mass spectrometry (GC-MS) analysis of blackberry (Rubus ulmifolius Schott) volatiles with the aim of improving their precision. Experimental and randomly simulated data were compared using different statistical parameters (correlation coefficients, Principal Component Analysis loadings and eigenvalues). Non-random factors were shown to significantly contribute to total dispersion; groups of volatile compounds could be associated with these factors. A significant improvement of precision was achieved when considering percent concentration ratios, rather than percent values, among those blackberry volatiles with a similar dispersion behavior. As novelty over previous references, and to complement this main objective, the presence of non-random dispersion trends in data from simple blackberry model systems was evidenced. Although the influence of the type of matrix on data precision was proved, the possibility of a better understanding of the dispersion patterns in real samples was not possible from model systems. The approach here used was validated for the first time through the multicomponent characterization of Italian blackberries from different harvest years. Copyright © 2014 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Hacker, Joshua; Vandenberghe, Francois; Jung, Byoung-Jo; Snyder, Chris
2017-04-01
Effective assimilation of cloud-affected radiance observations from space-borne imagers, with the aim of improving cloud analysis and forecasting, has proven to be difficult. Large observation biases, nonlinear observation operators, and non-Gaussian innovation statistics present many challenges. Ensemble-variational data assimilation (EnVar) systems offer the benefits of flow-dependent background error statistics from an ensemble, and the ability of variational minimization to handle nonlinearity. The specific benefits of ensemble statistics, relative to static background errors more commonly used in variational systems, have not been quantified for the problem of assimilating cloudy radiances. A simple experiment framework is constructed with a regional NWP model and operational variational data assimilation system, to provide the basis understanding the importance of ensemble statistics in cloudy radiance assimilation. Restricting the observations to those corresponding to clouds in the background forecast leads to innovations that are more Gaussian. The number of large innovations is reduced compared to the more general case of all observations, but not eliminated. The Huber norm is investigated to handle the fat tails of the distributions, and allow more observations to be assimilated without the need for strict background checks that eliminate them. Comparing assimilation using only ensemble background error statistics with assimilation using only static background error statistics elucidates the importance of the ensemble statistics. Although the cost functions in both experiments converge to similar values after sufficient outer-loop iterations, the resulting cloud water, ice, and snow content are greater in the ensemble-based analysis. The subsequent forecasts from the ensemble-based analysis also retain more condensed water species, indicating that the local environment is more supportive of clouds. In this presentation we provide details that explain the apparent benefit from using ensembles for cloudy radiance assimilation in an EnVar context.
Chan, Y; Walmsley, R P
1997-12-01
When several treatment methods are available for the same problem, many clinicians are faced with the task of deciding which treatment to use. Many clinicians may have conducted informal "mini-experiments" on their own to determine which treatment is best suited for the problem. These results are usually not documented or reported in a formal manner because many clinicians feel that they are "statistically challenged." Another reason may be because clinicians do not feel they have controlled enough test conditions to warrant analysis. In this update, a statistic is described that does not involve complicated statistical assumptions, making it a simple and easy-to-use statistical method. This update examines the use of two statistics and does not deal with other issues that could affect clinical research such as issues affecting credibility. For readers who want a more in-depth examination of this topic, references have been provided. The Kruskal-Wallis one-way analysis-of-variance-by-ranks test (or H test) is used to determine whether three or more independent groups are the same or different on some variable of interest when an ordinal level of data or an interval or ratio level of data is available. A hypothetical example will be presented to explain when and how to use this statistic, how to interpret results using the statistic, the advantages and disadvantages of the statistic, and what to look for in a written report. This hypothetical example will involve the use of ratio data to demonstrate how to choose between using the nonparametric H test and the more powerful parametric F test.
General Blending Models for Data From Mixture Experiments
Brown, L.; Donev, A. N.; Bissett, A. C.
2015-01-01
We propose a new class of models providing a powerful unification and extension of existing statistical methodology for analysis of data obtained in mixture experiments. These models, which integrate models proposed by Scheffé and Becker, extend considerably the range of mixture component effects that may be described. They become complex when the studied phenomenon requires it, but remain simple whenever possible. This article has supplementary material online. PMID:26681812
Magnetorotational dynamo chimeras. The missing link to turbulent accretion disk dynamo models?
NASA Astrophysics Data System (ADS)
Riols, A.; Rincon, F.; Cossu, C.; Lesur, G.; Ogilvie, G. I.; Longaretti, P.-Y.
2017-02-01
In Keplerian accretion disks, turbulence and magnetic fields may be jointly excited through a subcritical dynamo mechanisminvolving magnetorotational instability (MRI). This dynamo may notably contribute to explaining the time-variability of various accreting systems, as high-resolution simulations of MRI dynamo turbulence exhibit statistical self-organization into large-scale cyclic dynamics. However, understanding the physics underlying these statistical states and assessing their exact astrophysical relevance is theoretically challenging. The study of simple periodic nonlinear MRI dynamo solutions has recently proven useful in this respect, and has highlighted the role of turbulent magnetic diffusion in the seeming impossibility of a dynamo at low magnetic Prandtl number (Pm), a common regime in disks. Arguably though, these simple laminar structures may not be fully representative of the complex, statistically self-organized states expected in astrophysical regimes. Here, we aim at closing this seeming discrepancy by reporting the numerical discovery of exactly periodic, yet semi-statistical "chimeral MRI dynamo states" which are the organized outcome of a succession of MRI-unstable, non-axisymmetric dynamical stages of different forms and amplitudes. Interestingly, these states, while reminiscent of the statistical complexity of turbulent simulations, involve the same physical principles as simpler laminar cycles, and their analysis further confirms the theory that subcritical turbulent magnetic diffusion impedes the sustainment of an MRI dynamo at low Pm. Overall, chimera dynamo cycles therefore offer an unprecedented dual physical and statistical perspective on dynamos in rotating shear flows, which may prove useful in devising more accurate, yet intuitive mean-field models of time-dependent turbulent disk dynamos. Movies associated to Fig. 1 are available at http://www.aanda.org
WASP (Write a Scientific Paper) using Excel - 1: Data entry and validation.
Grech, Victor
2018-02-01
Data collection for the purposes of analysis, after the planning and execution of a research study, commences with data input and validation. The process of data entry and analysis may appear daunting to the uninitiated, but as pointed out in the 1970s in a series of papers by British Medical Journal Deputy Editor TDV Swinscow, modern hardware and software (he was then referring to the availability of hand calculators) permits the performance of statistical testing outside a computer laboratory. In this day and age, modern software, such as the ubiquitous and almost universally familiar Microsoft Excel™ greatly facilitates this process. This first paper comprises the first of a collection of papers which will emulate Swinscow's series, in his own words, "addressed to readers who want to start at the beginning, not to those who are already skilled statisticians." These papers will have less focus on the actual arithmetic, and more emphasis on how to actually implement simple statistics, step by step, using Excel, thereby constituting the equivalent of Swinscow's papers in the personal computer age. Data entry can be facilitated by several underutilised features in Excel. This paper will explain Excel's little-known form function, data validation implementation at input stage, simple coding tips and data cleaning tools. Copyright © 2018 Elsevier B.V. All rights reserved.
Nomogram for sample size calculation on a straightforward basis for the kappa statistic.
Hong, Hyunsook; Choi, Yunhee; Hahn, Seokyung; Park, Sue Kyung; Park, Byung-Joo
2014-09-01
Kappa is a widely used measure of agreement. However, it may not be straightforward in some situation such as sample size calculation due to the kappa paradox: high agreement but low kappa. Hence, it seems reasonable in sample size calculation that the level of agreement under a certain marginal prevalence is considered in terms of a simple proportion of agreement rather than a kappa value. Therefore, sample size formulae and nomograms using a simple proportion of agreement rather than a kappa under certain marginal prevalences are proposed. A sample size formula was derived using the kappa statistic under the common correlation model and goodness-of-fit statistic. The nomogram for the sample size formula was developed using SAS 9.3. The sample size formulae using a simple proportion of agreement instead of a kappa statistic and nomograms to eliminate the inconvenience of using a mathematical formula were produced. A nomogram for sample size calculation with a simple proportion of agreement should be useful in the planning stages when the focus of interest is on testing the hypothesis of interobserver agreement involving two raters and nominal outcome measures. Copyright © 2014 Elsevier Inc. All rights reserved.
Van Bockstaele, Femke; Janssens, Ann; Piette, Anne; Callewaert, Filip; Pede, Valerie; Offner, Fritz; Verhasselt, Bruno; Philippé, Jan
2006-07-15
ZAP-70 has been proposed as a surrogate marker for immunoglobulin heavy-chain variable region (IgV(H)) mutation status, which is known as a prognostic marker in B-cell chronic lymphocytic leukemia (CLL). The flow cytometric analysis of ZAP-70 suffers from difficulties in standardization and interpretation. We applied the Kolmogorov-Smirnov (KS) statistical test to make analysis more straightforward. We examined ZAP-70 expression by flow cytometry in 53 patients with CLL. Analysis was performed as initially described by Crespo et al. (New England J Med 2003; 348:1764-1775) and alternatively by application of the KS statistical test comparing T cells with B cells. Receiver-operating-characteristics (ROC)-curve analyses were performed to determine the optimal cut-off values for ZAP-70 measured by the two approaches. ZAP-70 protein expression was compared with ZAP-70 mRNA expression measured by a quantitative PCR (qPCR) and with the IgV(H) mutation status. Both flow cytometric analyses correlated well with the molecular technique and proved to be of equal value in predicting the IgV(H) mutation status. Applying the KS test is reproducible, simple, straightforward, and overcomes a number of difficulties encountered in the Crespo-method. The KS statistical test is an essential part of the software delivered with modern routine analytical flow cytometers and is well suited for analysis of ZAP-70 expression in CLL. (c) 2006 International Society for Analytical Cytology.
Aggregative Learning Method and Its Application for Communication Quality Evaluation
NASA Astrophysics Data System (ADS)
Akhmetov, Dauren F.; Kotaki, Minoru
2007-12-01
In this paper, so-called Aggregative Learning Method (ALM) is proposed to improve and simplify the learning and classification abilities of different data processing systems. It provides a universal basis for design and analysis of mathematical models of wide class. A procedure was elaborated for time series model reconstruction and analysis for linear and nonlinear cases. Data approximation accuracy (during learning phase) and data classification quality (during recall phase) are estimated from introduced statistic parameters. The validity and efficiency of the proposed approach have been demonstrated through its application for monitoring of wireless communication quality, namely, for Fixed Wireless Access (FWA) system. Low memory and computation resources were shown to be needed for the procedure realization, especially for data classification (recall) stage. Characterized with high computational efficiency and simple decision making procedure, the derived approaches can be useful for simple and reliable real-time surveillance and control system design.
Bojić, Mirza; Simon Haas, Vicente; Maleš, Željan
2013-01-01
Raw material, different formulations of foods, and dietary supplements of mate demands control of the content of bioactive substances for which high performance thin layer chromatography (TLC), described here, presents simple and rapid approach for detections as well as quantification. Using TLC densitometry, the following bioactive compounds were identified and quantified: chlorogenic acid (2.1 mg/g), caffeic acid (1.5 mg/g), rutin (5.2 mg/g), quercetin (2.2 mg/g), and kaempferol (4.5 mg/g). The results obtained with TLC densitometry for caffeine (5.4 mg/g) and theobromine (2.7 mg/g) show no statistical difference to the content of total xanthines (7.6 mg/g) obtained by UV-Vis spectrophotometry. Thus, TLC remains a technique of choice for simple and rapid analysis of great number of samples as well as a primary screening technique in plant analysis. PMID:23841023
Automatic Generation of Algorithms for the Statistical Analysis of Planetary Nebulae Images
NASA Technical Reports Server (NTRS)
Fischer, Bernd
2004-01-01
Analyzing data sets collected in experiments or by observations is a Core scientific activity. Typically, experimentd and observational data are &aught with uncertainty, and the analysis is based on a statistical model of the conjectured underlying processes, The large data volumes collected by modern instruments make computer support indispensible for this. Consequently, scientists spend significant amounts of their time with the development and refinement of the data analysis programs. AutoBayes [GF+02, FS03] is a fully automatic synthesis system for generating statistical data analysis programs. Externally, it looks like a compiler: it takes an abstract problem specification and translates it into executable code. Its input is a concise description of a data analysis problem in the form of a statistical model as shown in Figure 1; its output is optimized and fully documented C/C++ code which can be linked dynamically into the Matlab and Octave environments. Internally, however, it is quite different: AutoBayes derives a customized algorithm implementing the given model using a schema-based process, and then further refines and optimizes the algorithm into code. A schema is a parameterized code template with associated semantic constraints which define and restrict the template s applicability. The schema parameters are instantiated in a problem-specific way during synthesis as AutoBayes checks the constraints against the original model or, recursively, against emerging sub-problems. AutoBayes schema library contains problem decomposition operators (which are justified by theorems in a formal logic in the domain of Bayesian networks) as well as machine learning algorithms (e.g., EM, k-Means) and nu- meric optimization methods (e.g., Nelder-Mead simplex, conjugate gradient). AutoBayes augments this schema-based approach by symbolic computation to derive closed-form solutions whenever possible. This is a major advantage over other statistical data analysis systems which use numerical approximations even in cases where closed-form solutions exist. AutoBayes is implemented in Prolog and comprises approximately 75.000 lines of code. In this paper, we take one typical scientific data analysis problem-analyzing planetary nebulae images taken by the Hubble Space Telescope-and show how AutoBayes can be used to automate the implementation of the necessary anal- ysis programs. We initially follow the analysis described by Knuth and Hajian [KHO2] and use AutoBayes to derive code for the published models. We show the details of the code derivation process, including the symbolic computations and automatic integration of library procedures, and compare the results of the automatically generated and manually implemented code. We then go beyond the original analysis and use AutoBayes to derive code for a simple image segmentation procedure based on a mixture model which can be used to automate a manual preproceesing step. Finally, we combine the original approach with the simple segmentation which yields a more detailed analysis. This also demonstrates that AutoBayes makes it easy to combine different aspects of data analysis.
NASA Technical Reports Server (NTRS)
Brown, Andrew M.; Ferri, Aldo A.
1995-01-01
Standard methods of structural dynamic analysis assume that the structural characteristics are deterministic. Recognizing that these characteristics are actually statistical in nature, researchers have recently developed a variety of methods that use this information to determine probabilities of a desired response characteristic, such as natural frequency, without using expensive Monte Carlo simulations. One of the problems in these methods is correctly identifying the statistical properties of primitive variables such as geometry, stiffness, and mass. This paper presents a method where the measured dynamic properties of substructures are used instead as the random variables. The residual flexibility method of component mode synthesis is combined with the probabilistic methods to determine the cumulative distribution function of the system eigenvalues. A simple cantilever beam test problem is presented that illustrates the theory.
NASA Astrophysics Data System (ADS)
Alsing, Justin; Wandelt, Benjamin; Feeney, Stephen
2018-07-01
Many statistical models in cosmology can be simulated forwards but have intractable likelihood functions. Likelihood-free inference methods allow us to perform Bayesian inference from these models using only forward simulations, free from any likelihood assumptions or approximations. Likelihood-free inference generically involves simulating mock data and comparing to the observed data; this comparison in data space suffers from the curse of dimensionality and requires compression of the data to a small number of summary statistics to be tractable. In this paper, we use massive asymptotically optimal data compression to reduce the dimensionality of the data space to just one number per parameter, providing a natural and optimal framework for summary statistic choice for likelihood-free inference. Secondly, we present the first cosmological application of Density Estimation Likelihood-Free Inference (DELFI), which learns a parametrized model for joint distribution of data and parameters, yielding both the parameter posterior and the model evidence. This approach is conceptually simple, requires less tuning than traditional Approximate Bayesian Computation approaches to likelihood-free inference and can give high-fidelity posteriors from orders of magnitude fewer forward simulations. As an additional bonus, it enables parameter inference and Bayesian model comparison simultaneously. We demonstrate DELFI with massive data compression on an analysis of the joint light-curve analysis supernova data, as a simple validation case study. We show that high-fidelity posterior inference is possible for full-scale cosmological data analyses with as few as ˜104 simulations, with substantial scope for further improvement, demonstrating the scalability of likelihood-free inference to large and complex cosmological data sets.
Ma, Junshui; Wang, Shubing; Raubertas, Richard; Svetnik, Vladimir
2010-07-15
With the increasing popularity of using electroencephalography (EEG) to reveal the treatment effect in drug development clinical trials, the vast volume and complex nature of EEG data compose an intriguing, but challenging, topic. In this paper the statistical analysis methods recommended by the EEG community, along with methods frequently used in the published literature, are first reviewed. A straightforward adjustment of the existing methods to handle multichannel EEG data is then introduced. In addition, based on the spatial smoothness property of EEG data, a new category of statistical methods is proposed. The new methods use a linear combination of low-degree spherical harmonic (SPHARM) basis functions to represent a spatially smoothed version of the EEG data on the scalp, which is close to a sphere in shape. In total, seven statistical methods, including both the existing and the newly proposed methods, are applied to two clinical datasets to compare their power to detect a drug effect. Contrary to the EEG community's recommendation, our results suggest that (1) the nonparametric method does not outperform its parametric counterpart; and (2) including baseline data in the analysis does not always improve the statistical power. In addition, our results recommend that (3) simple paired statistical tests should be avoided due to their poor power; and (4) the proposed spatially smoothed methods perform better than their unsmoothed versions. Copyright 2010 Elsevier B.V. All rights reserved.
Markov Logic Networks in the Analysis of Genetic Data
Sakhanenko, Nikita A.
2010-01-01
Abstract Complex, non-additive genetic interactions are common and can be critical in determining phenotypes. Genome-wide association studies (GWAS) and similar statistical studies of linkage data, however, assume additive models of gene interactions in looking for genotype-phenotype associations. These statistical methods view the compound effects of multiple genes on a phenotype as a sum of influences of each gene and often miss a substantial part of the heritable effect. Such methods do not use any biological knowledge about underlying mechanisms. Modeling approaches from the artificial intelligence (AI) field that incorporate deterministic knowledge into models to perform statistical analysis can be applied to include prior knowledge in genetic analysis. We chose to use the most general such approach, Markov Logic Networks (MLNs), for combining deterministic knowledge with statistical analysis. Using simple, logistic regression-type MLNs we can replicate the results of traditional statistical methods, but we also show that we are able to go beyond finding independent markers linked to a phenotype by using joint inference without an independence assumption. The method is applied to genetic data on yeast sporulation, a complex phenotype with gene interactions. In addition to detecting all of the previously identified loci associated with sporulation, our method identifies four loci with smaller effects. Since their effect on sporulation is small, these four loci were not detected with methods that do not account for dependence between markers due to gene interactions. We show how gene interactions can be detected using more complex models, which can be used as a general framework for incorporating systems biology with genetics. PMID:20958249
Role of diversity in ICA and IVA: theory and applications
NASA Astrophysics Data System (ADS)
Adalı, Tülay
2016-05-01
Independent component analysis (ICA) has been the most popular approach for solving the blind source separation problem. Starting from a simple linear mixing model and the assumption of statistical independence, ICA can recover a set of linearly-mixed sources to within a scaling and permutation ambiguity. It has been successfully applied to numerous data analysis problems in areas as diverse as biomedicine, communications, finance, geo- physics, and remote sensing. ICA can be achieved using different types of diversity—statistical property—and, can be posed to simultaneously account for multiple types of diversity such as higher-order-statistics, sample dependence, non-circularity, and nonstationarity. A recent generalization of ICA, independent vector analysis (IVA), generalizes ICA to multiple data sets and adds the use of one more type of diversity, statistical dependence across the data sets, for jointly achieving independent decomposition of multiple data sets. With the addition of each new diversity type, identification of a broader class of signals become possible, and in the case of IVA, this includes sources that are independent and identically distributed Gaussians. We review the fundamentals and properties of ICA and IVA when multiple types of diversity are taken into account, and then ask the question whether diversity plays an important role in practical applications as well. Examples from various domains are presented to demonstrate that in many scenarios it might be worthwhile to jointly account for multiple statistical properties. This paper is submitted in conjunction with the talk delivered for the "Unsupervised Learning and ICA Pioneer Award" at the 2016 SPIE Conference on Sensing and Analysis Technologies for Biomedical and Cognitive Applications.
LADES: a software for constructing and analyzing longitudinal designs in biomedical research.
Vázquez-Alcocer, Alan; Garzón-Cortes, Daniel Ladislao; Sánchez-Casas, Rosa María
2014-01-01
One of the most important steps in biomedical longitudinal studies is choosing a good experimental design that can provide high accuracy in the analysis of results with a minimum sample size. Several methods for constructing efficient longitudinal designs have been developed based on power analysis and the statistical model used for analyzing the final results. However, development of this technology is not available to practitioners through user-friendly software. In this paper we introduce LADES (Longitudinal Analysis and Design of Experiments Software) as an alternative and easy-to-use tool for conducting longitudinal analysis and constructing efficient longitudinal designs. LADES incorporates methods for creating cost-efficient longitudinal designs, unequal longitudinal designs, and simple longitudinal designs. In addition, LADES includes different methods for analyzing longitudinal data such as linear mixed models, generalized estimating equations, among others. A study of European eels is reanalyzed in order to show LADES capabilities. Three treatments contained in three aquariums with five eels each were analyzed. Data were collected from 0 up to the 12th week post treatment for all the eels (complete design). The response under evaluation is sperm volume. A linear mixed model was fitted to the results using LADES. The complete design had a power of 88.7% using 15 eels. With LADES we propose the use of an unequal design with only 14 eels and 89.5% efficiency. LADES was developed as a powerful and simple tool to promote the use of statistical methods for analyzing and creating longitudinal experiments in biomedical research.
Valid statistical approaches for analyzing sholl data: Mixed effects versus simple linear models.
Wilson, Machelle D; Sethi, Sunjay; Lein, Pamela J; Keil, Kimberly P
2017-03-01
The Sholl technique is widely used to quantify dendritic morphology. Data from such studies, which typically sample multiple neurons per animal, are often analyzed using simple linear models. However, simple linear models fail to account for intra-class correlation that occurs with clustered data, which can lead to faulty inferences. Mixed effects models account for intra-class correlation that occurs with clustered data; thus, these models more accurately estimate the standard deviation of the parameter estimate, which produces more accurate p-values. While mixed models are not new, their use in neuroscience has lagged behind their use in other disciplines. A review of the published literature illustrates common mistakes in analyses of Sholl data. Analysis of Sholl data collected from Golgi-stained pyramidal neurons in the hippocampus of male and female mice using both simple linear and mixed effects models demonstrates that the p-values and standard deviations obtained using the simple linear models are biased downwards and lead to erroneous rejection of the null hypothesis in some analyses. The mixed effects approach more accurately models the true variability in the data set, which leads to correct inference. Mixed effects models avoid faulty inference in Sholl analysis of data sampled from multiple neurons per animal by accounting for intra-class correlation. Given the widespread practice in neuroscience of obtaining multiple measurements per subject, there is a critical need to apply mixed effects models more widely. Copyright © 2017 Elsevier B.V. All rights reserved.
Jager, Tjalling
2013-02-05
The individuals of a species are not equal. These differences frustrate experimental biologists and ecotoxicologists who wish to study the response of a species (in general) to a treatment. In the analysis of data, differences between model predictions and observations on individual animals are usually treated as random measurement error around the true response. These deviations, however, are mainly caused by real differences between the individuals (e.g., differences in physiology and in initial conditions). Understanding these intraspecies differences, and accounting for them in the data analysis, will improve our understanding of the response to the treatment we are investigating and allow for a more powerful, less biased, statistical analysis. Here, I explore a basic scheme for statistical inference to estimate parameters governing stress that allows individuals to differ in their basic physiology. This scheme is illustrated using a simple toxicokinetic-toxicodynamic model and a data set for growth of the springtail Folsomia candida exposed to cadmium in food. This article should be seen as proof of concept; a first step in bringing more realism into the statistical inference for process-based models in ecotoxicology.
Applications of Ergodic Theory to Coverage Analysis
NASA Technical Reports Server (NTRS)
Lo, Martin W.
2003-01-01
The study of differential equations, or dynamical systems in general, has two fundamentally different approaches. We are most familiar with the construction of solutions to differential equations. Another approach is to study the statistical behavior of the solutions. Ergodic Theory is one of the most developed methods to study the statistical behavior of the solutions of differential equations. In the theory of satellite orbits, the statistical behavior of the orbits is used to produce 'Coverage Analysis' or how often a spacecraft is in view of a site on the ground. In this paper, we consider the use of Ergodic Theory for Coverage Analysis. This allows us to greatly simplify the computation of quantities such as the total time for which a ground station can see a satellite without ever integrating the trajectory, see Lo 1,2. More over, for any quantity which is an integrable function of the ground track, its average may be computed similarly without the integration of the trajectory. For example, the data rate for a simple telecom system is a function of the distance between the satellite and the ground station. We show that such a function may be averaged using the Ergodic Theorem.
Statistical primer: how to deal with missing data in scientific research?
Papageorgiou, Grigorios; Grant, Stuart W; Takkenberg, Johanna J M; Mokhles, Mostafa M
2018-05-10
Missing data are a common challenge encountered in research which can compromise the results of statistical inference when not handled appropriately. This paper aims to introduce basic concepts of missing data to a non-statistical audience, list and compare some of the most popular approaches for handling missing data in practice and provide guidelines and recommendations for dealing with and reporting missing data in scientific research. Complete case analysis and single imputation are simple approaches for handling missing data and are popular in practice, however, in most cases they are not guaranteed to provide valid inferences. Multiple imputation is a robust and general alternative which is appropriate for data missing at random, surpassing the disadvantages of the simpler approaches, but should always be conducted with care. The aforementioned approaches are illustrated and compared in an example application using Cox regression.
NASA Astrophysics Data System (ADS)
Most, Sebastian; Nowak, Wolfgang; Bijeljic, Branko
2015-04-01
Fickian transport in groundwater flow is the exception rather than the rule. Transport in porous media is frequently simulated via particle methods (i.e. particle tracking random walk (PTRW) or continuous time random walk (CTRW)). These methods formulate transport as a stochastic process of particle position increments. At the pore scale, geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Hence, it is important to get a better understanding of the processes at pore scale. For our analysis we track the positions of 10.000 particles migrating through the pore space over time. The data we use come from micro CT scans of a homogeneous sandstone and encompass about 10 grain sizes. Based on those images we discretize the pore structure and simulate flow at the pore scale based on the Navier-Stokes equation. This flow field realistically describes flow inside the pore space and we do not need to add artificial dispersion during the transport simulation. Next, we use particle tracking random walk and simulate pore-scale transport. Finally, we use the obtained particle trajectories to do a multivariate statistical analysis of the particle motion at the pore scale. Our analysis is based on copulas. Every multivariate joint distribution is a combination of its univariate marginal distributions. The copula represents the dependence structure of those univariate marginals and is therefore useful to observe correlation and non-Gaussian interactions (i.e. non-Fickian transport). The first goal of this analysis is to better understand the validity regions of commonly made assumptions. We are investigating three different transport distances: 1) The distance where the statistical dependence between particle increments can be modelled as an order-one Markov process. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks start. 2) The distance where bivariate statistical dependence simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW/CTRW). 3) The distance of complete statistical independence (validity of classical PTRW/CTRW). The second objective is to reveal characteristic dependencies influencing transport the most. Those dependencies can be very complex. Copulas are highly capable of representing linear dependence as well as non-linear dependence. With that tool we are able to detect persistent characteristics dominating transport even across different scales. The results derived from our experimental data set suggest that there are many more non-Fickian aspects of pore-scale transport than the univariate statistics of longitudinal displacements. Non-Fickianity can also be found in transverse displacements, and in the relations between increments at different time steps. Also, the found dependence is non-linear (i.e. beyond simple correlation) and persists over long distances. Thus, our results strongly support the further refinement of techniques like correlated PTRW or correlated CTRW towards non-linear statistical relations.
Wu, Baolin
2006-02-15
Differential gene expression detection and sample classification using microarray data have received much research interest recently. Owing to the large number of genes p and small number of samples n (p > n), microarray data analysis poses big challenges for statistical analysis. An obvious problem owing to the 'large p small n' is over-fitting. Just by chance, we are likely to find some non-differentially expressed genes that can classify the samples very well. The idea of shrinkage is to regularize the model parameters to reduce the effects of noise and produce reliable inferences. Shrinkage has been successfully applied in the microarray data analysis. The SAM statistics proposed by Tusher et al. and the 'nearest shrunken centroid' proposed by Tibshirani et al. are ad hoc shrinkage methods. Both methods are simple, intuitive and prove to be useful in empirical studies. Recently Wu proposed the penalized t/F-statistics with shrinkage by formally using the (1) penalized linear regression models for two-class microarray data, showing good performance. In this paper we systematically discussed the use of penalized regression models for analyzing microarray data. We generalize the two-class penalized t/F-statistics proposed by Wu to multi-class microarray data. We formally derive the ad hoc shrunken centroid used by Tibshirani et al. using the (1) penalized regression models. And we show that the penalized linear regression models provide a rigorous and unified statistical framework for sample classification and differential gene expression detection.
NASA Astrophysics Data System (ADS)
Halbrügge, Marc
2010-12-01
This paper describes the creation of a cognitive model submitted to the ‘Dynamic Stocks and Flows’ (DSF) modeling challenge. This challenge aims at comparing computational cognitive models for human behavior during an open ended control task. Participants in the modeling competition were provided with a simulation environment and training data for benchmarking their models while the actual specification of the competition task was withheld. To meet this challenge, the cognitive model described here was designed and optimized for generalizability. Only two simple assumptions about human problem solving were used to explain the empirical findings of the training data. In-depth analysis of the data set prior to the development of the model led to the dismissal of correlations or other parametric statistics as goodness-of-fit indicators. A new statistical measurement based on rank orders and sequence matching techniques is being proposed instead. This measurement, when being applied to the human sample, also identifies clusters of subjects that use different strategies for the task. The acceptability of the fits achieved by the model is verified using permutation tests.
[Practical aspects regarding sample size in clinical research].
Vega Ramos, B; Peraza Yanes, O; Herrera Correa, G; Saldívar Toraya, S
1996-01-01
The knowledge of the right sample size let us to be sure if the published results in medical papers had a suitable design and a proper conclusion according to the statistics analysis. To estimate the sample size we must consider the type I error, type II error, variance, the size of the effect, significance and power of the test. To decide what kind of mathematics formula will be used, we must define what kind of study we have, it means if its a prevalence study, a means values one or a comparative one. In this paper we explain some basic topics of statistics and we describe four simple samples of estimation of sample size.
Entropy in sound and vibration: towards a new paradigm
2017-01-01
This paper describes a discussion on the method and the status of a statistical theory of sound and vibration, called statistical energy analysis (SEA). SEA is a simple theory of sound and vibration in elastic structures that applies when the vibrational energy is diffusely distributed. We show that SEA is a thermodynamical theory of sound and vibration, based on a law of exchange of energy analogous to the Clausius principle. We further investigate the notion of entropy in this context and discuss its meaning. We show that entropy is a measure of information lost in the passage from the classical theory of sound and vibration and SEA, its thermodynamical counterpart. PMID:28265190
Data series embedding and scale invariant statistics.
Michieli, I; Medved, B; Ristov, S
2010-06-01
Data sequences acquired from bio-systems such as human gait data, heart rate interbeat data, or DNA sequences exhibit complex dynamics that is frequently described by a long-memory or power-law decay of autocorrelation function. One way of characterizing that dynamics is through scale invariant statistics or "fractal-like" behavior. For quantifying scale invariant parameters of physiological signals several methods have been proposed. Among them the most common are detrended fluctuation analysis, sample mean variance analyses, power spectral density analysis, R/S analysis, and recently in the realm of the multifractal approach, wavelet analysis. In this paper it is demonstrated that embedding the time series data in the high-dimensional pseudo-phase space reveals scale invariant statistics in the simple fashion. The procedure is applied on different stride interval data sets from human gait measurements time series (Physio-Bank data library). Results show that introduced mapping adequately separates long-memory from random behavior. Smaller gait data sets were analyzed and scale-free trends for limited scale intervals were successfully detected. The method was verified on artificially produced time series with known scaling behavior and with the varying content of noise. The possibility for the method to falsely detect long-range dependence in the artificially generated short range dependence series was investigated. (c) 2009 Elsevier B.V. All rights reserved.
TableViewer for Herschel Data Processing
NASA Astrophysics Data System (ADS)
Zhang, L.; Schulz, B.
2006-07-01
The TableViewer utility is a GUI tool written in Java to support interactive data processing and analysis for the Herschel Space Observatory (Pilbratt et al. 2001). The idea was inherited from a prototype written in IDL (Schulz et al. 2005). It allows to graphically view and analyze tabular data organized in columns with equal numbers of rows. It can be run either as a standalone application, where data access is restricted to FITS (FITS 1999) files only, or it can be run from the Quick Look Analysis(QLA) or Interactive Analysis(IA) command line, from where also objects are accessible. The graphic display is very versatile, allowing plots in either linear or log scales. Zooming, panning, and changing data columns is performed rapidly using a group of navigation buttons. Selecting and de-selecting of fields of data points controls the input to simple analysis tasks like building a statistics table, or generating power spectra. The binary data stored in a TableDataset^1, a Product or in FITS files can also be displayed as tabular data, where values in individual cells can be modified. TableViewer provides several processing utilities which, besides calculation of statistics either for all channels or for selected channels, and calculation of power spectra, allows to convert/repair datasets by changing the unit name of data columns, and by modifying data values in columns with a simple calculator tool. Interactively selected data can be separated out, and modified data sets can be saved to FITS files. The tool will be very helpful especially in the early phases of Herschel data analysis when a quick access to contents of data products is important. TableDataset and Product are Java classes defined in herschel.ia.dataset.
Biometrical issues in the analysis of adverse events within the benefit assessment of drugs.
Bender, Ralf; Beckmann, Lars; Lange, Stefan
2016-07-01
The analysis of adverse events plays an important role in the benefit assessment of drugs. Consequently, results on adverse events are an integral part of reimbursement dossiers submitted by pharmaceutical companies to health policy decision-makers. Methods applied in the analysis of adverse events commonly include simple standard methods for contingency tables. However, the results produced may be misleading if observations are censored at the time of discontinuation due to treatment switching or noncompliance, resulting in unequal follow-up periods. In this paper, we present examples to show that the application of inadequate methods for the analysis of adverse events in the reimbursement dossier can lead to a downgrading of the evidence on a drug's benefit in the subsequent assessment, as greater harm from the drug cannot be excluded with sufficient certainty. Legal regulations on the benefit assessment of drugs in Germany are presented, in particular, with regard to the analysis of adverse events. Differences in safety considerations between the drug approval process and the benefit assessment are discussed. We show that the naive application of simple proportions in reimbursement dossiers frequently leads to uninterpretable results if observations are censored and the average follow-up periods differ between treatment groups. Likewise, the application of incidence rates may be misleading in the case of recurrent events and unequal follow-up periods. To allow for an appropriate benefit assessment of drugs, adequate survival time methods accounting for time dependencies and duration of follow-up are required, not only for time-to-event efficacy endpoints but also for adverse events. © 2016 The Authors. Pharmaceutical Statistics published by John Wiley & Sons Ltd. © 2016 The Authors. Pharmaceutical Statistics published by John Wiley & Sons Ltd.
2013-01-01
Background Plasma glucose levels are important measures in medical care and research, and are often obtained from oral glucose tolerance tests (OGTT) with repeated measurements over 2–3 hours. It is common practice to use simple summary measures of OGTT curves. However, different OGTT curves can yield similar summary measures, and information of physiological or clinical interest may be lost. Our mean aim was to extract information inherent in the shape of OGTT glucose curves, compare it with the information from simple summary measures, and explore the clinical usefulness of such information. Methods OGTTs with five glucose measurements over two hours were recorded for 974 healthy pregnant women in their first trimester. For each woman, the five measurements were transformed into smooth OGTT glucose curves by functional data analysis (FDA), a collection of statistical methods developed specifically to analyse curve data. The essential modes of temporal variation between OGTT glucose curves were extracted by functional principal component analysis. The resultant functional principal component (FPC) scores were compared with commonly used simple summary measures: fasting and two-hour (2-h) values, area under the curve (AUC) and simple shape index (2-h minus 90-min values, or 90-min minus 60-min values). Clinical usefulness of FDA was explored by regression analyses of glucose tolerance later in pregnancy. Results Over 99% of the variation between individually fitted curves was expressed in the first three FPCs, interpreted physiologically as “general level” (FPC1), “time to peak” (FPC2) and “oscillations” (FPC3). FPC1 scores correlated strongly with AUC (r=0.999), but less with the other simple summary measures (−0.42≤r≤0.79). FPC2 scores gave shape information not captured by simple summary measures (−0.12≤r≤0.40). FPC2 scores, but not FPC1 nor the simple summary measures, discriminated between women who did and did not develop gestational diabetes later in pregnancy. Conclusions FDA of OGTT glucose curves in early pregnancy extracted shape information that was not identified by commonly used simple summary measures. This information discriminated between women with and without gestational diabetes later in pregnancy. PMID:23327294
Frøslie, Kathrine Frey; Røislien, Jo; Qvigstad, Elisabeth; Godang, Kristin; Bollerslev, Jens; Voldner, Nanna; Henriksen, Tore; Veierød, Marit B
2013-01-17
Plasma glucose levels are important measures in medical care and research, and are often obtained from oral glucose tolerance tests (OGTT) with repeated measurements over 2-3 hours. It is common practice to use simple summary measures of OGTT curves. However, different OGTT curves can yield similar summary measures, and information of physiological or clinical interest may be lost. Our mean aim was to extract information inherent in the shape of OGTT glucose curves, compare it with the information from simple summary measures, and explore the clinical usefulness of such information. OGTTs with five glucose measurements over two hours were recorded for 974 healthy pregnant women in their first trimester. For each woman, the five measurements were transformed into smooth OGTT glucose curves by functional data analysis (FDA), a collection of statistical methods developed specifically to analyse curve data. The essential modes of temporal variation between OGTT glucose curves were extracted by functional principal component analysis. The resultant functional principal component (FPC) scores were compared with commonly used simple summary measures: fasting and two-hour (2-h) values, area under the curve (AUC) and simple shape index (2-h minus 90-min values, or 90-min minus 60-min values). Clinical usefulness of FDA was explored by regression analyses of glucose tolerance later in pregnancy. Over 99% of the variation between individually fitted curves was expressed in the first three FPCs, interpreted physiologically as "general level" (FPC1), "time to peak" (FPC2) and "oscillations" (FPC3). FPC1 scores correlated strongly with AUC (r=0.999), but less with the other simple summary measures (-0.42≤r≤0.79). FPC2 scores gave shape information not captured by simple summary measures (-0.12≤r≤0.40). FPC2 scores, but not FPC1 nor the simple summary measures, discriminated between women who did and did not develop gestational diabetes later in pregnancy. FDA of OGTT glucose curves in early pregnancy extracted shape information that was not identified by commonly used simple summary measures. This information discriminated between women with and without gestational diabetes later in pregnancy.
The Zombie Plot: A Simple Graphic Method for Visualizing the Efficacy of a Diagnostic Test.
Richardson, Michael L
2016-08-09
One of the most important jobs of a radiologist is to pick the most appropriate imaging test for a particular clinical situation. Making a proper selection sometimes requires statistical analysis. The objective of this article is to introduce a simple graphic technique, an ROC plot that has been divided into zones of mostly bad imaging efficacy (ZOMBIE, hereafter referred to as the "zombie plot"), that transforms information about imaging efficacy from the numeric domain into the visual domain. The numeric rationale for the use of zombie plots is given, as are several examples of the clinical use of these plots. Two online calculators are described that simplify the process of producing a zombie plot.
Skop-Lewandowska, Agata; Zając, Joanna; Kolarzyk, Emilia
2017-12-23
Overweight and obesity belong to the alarming and constantly increasing problems of the 21st century among all age groups. One of the major factors enhancing these problems are simple carbohydrates commonly found in popular sweet drinks. The aim of the study was to estimate the nutritional patterns of elderly people with diagnosed cardiovascular system diseases, and analysis of the relationship between consumption of simple carbohydrates and prevalence of overweight and obesity. From 233 individuals hospitalized in the Clinic of Cardiology and Hypertension in Krakow, Poland, a group of 128 elderly people was selected (66 women and 62 men). Actual food consumption for each individual was assessed using a 24-hour nutrition recall. BMI values was calculated for assessment of nutritional status. Statistical analysis was performed on two groups: one with BMI <25kg/m2 and other with BMI≥25kg/m2. Overweight was stated among 33.8% of women and 50% of men, obesity among 27.7% of women and 17.7% of men. Results indicated that consumption of products rich in sucrose was associated with overweight and obesity. People with overweight and obesity statistically more often ate sweet products comparing to those with proper weight: 46.2 g vs 33.8g. The growing world-wide epidemic of overweight and obesity is one of the main priorities of preventive medicine remains changing eating patterns As observed in this study, one additional spoon of sugar consumed daily increases the risk of being overweight or obese by about 14%. Overweight and obesity was found among 60% of the examined elderly people. Correlation was found between rise in risk of obesity or overweight by about 14% with each additional spoon of sugar (5g) eaten every day.
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.
Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S
2017-01-05
The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
Statistical sensitivity analysis of a simple nuclear waste repository model
NASA Astrophysics Data System (ADS)
Ronen, Y.; Lucius, J. L.; Blow, E. M.
1980-06-01
A preliminary step in a comprehensive sensitivity analysis of the modeling of a nuclear waste repository. The purpose of the complete analysis is to determine which modeling parameters and physical data are most important in determining key design performance criteria and then to obtain the uncertainty in the design for safety considerations. The theory for a statistical screening design methodology is developed for later use in the overall program. The theory was applied to the test case of determining the relative importance of the sensitivity of near field temperature distribution in a single level salt repository to modeling parameters. The exact values of the sensitivities to these physical and modeling parameters were then obtained using direct methods of recalculation. The sensitivity coefficients found to be important for the sample problem were thermal loading, distance between the spent fuel canisters and their radius. Other important parameters were those related to salt properties at a point of interest in the repository.
[Effect of somatostatin-14 in simple mechanical obstruction of the small intestine].
Jimenez-Garcia, A; Ahmad Araji, O; Balongo Garcia, R; Nogales Munoz, A; Salguero Villadiego, M; Cantillana Martinez, J
1994-02-01
In order to investigate the properties of somatostatin-14 we studied an experimental model of simple mechanical and closed loop occlusion. Forty-eight New Zealand rabbits were assigned randomly to three groups of 16: group C (controls) was operated and treated with saline solution (4 cc/Kg/h); group A was operated and initially treated with saline solution and an equal dose of somatostatin-14 (3.5 micrograms/Kg/h; and group B was operated and treated in the same manner as group A, but later, 8 hours after the laparotomy. The animals were sacrificed 24 hours later; intestinal secretion was quantified, blood and intestinal fluid chemistries were performed and specimens of the intestine were prepared for histological examination. Descriptive statistical analysis of the results was performed with the ANOVA, a semi-quantitative test and the covariance test. Somatostatin-14 produced an improvement in the volume of intestinal secretion in the treated groups compared with the control group. The results were statistically significant in group B treated after an 8-hour delay: closed loop (ml): 6.40 +/- 1.12, 2.50 +/- 0.94, 1.85 +/- 0.83 and simple mechanical occlusion (ml): 175 +/- 33.05, 89.50 +/- 9.27, 57.18 +/- 21.23, p < 0.01 for groups C, A and B C, A and B respectively. Net secretion of Cl and Na ions was also improved, p < 0.01.(ABSTRACT TRUNCATED AT 250 WORDS)
NASA Astrophysics Data System (ADS)
Pollard, D.; Chang, W.; Haran, M.; Applegate, P.; DeConto, R.
2015-11-01
A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ~ 20 000 years. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree quite well with the more advanced techniques, but only for a large ensemble with full factorial parameter sampling. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds. Each run is extended 5000 years into the "future" with idealized ramped climate warming. In the majority of runs with reasonable scores, this produces grounding-line retreat deep into the West Antarctic interior, and the analysis provides sea-level-rise envelopes with well defined parametric uncertainty bounds.
In defence of model-based inference in phylogeography
Beaumont, Mark A.; Nielsen, Rasmus; Robert, Christian; Hey, Jody; Gaggiotti, Oscar; Knowles, Lacey; Estoup, Arnaud; Panchal, Mahesh; Corander, Jukka; Hickerson, Mike; Sisson, Scott A.; Fagundes, Nelson; Chikhi, Lounès; Beerli, Peter; Vitalis, Renaud; Cornuet, Jean-Marie; Huelsenbeck, John; Foll, Matthieu; Yang, Ziheng; Rousset, Francois; Balding, David; Excoffier, Laurent
2017-01-01
Recent papers have promoted the view that model-based methods in general, and those based on Approximate Bayesian Computation (ABC) in particular, are flawed in a number of ways, and are therefore inappropriate for the analysis of phylogeographic data. These papers further argue that Nested Clade Phylogeographic Analysis (NCPA) offers the best approach in statistical phylogeography. In order to remove the confusion and misconceptions introduced by these papers, we justify and explain the reasoning behind model-based inference. We argue that ABC is a statistically valid approach, alongside other computational statistical techniques that have been successfully used to infer parameters and compare models in population genetics. We also examine the NCPA method and highlight numerous deficiencies, either when used with single or multiple loci. We further show that the ages of clades are carelessly used to infer ages of demographic events, that these ages are estimated under a simple model of panmixia and population stationarity but are then used under different and unspecified models to test hypotheses, a usage the invalidates these testing procedures. We conclude by encouraging researchers to study and use model-based inference in population genetics. PMID:29284924
Eutrophication risk assessment in coastal embayments using simple statistical models.
Arhonditsis, G; Eleftheriadou, M; Karydis, M; Tsirtsis, G
2003-09-01
A statistical methodology is proposed for assessing the risk of eutrophication in marine coastal embayments. The procedure followed was the development of regression models relating the levels of chlorophyll a (Chl) with the concentration of the limiting nutrient--usually nitrogen--and the renewal rate of the systems. The method was applied in the Gulf of Gera, Island of Lesvos, Aegean Sea and a surrogate for renewal rate was created using the Canberra metric as a measure of the resemblance between the Gulf and the oligotrophic waters of the open sea in terms of their physical, chemical and biological properties. The Chl-total dissolved nitrogen-renewal rate regression model was the most significant, accounting for 60% of the variation observed in Chl. Predicted distributions of Chl for various combinations of the independent variables, based on Bayesian analysis of the models, enabled comparison of the outcomes of specific scenarios of interest as well as further analysis of the system dynamics. The present statistical approach can be used as a methodological tool for testing the resilience of coastal ecosystems under alternative managerial schemes and levels of exogenous nutrient loading.
Applied statistics in ecology: common pitfalls and simple solutions
E. Ashley Steel; Maureen C. Kennedy; Patrick G. Cunningham; John S. Stanovick
2013-01-01
The most common statistical pitfalls in ecological research are those associated with data exploration, the logic of sampling and design, and the interpretation of statistical results. Although one can find published errors in calculations, the majority of statistical pitfalls result from incorrect logic or interpretation despite correct numerical calculations. There...
Information categorization approach to literary authorship disputes
NASA Astrophysics Data System (ADS)
Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.
2003-11-01
Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.
The Statistics of wood assays for preservative retention
Patricia K. Lebow; Scott W. Conklin
2011-01-01
This paper covers general statistical concepts that apply to interpreting wood assay retention values. In particular, since wood assays are typically obtained from a single composited sample, the statistical aspects, including advantages and disadvantages, of simple compositing are covered.
A Methodology to Seperate and Analyze a Seismic Wide Angle Profile
NASA Astrophysics Data System (ADS)
Weinzierl, Wolfgang; Kopp, Heidrun
2010-05-01
General solutions of inverse problems can often be obtained through the introduction of probability distributions to sample the model space. We present a simple approach of defining an a priori space in a tomographic study and retrieve the velocity-depth posterior distribution by a Monte Carlo method. Utilizing a fitting routine designed for very low statistics to setup and analyze the obtained tomography results, it is possible to statistically separate the velocity-depth model space derived from the inversion of seismic refraction data. An example of a profile acquired in the Lesser Antilles subduction zone reveals the effectiveness of this approach. The resolution analysis of the structural heterogeneity includes a divergence analysis which proves to be capable of dissecting long wide-angle profiles for deep crust and upper mantle studies. The complete information of any parameterised physical system is contained in the a posteriori distribution. Methods for analyzing and displaying key properties of the a posteriori distributions of highly nonlinear inverse problems are therefore essential in the scope of any interpretation. From this study we infer several conclusions concerning the interpretation of the tomographic approach. By calculating a global as well as singular misfits of velocities we are able to map different geological units along a profile. Comparing velocity distributions with the result of a tomographic inversion along the profile we can mimic the subsurface structures in their extent and composition. The possibility of gaining a priori information for seismic refraction analysis by a simple solution to an inverse problem and subsequent resolution of structural heterogeneities through a divergence analysis is a new and simple way of defining a priori space and estimating the a posteriori mean and covariance in singular and general form. The major advantage of a Monte Carlo based approach in our case study is the obtained knowledge of velocity depth distributions. Certainly the decision of where to extract velocity information on the profile for setting up a Monte Carlo ensemble is limiting the a priori space. However, the general conclusion of analyzing the velocity field according to distinct reference distributions gives us the possibility to define the covariance according to any geological unit if we have a priori information on the velocity depth distributions. Using the wide angle data recorded across the Lesser Antilles arc, we are able to resolve a shallow feature like the backstop by a robust and simple divergence analysis. We demonstrate the effectiveness of the new methodology to extract some key features and properties from the inversion results by including information concerning the confidence level of results.
Osterberg, T; Norinder, U
2001-01-01
A method of modelling and predicting biopharmaceutical properties using simple theoretically computed molecular descriptors and multivariate statistics has been investigated for several data sets related to solubility, IAM chromatography, permeability across Caco-2 cell monolayers, human intestinal perfusion, brain-blood partitioning, and P-glycoprotein ATPase activity. The molecular descriptors (e.g. molar refractivity, molar volume, index of refraction, surface tension and density) and logP were computed with ACD/ChemSketch and ACD/logP, respectively. Good statistical models were derived that permit simple computational prediction of biopharmaceutical properties. All final models derived had R(2) values ranging from 0.73 to 0.95 and Q(2) values ranging from 0.69 to 0.86. The RMSEP values for the external test sets ranged from 0.24 to 0.85 (log scale).
Statistical Analysis of Warfare: Identification of Winning Factors with a Focus on Irregular Warfare
2015-09-01
Defense HERO Historical Evaluation and Research Organization IDPs Internally Displaced Persons IFR Initial Force Ratio INITA Relative Imitative...armies started combating non-state, widely dispersed groups. Although this change appears to be quite simple, it has had a deep impact on military...1789–1961: A Study of the Impact of the French, Industrial, and Russian Revolutions on War and its Conduct (Boston, MA: Da Capo Press, 1992). 10 Ibid
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chertkov, Michael; Turitsyn, Konstantin; Sulc, Petr
The anticipated increase in the number of plug-in electric vehicles (EV) will put additional strain on electrical distribution circuits. Many control schemes have been proposed to control EV charging. Here, we develop control algorithms based on randomized EV charging start times and simple one-way broadcast communication allowing for a time delay between communication events. Using arguments from queuing theory and statistical analysis, we seek to maximize the utilization of excess distribution circuit capacity while keeping the probability of a circuit overload negligible.
2009-01-01
representation to a simple curve in 3D by using the Whitney embedding theorem. In a very ludic way, we propose to combine phases one and two to...elimination principle which takes advantage of the designed parametrization. To further refine discrimination among objects, we introduce a post...packing numbers and design of principal curves. IEEE transactions on Pattern Analysis and Machine Intel- ligence, 22(3):281-297, 2000. [68] M. H. Yang, Face
Computerized EEG analysis for studying the effect of drugs on the central nervous system.
Rosadini, G; Cavazza, B; Rodriguez, G; Sannita, W G; Siccardi, A
1977-11-01
Samples of our experience in quantitative pharmaco-EEG are reviewed to discuss and define its applicability and limits. Simple processing systems, such as the computation of Hjorth's descriptors, are useful for on-line monitoring of drug-induced EEG modifications which are evident also at the visual visual analysis. Power spectral analysis is suitable to identify and quantify EEG effects not evident at the visual inspection. It demonstrated how the EEG effects of compounds in a long-acting formulation vary according to the sampling time and the explored cerebral area. EEG modifications not detected by power spectral analysis can be defined by comparing statistically (F test) the spectral values of the EEG from a single lead at the different samples (longitudinal comparison), or the spectral values from different leads at any sample (intrahemispheric comparison). The presently available procedures of quantitative pharmaco-EEG are effective when applied to the study of mutltilead EEG recordings in a statistically significant sample of population. They do not seem reliable in the monitoring of directing of neuropyschiatric therapies in single patients, due to individual variability of drug effects.
Neurophysiological correlates of depressive symptoms in young adults: A quantitative EEG study.
Lee, Poh Foong; Kan, Donica Pei Xin; Croarkin, Paul; Phang, Cheng Kar; Doruk, Deniz
2018-01-01
There is an unmet need for practical and reliable biomarkers for mood disorders in young adults. Identifying the brain activity associated with the early signs of depressive disorders could have important diagnostic and therapeutic implications. In this study we sought to investigate the EEG characteristics in young adults with newly identified depressive symptoms. Based on the initial screening, a total of 100 participants (n = 50 euthymic, n = 50 depressive) underwent 32-channel EEG acquisition. Simple logistic regression and C-statistic were used to explore if EEG power could be used to discriminate between the groups. The strongest EEG predictors of mood using multivariate logistic regression models. Simple logistic regression analysis with subsequent C-statistics revealed that only high-alpha and beta power originating from the left central cortex (C3) have a reliable discriminative value (ROC curve >0.7 (70%)) for differentiating the depressive group from the euthymic group. Multivariate regression analysis showed that the single most significant predictor of group (depressive vs. euthymic) is the high-alpha power over C3 (p = 0.03). The present findings suggest that EEG is a useful tool in the identification of neurophysiological correlates of depressive symptoms in young adults with no previous psychiatric history. Our results could guide future studies investigating the early neurophysiological changes and surrogate outcomes in depression. Copyright © 2017 Elsevier Ltd. All rights reserved.
Evaluating surrogate endpoints, prognostic markers, and predictive markers: Some simple themes.
Baker, Stuart G; Kramer, Barnett S
2015-08-01
A surrogate endpoint is an endpoint observed earlier than the true endpoint (a health outcome) that is used to draw conclusions about the effect of treatment on the unobserved true endpoint. A prognostic marker is a marker for predicting the risk of an event given a control treatment; it informs treatment decisions when there is information on anticipated benefits and harms of a new treatment applied to persons at high risk. A predictive marker is a marker for predicting the effect of treatment on outcome in a subgroup of patients or study participants; it provides more rigorous information for treatment selection than a prognostic marker when it is based on estimated treatment effects in a randomized trial. We organized our discussion around a different theme for each topic. "Fundamentally an extrapolation" refers to the non-statistical considerations and assumptions needed when using surrogate endpoints to evaluate a new treatment. "Decision analysis to the rescue" refers to use the use of decision analysis to evaluate an additional prognostic marker because it is not possible to choose between purely statistical measures of marker performance. "The appeal of simplicity" refers to a straightforward and efficient use of a single randomized trial to evaluate overall treatment effect and treatment effect within subgroups using predictive markers. The simple themes provide a general guideline for evaluation of surrogate endpoints, prognostic markers, and predictive markers. © The Author(s) 2014.
The significance of organ prolapse in gastroschisis.
Koehler, Shannon M; Szabo, Aniko; Loichinger, Matt; Peterson, Erika; Christensen, Melissa; Wagner, Amy J
2017-12-01
The aim of this study was to evaluate the incidence and importance of organ prolapse (stomach, bladder, reproductive organs) in gastroschisis. This is a retrospective review of gastroschisis patients from 2000 to 2014 at a single tertiary institution. Statistical analysis was performed using a chi-square test, Student's t test, log-rank test, or Cox regression analysis models. All tests were conducted as two-tailed tests, and p-values <0.05 were considered statistically significant. One hundred seventy-one gastroschisis patients were identified. Sixty-nine (40.6%) had at least one prolapsed organ besides bowel. The most commonly prolapsed organs were stomach (n=45, 26.3%), reproductive organs (n=34, 19.9%), and bladder (n=15, 8.8%). Patients with prolapsed organs were more likely to have simple gastroschisis with significant decreases in the rate of atresia and necrosis/perforation. They progressed to earlier enteral feeds, discontinuation of parenteral nutrition, and discharge. Likewise, these patients were less likely to have complications such as central line infections, sepsis, and short gut syndrome. Gastroschisis is typically described as isolated bowel herniation, but a large portion have prolapse of other organs. Prolapsed organs are associated with simple gastroschisis, and improved outcomes most likely due to a larger fascial defect. This may be useful for prenatal and postnatal counseling of families. Case Control/Retrospective Comparative Study. Level III. Copyright © 2017 Elsevier Inc. All rights reserved.
Statistical tools for transgene copy number estimation based on real-time PCR.
Yuan, Joshua S; Burris, Jason; Stewart, Nathan R; Mentewab, Ayalew; Stewart, C Neal
2007-11-01
As compared with traditional transgene copy number detection technologies such as Southern blot analysis, real-time PCR provides a fast, inexpensive and high-throughput alternative. However, the real-time PCR based transgene copy number estimation tends to be ambiguous and subjective stemming from the lack of proper statistical analysis and data quality control to render a reliable estimation of copy number with a prediction value. Despite the recent progresses in statistical analysis of real-time PCR, few publications have integrated these advancements in real-time PCR based transgene copy number determination. Three experimental designs and four data quality control integrated statistical models are presented. For the first method, external calibration curves are established for the transgene based on serially-diluted templates. The Ct number from a control transgenic event and putative transgenic event are compared to derive the transgene copy number or zygosity estimation. Simple linear regression and two group T-test procedures were combined to model the data from this design. For the second experimental design, standard curves were generated for both an internal reference gene and the transgene, and the copy number of transgene was compared with that of internal reference gene. Multiple regression models and ANOVA models can be employed to analyze the data and perform quality control for this approach. In the third experimental design, transgene copy number is compared with reference gene without a standard curve, but rather, is based directly on fluorescence data. Two different multiple regression models were proposed to analyze the data based on two different approaches of amplification efficiency integration. Our results highlight the importance of proper statistical treatment and quality control integration in real-time PCR-based transgene copy number determination. These statistical methods allow the real-time PCR-based transgene copy number estimation to be more reliable and precise with a proper statistical estimation. Proper confidence intervals are necessary for unambiguous prediction of trangene copy number. The four different statistical methods are compared for their advantages and disadvantages. Moreover, the statistical methods can also be applied for other real-time PCR-based quantification assays including transfection efficiency analysis and pathogen quantification.
Ghandeharioun, H; Rezaeitalab, F; Lotfi, R
2016-01-01
This study carefully evaluates the association of different respiration-related events to each other and to simple nocturnal features in obstructive sleep apnea-hypopnea syndrome (OSAS). The events include apneas, hypopneas, respiratory event-related arousals and snores. We conducted a statistical study on 158 adults who underwent polysomnography between July 2012 and May 2014. To monitor relevance, along with linear statistical strategies like analysis of variance and bootstrapping a correlation coefficient standard error, the non-linear method of mutual information is also applied to illuminate vague results of linear techniques. Based on normalized mutual information weights (NMIW), indices of apnea are 1.3 times more relevant to AHI values than those of hypopnea. NMIW for the number of blood oxygen desaturation below 95% is considerable (0.531). The next relevant feature is "respiratory arousals index" with NMIW of 0.501. Snore indices (0.314), and BMI (0.203) take the next place. Based on NMIW values, snoring events are nearly one-third (29.9%) more dependent to hypopneas than RERAs. 1. The more sever the OSAS is, the more frequently the apneic events happen. 2. The association of snore with hypopnea/RERA revealed which is routinely ignored in regression-based OSAS modeling. 3. The statistical dependencies of oximetry features potentially can lead to home-based screening of OSAS. 4. Poor ESS-AHI relevance in the database under study indicates its disability for the OSA diagnosis compared to oximetry. 5. Based on poor RERA-snore/ESS relevance, detailed history of the symptoms plus polysomnography is suggested for accurate diagnosis of RERAs. Copyright © 2015 Sociedade Portuguesa de Pneumologia. Published by Elsevier España, S.L.U. All rights reserved.
Statistical assessment of the learning curves of health technologies.
Ramsay, C R; Grant, A M; Wallace, S A; Garthwaite, P H; Monk, A F; Russell, I T
2001-01-01
(1) To describe systematically studies that directly assessed the learning curve effect of health technologies. (2) Systematically to identify 'novel' statistical techniques applied to learning curve data in other fields, such as psychology and manufacturing. (3) To test these statistical techniques in data sets from studies of varying designs to assess health technologies in which learning curve effects are known to exist. METHODS - STUDY SELECTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): For a study to be included, it had to include a formal analysis of the learning curve of a health technology using a graphical, tabular or statistical technique. METHODS - STUDY SELECTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): For a study to be included, it had to include a formal assessment of a learning curve using a statistical technique that had not been identified in the previous search. METHODS - DATA SOURCES: Six clinical and 16 non-clinical biomedical databases were searched. A limited amount of handsearching and scanning of reference lists was also undertaken. METHODS - DATA EXTRACTION (HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW): A number of study characteristics were abstracted from the papers such as study design, study size, number of operators and the statistical method used. METHODS - DATA EXTRACTION (NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH): The new statistical techniques identified were categorised into four subgroups of increasing complexity: exploratory data analysis; simple series data analysis; complex data structure analysis, generic techniques. METHODS - TESTING OF STATISTICAL METHODS: Some of the statistical methods identified in the systematic searches for single (simple) operator series data and for multiple (complex) operator series data were illustrated and explored using three data sets. The first was a case series of 190 consecutive laparoscopic fundoplication procedures performed by a single surgeon; the second was a case series of consecutive laparoscopic cholecystectomy procedures performed by ten surgeons; the third was randomised trial data derived from the laparoscopic procedure arm of a multicentre trial of groin hernia repair, supplemented by data from non-randomised operations performed during the trial. RESULTS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: Of 4571 abstracts identified, 272 (6%) were later included in the study after review of the full paper. Some 51% of studies assessed a surgical minimal access technique and 95% were case series. The statistical method used most often (60%) was splitting the data into consecutive parts (such as halves or thirds), with only 14% attempting a more formal statistical analysis. The reporting of the studies was poor, with 31% giving no details of data collection methods. RESULTS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: Of 9431 abstracts assessed, 115 (1%) were deemed appropriate for further investigation and, of these, 18 were included in the study. All of the methods for complex data sets were identified in the non-clinical literature. These were discriminant analysis, two-stage estimation of learning rates, generalised estimating equations, multilevel models, latent curve models, time series models and stochastic parameter models. In addition, eight new shapes of learning curves were identified. RESULTS - TESTING OF STATISTICAL METHODS: No one particular shape of learning curve performed significantly better than another. The performance of 'operation time' as a proxy for learning differed between the three procedures. Multilevel modelling using the laparoscopic cholecystectomy data demonstrated and measured surgeon-specific and confounding effects. The inclusion of non-randomised cases, despite the possible limitations of the method, enhanced the interpretation of learning effects. CONCLUSIONS - HEALTH TECHNOLOGY ASSESSMENT LITERATURE REVIEW: The statistical methods used for assessing learning effects in health technology assessment have been crude and the reporting of studies poor. CONCLUSIONS - NON-HEALTH TECHNOLOGY ASSESSMENT LITERATURE SEARCH: A number of statistical methods for assessing learning effects were identified that had not hitherto been used in health technology assessment. There was a hierarchy of methods for the identification and measurement of learning, and the more sophisticated methods for both have had little if any use in health technology assessment. This demonstrated the value of considering fields outside clinical research when addressing methodological issues in health technology assessment. CONCLUSIONS - TESTING OF STATISTICAL METHODS: It has been demonstrated that the portfolio of techniques identified can enhance investigations of learning curve effects. (ABSTRACT TRUNCATED)
Robust Strategy for Rocket Engine Health Monitoring
NASA Technical Reports Server (NTRS)
Santi, L. Michael
2001-01-01
Monitoring the health of rocket engine systems is essentially a two-phase process. The acquisition phase involves sensing physical conditions at selected locations, converting physical inputs to electrical signals, conditioning the signals as appropriate to establish scale or filter interference, and recording results in a form that is easy to interpret. The inference phase involves analysis of results from the acquisition phase, comparison of analysis results to established health measures, and assessment of health indications. A variety of analytical tools may be employed in the inference phase of health monitoring. These tools can be separated into three broad categories: statistical, rule based, and model based. Statistical methods can provide excellent comparative measures of engine operating health. They require well-characterized data from an ensemble of "typical" engines, or "golden" data from a specific test assumed to define the operating norm in order to establish reliable comparative measures. Statistical methods are generally suitable for real-time health monitoring because they do not deal with the physical complexities of engine operation. The utility of statistical methods in rocket engine health monitoring is hindered by practical limits on the quantity and quality of available data. This is due to the difficulty and high cost of data acquisition, the limited number of available test engines, and the problem of simulating flight conditions in ground test facilities. In addition, statistical methods incur a penalty for disregarding flow complexity and are therefore limited in their ability to define performance shift causality. Rule based methods infer the health state of the engine system based on comparison of individual measurements or combinations of measurements with defined health norms or rules. This does not mean that rule based methods are necessarily simple. Although binary yes-no health assessment can sometimes be established by relatively simple rules, the causality assignment needed for refined health monitoring often requires an exceptionally complex rule base involving complicated logical maps. Structuring the rule system to be clear and unambiguous can be difficult, and the expert input required to maintain a large logic network and associated rule base can be prohibitive.
A computational visual saliency model based on statistics and machine learning.
Lin, Ru-Je; Lin, Wei-Song
2014-08-01
Identifying the type of stimuli that attracts human visual attention has been an appealing topic for scientists for many years. In particular, marking the salient regions in images is useful for both psychologists and many computer vision applications. In this paper, we propose a computational approach for producing saliency maps using statistics and machine learning methods. Based on four assumptions, three properties (Feature-Prior, Position-Prior, and Feature-Distribution) can be derived and combined by a simple intersection operation to obtain a saliency map. These properties are implemented by a similarity computation, support vector regression (SVR) technique, statistical analysis of training samples, and information theory using low-level features. This technique is able to learn the preferences of human visual behavior while simultaneously considering feature uniqueness. Experimental results show that our approach performs better in predicting human visual attention regions than 12 other models in two test databases. © 2014 ARVO.
NASA Technical Reports Server (NTRS)
Butler, C. M.; Hogge, J. E.
1978-01-01
Air quality sampling was conducted. Data for air quality parameters, recorded on written forms, punched cards or magnetic tape, are available for 1972 through 1975. Computer software was developed to (1) calculate several daily statistical measures of location, (2) plot time histories of data or the calculated daily statistics, (3) calculate simple correlation coefficients, and (4) plot scatter diagrams. Computer software was developed for processing air quality data to include time series analysis and goodness of fit tests. Computer software was developed to (1) calculate a larger number of daily statistical measures of location, and a number of daily monthly and yearly measures of location, dispersion, skewness and kurtosis, (2) decompose the extended time series model and (3) perform some goodness of fit tests. The computer program is described, documented and illustrated by examples. Recommendations are made for continuation of the development of research on processing air quality data.
A statistical framework for evaluating neural networks to predict recurrent events in breast cancer
NASA Astrophysics Data System (ADS)
Gorunescu, Florin; Gorunescu, Marina; El-Darzi, Elia; Gorunescu, Smaranda
2010-07-01
Breast cancer is the second leading cause of cancer deaths in women today. Sometimes, breast cancer can return after primary treatment. A medical diagnosis of recurrent cancer is often a more challenging task than the initial one. In this paper, we investigate the potential contribution of neural networks (NNs) to support health professionals in diagnosing such events. The NN algorithms are tested and applied to two different datasets. An extensive statistical analysis has been performed to verify our experiments. The results show that a simple network structure for both the multi-layer perceptron and radial basis function can produce equally good results, not all attributes are needed to train these algorithms and, finally, the classification performances of all algorithms are statistically robust. Moreover, we have shown that the best performing algorithm will strongly depend on the features of the datasets, and hence, there is not necessarily a single best classifier.
Sample size considerations for clinical research studies in nuclear cardiology.
Chiuzan, Cody; West, Erin A; Duong, Jimmy; Cheung, Ken Y K; Einstein, Andrew J
2015-12-01
Sample size calculation is an important element of research design that investigators need to consider in the planning stage of the study. Funding agencies and research review panels request a power analysis, for example, to determine the minimum number of subjects needed for an experiment to be informative. Calculating the right sample size is crucial to gaining accurate information and ensures that research resources are used efficiently and ethically. The simple question "How many subjects do I need?" does not always have a simple answer. Before calculating the sample size requirements, a researcher must address several aspects, such as purpose of the research (descriptive or comparative), type of samples (one or more groups), and data being collected (continuous or categorical). In this article, we describe some of the most frequent methods for calculating the sample size with examples from nuclear cardiology research, including for t tests, analysis of variance (ANOVA), non-parametric tests, correlation, Chi-squared tests, and survival analysis. For the ease of implementation, several examples are also illustrated via user-friendly free statistical software.
Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia
2017-04-01
To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
Anderson, Carl A; McRae, Allan F; Visscher, Peter M
2006-07-01
Standard quantitative trait loci (QTL) mapping techniques commonly assume that the trait is both fully observed and normally distributed. When considering survival or age-at-onset traits these assumptions are often incorrect. Methods have been developed to map QTL for survival traits; however, they are both computationally intensive and not available in standard genome analysis software packages. We propose a grouped linear regression method for the analysis of continuous survival data. Using simulation we compare this method to both the Cox and Weibull proportional hazards models and a standard linear regression method that ignores censoring. The grouped linear regression method is of equivalent power to both the Cox and Weibull proportional hazards methods and is significantly better than the standard linear regression method when censored observations are present. The method is also robust to the proportion of censored individuals and the underlying distribution of the trait. On the basis of linear regression methodology, the grouped linear regression model is computationally simple and fast and can be implemented readily in freely available statistical software.
A Role for Chunk Formation in Statistical Learning of Second Language Syntax
ERIC Educational Resources Information Center
Hamrick, Phillip
2014-01-01
Humans are remarkably sensitive to the statistical structure of language. However, different mechanisms have been proposed to account for such statistical sensitivities. The present study compared adult learning of syntax and the ability of two models of statistical learning to simulate human performance: Simple Recurrent Networks, which learn by…
No-Reference Video Quality Assessment Based on Statistical Analysis in 3D-DCT Domain.
Li, Xuelong; Guo, Qun; Lu, Xiaoqiang
2016-05-13
It is an important task to design models for universal no-reference video quality assessment (NR-VQA) in multiple video processing and computer vision applications. However, most existing NR-VQA metrics are designed for specific distortion types which are not often aware in practical applications. A further deficiency is that the spatial and temporal information of videos is hardly considered simultaneously. In this paper, we propose a new NR-VQA metric based on the spatiotemporal natural video statistics (NVS) in 3D discrete cosine transform (3D-DCT) domain. In the proposed method, a set of features are firstly extracted based on the statistical analysis of 3D-DCT coefficients to characterize the spatiotemporal statistics of videos in different views. These features are used to predict the perceived video quality via the efficient linear support vector regression (SVR) model afterwards. The contributions of this paper are: 1) we explore the spatiotemporal statistics of videos in 3DDCT domain which has the inherent spatiotemporal encoding advantage over other widely used 2D transformations; 2) we extract a small set of simple but effective statistical features for video visual quality prediction; 3) the proposed method is universal for multiple types of distortions and robust to different databases. The proposed method is tested on four widely used video databases. Extensive experimental results demonstrate that the proposed method is competitive with the state-of-art NR-VQA metrics and the top-performing FR-VQA and RR-VQA metrics.
Statistical testing of association between menstruation and migraine.
Barra, Mathias; Dahl, Fredrik A; Vetvik, Kjersti G
2015-02-01
To repair and refine a previously proposed method for statistical analysis of association between migraine and menstruation. Menstrually related migraine (MRM) affects about 20% of female migraineurs in the general population. The exact pathophysiological link from menstruation to migraine is hypothesized to be through fluctuations in female reproductive hormones, but the exact mechanisms remain unknown. Therefore, the main diagnostic criterion today is concurrency of migraine attacks with menstruation. Methods aiming to exclude spurious associations are wanted, so that further research into these mechanisms can be performed on a population with a true association. The statistical method is based on a simple two-parameter null model of MRM (which allows for simulation modeling), and Fisher's exact test (with mid-p correction) applied to standard 2 × 2 contingency tables derived from the patients' headache diaries. Our method is a corrected version of a previously published flawed framework. To our best knowledge, no other published methods for establishing a menstruation-migraine association by statistical means exist today. The probabilistic methodology shows good performance when subjected to receiver operator characteristic curve analysis. Quick reference cutoff values for the clinical setting were tabulated for assessing association given a patient's headache history. In this paper, we correct a proposed method for establishing association between menstruation and migraine by statistical methods. We conclude that the proposed standard of 3-cycle observations prior to setting an MRM diagnosis should be extended with at least one perimenstrual window to obtain sufficient information for statistical processing. © 2014 American Headache Society.
SOCR Analyses - an Instructional Java Web-based Statistical Analysis Toolkit.
Chu, Annie; Cui, Jenny; Dinov, Ivo D
2009-03-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test.The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website.In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models.
Multivariate analysis techniques
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bendavid, Josh; Fisher, Wade C.; Junk, Thomas R.
2016-01-01
The end products of experimental data analysis are designed to be simple and easy to understand: hypothesis tests and measurements of parameters. But, the experimental data themselves are voluminous and complex. Furthermore, in modern collider experiments, many petabytes of data must be processed in search of rare new processes which occur together with much more copious background processes that are of less interest to the task at hand. The systematic uncertainties on the background may be larger than the expected signal in many cases. The statistical power of an analysis and its sensitivity to systematic uncertainty can therefore usually bothmore » be improved by separating signal events from background events with higher efficiency and purity.« less
A consistent framework for Horton regression statistics that leads to a modified Hack's law
Furey, P.R.; Troutman, B.M.
2008-01-01
A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.
A perceptual space of local image statistics.
Victor, Jonathan D; Thengone, Daniel J; Rizvi, Syed M; Conte, Mary M
2015-12-01
Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice - a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4min. In sum, local image statistics form a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. Copyright © 2015 Elsevier Ltd. All rights reserved.
A perceptual space of local image statistics
Victor, Jonathan D.; Thengone, Daniel J.; Rizvi, Syed M.; Conte, Mary M.
2015-01-01
Local image statistics are important for visual analysis of textures, surfaces, and form. There are many kinds of local statistics, including those that capture luminance distributions, spatial contrast, oriented segments, and corners. While sensitivity to each of these kinds of statistics have been well-studied, much less is known about visual processing when multiple kinds of statistics are relevant, in large part because the dimensionality of the problem is high and different kinds of statistics interact. To approach this problem, we focused on binary images on a square lattice – a reduced set of stimuli which nevertheless taps many kinds of local statistics. In this 10-parameter space, we determined psychophysical thresholds to each kind of statistic (16 observers) and all of their pairwise combinations (4 observers). Sensitivities and isodiscrimination contours were consistent across observers. Isodiscrimination contours were elliptical, implying a quadratic interaction rule, which in turn determined ellipsoidal isodiscrimination surfaces in the full 10-dimensional space, and made predictions for sensitivities to complex combinations of statistics. These predictions, including the prediction of a combination of statistics that was metameric to random, were verified experimentally. Finally, check size had only a mild effect on sensitivities over the range from 2.8 to 14 min, but sensitivities to second- and higher-order statistics was substantially lower at 1.4 min. In sum, local image statistics forms a perceptual space that is highly stereotyped across observers, in which different kinds of statistics interact according to simple rules. PMID:26130606
Theory of Financial Risk and Derivative Pricing
NASA Astrophysics Data System (ADS)
Bouchaud, Jean-Philippe; Potters, Marc
2009-01-01
Foreword; Preface; 1. Probability theory: basic notions; 2. Maximum and addition of random variables; 3. Continuous time limit, Ito calculus and path integrals; 4. Analysis of empirical data; 5. Financial products and financial markets; 6. Statistics of real prices: basic results; 7. Non-linear correlations and volatility fluctuations; 8. Skewness and price-volatility correlations; 9. Cross-correlations; 10. Risk measures; 11. Extreme correlations and variety; 12. Optimal portfolios; 13. Futures and options: fundamental concepts; 14. Options: hedging and residual risk; 15. Options: the role of drift and correlations; 16. Options: the Black and Scholes model; 17. Options: some more specific problems; 18. Options: minimum variance Monte-Carlo; 19. The yield curve; 20. Simple mechanisms for anomalous price statistics; Index of most important symbols; Index.
Theory of Financial Risk and Derivative Pricing - 2nd Edition
NASA Astrophysics Data System (ADS)
Bouchaud, Jean-Philippe; Potters, Marc
2003-12-01
Foreword; Preface; 1. Probability theory: basic notions; 2. Maximum and addition of random variables; 3. Continuous time limit, Ito calculus and path integrals; 4. Analysis of empirical data; 5. Financial products and financial markets; 6. Statistics of real prices: basic results; 7. Non-linear correlations and volatility fluctuations; 8. Skewness and price-volatility correlations; 9. Cross-correlations; 10. Risk measures; 11. Extreme correlations and variety; 12. Optimal portfolios; 13. Futures and options: fundamental concepts; 14. Options: hedging and residual risk; 15. Options: the role of drift and correlations; 16. Options: the Black and Scholes model; 17. Options: some more specific problems; 18. Options: minimum variance Monte-Carlo; 19. The yield curve; 20. Simple mechanisms for anomalous price statistics; Index of most important symbols; Index.
Renormalization Group Tutorial
NASA Technical Reports Server (NTRS)
Bell, Thomas L.
2004-01-01
Complex physical systems sometimes have statistical behavior characterized by power- law dependence on the parameters of the system and spatial variability with no particular characteristic scale as the parameters approach critical values. The renormalization group (RG) approach was developed in the fields of statistical mechanics and quantum field theory to derive quantitative predictions of such behavior in cases where conventional methods of analysis fail. Techniques based on these ideas have since been extended to treat problems in many different fields, and in particular, the behavior of turbulent fluids. This lecture will describe a relatively simple but nontrivial example of the RG approach applied to the diffusion of photons out of a stellar medium when the photons have wavelengths near that of an emission line of atoms in the medium.
Things fall apart: biological species form unconnected parsimony networks.
Hart, Michael W; Sunday, Jennifer
2007-10-22
The generality of operational species definitions is limited by problematic definitions of between-species divergence. A recent phylogenetic species concept based on a simple objective measure of statistically significant genetic differentiation uses between-species application of statistical parsimony networks that are typically used for population genetic analysis within species. Here we review recent phylogeographic studies and reanalyse several mtDNA barcoding studies using this method. We found that (i) alignments of DNA sequences typically fall apart into a separate subnetwork for each Linnean species (but with a higher rate of true positives for mtDNA data) and (ii) DNA sequences from single species typically stick together in a single haplotype network. Departures from these patterns are usually consistent with hybridization or cryptic species diversity.
Probabilistic Component Mode Synthesis of Nondeterministic Substructures
NASA Technical Reports Server (NTRS)
Brown, Andrew M.; Ferri, Aldo A.
1996-01-01
Standard methods of structural dynamic analysis assume that the structural characteristics are deterministic. Recognizing that these characteristics are actually statistical in nature researchers have recently developed a variety of methods that use this information to determine probabilities of a desired response characteristic, such as natural frequency, without using expensive Monte Carlo simulations. One of the problems in these methods is correctly identifying the statistical properties of primitive variables such as geometry, stiffness, and mass. We present a method where the measured dynamic properties of substructures are used instead as the random variables. The residual flexibility method of component mode synthesis is combined with the probabilistic methods to determine the cumulative distribution function of the system eigenvalues. A simple cantilever beam test problem is presented that illustrates the theory.
Robust Combining of Disparate Classifiers Through Order Statistics
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep
2001-01-01
Integrating the outputs of multiple classifiers via combiners or meta-learners has led to substantial improvements in several difficult pattern recognition problems. In this article we investigate a family of combiners based on order statistics, for robust handling of situations where there are large discrepancies in performance of individual classifiers. Based on a mathematical modeling of how the decision boundaries are affected by order statistic combiners, we derive expressions for the reductions in error expected when simple output combination methods based on the the median, the maximum and in general, the ith order statistic, are used. Furthermore, we analyze the trim and spread combiners, both based on linear combinations of the ordered classifier outputs, and show that in the presence of uneven classifier performance, they often provide substantial gains over both linear and simple order statistics combiners. Experimental results on both real world data and standard public domain data sets corroborate these findings.
Counting statistics for genetic switches based on effective interaction approximation
NASA Astrophysics Data System (ADS)
Ohkubo, Jun
2012-09-01
Applicability of counting statistics for a system with an infinite number of states is investigated. The counting statistics has been studied a lot for a system with a finite number of states. While it is possible to use the scheme in order to count specific transitions in a system with an infinite number of states in principle, we have non-closed equations in general. A simple genetic switch can be described by a master equation with an infinite number of states, and we use the counting statistics in order to count the number of transitions from inactive to active states in the gene. To avoid having the non-closed equations, an effective interaction approximation is employed. As a result, it is shown that the switching problem can be treated as a simple two-state model approximately, which immediately indicates that the switching obeys non-Poisson statistics.
Lin, Kao; Li, Haipeng; Schlötterer, Christian; Futschik, Andreas
2011-01-01
Summary statistics are widely used in population genetics, but they suffer from the drawback that no simple sufficient summary statistic exists, which captures all information required to distinguish different evolutionary hypotheses. Here, we apply boosting, a recent statistical method that combines simple classification rules to maximize their joint predictive performance. We show that our implementation of boosting has a high power to detect selective sweeps. Demographic events, such as bottlenecks, do not result in a large excess of false positives. A comparison to other neutrality tests shows that our boosting implementation performs well compared to other neutrality tests. Furthermore, we evaluated the relative contribution of different summary statistics to the identification of selection and found that for recent sweeps integrated haplotype homozygosity is very informative whereas older sweeps are better detected by Tajima's π. Overall, Watterson's θ was found to contribute the most information for distinguishing between bottlenecks and selection. PMID:21041556
Patil, Prasad; Peng, Roger D; Leek, Jeffrey T
2016-07-01
A recent study of the replicability of key psychological findings is a major contribution toward understanding the human side of the scientific process. Despite the careful and nuanced analysis reported, the simple narrative disseminated by the mass, social, and scientific media was that in only 36% of the studies were the original results replicated. In the current study, however, we showed that 77% of the replication effect sizes reported were within a 95% prediction interval calculated using the original effect size. Our analysis suggests two critical issues in understanding replication of psychological studies. First, researchers' intuitive expectations for what a replication should show do not always match with statistical estimates of replication. Second, when the results of original studies are very imprecise, they create wide prediction intervals-and a broad range of replication effects that are consistent with the original estimates. This may lead to effects that replicate successfully, in that replication results are consistent with statistical expectations, but do not provide much information about the size (or existence) of the true effect. In this light, the results of the Reproducibility Project: Psychology can be viewed as statistically consistent with what one might expect when performing a large-scale replication experiment. © The Author(s) 2016.
Repeatability Modeling for Wind-Tunnel Measurements: Results for Three Langley Facilities
NASA Technical Reports Server (NTRS)
Hemsch, Michael J.; Houlden, Heather P.
2014-01-01
Data from extensive check standard tests of seven measurement processes in three NASA Langley Research Center wind tunnels are statistically analyzed to test a simple model previously presented in 2000 for characterizing short-term, within-test and across-test repeatability. The analysis is intended to support process improvement and development of uncertainty models for the measurements. The analysis suggests that the repeatability can be estimated adequately as a function of only the test section dynamic pressure over a two-orders- of-magnitude dynamic pressure range. As expected for low instrument loading, short-term coefficient repeatability is determined by the resolution of the instrument alone (air off). However, as previously pointed out, for the highest dynamic pressure range the coefficient repeatability appears to be independent of dynamic pressure, thus presenting a lower floor for the standard deviation for all three time frames. The simple repeatability model is shown to be adequate for all of the cases presented and for all three time frames.
Analyzing Hidden Semantics in Social Bookmarking of Open Educational Resources
NASA Astrophysics Data System (ADS)
Minguillón, Julià
Web 2.0 services such as social bookmarking allow users to manage and share the links they find interesting, adding their own tags for describing them. This is especially interesting in the field of open educational resources, as delicious is a simple way to bridge the institutional point of view (i.e. learning object repositories) with the individual one (i.e. personal collections), thus promoting the discovering and sharing of such resources by other users. In this paper we propose a methodology for analyzing such tags in order to discover hidden semantics (i.e. taxonomies and vocabularies) that can be used to improve descriptions of learning objects and make learning object repositories more visible and discoverable. We propose the use of a simple statistical analysis tool such as principal component analysis to discover which tags create clusters that can be semantically interpreted. We will compare the obtained results with a collection of resources related to open educational resources, in order to better understand the real needs of people searching for open educational resources.
NASA Astrophysics Data System (ADS)
Clerc, F.; Njiki-Menga, G.-H.; Witschger, O.
2013-04-01
Most of the measurement strategies that are suggested at the international level to assess workplace exposure to nanomaterials rely on devices measuring, in real time, airborne particles concentrations (according different metrics). Since none of the instruments to measure aerosols can distinguish a particle of interest to the background aerosol, the statistical analysis of time resolved data requires special attention. So far, very few approaches have been used for statistical analysis in the literature. This ranges from simple qualitative analysis of graphs to the implementation of more complex statistical models. To date, there is still no consensus on a particular approach and the current period is always looking for an appropriate and robust method. In this context, this exploratory study investigates a statistical method to analyse time resolved data based on a Bayesian probabilistic approach. To investigate and illustrate the use of the this statistical method, particle number concentration data from a workplace study that investigated the potential for exposure via inhalation from cleanout operations by sandpapering of a reactor producing nanocomposite thin films have been used. In this workplace study, the background issue has been addressed through the near-field and far-field approaches and several size integrated and time resolved devices have been used. The analysis of the results presented here focuses only on data obtained with two handheld condensation particle counters. While one was measuring at the source of the released particles, the other one was measuring in parallel far-field. The Bayesian probabilistic approach allows a probabilistic modelling of data series, and the observed task is modelled in the form of probability distributions. The probability distributions issuing from time resolved data obtained at the source can be compared with the probability distributions issuing from the time resolved data obtained far-field, leading in a quantitative estimation of the airborne particles released at the source when the task is performed. Beyond obtained results, this exploratory study indicates that the analysis of the results requires specific experience in statistics.
ERIC Educational Resources Information Center
Lu, Yonggang; Henning, Kevin S. S.
2013-01-01
Spurred by recent writings regarding statistical pragmatism, we propose a simple, practical approach to introducing students to a new style of statistical thinking that models nature through the lens of data-generating processes, not populations. (Contains 5 figures.)
The probability of transportation accidents
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brobst, W.A.
1972-11-10
We examined the relative safety of different modes of transportation from a statistical basis, rather than an emotional one. As we were collecting data and evaluating its applicability, we found that our own emotions came into play in judging which data would be useful and which data we should discard. We developed a methodology of simple data analysis that would lend itself to similar evaluations to questions. The author described that methodology, and demonstrated its application to shipments of radioactive materials. 31 refs., 7 tabs/
Predicting the Ability of Marine Mammal Populations to Compensate for Behavioral Disturbances
2015-09-30
approaches, including simple theoretical models as well as statistical analysis of data rich conditions. Building on models developed for PCoD [2,3], we...conditions is population trajectory most likely to be affected (the central aim of PCoD ). For the revised model presented here, we include a population...averaged condition individuals (here used as a proxy for individual health as defined in PCoD ), and E is the quality of the environment in which the
Unlikely Fluctuations and Non-Equilibrium Work Theorems-A Simple Example.
Muzikar, Paul
2016-06-30
An exciting development in statistical mechanics has been the elucidation of a series of surprising equalities involving the work done during a nonequilibrium process. Astumian has presented an elegant example of such an equality, involving a colloidal particle undergoing Brownian motion in the presence of gravity. We analyze this example; its simplicity, and its link to geometric Brownian motion, allows us to clarify the inner workings of the equality. Our analysis explicitly shows the important role played by large, unlikely fluctuations.
Impact resistance of fiber composites - Energy-absorbing mechanisms and environmental effects
NASA Technical Reports Server (NTRS)
Chamis, C. C.; Sinclair, J. H.
1985-01-01
Energy absorbing mechanisms were identified by several approaches. The energy absorbing mechanisms considered are those in unidirectional composite beams subjected to impact. The approaches used include: mechanic models, statistical models, transient finite element analysis, and simple beam theory. Predicted results are correlated with experimental data from Charpy impact tests. The environmental effects on impact resistance are evaluated. Working definitions for energy absorbing and energy releasing mechanisms are proposed and a dynamic fracture progression is outlined. Possible generalizations to angle-plied laminates are described.
Impact resistance of fiber composites: Energy absorbing mechanisms and environmental effects
NASA Technical Reports Server (NTRS)
Chamis, C. C.; Sinclair, J. H.
1983-01-01
Energy absorbing mechanisms were identified by several approaches. The energy absorbing mechanisms considered are those in unidirectional composite beams subjected to impact. The approaches used include: mechanic models, statistical models, transient finite element analysis, and simple beam theory. Predicted results are correlated with experimental data from Charpy impact tests. The environmental effects on impact resistance are evaluated. Working definitions for energy absorbing and energy releasing mechanisms are proposed and a dynamic fracture progression is outlined. Possible generalizations to angle-plied laminates are described.
Metrics for comparing neuronal tree shapes based on persistent homology.
Li, Yanjie; Wang, Dingkang; Ascoli, Giorgio A; Mitra, Partha; Wang, Yusu
2017-01-01
As more and more neuroanatomical data are made available through efforts such as NeuroMorpho.Org and FlyCircuit.org, the need to develop computational tools to facilitate automatic knowledge discovery from such large datasets becomes more urgent. One fundamental question is how best to compare neuron structures, for instance to organize and classify large collection of neurons. We aim to develop a flexible yet powerful framework to support comparison and classification of large collection of neuron structures efficiently. Specifically we propose to use a topological persistence-based feature vectorization framework. Existing methods to vectorize a neuron (i.e, convert a neuron to a feature vector so as to support efficient comparison and/or searching) typically rely on statistics or summaries of morphometric information, such as the average or maximum local torque angle or partition asymmetry. These simple summaries have limited power in encoding global tree structures. Based on the concept of topological persistence recently developed in the field of computational topology, we vectorize each neuron structure into a simple yet informative summary. In particular, each type of information of interest can be represented as a descriptor function defined on the neuron tree, which is then mapped to a simple persistence-signature. Our framework can encode both local and global tree structure, as well as other information of interest (electrophysiological or dynamical measures), by considering multiple descriptor functions on the neuron. The resulting persistence-based signature is potentially more informative than simple statistical summaries (such as average/mean/max) of morphometric quantities-Indeed, we show that using a certain descriptor function will give a persistence-based signature containing strictly more information than the classical Sholl analysis. At the same time, our framework retains the efficiency associated with treating neurons as points in a simple Euclidean feature space, which would be important for constructing efficient searching or indexing structures over them. We present preliminary experimental results to demonstrate the effectiveness of our persistence-based neuronal feature vectorization framework.
Metrics for comparing neuronal tree shapes based on persistent homology
Li, Yanjie; Wang, Dingkang; Ascoli, Giorgio A.; Mitra, Partha
2017-01-01
As more and more neuroanatomical data are made available through efforts such as NeuroMorpho.Org and FlyCircuit.org, the need to develop computational tools to facilitate automatic knowledge discovery from such large datasets becomes more urgent. One fundamental question is how best to compare neuron structures, for instance to organize and classify large collection of neurons. We aim to develop a flexible yet powerful framework to support comparison and classification of large collection of neuron structures efficiently. Specifically we propose to use a topological persistence-based feature vectorization framework. Existing methods to vectorize a neuron (i.e, convert a neuron to a feature vector so as to support efficient comparison and/or searching) typically rely on statistics or summaries of morphometric information, such as the average or maximum local torque angle or partition asymmetry. These simple summaries have limited power in encoding global tree structures. Based on the concept of topological persistence recently developed in the field of computational topology, we vectorize each neuron structure into a simple yet informative summary. In particular, each type of information of interest can be represented as a descriptor function defined on the neuron tree, which is then mapped to a simple persistence-signature. Our framework can encode both local and global tree structure, as well as other information of interest (electrophysiological or dynamical measures), by considering multiple descriptor functions on the neuron. The resulting persistence-based signature is potentially more informative than simple statistical summaries (such as average/mean/max) of morphometric quantities—Indeed, we show that using a certain descriptor function will give a persistence-based signature containing strictly more information than the classical Sholl analysis. At the same time, our framework retains the efficiency associated with treating neurons as points in a simple Euclidean feature space, which would be important for constructing efficient searching or indexing structures over them. We present preliminary experimental results to demonstrate the effectiveness of our persistence-based neuronal feature vectorization framework. PMID:28809960
Using a Five-Step Procedure for Inferential Statistical Analyses
ERIC Educational Resources Information Center
Kamin, Lawrence F.
2010-01-01
Many statistics texts pose inferential statistical problems in a disjointed way. By using a simple five-step procedure as a template for statistical inference problems, the student can solve problems in an organized fashion. The problem and its solution will thus be a stand-by-itself organic whole and a single unit of thought and effort. The…
Systematic analysis of coding and noncoding DNA sequences using methods of statistical linguistics
NASA Technical Reports Server (NTRS)
Mantegna, R. N.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We compare the statistical properties of coding and noncoding regions in eukaryotic and viral DNA sequences by adapting two tests developed for the analysis of natural languages and symbolic sequences. The data set comprises all 30 sequences of length above 50 000 base pairs in GenBank Release No. 81.0, as well as the recently published sequences of C. elegans chromosome III (2.2 Mbp) and yeast chromosome XI (661 Kbp). We find that for the three chromosomes we studied the statistical properties of noncoding regions appear to be closer to those observed in natural languages than those of coding regions. In particular, (i) a n-tuple Zipf analysis of noncoding regions reveals a regime close to power-law behavior while the coding regions show logarithmic behavior over a wide interval, while (ii) an n-gram entropy measurement shows that the noncoding regions have a lower n-gram entropy (and hence a larger "n-gram redundancy") than the coding regions. In contrast to the three chromosomes, we find that for vertebrates such as primates and rodents and for viral DNA, the difference between the statistical properties of coding and noncoding regions is not pronounced and therefore the results of the analyses of the investigated sequences are less conclusive. After noting the intrinsic limitations of the n-gram redundancy analysis, we also briefly discuss the failure of the zeroth- and first-order Markovian models or simple nucleotide repeats to account fully for these "linguistic" features of DNA. Finally, we emphasize that our results by no means prove the existence of a "language" in noncoding DNA.
Development of a funding, cost, and spending model for satellite projects
NASA Technical Reports Server (NTRS)
Johnson, Jesse P.
1989-01-01
The need for a predictive budget/funging model is obvious. The current models used by the Resource Analysis Office (RAO) are used to predict the total costs of satellite projects. An effort to extend the modeling capabilities from total budget analysis to total budget and budget outlays over time analysis was conducted. A statistical based and data driven methodology was used to derive and develop the model. Th budget data for the last 18 GSFC-sponsored satellite projects were analyzed and used to build a funding model which would describe the historical spending patterns. This raw data consisted of dollars spent in that specific year and their 1989 dollar equivalent. This data was converted to the standard format used by the RAO group and placed in a database. A simple statistical analysis was performed to calculate the gross statistics associated with project length and project cost ant the conditional statistics on project length and project cost. The modeling approach used is derived form the theory of embedded statistics which states that properly analyzed data will produce the underlying generating function. The process of funding large scale projects over extended periods of time is described by Life Cycle Cost Models (LCCM). The data was analyzed to find a model in the generic form of a LCCM. The model developed is based on a Weibull function whose parameters are found by both nonlinear optimization and nonlinear regression. In order to use this model it is necessary to transform the problem from a dollar/time space to a percentage of total budget/time space. This transformation is equivalent to moving to a probability space. By using the basic rules of probability, the validity of both the optimization and the regression steps are insured. This statistically significant model is then integrated and inverted. The resulting output represents a project schedule which relates the amount of money spent to the percentage of project completion.
Statistics Using Just One Formula
ERIC Educational Resources Information Center
Rosenthal, Jeffrey S.
2018-01-01
This article advocates that introductory statistics be taught by basing all calculations on a single simple margin-of-error formula and deriving all of the standard introductory statistical concepts (confidence intervals, significance tests, comparisons of means and proportions, etc) from that one formula. It is argued that this approach will…
Simple taper: Taper equations for the field forester
David R. Larsen
2017-01-01
"Simple taper" is set of linear equations that are based on stem taper rates; the intent is to provide taper equation functionality to field foresters. The equation parameters are two taper rates based on differences in diameter outside bark at two points on a tree. The simple taper equations are statistically equivalent to more complex equations. The linear...
ERIC Educational Resources Information Center
Nelson, Dean
2009-01-01
Following the Guidelines for Assessment and Instruction in Statistics Education (GAISE) recommendation to use real data, an example is presented in which simple linear regression is used to evaluate the effect of the Montreal Protocol on atmospheric concentration of chlorofluorocarbons. This simple set of data, obtained from a public archive, can…
Statistical process control: separating signal from noise in emergency department operations.
Pimentel, Laura; Barrueto, Fermin
2015-05-01
Statistical process control (SPC) is a visually appealing and statistically rigorous methodology very suitable to the analysis of emergency department (ED) operations. We demonstrate that the control chart is the primary tool of SPC; it is constructed by plotting data measuring the key quality indicators of operational processes in rationally ordered subgroups such as units of time. Control limits are calculated using formulas reflecting the variation in the data points from one another and from the mean. SPC allows managers to determine whether operational processes are controlled and predictable. We review why the moving range chart is most appropriate for use in the complex ED milieu, how to apply SPC to ED operations, and how to determine when performance improvement is needed. SPC is an excellent tool for operational analysis and quality improvement for these reasons: 1) control charts make large data sets intuitively coherent by integrating statistical and visual descriptions; 2) SPC provides analysis of process stability and capability rather than simple comparison with a benchmark; 3) SPC allows distinction between special cause variation (signal), indicating an unstable process requiring action, and common cause variation (noise), reflecting a stable process; and 4) SPC keeps the focus of quality improvement on process rather than individual performance. Because data have no meaning apart from their context, and every process generates information that can be used to improve it, we contend that SPC should be seriously considered for driving quality improvement in emergency medicine. Copyright © 2015 Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Alves, Gelio; Wang, Guanghui; Ogurtsov, Aleksey Y.; Drake, Steven K.; Gucek, Marjan; Suffredini, Anthony F.; Sacks, David B.; Yu, Yi-Kuo
2016-02-01
Correct and rapid identification of microorganisms is the key to the success of many important applications in health and safety, including, but not limited to, infection treatment, food safety, and biodefense. With the advance of mass spectrometry (MS) technology, the speed of identification can be greatly improved. However, the increasing number of microbes sequenced is challenging correct microbial identification because of the large number of choices present. To properly disentangle candidate microbes, one needs to go beyond apparent morphology or simple `fingerprinting'; to correctly prioritize the candidate microbes, one needs to have accurate statistical significance in microbial identification. We meet these challenges by using peptidome profiles of microbes to better separate them and by designing an analysis method that yields accurate statistical significance. Here, we present an analysis pipeline that uses tandem MS (MS/MS) spectra for microbial identification or classification. We have demonstrated, using MS/MS data of 81 samples, each composed of a single known microorganism, that the proposed pipeline can correctly identify microorganisms at least at the genus and species levels. We have also shown that the proposed pipeline computes accurate statistical significances, i.e., E-values for identified peptides and unified E-values for identified microorganisms. The proposed analysis pipeline has been implemented in MiCId, a freely available software for Microorganism Classification and Identification. MiCId is available for download at http://www.ncbi.nlm.nih.gov/CBBresearch/Yu/downloads.html.
Holder, J P; Benedetti, L R; Bradley, D K
2016-11-01
Single hit pulse height analysis is applied to National Ignition Facility x-ray framing cameras to quantify gain and gain variation in a single micro-channel plate-based instrument. This method allows the separation of gain from detectability in these photon-detecting devices. While pulse heights measured by standard-DC calibration methods follow the expected exponential distribution at the limit of a compound-Poisson process, gain-gated pulse heights follow a more complex distribution that may be approximated as a weighted sum of a few exponentials. We can reproduce this behavior with a simple statistical-sampling model.
van der Pas, Stéphanie L; Nelissen, Rob G H H; Fiocco, Marta
2017-08-02
In arthroplasty data, patients with staged bilateral total joint arthroplasty (TJA) pose a problem in statistical analysis. Subgroup analysis, in which patients with unilateral and bilateral TJA are studied separately, is sometimes considered an appropriate solution to the problem; we aim to show that this is not true because of immortal time bias. We reviewed patients who underwent staged (at any time) bilateral TJA. The logical fallacy leading to immortal time bias is explained through a simple artificial data example. The cumulative incidences of revision and death are computed by subgroup analysis and by landmark analysis based on hip replacement data from the Dutch Arthroplasty Register and on simulated data sets. For patients who underwent unilateral TJA, subgroup analysis can lead to an overestimate of the cumulative incidence of death and an underestimate of the cumulative incidence of revision. The reverse conclusion holds for patients who underwent staged bilateral TJA. Analysis of these patients can lead to an underestimate of the cumulative incidence of death and an overestimate of the cumulative incidence of revision. Immortal time bias can be prevented by using landmark analysis. When examining arthroplasty registry data, patients who underwent staged bilateral TJA should be analyzed with caution. An appropriate statistical method to address the research question should be selected.
Asymptotically Optimal and Private Statistical Estimation
NASA Astrophysics Data System (ADS)
Smith, Adam
Differential privacy is a definition of "privacy" for statistical databases. The definition is simple, yet it implies strong semantics even in the presence of an adversary with arbitrary auxiliary information about the database.
Extraction of the proton radius from electron-proton scattering data
Lee, Gabriel; Arrington, John R.; Hill, Richard J.
2015-07-27
We perform a new analysis of electron-proton scattering data to determine the proton electric and magnetic radii, enforcing model-independent constraints from form factor analyticity. A wide-ranging study of possible systematic effects is performed. An improved analysis is developed that rebins data taken at identical kinematic settings and avoids a scaling assumption of systematic errors with statistical errors. Employing standard models for radiative corrections, our improved analysis of the 2010 Mainz A1 Collaboration data yields a proton electric radius r E = 0.895(20) fm and magnetic radius r M = 0.776(38) fm. A similar analysis applied to world data (excluding Mainzmore » data) implies r E = 0.916(24) fm and r M = 0.914(35) fm. The Mainz and world values of the charge radius are consistent, and a simple combination yields a value r E = 0.904(15) fm that is 4σ larger than the CREMA Collaboration muonic hydrogen determination. The Mainz and world values of the magnetic radius differ by 2.7σ, and a simple average yields r M = 0.851(26) fm. As a result, the circumstances under which published muonic hydrogen and electron scattering data could be reconciled are discussed, including a possible deficiency in the standard radiative correction model which requires further analysis.« less
Eksborg, Staffan
2013-01-01
Pharmacokinetic studies are important for optimizing of drug dosing, but requires proper validation of the used pharmacokinetic procedures. However, simple and reliable statistical methods suitable for evaluation of the predictive performance of pharmacokinetic analysis are essentially lacking. The aim of the present study was to construct and evaluate a graphic procedure for quantification of predictive performance of individual and population pharmacokinetic compartment analysis. Original data from previously published pharmacokinetic compartment analyses after intravenous, oral, and epidural administration, and digitized data, obtained from published scatter plots of observed vs predicted drug concentrations from population pharmacokinetic studies using the NPEM algorithm and NONMEM computer program and Bayesian forecasting procedures, were used for estimating the predictive performance according to the proposed graphical method and by the method of Sheiner and Beal. The graphical plot proposed in the present paper proved to be a useful tool for evaluation of predictive performance of both individual and population compartment pharmacokinetic analysis. The proposed method is simple to use and gives valuable information concerning time- and concentration-dependent inaccuracies that might occur in individual and population pharmacokinetic compartment analysis. Predictive performance can be quantified by the fraction of concentration ratios within arbitrarily specified ranges, e.g. within the range 0.8-1.2.
Using Statistical Process Control to Make Data-Based Clinical Decisions.
ERIC Educational Resources Information Center
Pfadt, Al; Wheeler, Donald J.
1995-01-01
Statistical process control (SPC), which employs simple statistical tools and problem-solving techniques such as histograms, control charts, flow charts, and Pareto charts to implement continual product improvement procedures, can be incorporated into human service organizations. Examples illustrate use of SPC procedures to analyze behavioral data…
Self-organization of cosmic radiation pressure instability. II - One-dimensional simulations
NASA Technical Reports Server (NTRS)
Hogan, Craig J.; Woods, Jorden
1992-01-01
The clustering of statistically uniform discrete absorbing particles moving solely under the influence of radiation pressure from uniformly distributed emitters is studied in a simple one-dimensional model. Radiation pressure tends to amplify statistical clustering in the absorbers; the absorbing material is swept into empty bubbles, the biggest bubbles grow bigger almost as they would in a uniform medium, and the smaller ones get crushed and disappear. Numerical simulations of a one-dimensional system are used to support the conjecture that the system is self-organizing. Simple statistics indicate that a wide range of initial conditions produce structure approaching the same self-similar statistical distribution, whose scaling properties follow those of the attractor solution for an isolated bubble. The importance of the process for large-scale structuring of the interstellar medium is briefly discussed.
Length bias correction in gene ontology enrichment analysis using logistic regression.
Mi, Gu; Di, Yanming; Emerson, Sarah; Cumbie, Jason S; Chang, Jeff H
2012-01-01
When assessing differential gene expression from RNA sequencing data, commonly used statistical tests tend to have greater power to detect differential expression of genes encoding longer transcripts. This phenomenon, called "length bias", will influence subsequent analyses such as Gene Ontology enrichment analysis. In the presence of length bias, Gene Ontology categories that include longer genes are more likely to be identified as enriched. These categories, however, are not necessarily biologically more relevant. We show that one can effectively adjust for length bias in Gene Ontology analysis by including transcript length as a covariate in a logistic regression model. The logistic regression model makes the statistical issue underlying length bias more transparent: transcript length becomes a confounding factor when it correlates with both the Gene Ontology membership and the significance of the differential expression test. The inclusion of the transcript length as a covariate allows one to investigate the direct correlation between the Gene Ontology membership and the significance of testing differential expression, conditional on the transcript length. We present both real and simulated data examples to show that the logistic regression approach is simple, effective, and flexible.
S-SPatt: simple statistics for patterns on Markov chains.
Nuel, Grégory
2005-07-01
S-SPatt allows the counting of patterns occurrences in text files and, assuming these texts are generated from a random Markovian source, the computation of the P-value of a given observation using a simple binomial approximation.
Hauber, A Brett; González, Juan Marcos; Groothuis-Oudshoorn, Catharina G M; Prior, Thomas; Marshall, Deborah A; Cunningham, Charles; IJzerman, Maarten J; Bridges, John F P
2016-06-01
Conjoint analysis is a stated-preference survey method that can be used to elicit responses that reveal preferences, priorities, and the relative importance of individual features associated with health care interventions or services. Conjoint analysis methods, particularly discrete choice experiments (DCEs), have been increasingly used to quantify preferences of patients, caregivers, physicians, and other stakeholders. Recent consensus-based guidance on good research practices, including two recent task force reports from the International Society for Pharmacoeconomics and Outcomes Research, has aided in improving the quality of conjoint analyses and DCEs in outcomes research. Nevertheless, uncertainty regarding good research practices for the statistical analysis of data from DCEs persists. There are multiple methods for analyzing DCE data. Understanding the characteristics and appropriate use of different analysis methods is critical to conducting a well-designed DCE study. This report will assist researchers in evaluating and selecting among alternative approaches to conducting statistical analysis of DCE data. We first present a simplistic DCE example and a simple method for using the resulting data. We then present a pedagogical example of a DCE and one of the most common approaches to analyzing data from such a question format-conditional logit. We then describe some common alternative methods for analyzing these data and the strengths and weaknesses of each alternative. We present the ESTIMATE checklist, which includes a list of questions to consider when justifying the choice of analysis method, describing the analysis, and interpreting the results. Copyright © 2016 International Society for Pharmacoeconomics and Outcomes Research (ISPOR). Published by Elsevier Inc. All rights reserved.
[A comparison of convenience sampling and purposive sampling].
Suen, Lee-Jen Wu; Huang, Hui-Man; Lee, Hao-Hsien
2014-06-01
Convenience sampling and purposive sampling are two different sampling methods. This article first explains sampling terms such as target population, accessible population, simple random sampling, intended sample, actual sample, and statistical power analysis. These terms are then used to explain the difference between "convenience sampling" and purposive sampling." Convenience sampling is a non-probabilistic sampling technique applicable to qualitative or quantitative studies, although it is most frequently used in quantitative studies. In convenience samples, subjects more readily accessible to the researcher are more likely to be included. Thus, in quantitative studies, opportunity to participate is not equal for all qualified individuals in the target population and study results are not necessarily generalizable to this population. As in all quantitative studies, increasing the sample size increases the statistical power of the convenience sample. In contrast, purposive sampling is typically used in qualitative studies. Researchers who use this technique carefully select subjects based on study purpose with the expectation that each participant will provide unique and rich information of value to the study. As a result, members of the accessible population are not interchangeable and sample size is determined by data saturation not by statistical power analysis.
Spatial Statistics for Tumor Cell Counting and Classification
NASA Astrophysics Data System (ADS)
Wirjadi, Oliver; Kim, Yoo-Jin; Breuel, Thomas
To count and classify cells in histological sections is a standard task in histology. One example is the grading of meningiomas, benign tumors of the meninges, which requires to assess the fraction of proliferating cells in an image. As this process is very time consuming when performed manually, automation is required. To address such problems, we propose a novel application of Markov point process methods in computer vision, leading to algorithms for computing the locations of circular objects in images. In contrast to previous algorithms using such spatial statistics methods in image analysis, the present one is fully trainable. This is achieved by combining point process methods with statistical classifiers. Using simulated data, the method proposed in this paper will be shown to be more accurate and more robust to noise than standard image processing methods. On the publicly available SIMCEP benchmark for cell image analysis algorithms, the cell count performance of the present paper is significantly more accurate than results published elsewhere, especially when cells form dense clusters. Furthermore, the proposed system performs as well as a state-of-the-art algorithm for the computer-aided histological grading of meningiomas when combined with a simple k-nearest neighbor classifier for identifying proliferating cells.
Chair-side detection of Prevotella Intermedia in mature dental plaque by its fluorescence.
Nomura, Yoshiaki; Takeuchi, Hiroaki; Okamoto, Masaaki; Sogabe, Kaoru; Okada, Ayako; Hanada, Nobuhiro
2017-06-01
Prevotella intermedia/nigrescens is one of the well-known pathogens causing periodontal diseases, and the red florescence excited by the visible blue light caused by the protoporphyrin IX in the bacterial cells could be useful for the chair-side detection. The aim of this study was to evaluated levels of periodontal pathogen, especially P. intermedia in clinical samples of red fluorescent dental plaque. Thirty two supra gingival plaque samples from six individuals were measured its fluorescence at 640nm wavelength excited by 409nm. Periodontopathic bacteria were counted by the Invader PLUS PCR assay. Co-relations the fluorescence intensity and bacterial counts were analyzed by Person's correlation coefficient and simple and multiple regression analysis. Positive and negative predictive values of the fluorescence intensities for with or without P. intermedia in supragingival plaque was calculated. When relative fluorescence unit (RFU) were logarithmic transformed, statistically significant linear relations between RFU and bacterial counts were obtained for P. intermedia, Porphyromonas gingivalis and Tannerella forsythia. By the multiple regression analysis, only P. intermedia had statistically significant co-relation with fluorescence intensities. All of the fluorescent dental plaque contained P. intermedia m. In contrast, 28% of non-fluorescent plaques contained P. intermedia. To check the fluorescence dental plaque in the oral cavity could be the simple chair-side screening of the mature dental plaque before examining the periodontal pathogens especially P. intermedia by the PCR method. Copyright © 2017 Elsevier B.V. All rights reserved.
Evaluating surrogate endpoints, prognostic markers, and predictive markers — some simple themes
Baker, Stuart G.; Kramer, Barnett S.
2014-01-01
Background A surrogate endpoint is an endpoint observed earlier than the true endpoint (a health outcome) that is used to draw conclusions about the effect of treatment on the unobserved true endpoint. A prognostic marker is a marker for predicting the risk of an event given a control treatment; it informs treatment decisions when there is information on anticipated benefits and harms of a new treatment applied to persons at high risk. A predictive marker is a marker for predicting the effect of treatment on outcome in a subgroup of patients or study participants; it provides more rigorous information for treatment selection than a prognostic marker when it is based on estimated treatment effects in a randomized trial. Methods We organized our discussion around a different theme for each topic. Results “Fundamentally an extrapolation” refers to the non-statistical considerations and assumptions needed when using surrogate endpoints to evaluate a new treatment. “Decision analysis to the rescue” refers to use the use of decision analysis to evaluate an additional prognostic marker because it is not possible to choose between purely statistical measures of marker performance. “The appeal of simplicity” refers to a straightforward and efficient use of a single randomized trial to evaluate overall treatment effect and treatment effect within subgroups using predictive markers. Conclusion The simple themes provide a general guideline for evaluation of surrogate endpoints, prognostic markers, and predictive markers. PMID:25385934
Xiao, Yong; Gu, Xiaomin; Yin, Shiyang; Shao, Jingli; Cui, Yali; Zhang, Qiulan; Niu, Yong
2016-01-01
Based on the geo-statistical theory and ArcGIS geo-statistical module, datas of 30 groundwater level observation wells were used to estimate the decline of groundwater level in Beijing piedmont. Seven different interpolation methods (inverse distance weighted interpolation, global polynomial interpolation, local polynomial interpolation, tension spline interpolation, ordinary Kriging interpolation, simple Kriging interpolation and universal Kriging interpolation) were used for interpolating groundwater level between 2001 and 2013. Cross-validation, absolute error and coefficient of determination (R(2)) was applied to evaluate the accuracy of different methods. The result shows that simple Kriging method gave the best fit. The analysis of spatial and temporal variability suggest that the nugget effects from 2001 to 2013 were increasing, which means the spatial correlation weakened gradually under the influence of human activities. The spatial variability in the middle areas of the alluvial-proluvial fan is relatively higher than area in top and bottom. Since the changes of the land use, groundwater level also has a temporal variation, the average decline rate of groundwater level between 2007 and 2013 increases compared with 2001-2006. Urban development and population growth cause over-exploitation of residential and industrial areas. The decline rate of the groundwater level in residential, industrial and river areas is relatively high, while the decreasing of farmland area and development of water-saving irrigation reduce the quantity of water using by agriculture and decline rate of groundwater level in agricultural area is not significant.
Immature germ cells in semen - correlation with total sperm count and sperm motility.
Patil, Priya S; Humbarwadi, Rajendra S; Patil, Ashalata D; Gune, Anita R
2013-07-01
Current data regarding infertility suggests that male factor contributes up to 30% of the total cases of infertility. Semen analysis reveals the presence of spermatozoa as well as a number of non-sperm cells, presently being mentioned in routine semen report as "round cells" without further differentiating them into leucocytes or immature germ cells. The aim of this work was to study a simple, cost-effective, and convenient method for differentiating the round cells in semen into immature germ cells and leucocytes and correlating them with total sperm counts and motility. Semen samples from 120 males, who had come for investigation for infertility, were collected, semen parameters recorded, and stained smears studied for different round cells. Statistical analysis of the data was done to correlate total sperm counts and sperm motility with the occurrence of immature germ cells and leucocytes. The average shedding of immature germ cells in different groups with normal and low sperm counts was compared. The clinical significance of "round cells" in semen and their differentiation into leucocytes and immature germ cells are discussed. Round cells in semen can be differentiated into immature germ cells and leucocytes using simple staining methods. The differential counts mentioned in a semen report give valuable and clinically relevant information. In this study, we observed a negative correlation between total count and immature germ cells, as well as sperm motility and shedding of immature germ cells. The latter was statistically significant with a P value 0.000.
Missing data imputation: focusing on single imputation.
Zhang, Zhongheng
2016-01-01
Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations.
Analysis of variances of quasirapidities in collisions of gold nuclei with track-emulsion nuclei
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gulamov, K. G.; Zhokhova, S. I.; Lugovoi, V. V., E-mail: lugovoi@uzsci.net
2012-08-15
A new method of an analysis of variances was developed for studying n-particle correlations of quasirapidities in nucleus-nucleus collisions for a large constant number n of particles. Formulas that generalize the results of the respective analysis to various values of n were derived. Calculations on the basis of simple models indicate that the method is applicable, at least for n {>=} 100. Quasirapidity correlations statistically significant at a level of 36 standard deviations were discovered in collisions between gold nuclei and track-emulsion nuclei at an energy of 10.6 GeV per nucleon. The experimental data obtained in our present study aremore » contrasted against the theory of nucleus-nucleus collisions.« less
Missing data imputation: focusing on single imputation
2016-01-01
Complete case analysis is widely used for handling missing data, and it is the default method in many statistical packages. However, this method may introduce bias and some useful information will be omitted from analysis. Therefore, many imputation methods are developed to make gap end. The present article focuses on single imputation. Imputations with mean, median and mode are simple but, like complete case analysis, can introduce bias on mean and deviation. Furthermore, they ignore relationship with other variables. Regression imputation can preserve relationship between missing values and other variables. There are many sophisticated methods exist to handle missing values in longitudinal data. This article focuses primarily on how to implement R code to perform single imputation, while avoiding complex mathematical calculations. PMID:26855945
Testing averaged cosmology with type Ia supernovae and BAO data
DOE Office of Scientific and Technical Information (OSTI.GOV)
Santos, B.; Alcaniz, J.S.; Coley, A.A.
An important problem in precision cosmology is the determination of the effects of averaging and backreaction on observational predictions, particularly in view of the wealth of new observational data and improved statistical techniques. In this paper, we discuss the observational viability of a class of averaged cosmologies which consist of a simple parametrized phenomenological two-scale backreaction model with decoupled spatial curvature parameters. We perform a Bayesian model selection analysis and find that this class of averaged phenomenological cosmological models is favored with respect to the standard ΛCDM cosmological scenario when a joint analysis of current SNe Ia and BAO datamore » is performed. In particular, the analysis provides observational evidence for non-trivial spatial curvature.« less
Reply to "Comment on `Third law of thermodynamics as a key test of generalized entropies' "
NASA Astrophysics Data System (ADS)
Bento, E. P.; Viswanathan, G. M.; da Luz, M. G. E.; Silva, R.
2015-07-01
In Bento et al. [Phys. Rev. E 91, 039901 (2015), 10.1103/PhysRevE.91.039901] we develop a method to verify if an arbitrary generalized statistics does or does not obey the third law of thermodynamics. As examples, we address two important formulations, Kaniadakis and Tsallis. In their Comment on the paper, Bagci and Oikonomou suggest that our examination of the Tsallis statistics is valid only for q ≥1 , using arguments like there is no distribution maximizing the Tsallis entropy for the interval q <0 (in which the third law is not verified) compatible with the problem energy expression. In this Reply, we first (and most importantly) show that the Comment misses the point. In our original work we have considered the now already standard construction of the Tsallis statistics. So, if indeed such statistics lacks a maximization principle (a fact irrelevant in our protocol), this is an inherent feature of the statistics itself and not a problem with our analysis. Second, some arguments used by Bagci and Oikonomou (for 0
Conceptual and statistical problems associated with the use of diversity indices in ecology.
Barrantes, Gilbert; Sandoval, Luis
2009-09-01
Diversity indices, particularly the Shannon-Wiener index, have extensively been used in analyzing patterns of diversity at different geographic and ecological scales. These indices have serious conceptual and statistical problems which make comparisons of species richness or species abundances across communities nearly impossible. There is often no a single statistical method that retains all information needed to answer even a simple question. However, multivariate analyses could be used instead of diversity indices, such as cluster analyses or multiple regressions. More complex multivariate analyses, such as Canonical Correspondence Analysis, provide very valuable information on environmental variables associated to the presence and abundance of the species in a community. In addition, particular hypotheses associated to changes in species richness across localities, or change in abundance of one, or a group of species can be tested using univariate, bivariate, and/or rarefaction statistical tests. The rarefaction method has proved to be robust to standardize all samples to a common size. Even the simplest method as reporting the number of species per taxonomic category possibly provides more information than a diversity index value.
An investigation into the causes of stratospheric ozone loss in the southern Australasian region
NASA Astrophysics Data System (ADS)
Lehmann, P.; Karoly, D. J.; Newmann, P. A.; Clarkson, T. S.; Matthews, W. A.
1992-07-01
Measurements of total ozone at Macquarie Island (55 deg S, 159 deg E) reveal statistically significant reductions of approximately twelve percent during July to September when comparing the mean levels for 1987-90 with those in the seventies. In order to investigate the possibility that these ozone changes may not be a result of dynamic variability of the stratosphere, a simple linear model of ozone was created from statistical analysis of tropopause height and isentropic transient eddy heat flux, which were assumed representative of the dominant dynamic influences. Comparison of measured and modeled ozone indicates that the recent downward trend in ozone at Macquarie Island is not related to stratospheric dynamic variability and therefore suggests another mechanism, possibly changes in photochemical destruction of ozone.
Amirian, Mohammad-Elyas; Fazilat-Pour, Masoud
2016-08-01
The present study examined simple and multivariate relationships of spiritual intelligence with general health and happiness. The employed method was descriptive and correlational. King's Spiritual Quotient scales, GHQ-28 and Oxford Happiness Inventory, are filled out by a sample consisted of 384 students, which were selected using stratified random sampling from the students of Shahid Bahonar University of Kerman. Data are subjected to descriptive and inferential statistics including correlations and multivariate regressions. Bivariate correlations support positive and significant predictive value of spiritual intelligence toward general health and happiness. Further analysis showed that among the Spiritual Intelligence' subscales, Existential Critical Thinking Predicted General Health and Happiness, reversely. In addition, happiness was positively predicted by generation of personal meaning and transcendental awareness. The findings are discussed in line with the previous studies and the relevant theoretical background.
Scaling of drizzle virga depth with cloud thickness for marine stratocumulus clouds
Yang, Fan; Luke, Edward P.; Kollias, Pavlos; ...
2018-04-20
Drizzle plays a crucial role in cloud lifetime and radiation properties of marine stratocumulus clouds. Understanding where drizzle exists in the sub-cloud layer, which depends on drizzle virga depth, can help us better understand where below-cloud scavenging and evaporative cooling and moisturizing occur. In this study, we examine the statistical properties of drizzle frequency and virga depth of marine stratocumulus based on unique ground-based remote sensing data. Results show that marine stratocumulus clouds are drizzling nearly all the time. In addition, we derive a simple scaling analysis between drizzle virga thickness and cloud thickness. Our analytical expression agrees with themore » observational data reasonable well, which suggests that our formula provides a simple parameterization for drizzle virga of stratocumulus clouds suitable for use in other models.« less
Matrix population models from 20 studies of perennial plant populations
Ellis, Martha M.; Williams, Jennifer L.; Lesica, Peter; Bell, Timothy J.; Bierzychudek, Paulette; Bowles, Marlin; Crone, Elizabeth E.; Doak, Daniel F.; Ehrlen, Johan; Ellis-Adam, Albertine; McEachern, Kathryn; Ganesan, Rengaian; Latham, Penelope; Luijten, Sheila; Kaye, Thomas N.; Knight, Tiffany M.; Menges, Eric S.; Morris, William F.; den Nijs, Hans; Oostermeijer, Gerard; Quintana-Ascencio, Pedro F.; Shelly, J. Stephen; Stanley, Amanda; Thorpe, Andrea; Tamara, Ticktin; Valverde, Teresa; Weekley, Carl W.
2012-01-01
Demographic transition matrices are one of the most commonly applied population models for both basic and applied ecological research. The relatively simple framework of these models and simple, easily interpretable summary statistics they produce have prompted the wide use of these models across an exceptionally broad range of taxa. Here, we provide annual transition matrices and observed stage structures/population sizes for 20 perennial plant species which have been the focal species for long-term demographic monitoring. These data were assembled as part of the "Testing Matrix Models" working group through the National Center for Ecological Analysis and Synthesis (NCEAS). In sum, these data represent 82 populations with >460 total population-years of data. It is our hope that making these data available will help promote and improve our ability to monitor and understand plant population dynamics.
Matrix population models from 20 studies of perennial plant populations
Ellis, Martha M.; Williams, Jennifer L.; Lesica, Peter; Bell, Timothy J.; Bierzychudek, Paulette; Bowles, Marlin; Crone, Elizabeth E.; Doak, Daniel F.; Ehrlen, Johan; Ellis-Adam, Albertine; McEachern, Kathryn; Ganesan, Rengaian; Latham, Penelope; Luijten, Sheila; Kaye, Thomas N.; Knight, Tiffany M.; Menges, Eric S.; Morris, William F.; den Nijs, Hans; Oostermeijer, Gerard; Quintana-Ascencio, Pedro F.; Shelly, J. Stephen; Stanley, Amanda; Thorpe, Andrea; Tamara, Ticktin; Valverde, Teresa; Weekley, Carl W.
2012-01-01
Demographic transition matrices are one of the most commonly applied population models for both basic and applied ecological research. The relatively simple framework of these models and simple, easily interpretable summary statistics they produce have prompted the wide use of these models across an exceptionally broad range of taxa. Here, we provide annual transition matrices and observed stage structures/population sizes for 20 perennial plant species which have been the focal species for long-term demographic monitoring. These data were assembled as part of the 'Testing Matrix Models' working group through the National Center for Ecological Analysis and Synthesis (NCEAS). In sum, these data represent 82 populations with >460 total population-years of data. It is our hope that making these data available will help promote and improve our ability to monitor and understand plant population dynamics.
Scaling of drizzle virga depth with cloud thickness for marine stratocumulus clouds
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, Fan; Luke, Edward P.; Kollias, Pavlos
Drizzle plays a crucial role in cloud lifetime and radiation properties of marine stratocumulus clouds. Understanding where drizzle exists in the sub-cloud layer, which depends on drizzle virga depth, can help us better understand where below-cloud scavenging and evaporative cooling and moisturizing occur. In this study, we examine the statistical properties of drizzle frequency and virga depth of marine stratocumulus based on unique ground-based remote sensing data. Results show that marine stratocumulus clouds are drizzling nearly all the time. In addition, we derive a simple scaling analysis between drizzle virga thickness and cloud thickness. Our analytical expression agrees with themore » observational data reasonable well, which suggests that our formula provides a simple parameterization for drizzle virga of stratocumulus clouds suitable for use in other models.« less
The non-equilibrium statistical mechanics of a simple geophysical fluid dynamics model
NASA Astrophysics Data System (ADS)
Verkley, Wim; Severijns, Camiel
2014-05-01
Lorenz [1] has devised a dynamical system that has proved to be very useful as a benchmark system in geophysical fluid dynamics. The system in its simplest form consists of a periodic array of variables that can be associated with an atmospheric field on a latitude circle. The system is driven by a constant forcing, is damped by linear friction and has a simple advection term that causes the model to behave chaotically if the forcing is large enough. Our aim is to predict the statistics of Lorenz' model on the basis of a given average value of its total energy - obtained from a numerical integration - and the assumption of statistical stationarity. Our method is the principle of maximum entropy [2] which in this case reads: the information entropy of the system's probability density function shall be maximal under the constraints of normalization, a given value of the average total energy and statistical stationarity. Statistical stationarity is incorporated approximately by using `stationarity constraints', i.e., by requiring that the average first and possibly higher-order time-derivatives of the energy are zero in the maximization of entropy. The analysis [3] reveals that, if the first stationarity constraint is used, the resulting probability density function rather accurately reproduces the statistics of the individual variables. If the second stationarity constraint is used as well, the correlations between the variables are also reproduced quite adequately. The method can be generalized straightforwardly and holds the promise of a viable non-equilibrium statistical mechanics of the forced-dissipative systems of geophysical fluid dynamics. [1] E.N. Lorenz, 1996: Predictability - A problem partly solved, in Proc. Seminar on Predictability (ECMWF, Reading, Berkshire, UK), Vol. 1, pp. 1-18. [2] E.T. Jaynes, 2003: Probability Theory - The Logic of Science (Cambridge University Press, Cambridge). [3] W.T.M. Verkley and C.A. Severijns, 2014: The maximum entropy principle applied to a dynamical system proposed by Lorenz, Eur. Phys. J. B, 87:7, http://dx.doi.org/10.1140/epjb/e2013-40681-2 (open access).
A Monte Carlo–Based Bayesian Approach for Measuring Agreement in a Qualitative Scale
Pérez Sánchez, Carlos Javier
2014-01-01
Agreement analysis has been an active research area whose techniques have been widely applied in psychology and other fields. However, statistical agreement among raters has been mainly considered from a classical statistics point of view. Bayesian methodology is a viable alternative that allows the inclusion of subjective initial information coming from expert opinions, personal judgments, or historical data. A Bayesian approach is proposed by providing a unified Monte Carlo–based framework to estimate all types of measures of agreement in a qualitative scale of response. The approach is conceptually simple and it has a low computational cost. Both informative and non-informative scenarios are considered. In case no initial information is available, the results are in line with the classical methodology, but providing more information on the measures of agreement. For the informative case, some guidelines are presented to elicitate the prior distribution. The approach has been applied to two applications related to schizophrenia diagnosis and sensory analysis. PMID:29881002
Distribution of the two-sample t-test statistic following blinded sample size re-estimation.
Lu, Kaifeng
2016-05-01
We consider the blinded sample size re-estimation based on the simple one-sample variance estimator at an interim analysis. We characterize the exact distribution of the standard two-sample t-test statistic at the final analysis. We describe a simulation algorithm for the evaluation of the probability of rejecting the null hypothesis at given treatment effect. We compare the blinded sample size re-estimation method with two unblinded methods with respect to the empirical type I error, the empirical power, and the empirical distribution of the standard deviation estimator and final sample size. We characterize the type I error inflation across the range of standardized non-inferiority margin for non-inferiority trials, and derive the adjusted significance level to ensure type I error control for given sample size of the internal pilot study. We show that the adjusted significance level increases as the sample size of the internal pilot study increases. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
NASA Technical Reports Server (NTRS)
Matney, Mark
2011-01-01
A number of statistical tools have been developed over the years for assessing the risk of reentering objects to human populations. These tools make use of the characteristics (e.g., mass, material, shape, size) of debris that are predicted by aerothermal models to survive reentry. The statistical tools use this information to compute the probability that one or more of the surviving debris might hit a person on the ground and cause one or more casualties. The statistical portion of the analysis relies on a number of assumptions about how the debris footprint and the human population are distributed in latitude and longitude, and how to use that information to arrive at realistic risk numbers. Because this information is used in making policy and engineering decisions, it is important that these assumptions be tested using empirical data. This study uses the latest database of known uncontrolled reentry locations measured by the United States Department of Defense. The predicted ground footprint distributions of these objects are based on the theory that their orbits behave basically like simple Kepler orbits. However, there are a number of factors in the final stages of reentry - including the effects of gravitational harmonics, the effects of the Earth s equatorial bulge on the atmosphere, and the rotation of the Earth and atmosphere - that could cause them to diverge from simple Kepler orbit behavior and possibly change the probability of reentering over a given location. In this paper, the measured latitude and longitude distributions of these objects are directly compared with the predicted distributions, providing a fundamental empirical test of the model assumptions.
NASA Astrophysics Data System (ADS)
Goodman, Joseph W.
2000-07-01
The Wiley Classics Library consists of selected books that have become recognized classics in their respective fields. With these new unabridged and inexpensive editions, Wiley hopes to extend the life of these important works by making them available to future generations of mathematicians and scientists. Currently available in the Series: T. W. Anderson The Statistical Analysis of Time Series T. S. Arthanari & Yadolah Dodge Mathematical Programming in Statistics Emil Artin Geometric Algebra Norman T. J. Bailey The Elements of Stochastic Processes with Applications to the Natural Sciences Robert G. Bartle The Elements of Integration and Lebesgue Measure George E. P. Box & Norman R. Draper Evolutionary Operation: A Statistical Method for Process Improvement George E. P. Box & George C. Tiao Bayesian Inference in Statistical Analysis R. W. Carter Finite Groups of Lie Type: Conjugacy Classes and Complex Characters R. W. Carter Simple Groups of Lie Type William G. Cochran & Gertrude M. Cox Experimental Designs, Second Edition Richard Courant Differential and Integral Calculus, Volume I RIchard Courant Differential and Integral Calculus, Volume II Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume I Richard Courant & D. Hilbert Methods of Mathematical Physics, Volume II D. R. Cox Planning of Experiments Harold S. M. Coxeter Introduction to Geometry, Second Edition Charles W. Curtis & Irving Reiner Representation Theory of Finite Groups and Associative Algebras Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume I Charles W. Curtis & Irving Reiner Methods of Representation Theory with Applications to Finite Groups and Orders, Volume II Cuthbert Daniel Fitting Equations to Data: Computer Analysis of Multifactor Data, Second Edition Bruno de Finetti Theory of Probability, Volume I Bruno de Finetti Theory of Probability, Volume 2 W. Edwards Deming Sample Design in Business Research
Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs
2011-01-01
This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages.
Using Data from Climate Science to Teach Introductory Statistics
ERIC Educational Resources Information Center
Witt, Gary
2013-01-01
This paper shows how the application of simple statistical methods can reveal to students important insights from climate data. While the popular press is filled with contradictory opinions about climate science, teachers can encourage students to use introductory-level statistics to analyze data for themselves on this important issue in public…
Using R in Introductory Statistics Courses with the pmg Graphical User Interface
ERIC Educational Resources Information Center
Verzani, John
2008-01-01
The pmg add-on package for the open source statistics software R is described. This package provides a simple to use graphical user interface (GUI) that allows introductory statistics students, without advanced computing skills, to quickly create the graphical and numeric summaries expected of them. (Contains 9 figures.)
A study of the impact of a simple stimulus on a receiver's imagination in mediated communication.
Ambe, Mioko; Kamada, Mikio; Ono, Masumi; Shibata, Toyohiko
2005-10-01
Mediated communication involves a form of intimate partnership where, as in the case of face-to-face intimate relationships, parties have a strong desire to exchange emotion and ensure a connection by way of receiving and responding to personal messages. So, in mediated communication, although partners have an effective means of conveying a connection, they still are in need of an equally effective means of conveying emotional state; they need a so-called "emotion-related channel." A desire to develop an efficient means of conveying emotion in mediated communication has driven this study. A study was carried out to determine the effects of a simple stimulus on one's imagination when subjects considered the simple stimulus to be a message from an intimate partner. Twenty-one subjects were first subjected to a simple pattern stimulus. They then experienced the same stimulus as a message received from an intimate partner in mediated communication. They subsequently answered questionnaires on their impressions of the stimulus, the emotional states of their imagined partners, and their own emotional states. A statistical analysis was then carried out. From a close examination of the findings, some interesting points were discovered. One important finding is that from the simple stimulus, subjects were able to imagine not only an intimate partner but also the emotional state of that partner. This and other findings lend support to the notion that a simple stimulus could serve as an emotion-related channel, in mediated communication.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Li, Jie; Li, Rui; You, Leiming; Xu, Anlong; Fu, Yonggui; Huang, Shengfeng
2015-01-01
Switching between different alternative polyadenylation (APA) sites plays an important role in the fine tuning of gene expression. New technologies for the execution of 3’-end enriched RNA-seq allow genome-wide detection of the genes that exhibit significant APA site switching between different samples. Here, we show that the independence test gives better results than the linear trend test in detecting APA site-switching events. Further examination suggests that the discrepancy between these two statistical methods arises from complex APA site-switching events that cannot be represented by a simple change of average 3’-UTR length. In theory, the linear trend test is only effective in detecting these simple changes. We classify the switching events into four switching patterns: two simple patterns (3’-UTR shortening and lengthening) and two complex patterns. By comparing the results of the two statistical methods, we show that complex patterns account for 1/4 of all observed switching events that happen between normal and cancerous human breast cell lines. Because simple and complex switching patterns may convey different biological meanings, they merit separate study. We therefore propose to combine both the independence test and the linear trend test in practice. First, the independence test should be used to detect APA site switching; second, the linear trend test should be invoked to identify simple switching events; and third, those complex switching events that pass independence testing but fail linear trend testing can be identified. PMID:25875641
Techniques for estimating selected streamflow characteristics of rural unregulated streams in Ohio
Koltun, G.F.; Whitehead, Matthew T.
2002-01-01
This report provides equations for estimating mean annual streamflow, mean monthly streamflows, harmonic mean streamflow, and streamflow quartiles (the 25th-, 50th-, and 75th-percentile streamflows) as a function of selected basin characteristics for rural, unregulated streams in Ohio. The equations were developed from streamflow statistics and basin-characteristics data for as many as 219 active or discontinued streamflow-gaging stations on rural, unregulated streams in Ohio with 10 or more years of homogenous daily streamflow record. Streamflow statistics and basin-characteristics data for the 219 stations are presented in this report. Simple equations (based on drainage area only) and best-fit equations (based on drainage area and at least two other basin characteristics) were developed by means of ordinary least-squares regression techniques. Application of the best-fit equations generally involves quantification of basin characteristics that require or are facilitated by use of a geographic information system. In contrast, the simple equations can be used with information that can be obtained without use of a geographic information system; however, the simple equations have larger prediction errors than the best-fit equations and exhibit geographic biases for most streamflow statistics. The best-fit equations should be used instead of the simple equations whenever possible.
Partl, Richard; Fastner, Gerd; Kaiser, Julia; Kronhuber, Elisabeth; Cetin-Strohmer, Klaudia; Steffal, Claudia; Böhmer-Breitfelder, Barbara; Mayer, Johannes; Avian, Alexander; Berghold, Andrea
2016-02-01
Low Karnofsky performance status (KPS) and elevated lactate dehydrogenases (LDHs) as a surrogate marker for tumor load and cell turnover may depict patients with a very short life expectancy. To validate this finding and compare it to other indices, namely, the recursive partitioning analysis (RPA) and diagnosis-specific graded prognostic assessment (DS-GPA), a multicenter analysis was undertaken. A retrospective analysis of 234 metastatic melanoma patients uniformly treated with palliative whole brain radiotherapy (WBRT) was done. Univariate and multivariate analyses were used to determine the impact of patient-, tumor-, and treatment-related parameters on overall survival (OS). KPS and LDH emerged as independent factors predicting OS. By combining KPS and LDH values (KPS/LDH index), groups of patients with statistically significant differences in median OS (days; 95 % CI) after onset of WBRT were identified: group 1 (KPS ≥ 70/normal LDH) 234 (96-372), group 2 (KPS ≥ 70/elevated LDH) 112 (69-155), group 3 (KPS <70/normal LDH) 43 (12-74), and group 4 (KPS <70/elevated LDH) 29 (17-41). Between all four groups, statistically significant differences were observed. The RPA and DS-GPA indices failed to distinguish significantly between good and moderate prognosis and were inferior in predicting a very unfavorable prognosis. The parameters KPS and LDH independently impacted on OS. The combination of both (KPS/LDH index) identified patients with a very short life expectancy, who might be better served by recommending best supportive care instead of WBRT. The KPS/LDH index is simple and effective in terms of time and cost as compared to other prognostic indices.
Quantitative analysis of tympanic membrane perforation: a simple and reliable method.
Ibekwe, T S; Adeosun, A A; Nwaorgu, O G
2009-01-01
Accurate assessment of the features of tympanic membrane perforation, especially size, site, duration and aetiology, is important, as it enables optimum management. To describe a simple, cheap and effective method of quantitatively analysing tympanic membrane perforations. The system described comprises a video-otoscope (capable of generating still and video images of the tympanic membrane), adapted via a universal serial bus box to a computer screen, with images analysed using the Image J geometrical analysis software package. The reproducibility of results and their correlation with conventional otoscopic methods of estimation were tested statistically with the paired t-test and correlational tests, using the Statistical Package for the Social Sciences version 11 software. The following equation was generated: P/T x 100 per cent = percentage perforation, where P is the area (in pixels2) of the tympanic membrane perforation and T is the total area (in pixels2) for the entire tympanic membrane (including the perforation). Illustrations are shown. Comparison of blinded data on tympanic membrane perforation area obtained independently from assessments by two trained otologists, of comparative years of experience, using the video-otoscopy system described, showed similar findings, with strong correlations devoid of inter-observer error (p = 0.000, r = 1). Comparison with conventional otoscopic assessment also indicated significant correlation, comparing results for two trained otologists, but some inter-observer variation was present (p = 0.000, r = 0.896). Correlation between the two methods for each of the otologists was also highly significant (p = 0.000). A computer-adapted video-otoscope, with images analysed by Image J software, represents a cheap, reliable, technology-driven, clinical method of quantitative analysis of tympanic membrane perforations and injuries.
Kidney function estimating equations in patients with chronic kidney disease.
Hojs, R; Bevc, S; Ekart, R; Gorenjak, M; Puklavec, L
2011-04-01
The current guidelines emphasise the need to assess kidney function using predictive equations rather than just serum creatinine. The present study compares serum cystatin C-based equations and serum creatinine-based equations in patients with chronic kidney disease (CKD). Seven hundred and sixty-four adult patients with CKD were enrolled. In each patient serum creatinine and serum cystatin C were determined. Their glomerular filtration rate (GFR) was estimated using three serum creatinine-based equations [Cockcroft-Gault (C&G), modification of diet in renal disease (MDRD) and the Chronic Kidney Disease Epidemiology Collaboration equation (CKD-EPI)] and two serum cystatin C-based equations [our own cystatin C formula (GFR=90.63 × cystatin C(-1.192) ) and simple cystatin C formula (GFR=100/cystatin C)]. The GFR was measured using (51) CrEDTA clearance. Statistically significant correlation between (51) CrEDTA clearance with serum creatinine, serum cystatin C and all observed formulas was found. The receiver operating characteristic curve analysis (cut-off for GFR 60 ml/min/1.73m(2)) showed that serum cystatin C and both cystatin C formulas had a higher diagnostic accuracy than C&G formula. Bland and Altman analysis for the same cut-off value showed that all formulas except simple cystatin C formula underestimated measured GFR. The accuracy within 30% of estimated (51) CrEDTA clearance values differs according to stages of CKD. Analysis of ability to correctly predict patient's GFR below or above 60 ml/min/1.73m(2) showed statistically significant higher ability for both cystatin C formulas compared to MDRD formula. Our results indicate that serum cystatin C-based equations are reliable markers of GFR comparable with creatinine-based formulas. © 2011 Blackwell Publishing Ltd.
Moore, Jason H; Amos, Ryan; Kiralis, Jeff; Andrews, Peter C
2015-01-01
Simulation plays an essential role in the development of new computational and statistical methods for the genetic analysis of complex traits. Most simulations start with a statistical model using methods such as linear or logistic regression that specify the relationship between genotype and phenotype. This is appealing due to its simplicity and because these statistical methods are commonly used in genetic analysis. It is our working hypothesis that simulations need to move beyond simple statistical models to more realistically represent the biological complexity of genetic architecture. The goal of the present study was to develop a prototype genotype–phenotype simulation method and software that are capable of simulating complex genetic effects within the context of a hierarchical biology-based framework. Specifically, our goal is to simulate multilocus epistasis or gene–gene interaction where the genetic variants are organized within the framework of one or more genes, their regulatory regions and other regulatory loci. We introduce here the Heuristic Identification of Biological Architectures for simulating Complex Hierarchical Interactions (HIBACHI) method and prototype software for simulating data in this manner. This approach combines a biological hierarchy, a flexible mathematical framework, a liability threshold model for defining disease endpoints, and a heuristic search strategy for identifying high-order epistatic models of disease susceptibility. We provide several simulation examples using genetic models exhibiting independent main effects and three-way epistatic effects. PMID:25395175
Cox, Tony; Popken, Douglas; Ricci, Paolo F
2013-01-01
Exposures to fine particulate matter (PM2.5) in air (C) have been suspected of contributing causally to increased acute (e.g., same-day or next-day) human mortality rates (R). We tested this causal hypothesis in 100 United States cities using the publicly available NMMAPS database. Although a significant, approximately linear, statistical C-R association exists in simple statistical models, closer analysis suggests that it is not causal. Surprisingly, conditioning on other variables that have been extensively considered in previous analyses (usually using splines or other smoothers to approximate their effects), such as month of the year and mean daily temperature, suggests that they create strong, nonlinear confounding that explains the statistical association between PM2.5 and mortality rates in this data set. As this finding disagrees with conventional wisdom, we apply several different techniques to examine it. Conditional independence tests for potential causation, non-parametric classification tree analysis, Bayesian Model Averaging (BMA), and Granger-Sims causality testing, show no evidence that PM2.5 concentrations have any causal impact on increasing mortality rates. This apparent absence of a causal C-R relation, despite their statistical association, has potentially important implications for managing and communicating the uncertain health risks associated with, but not necessarily caused by, PM2.5 exposures. PMID:23983662
Guidelines for the design and statistical analysis of experiments in papers submitted to ATLA.
Festing, M F
2001-01-01
In vitro experiments need to be well designed and correctly analysed if they are to achieve their full potential to replace the use of animals in research. An "experiment" is a procedure for collecting scientific data in order to answer a hypothesis, or to provide material for generating new hypotheses, and differs from a survey because the scientist has control over the treatments that can be applied. Most experiments can be classified into one of a few formal designs, the most common being completely randomised, and randomised block designs. These are quite common with in vitro experiments, which are often replicated in time. Some experiments involve a single independent (treatment) variable, while other "factorial" designs simultaneously vary two or more independent variables, such as drug treatment and cell line. Factorial designs often provide additional information at little extra cost. Experiments need to be carefully planned to avoid bias, be powerful yet simple, provide for a valid statistical analysis and, in some cases, have a wide range of applicability. Virtually all experiments need some sort of statistical analysis in order to take account of biological variation among the experimental subjects. Parametric methods using the t test or analysis of variance are usually more powerful than non-parametric methods, provided the underlying assumptions of normality of the residuals and equal variances are approximately valid. The statistical analyses of data from a completely randomised design, and from a randomised-block design are demonstrated in Appendices 1 and 2, and methods of determining sample size are discussed in Appendix 3. Appendix 4 gives a checklist for authors submitting papers to ATLA.
Simple stochastic model for El Niño with westerly wind bursts
Thual, Sulian; Majda, Andrew J.; Chen, Nan; Stechmann, Samuel N.
2016-01-01
Atmospheric wind bursts in the tropics play a key role in the dynamics of the El Niño Southern Oscillation (ENSO). A simple modeling framework is proposed that summarizes this relationship and captures major features of the observational record while remaining physically consistent and amenable to detailed analysis. Within this simple framework, wind burst activity evolves according to a stochastic two-state Markov switching–diffusion process that depends on the strength of the western Pacific warm pool, and is coupled to simple ocean–atmosphere processes that are otherwise deterministic, stable, and linear. A simple model with this parameterization and no additional nonlinearities reproduces a realistic ENSO cycle with intermittent El Niño and La Niña events of varying intensity and strength as well as realistic buildup and shutdown of wind burst activity in the western Pacific. The wind burst activity has a direct causal effect on the ENSO variability: in particular, it intermittently triggers regular El Niño or La Niña events, super El Niño events, or no events at all, which enables the model to capture observed ENSO statistics such as the probability density function and power spectrum of eastern Pacific sea surface temperatures. The present framework provides further theoretical and practical insight on the relationship between wind burst activity and the ENSO. PMID:27573821
Score tests for independence in semiparametric competing risks models.
Saïd, Mériem; Ghazzali, Nadia; Rivest, Louis-Paul
2009-12-01
A popular model for competing risks postulates the existence of a latent unobserved failure time for each risk. Assuming that these underlying failure times are independent is attractive since it allows standard statistical tools for right-censored lifetime data to be used in the analysis. This paper proposes simple independence score tests for the validity of this assumption when the individual risks are modeled using semiparametric proportional hazards regressions. It assumes that covariates are available, making the model identifiable. The score tests are derived for alternatives that specify that copulas are responsible for a possible dependency between the competing risks. The test statistics are constructed by adding to the partial likelihoods for the individual risks an explanatory variable for the dependency between the risks. A variance estimator is derived by writing the score function and the Fisher information matrix for the marginal models as stochastic integrals. Pitman efficiencies are used to compare test statistics. A simulation study and a numerical example illustrate the methodology proposed in this paper.
Patch-Based Generative Shape Model and MDL Model Selection for Statistical Analysis of Archipelagos
NASA Astrophysics Data System (ADS)
Ganz, Melanie; Nielsen, Mads; Brandt, Sami
We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation of calcifications, where the area overlap with the ground truth shapes improved significantly compared to the case where the prior was not used.
Data-adaptive test statistics for microarray data.
Mukherjee, Sach; Roberts, Stephen J; van der Laan, Mark J
2005-09-01
An important task in microarray data analysis is the selection of genes that are differentially expressed between different tissue samples, such as healthy and diseased. However, microarray data contain an enormous number of dimensions (genes) and very few samples (arrays), a mismatch which poses fundamental statistical problems for the selection process that have defied easy resolution. In this paper, we present a novel approach to the selection of differentially expressed genes in which test statistics are learned from data using a simple notion of reproducibility in selection results as the learning criterion. Reproducibility, as we define it, can be computed without any knowledge of the 'ground-truth', but takes advantage of certain properties of microarray data to provide an asymptotically valid guide to expected loss under the true data-generating distribution. We are therefore able to indirectly minimize expected loss, and obtain results substantially more robust than conventional methods. We apply our method to simulated and oligonucleotide array data. By request to the corresponding author.
A Statistical Analysis of Reviewer Agreement and Bias in Evaluating Medical Abstracts 1
Cicchetti, Domenic V.; Conn, Harold O.
1976-01-01
Observer variability affects virtually all aspects of clinical medicine and investigation. One important aspect, not previously examined, is the selection of abstracts for presentation at national medical meetings. In the present study, 109 abstracts, submitted to the American Association for the Study of Liver Disease, were evaluated by three “blind” reviewers for originality, design-execution, importance, and overall scientific merit. Of the 77 abstracts rated for all parameters by all observers, interobserver agreement ranged between 81 and 88%. However, corresponding intraclass correlations varied between 0.16 (approaching statistical significance) and 0.37 (p < 0.01). Specific tests of systematic differences in scoring revealed statistically significant levels of observer bias on most of the abstract components. Moreover, the mean differences in interobserver ratings were quite small compared to the standard deviations of these differences. These results emphasize the importance of evaluating the simple percentage of rater agreement within the broader context of observer variability and systematic bias. PMID:997596
Determination of Urine Albumin by New Simple High-Performance Liquid Chromatography Method.
Klapkova, Eva; Fortova, Magdalena; Prusa, Richard; Moravcova, Libuse; Kotaska, Karel
2016-11-01
A simple high-performance liquid chromatography (HPLC) method was developed for the determination of albumin in patients' urine samples without coeluting proteins and was compared with the immunoturbidimetric determination of albumin. Urine albumin is important biomarker in diabetic patients, but part of it is immuno-nonreactive. Albumin was determined by high-performance liquid chromatography (HPLC), UV detection at 280 nm, Zorbax 300SB-C3 column. Immunoturbidimetric analysis was performed using commercial kit on automatic biochemistry analyzer COBAS INTEGRA ® 400, Roche Diagnostics GmbH, Manheim, Germany. The HLPC method was fully validated. No significant interference with other proteins (transferrin, α-1-acid glycoprotein, α-1-antichymotrypsin, antitrypsin, hemopexin) was found. The results from 301 urine samples were compared with immunochemical determination. We found a statistically significant difference between these methods (P = 0.0001, Mann-Whitney test). New simple HPLC method was developed for the determination of urine albumin without coeluting proteins. Our data indicate that the HPLC method is highly specific and more sensitive than immunoturbidimetry. © 2016 Wiley Periodicals, Inc.
Simple and rapid quantification of brominated vegetable oil in commercial soft drinks by LC–MS
Chitranshi, Priyanka; da Costa, Gonçalo Gamboa
2016-01-01
We report here a simple and rapid method for the quantification of brominated vegetable oil (BVO) in soft drinks based upon liquid chromatography–electrospray ionization mass spectrometry. Unlike previously reported methods, this novel method does not require hydrolysis, extraction or derivatization steps, but rather a simple “dilute and shoot” sample preparation. The quantification is conducted by mass spectrometry in selected ion recording mode and a single point standard addition procedure. The method was validated in the range of 5–25 μg/mL BVO, encompassing the legal limit of 15 μg/mL established by the US FDA for fruit-flavored beverages in the US market. The method was characterized by excellent intra- and inter-assay accuracy (97.3–103.4%) and very low imprecision [0.5–3.6% (RSD)]. The direct nature of the quantification, simplicity, and excellent statistical performance of this methodology constitute clear advantages in relation to previously published methods for the analysis of BVO in soft drinks. PMID:27451219
Non-extensivity and complexity in the earthquake activity at the West Corinth rift (Greece)
NASA Astrophysics Data System (ADS)
Michas, Georgios; Vallianatos, Filippos; Sammonds, Peter
2013-04-01
Earthquakes exhibit complex phenomenology that is revealed from the fractal structure in space, time and magnitude. For that reason other tools rather than the simple Poissonian statistics seem more appropriate to describe the statistical properties of the phenomenon. Here we use Non-Extensive Statistical Physics [NESP] to investigate the inter-event time distribution of the earthquake activity at the west Corinth rift (central Greece). This area is one of the most seismotectonically active areas in Europe, with an important continental N-S extension and high seismicity rates. NESP concept refers to the non-additive Tsallis entropy Sq that includes Boltzmann-Gibbs entropy as a particular case. This concept has been successfully used for the analysis of a variety of complex dynamic systems including earthquakes, where fractality and long-range interactions are important. The analysis indicates that the cumulative inter-event time distribution can be successfully described with NESP, implying the complexity that characterizes the temporal occurrences of earthquakes. Further on, we use the Tsallis entropy (Sq) and the Fischer Information Measure (FIM) to investigate the complexity that characterizes the inter-event time distribution through different time windows along the evolution of the seismic activity at the West Corinth rift. The results of this analysis reveal a different level of organization and clusterization of the seismic activity in time. Acknowledgments. GM wish to acknowledge the partial support of the Greek State Scholarships Foundation (IKY).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Emanuel, A.E.
1991-03-01
This article presents a preliminary analysis of the effect of randomly varying harmonic voltages on the temperature rise of squirrel-cage motors. The stochastic process of random variations of harmonic voltages is defined by means of simple statistics (mean, standard deviation, type of distribution). Computational models based on a first-order approximation of the motor losses and on the Monte Carlo method yield results which prove that equipment with large thermal time-constant is capable of withstanding for a short period of time larger distortions than THD = 5%.
Upsets in Erased Floating Gate Cells With High-Energy Protons
Gerardin, S.; Bagatin, M.; Paccagnella, A.; ...
2017-01-01
We discuss upsets in erased floating gate cells, due to large threshold voltage shifts, using statistical distributions collected on a large number of memory cells. The spread in the neutral threshold voltage appears to be too low to quantitatively explain the experimental observations in terms of simple charge loss, at least in SLC devices. The possibility that memories exposed to high energy protons and heavy ions exhibit negative charge transfer between programmed and erased cells is investigated, although the analysis does not provide conclusive support to this hypothesis.
A powerful approach for association analysis incorporating imprinting effects
Xia, Fan; Zhou, Ji-Yuan; Fung, Wing Kam
2011-01-01
Motivation: For a diallelic marker locus, the transmission disequilibrium test (TDT) is a simple and powerful design for genetic studies. The TDT was originally proposed for use in families with both parents available (complete nuclear families) and has further been extended to 1-TDT for use in families with only one of the parents available (incomplete nuclear families). Currently, the increasing interest of the influence of parental imprinting on heritability indicates the importance of incorporating imprinting effects into the mapping of association variants. Results: In this article, we extend the TDT-type statistics to incorporate imprinting effects and develop a series of new test statistics in a general two-stage framework for association studies. Our test statistics enjoy the nature of family-based designs that need no assumption of Hardy–Weinberg equilibrium. Also, the proposed methods accommodate complete and incomplete nuclear families with one or more affected children. In the simulation study, we verify the validity of the proposed test statistics under various scenarios, and compare the powers of the proposed statistics with some existing test statistics. It is shown that our methods greatly improve the power for detecting association in the presence of imprinting effects. We further demonstrate the advantage of our methods by the application of the proposed test statistics to a rheumatoid arthritis dataset. Contact: wingfung@hku.hk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21798962
A powerful approach for association analysis incorporating imprinting effects.
Xia, Fan; Zhou, Ji-Yuan; Fung, Wing Kam
2011-09-15
For a diallelic marker locus, the transmission disequilibrium test (TDT) is a simple and powerful design for genetic studies. The TDT was originally proposed for use in families with both parents available (complete nuclear families) and has further been extended to 1-TDT for use in families with only one of the parents available (incomplete nuclear families). Currently, the increasing interest of the influence of parental imprinting on heritability indicates the importance of incorporating imprinting effects into the mapping of association variants. In this article, we extend the TDT-type statistics to incorporate imprinting effects and develop a series of new test statistics in a general two-stage framework for association studies. Our test statistics enjoy the nature of family-based designs that need no assumption of Hardy-Weinberg equilibrium. Also, the proposed methods accommodate complete and incomplete nuclear families with one or more affected children. In the simulation study, we verify the validity of the proposed test statistics under various scenarios, and compare the powers of the proposed statistics with some existing test statistics. It is shown that our methods greatly improve the power for detecting association in the presence of imprinting effects. We further demonstrate the advantage of our methods by the application of the proposed test statistics to a rheumatoid arthritis dataset. wingfung@hku.hk Supplementary data are available at Bioinformatics online.
A simple method for processing data with least square method
NASA Astrophysics Data System (ADS)
Wang, Chunyan; Qi, Liqun; Chen, Yongxiang; Pang, Guangning
2017-08-01
The least square method is widely used in data processing and error estimation. The mathematical method has become an essential technique for parameter estimation, data processing, regression analysis and experimental data fitting, and has become a criterion tool for statistical inference. In measurement data analysis, the distribution of complex rules is usually based on the least square principle, i.e., the use of matrix to solve the final estimate and to improve its accuracy. In this paper, a new method is presented for the solution of the method which is based on algebraic computation and is relatively straightforward and easy to understand. The practicability of this method is described by a concrete example.
Seasonality in twin birth rates, Denmark, 1936-84.
Bonnelykke, B; Søgaard, J; Nielsen, J
1987-12-01
A study was made of seasonality in twin birth rate in Denmark between 1977 and 1984. We studied all twin births (N = 45,550) in all deliveries (N = 3,679,932) during that period. Statistical analysis using a simple harmonic sinusoidal model provided no evidence for seasonality. However, sequential polynomial analysis disclosed a significant fit to a fifth order polynomial curve with peaks in twin birth rates in May-June and December, along with troughs in February and September. A falling trend in twinning rate broke off in Denmark around 1970, and from 1970 to 1984 an increasing trend was found. The results are discussed in terms of possible environmental influences on twinning.
Refraction of coastal ocean waves
NASA Technical Reports Server (NTRS)
Shuchman, R. A.; Kasischke, E. S.
1981-01-01
Refraction of gravity waves in the coastal area off Cape Hatteras, NC as documented by synthetic aperture radar (SAR) imagery from Seasat orbit 974 (collected on September 3, 1978) is discussed. An analysis of optical Fourier transforms (OFTs) from more than 70 geographical positions yields estimates of wavelength and wave direction for each position. In addition, independent estimates of the same two quantities are calculated using two simple theoretical wave-refraction models. The OFT results are then compared with the theoretical results. A statistical analysis shows a significant degree of linear correlation between the data sets. This is considered to indicate that the Seasat SAR produces imagery whose clarity is sufficient to show the refraction of gravity waves in shallow water.
Multiple scaling behaviour and nonlinear traits in music scores
Larralde, Hernán; Martínez-Mekler, Gustavo; Müller, Markus
2017-01-01
We present a statistical analysis of music scores from different composers using detrended fluctuation analysis (DFA). We find different fluctuation profiles that correspond to distinct autocorrelation structures of the musical pieces. Further, we reveal evidence for the presence of nonlinear autocorrelations by estimating the DFA of the magnitude series, a result validated by a corresponding study of appropriate surrogate data. The amount and the character of nonlinear correlations vary from one composer to another. Finally, we performed a simple experiment in order to evaluate the pleasantness of the musical surrogate pieces in comparison with the original music and find that nonlinear correlations could play an important role in the aesthetic perception of a musical piece. PMID:29308256
Wolff, Hans-Georg; Preising, Katja
2005-02-01
To ease the interpretation of higher order factor analysis, the direct relationships between variables and higher order factors may be calculated by the Schmid-Leiman solution (SLS; Schmid & Leiman, 1957). This simple transformation of higher order factor analysis orthogonalizes first-order and higher order factors and thereby allows the interpretation of the relative impact of factor levels on variables. The Schmid-Leiman solution may also be used to facilitate theorizing and scale development. The rationale for the procedure is presented, supplemented by syntax codes for SPSS and SAS, since the transformation is not part of most statistical programs. Syntax codes may also be downloaded from www.psychonomic.org/archive/.
Multiple scaling behaviour and nonlinear traits in music scores
NASA Astrophysics Data System (ADS)
González-Espinoza, Alfredo; Larralde, Hernán; Martínez-Mekler, Gustavo; Müller, Markus
2017-12-01
We present a statistical analysis of music scores from different composers using detrended fluctuation analysis (DFA). We find different fluctuation profiles that correspond to distinct autocorrelation structures of the musical pieces. Further, we reveal evidence for the presence of nonlinear autocorrelations by estimating the DFA of the magnitude series, a result validated by a corresponding study of appropriate surrogate data. The amount and the character of nonlinear correlations vary from one composer to another. Finally, we performed a simple experiment in order to evaluate the pleasantness of the musical surrogate pieces in comparison with the original music and find that nonlinear correlations could play an important role in the aesthetic perception of a musical piece.
Kumar, Keshav; Mishra, Ashok Kumar
2015-07-01
Fluorescence characteristic of 8-anilinonaphthalene-1-sulfonic acid (ANS) in ethanol-water mixture in combination with partial least square (PLS) analysis was used to propose a simple and sensitive analytical procedure for monitoring the adulteration of ethanol by water. The proposed analytical procedure was found to be capable of detecting even small adulteration level of ethanol by water. The robustness of the procedure is evident from the statistical parameters such as square of correlation coefficient (R(2)), root mean square of calibration (RMSEC) and root mean square of prediction (RMSEP) that were found to be well with in the acceptable limits.
Wide-Field Imaging of Single-Nanoparticle Extinction with Sub-nm2 Sensitivity
NASA Astrophysics Data System (ADS)
Payne, Lukas M.; Langbein, Wolfgang; Borri, Paola
2018-03-01
We report on a highly sensitive wide-field imaging technique for quantitative measurement of the optical extinction cross section σext of single nanoparticles. The technique is simple and high speed, and it enables the simultaneous acquisition of hundreds of nanoparticles for statistical analysis. Using rapid referencing, fast acquisition, and a deconvolution analysis, a shot-noise-limited sensitivity down to 0.4 nm2 is achieved. Measurements on a set of individual gold nanoparticles of 5 nm diameter using this method yield σext=(10.0 ±3.1 ) nm2, which is consistent with theoretical expectations and well above the background fluctuations of 0.9 nm2 .
Morse Code, Scrabble, and the Alphabet
ERIC Educational Resources Information Center
Richardson, Mary; Gabrosek, John; Reischman, Diann; Curtiss, Phyliss
2004-01-01
In this paper we describe an interactive activity that illustrates simple linear regression. Students collect data and analyze it using simple linear regression techniques taught in an introductory applied statistics course. The activity is extended to illustrate checks for regression assumptions and regression diagnostics taught in an…
ERIC Educational Resources Information Center
Mirman, Daniel; Estes, Katharine Graf; Magnuson, James S.
2010-01-01
Statistical learning mechanisms play an important role in theories of language acquisition and processing. Recurrent neural network models have provided important insights into how these mechanisms might operate. We examined whether such networks capture two key findings in human statistical learning. In Simulation 1, a simple recurrent network…
ERIC Educational Resources Information Center
Kravchuk, Olena; Elliott, Antony; Bhandari, Bhesh
2005-01-01
A simple laboratory experiment, based on the Maillard reaction, served as a project in Introductory Statistics for undergraduates in Food Science and Technology. By using the principles of randomization and replication and reflecting on the sources of variation in the experimental data, students reinforced the statistical concepts and techniques…
Fantuzzo, J. A.; Mirabella, V. R.; Zahn, J. D.
2017-01-01
Abstract Synapse formation analyses can be performed by imaging and quantifying fluorescent signals of synaptic markers. Traditionally, these analyses are done using simple or multiple thresholding and segmentation approaches or by labor-intensive manual analysis by a human observer. Here, we describe Intellicount, a high-throughput, fully-automated synapse quantification program which applies a novel machine learning (ML)-based image processing algorithm to systematically improve region of interest (ROI) identification over simple thresholding techniques. Through processing large datasets from both human and mouse neurons, we demonstrate that this approach allows image processing to proceed independently of carefully set thresholds, thus reducing the need for human intervention. As a result, this method can efficiently and accurately process large image datasets with minimal interaction by the experimenter, making it less prone to bias and less liable to human error. Furthermore, Intellicount is integrated into an intuitive graphical user interface (GUI) that provides a set of valuable features, including automated and multifunctional figure generation, routine statistical analyses, and the ability to run full datasets through nested folders, greatly expediting the data analysis process. PMID:29218324
SOCR Analyses – an Instructional Java Web-based Statistical Analysis Toolkit
Chu, Annie; Cui, Jenny; Dinov, Ivo D.
2011-01-01
The Statistical Online Computational Resource (SOCR) designs web-based tools for educational use in a variety of undergraduate courses (Dinov 2006). Several studies have demonstrated that these resources significantly improve students' motivation and learning experiences (Dinov et al. 2008). SOCR Analyses is a new component that concentrates on data modeling and analysis using parametric and non-parametric techniques supported with graphical model diagnostics. Currently implemented analyses include commonly used models in undergraduate statistics courses like linear models (Simple Linear Regression, Multiple Linear Regression, One-Way and Two-Way ANOVA). In addition, we implemented tests for sample comparisons, such as t-test in the parametric category; and Wilcoxon rank sum test, Kruskal-Wallis test, Friedman's test, in the non-parametric category. SOCR Analyses also include several hypothesis test models, such as Contingency tables, Friedman's test and Fisher's exact test. The code itself is open source (http://socr.googlecode.com/), hoping to contribute to the efforts of the statistical computing community. The code includes functionality for each specific analysis model and it has general utilities that can be applied in various statistical computing tasks. For example, concrete methods with API (Application Programming Interface) have been implemented in statistical summary, least square solutions of general linear models, rank calculations, etc. HTML interfaces, tutorials, source code, activities, and data are freely available via the web (www.SOCR.ucla.edu). Code examples for developers and demos for educators are provided on the SOCR Wiki website. In this article, the pedagogical utilization of the SOCR Analyses is discussed, as well as the underlying design framework. As the SOCR project is on-going and more functions and tools are being added to it, these resources are constantly improved. The reader is strongly encouraged to check the SOCR site for most updated information and newly added models. PMID:21546994
iSeq: Web-Based RNA-seq Data Analysis and Visualization.
Zhang, Chao; Fan, Caoqi; Gan, Jingbo; Zhu, Ping; Kong, Lei; Li, Cheng
2018-01-01
Transcriptome sequencing (RNA-seq) is becoming a standard experimental methodology for genome-wide characterization and quantification of transcripts at single base-pair resolution. However, downstream analysis of massive amount of sequencing data can be prohibitively technical for wet-lab researchers. A functionally integrated and user-friendly platform is required to meet this demand. Here, we present iSeq, an R-based Web server, for RNA-seq data analysis and visualization. iSeq is a streamlined Web-based R application under the Shiny framework, featuring a simple user interface and multiple data analysis modules. Users without programming and statistical skills can analyze their RNA-seq data and construct publication-level graphs through a standardized yet customizable analytical pipeline. iSeq is accessible via Web browsers on any operating system at http://iseq.cbi.pku.edu.cn .
Time averaging, ageing and delay analysis of financial time series
NASA Astrophysics Data System (ADS)
Cherstvy, Andrey G.; Vinod, Deepak; Aghion, Erez; Chechkin, Aleksei V.; Metzler, Ralf
2017-06-01
We introduce three strategies for the analysis of financial time series based on time averaged observables. These comprise the time averaged mean squared displacement (MSD) as well as the ageing and delay time methods for varying fractions of the financial time series. We explore these concepts via statistical analysis of historic time series for several Dow Jones Industrial indices for the period from the 1960s to 2015. Remarkably, we discover a simple universal law for the delay time averaged MSD. The observed features of the financial time series dynamics agree well with our analytical results for the time averaged measurables for geometric Brownian motion, underlying the famed Black-Scholes-Merton model. The concepts we promote here are shown to be useful for financial data analysis and enable one to unveil new universal features of stock market dynamics.
NASA Astrophysics Data System (ADS)
Prasanna, V.
2018-01-01
This study makes use of temperature and precipitation from CMIP5 climate model output for climate change application studies over the Indian region during the summer monsoon season (JJAS). Bias correction of temperature and precipitation from CMIP5 GCM simulation results with respect to observation is discussed in detail. The non-linear statistical bias correction is a suitable bias correction method for climate change data because it is simple and does not add up artificial uncertainties to the impact assessment of climate change scenarios for climate change application studies (agricultural production changes) in the future. The simple statistical bias correction uses observational constraints on the GCM baseline, and the projected results are scaled with respect to the changing magnitude in future scenarios, varying from one model to the other. Two types of bias correction techniques are shown here: (1) a simple bias correction using a percentile-based quantile-mapping algorithm and (2) a simple but improved bias correction method, a cumulative distribution function (CDF; Weibull distribution function)-based quantile-mapping algorithm. This study shows that the percentile-based quantile mapping method gives results similar to the CDF (Weibull)-based quantile mapping method, and both the methods are comparable. The bias correction is applied on temperature and precipitation variables for present climate and future projected data to make use of it in a simple statistical model to understand the future changes in crop production over the Indian region during the summer monsoon season. In total, 12 CMIP5 models are used for Historical (1901-2005), RCP4.5 (2005-2100), and RCP8.5 (2005-2100) scenarios. The climate index from each CMIP5 model and the observed agricultural yield index over the Indian region are used in a regression model to project the changes in the agricultural yield over India from RCP4.5 and RCP8.5 scenarios. The results revealed a better convergence of model projections in the bias corrected data compared to the uncorrected data. The study can be extended to localized regional domains aimed at understanding the changes in the agricultural productivity in the future with an agro-economy or a simple statistical model. The statistical model indicated that the total food grain yield is going to increase over the Indian region in the future, the increase in the total food grain yield is approximately 50 kg/ ha for the RCP4.5 scenario from 2001 until the end of 2100, and the increase in the total food grain yield is approximately 90 kg/ha for the RCP8.5 scenario from 2001 until the end of 2100. There are many studies using bias correction techniques, but this study applies the bias correction technique to future climate scenario data from CMIP5 models and applied it to crop statistics to find future crop yield changes over the Indian region.
Non-Asbestos Insulation Testing Using a Plasma Torch
NASA Technical Reports Server (NTRS)
Morgan, R. E.; Prince, A. S.; Selvidge, S. A.; Phelps, J.; Martin, C. L.; Lawrence, T. W.
2000-01-01
Insulation obsolescence issues are a major concern for the Reusable Solid Rocket Motor (RSRM). As old sources of raw materials disappear, new sources must be found and qualified. No simple, inexpensive test presently exists for predicting the erosion performance of a candidate insulation in the full-scale motor, Large motor tests cost million of dollars and therefore can only be used on a few very select candidates. There is a need for a simple, low cost method of screening insulation performance that can simulate some of the different erosion environments found in the RSRM. This paper describes a series of erosion tests on two different non-asbestos insulation formulations, a KEVLAR(registered) fiber-filled and a carbon fiber-filled insulation containing Ethylene-Propylene-Diene Monomer (EPDM) rubber as the binder. The test instrument was a plasma torch device. The two main variables investigated were heat flux and alumina particle impingement concentration. Statistical analysis revealed that the two different formulations had very different responses to the main variable. The results of this work indicate that there may be fundamental differences in how these insulation formulations perform in the motor operating environment. The plasma torch appears to offer a low-cost means of obtaining a fundamental understanding of insulation response to critical factors in a series of statistically designed experiments.
Robust LOD scores for variance component-based linkage analysis.
Blangero, J; Williams, J T; Almasy, L
2000-01-01
The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.
NASA Astrophysics Data System (ADS)
Guilhem, Yoann; Basseville, Stéphanie; Curtit, François; Stéphan, Jean-Michel; Cailletaud, Georges
2018-06-01
This paper is dedicated to the study of the influence of surface roughness on local stress and strain fields in polycrystalline aggregates. Finite element computations are performed with a crystal plasticity model on a 316L stainless steel polycrystalline material element with different roughness states on its free surface. The subsequent analysis of the plastic strain localization patterns shows that surface roughness strongly affects the plastic strain localization induced by crystallography. Nevertheless, this effect mainly takes place at the surface and vanishes under the first layer of grains, which implies the existence of a critical perturbed depth. A statistical analysis based on the plastic strain distribution obtained for different roughness levels provides a simple rule to define the size of the affected zone depending on the rough surface parameters.
Perel, Pablo; Edwards, Phil; Shakur, Haleema; Roberts, Ian
2008-11-06
Traumatic brain injury (TBI) is an important cause of acquired disability. In evaluating the effectiveness of clinical interventions for TBI it is important to measure disability accurately. The Glasgow Outcome Scale (GOS) is the most widely used outcome measure in randomised controlled trials (RCTs) in TBI patients. However GOS measurement is generally collected at 6 months after discharge when loss to follow up could have occurred. The objectives of this study were to evaluate the association and predictive validity between a simple disability scale at hospital discharge, the Oxford Handicap Scale (OHS), and the GOS at 6 months among TBI patients. The study was a secondary analysis of a randomised clinical trial among TBI patients (MRC CRASH Trial). A Spearman correlation was estimated to evaluate the association between the OHS and GOS. The validity of different dichotomies of the OHS for predicting GOS at 6 months was assessed by calculating sensitivity, specificity and the C statistic. Uni and multivariate logistic regression models were fitted including OHS as explanatory variable. For each model we analysed its discrimination and calibration. We found that the OHS is highly correlated with GOS at 6 months (spearman correlation 0.75) with evidence of a linear relationship between the two scales. The OHS dichotomy that separates patients with severe dependency or death showed the greatest discrimination (C statistic: 84.3). Among survivors at hospital discharge the OHS showed a very good discrimination (C statistic 0.78) and excellent calibration when used to predict GOS outcome at 6 months. We have shown that the OHS, a simple disability scale available at hospital discharge can predict disability accurately, according to the GOS, at 6 months. OHS could be used to improve the design and analysis of clinical trials in TBI patients and may also provide a valuable clinical tool for physicians to improve communication with patients and relatives when assessing a patient's prognosis at hospital discharge.
Universal Power Law Governing Pedestrian Interactions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Karamouzas, Ioannis; Skinner, Brian; Guy, Stephen J.
2014-12-01
Human crowds often bear a striking resemblance to interacting particle systems, and this has prompted many researchers to describe pedestrian dynamics in terms of interaction forces and potential energies. The correct quantitative form of this interaction, however, has remained an open question. Here, we introduce a novel statistical-mechanical approach to directly measure the interaction energy between pedestrians. This analysis, when applied to a large collection of human motion data, reveals a simple power-law interaction that is based not on the physical separation between pedestrians but on their projected time to a potential future collision, and is therefore fundamentally anticipatory inmore » nature. Remarkably, this simple law is able to describe human interactions across a wide variety of situations, speeds, and densities. We further show, through simulations, that the interaction law we identify is sufficient to reproduce many known crowd phenomena.« less
Heterogeneous distribution of metabolites across plant species
NASA Astrophysics Data System (ADS)
Takemoto, Kazuhiro; Arita, Masanori
2009-07-01
We investigate the distribution of flavonoids, a major category of plant secondary metabolites, across species. Flavonoids are known to show high species specificity, and were once considered as chemical markers for understanding adaptive evolution and characterization of living organisms. We investigate the distribution among species using bipartite networks, and find that two heterogeneous distributions are conserved among several families: the power-law distributions of the number of flavonoids in a species and the number of shared species of a particular flavonoid. In order to explain the possible origin of the heterogeneity, we propose a simple model with, essentially, a single parameter. As a result, we show that two respective power-law statistics emerge from simple evolutionary mechanisms based on a multiplicative process. These findings provide insights into the evolution of metabolite diversity and characterization of living organisms that defy genome sequence analysis for different reasons.
Huh, Yeamin; Smith, David E.; Feng, Meihau Rose
2014-01-01
Human clearance prediction for small- and macro-molecule drugs was evaluated and compared using various scaling methods and statistical analysis.Human clearance is generally well predicted using single or multiple species simple allometry for macro- and small-molecule drugs excreted renally.The prediction error is higher for hepatically eliminated small-molecules using single or multiple species simple allometry scaling, and it appears that the prediction error is mainly associated with drugs with low hepatic extraction ratio (Eh). The error in human clearance prediction for hepatically eliminated small-molecules was reduced using scaling methods with a correction of maximum life span (MLP) or brain weight (BRW).Human clearance of both small- and macro-molecule drugs is well predicted using the monkey liver blood flow method. Predictions using liver blood flow from other species did not work as well, especially for the small-molecule drugs. PMID:21892879
Simple method for quick estimation of aquifer hydrogeological parameters
NASA Astrophysics Data System (ADS)
Ma, C.; Li, Y. Y.
2017-08-01
Development of simple and accurate methods to determine the aquifer hydrogeological parameters was of importance for groundwater resources assessment and management. Aiming at the present issue of estimating aquifer parameters based on some data of the unsteady pumping test, a fitting function of Theis well function was proposed using fitting optimization method and then a unitary linear regression equation was established. The aquifer parameters could be obtained by solving coefficients of the regression equation. The application of the proposed method was illustrated, using two published data sets. By the error statistics and analysis on the pumping drawdown, it showed that the method proposed in this paper yielded quick and accurate estimates of the aquifer parameters. The proposed method could reliably identify the aquifer parameters from long distance observed drawdowns and early drawdowns. It was hoped that the proposed method in this paper would be helpful for practicing hydrogeologists and hydrologists.
Lee, Kian Mun; Hamid, Sharifah Bee Abd
2015-01-19
The performance of advance photocatalytic degradation of 4-chlorophenoxyacetic acid (4-CPA) strongly depends on photocatalyst dosage, initial concentration and initial pH. In the present study, a simple response surface methodology (RSM) was applied to investigate the interaction between these three independent factors. Thus, the photocatalytic degradation of 4-CPA in aqueous medium assisted by ultraviolet-active ZnO photocatalyst was systematically investigated. This study aims to determine the optimum processing parameters to maximize 4-CPA degradation. Based on the results obtained, it was found that a maximum of 91% of 4-CPA was successfully degraded under optimal conditions (0.02 g ZnO dosage, 20.00 mg/L of 4-CPA and pH 7.71). All the experimental data showed good agreement with the predicted results obtained from statistical analysis.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests.
Kosinski, Andrzej S
2013-03-15
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations that are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we presented, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic that incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, always reduces to the score statistic in the independent samples situation, and preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe that the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the WGS test statistic in a general GEE setting. Copyright © 2012 John Wiley & Sons, Ltd.
A weighted generalized score statistic for comparison of predictive values of diagnostic tests
Kosinski, Andrzej S.
2013-01-01
Positive and negative predictive values are important measures of a medical diagnostic test performance. We consider testing equality of two positive or two negative predictive values within a paired design in which all patients receive two diagnostic tests. The existing statistical tests for testing equality of predictive values are either Wald tests based on the multinomial distribution or the empirical Wald and generalized score tests within the generalized estimating equations (GEE) framework. As presented in the literature, these test statistics have considerably complex formulas without clear intuitive insight. We propose their re-formulations which are mathematically equivalent but algebraically simple and intuitive. As is clearly seen with a new re-formulation we present, the generalized score statistic does not always reduce to the commonly used score statistic in the independent samples case. To alleviate this, we introduce a weighted generalized score (WGS) test statistic which incorporates empirical covariance matrix with newly proposed weights. This statistic is simple to compute, it always reduces to the score statistic in the independent samples situation, and it preserves type I error better than the other statistics as demonstrated by simulations. Thus, we believe the proposed WGS statistic is the preferred statistic for testing equality of two predictive values and for corresponding sample size computations. The new formulas of the Wald statistics may be useful for easy computation of confidence intervals for difference of predictive values. The introduced concepts have potential to lead to development of the weighted generalized score test statistic in a general GEE setting. PMID:22912343
An asymptotic analysis of the logrank test.
Strawderman, R L
1997-01-01
Asymptotic expansions for the null distribution of the logrank statistic and its distribution under local proportional hazards alternatives are developed in the case of iid observations. The results, which are derived from the work of Gu (1992) and Taniguchi (1992), are easy to interpret, and provide some theoretical justification for many behavioral characteristics of the logrank test that have been previously observed in simulation studies. We focus primarily upon (i) the inadequacy of the usual normal approximation under treatment group imbalance; and, (ii) the effects of treatment group imbalance on power and sample size calculations. A simple transformation of the logrank statistic is also derived based on results in Konishi (1991) and is found to substantially improve the standard normal approximation to its distribution under the null hypothesis of no survival difference when there is treatment group imbalance.
NASA Astrophysics Data System (ADS)
Jiang, Zhi-Qiang; Zhou, Wei-Xing; Tan, Qun-Zhao
2009-11-01
Massive multiplayer online role-playing games (MMORPGs) are very popular in China, which provides a potential platform for scientific research. We study the online-offline activities of avatars in an MMORPG to understand their game-playing behavior. The statistical analysis unveils that the active avatars can be classified into three types. The avatars of the first type are owned by game cheaters who go online and offline in preset time intervals with the online duration distributions dominated by pulses. The second type of avatars is characterized by a Weibull distribution in the online durations, which is confirmed by statistical tests. The distributions of online durations of the remaining individual avatars differ from the above two types and cannot be described by a simple form. These findings have potential applications in the game industry.
Oostenveld, Robert; Fries, Pascal; Maris, Eric; Schoffelen, Jan-Mathijs
2011-01-01
This paper describes FieldTrip, an open source software package that we developed for the analysis of MEG, EEG, and other electrophysiological data. The software is implemented as a MATLAB toolbox and includes a complete set of consistent and user-friendly high-level functions that allow experimental neuroscientists to analyze experimental data. It includes algorithms for simple and advanced analysis, such as time-frequency analysis using multitapers, source reconstruction using dipoles, distributed sources and beamformers, connectivity analysis, and nonparametric statistical permutation tests at the channel and source level. The implementation as toolbox allows the user to perform elaborate and structured analyses of large data sets using the MATLAB command line and batch scripting. Furthermore, users and developers can easily extend the functionality and implement new algorithms. The modular design facilitates the reuse in other software packages. PMID:21253357
Szilágyi, N; Kovács, R; Kenyeres, I; Csikor, Zs
2013-01-01
Biofilm development in a fixed bed biofilm reactor system performing municipal wastewater treatment was monitored aiming at accumulating colonization and maximum biofilm mass data usable in engineering practice for process design purposes. Initially a 6 month experimental period was selected for investigations where the biofilm formation and the performance of the reactors were monitored. The results were analyzed by two methods: for simple, steady-state process design purposes the maximum biofilm mass on carriers versus influent load and a time constant of the biofilm growth were determined, whereas for design approaches using dynamic models a simple biofilm mass prediction model including attachment and detachment mechanisms was selected and fitted to the experimental data. According to a detailed statistical analysis, the collected data have not allowed us to determine both the time constant of biofilm growth and the maximum biofilm mass on carriers at the same time. The observed maximum biofilm mass could be determined with a reasonable error and ranged between 438 gTS/m(2) carrier surface and 843 gTS/m(2), depending on influent load, and hydrodynamic conditions. The parallel analysis of the attachment-detachment model showed that the experimental data set allowed us to determine the attachment rate coefficient which was in the range of 0.05-0.4 m d(-1) depending on influent load and hydrodynamic conditions.
Spatial Differentiation of Landscape Values in the Murray River Region of Victoria, Australia
NASA Astrophysics Data System (ADS)
Zhu, Xuan; Pfueller, Sharron; Whitelaw, Paul; Winter, Caroline
2010-05-01
This research advances the understanding of the location of perceived landscape values through a statistically based approach to spatial analysis of value densities. Survey data were obtained from a sample of people living in and using the Murray River region, Australia, where declining environmental quality prompted a reevaluation of its conservation status. When densities of 12 perceived landscape values were mapped using geographic information systems (GIS), valued places clustered along the entire river bank and in associated National/State Parks and reserves. While simple density mapping revealed high value densities in various locations, it did not indicate what density of a landscape value could be regarded as a statistically significant hotspot or distinguish whether overlapping areas of high density for different values indicate identical or adjacent locations. A spatial statistic Getis-Ord Gi* was used to indicate statistically significant spatial clusters of high value densities or “hotspots”. Of 251 hotspots, 40% were for single non-use values, primarily spiritual, therapeutic or intrinsic. Four hotspots had 11 landscape values. Two, lacking economic value, were located in ecologically important river red gum forests and two, lacking wilderness value, were near the major towns of Echuca-Moama and Albury-Wodonga. Hotspots for eight values showed statistically significant associations with another value. There were high associations between learning and heritage values while economic and biological diversity values showed moderate associations with several other direct and indirect use values. This approach may improve confidence in the interpretation of spatial analysis of landscape values by enhancing understanding of value relationships.
Building gene expression profile classifiers with a simple and efficient rejection option in R.
Benso, Alfredo; Di Carlo, Stefano; Politano, Gianfranco; Savino, Alessandro; Hafeezurrehman, Hafeez
2011-01-01
The collection of gene expression profiles from DNA microarrays and their analysis with pattern recognition algorithms is a powerful technology applied to several biological problems. Common pattern recognition systems classify samples assigning them to a set of known classes. However, in a clinical diagnostics setup, novel and unknown classes (new pathologies) may appear and one must be able to reject those samples that do not fit the trained model. The problem of implementing a rejection option in a multi-class classifier has not been widely addressed in the statistical literature. Gene expression profiles represent a critical case study since they suffer from the curse of dimensionality problem that negatively reflects on the reliability of both traditional rejection models and also more recent approaches such as one-class classifiers. This paper presents a set of empirical decision rules that can be used to implement a rejection option in a set of multi-class classifiers widely used for the analysis of gene expression profiles. In particular, we focus on the classifiers implemented in the R Language and Environment for Statistical Computing (R for short in the remaining of this paper). The main contribution of the proposed rules is their simplicity, which enables an easy integration with available data analysis environments. Since in the definition of a rejection model tuning of the involved parameters is often a complex and delicate task, in this paper we exploit an evolutionary strategy to automate this process. This allows the final user to maximize the rejection accuracy with minimum manual intervention. This paper shows how the use of simple decision rules can be used to help the use of complex machine learning algorithms in real experimental setups. The proposed approach is almost completely automated and therefore a good candidate for being integrated in data analysis flows in labs where the machine learning expertise required to tune traditional classifiers might not be available.
Huang, Yuan-sheng; Yang, Zhi-rong; Zhan, Si-yan
2015-06-18
To investigate the use of simple pooling and bivariate model in meta-analyses of diagnostic test accuracy (DTA) published in Chinese journals (January to November, 2014), compare the differences of results from these two models, and explore the impact of between-study variability of sensitivity and specificity on the differences. DTA meta-analyses were searched through Chinese Biomedical Literature Database (January to November, 2014). Details in models and data for fourfold table were extracted. Descriptive analysis was conducted to investigate the prevalence of the use of simple pooling method and bivariate model in the included literature. Data were re-analyzed with the two models respectively. Differences in the results were examined by Wilcoxon signed rank test. How the results differences were affected by between-study variability of sensitivity and specificity, expressed by I2, was explored. The 55 systematic reviews, containing 58 DTA meta-analyses, were included and 25 DTA meta-analyses were eligible for re-analysis. Simple pooling was used in 50 (90.9%) systematic reviews and bivariate model in 1 (1.8%). The remaining 4 (7.3%) articles used other models pooling sensitivity and specificity or pooled neither of them. Of the reviews simply pooling sensitivity and specificity, 41(82.0%) were at the risk of wrongly using Meta-disc software. The differences in medians of sensitivity and specificity between two models were both 0.011 (P<0.001, P=0.031 respectively). Greater differences could be found as I2 of sensitivity or specificity became larger, especially when I2>75%. Most DTA meta-analyses published in Chinese journals(January to November, 2014) combine the sensitivity and specificity by simple pooling. Meta-disc software can pool the sensitivity and specificity only through fixed-effect model, but a high proportion of authors think it can implement random-effect model. Simple pooling tends to underestimate the results compared with bivariate model. The greater the between-study variance is, the more likely the simple pooling has larger deviation. It is necessary to increase the knowledge level of statistical methods and software for meta-analyses of DTA data.
Entropy Is Simple, Qualitatively.
ERIC Educational Resources Information Center
Lambert, Frank L.
2002-01-01
Suggests that qualitatively, entropy is simple. Entropy increase from a macro viewpoint is a measure of the dispersal of energy from localized to spread out at a temperature T. Fundamentally based on statistical and quantum mechanics, this approach is superior to the non-fundamental "disorder" as a descriptor of entropy change. (MM)
Vajargah, Kianoush Fathi; Sadeghi-Bazargani, Homayoun; Mehdizadeh-Esfanjani, Robab; Savadi-Oskouei, Daryoush; Farhoudi, Mehdi
2012-01-01
The objective of the present study was to assess the comparable applicability of orthogonal projections to latent structures (OPLS) statistical model vs traditional linear regression in order to investigate the role of trans cranial doppler (TCD) sonography in predicting ischemic stroke prognosis. The study was conducted on 116 ischemic stroke patients admitted to a specialty neurology ward. The Unified Neurological Stroke Scale was used once for clinical evaluation on the first week of admission and again six months later. All data was primarily analyzed using simple linear regression and later considered for multivariate analysis using PLS/OPLS models through the SIMCA P+12 statistical software package. The linear regression analysis results used for the identification of TCD predictors of stroke prognosis were confirmed through the OPLS modeling technique. Moreover, in comparison to linear regression, the OPLS model appeared to have higher sensitivity in detecting the predictors of ischemic stroke prognosis and detected several more predictors. Applying the OPLS model made it possible to use both single TCD measures/indicators and arbitrarily dichotomized measures of TCD single vessel involvement as well as the overall TCD result. In conclusion, the authors recommend PLS/OPLS methods as complementary rather than alternative to the available classical regression models such as linear regression.
Wang, Hui; Liu, Tao; Qiu, Quan; Ding, Peng; He, Yan-Hui; Chen, Wei-Qing
2015-01-23
This study aimed to develop and validate a simple risk score for detecting individuals with impaired fasting glucose (IFG) among the Southern Chinese population. A sample of participants aged ≥20 years and without known diabetes from the 2006-2007 Guangzhou diabetes cross-sectional survey was used to develop separate risk scores for men and women. The participants completed a self-administered structured questionnaire and underwent simple clinical measurements. The risk scores were developed by multiple logistic regression analysis. External validation was performed based on three other studies: the 2007 Zhuhai rural population-based study, the 2008-2010 Guangzhou diabetes cross-sectional study and the 2007 Tibet population-based study. Performance of the scores was measured with the Hosmer-Lemeshow goodness-of-fit test and ROC c-statistic. Age, waist circumference, body mass index and family history of diabetes were included in the risk score for both men and women, with the additional factor of hypertension for men. The ROC c-statistic was 0.70 for both men and women in the derivation samples. Risk scores of ≥28 for men and ≥18 for women showed respective sensitivity, specificity, positive predictive value and negative predictive value of 56.6%, 71.7%, 13.0% and 96.0% for men and 68.7%, 60.2%, 11% and 96.0% for women in the derivation population. The scores performed comparably with the Zhuhai rural sample and the 2008-2010 Guangzhou urban samples but poorly in the Tibet sample. The performance of pre-existing USA, Shanghai, and Chengdu risk scores was poorer in our population than in their original study populations. The results suggest that the developed simple IFG risk scores can be generalized in Guangzhou city and nearby rural regions and may help primary health care workers to identify individuals with IFG in their practice.
Wang, Hui; Liu, Tao; Qiu, Quan; Ding, Peng; He, Yan-Hui; Chen, Wei-Qing
2015-01-01
This study aimed to develop and validate a simple risk score for detecting individuals with impaired fasting glucose (IFG) among the Southern Chinese population. A sample of participants aged ≥20 years and without known diabetes from the 2006–2007 Guangzhou diabetes cross-sectional survey was used to develop separate risk scores for men and women. The participants completed a self-administered structured questionnaire and underwent simple clinical measurements. The risk scores were developed by multiple logistic regression analysis. External validation was performed based on three other studies: the 2007 Zhuhai rural population-based study, the 2008–2010 Guangzhou diabetes cross-sectional study and the 2007 Tibet population-based study. Performance of the scores was measured with the Hosmer-Lemeshow goodness-of-fit test and ROC c-statistic. Age, waist circumference, body mass index and family history of diabetes were included in the risk score for both men and women, with the additional factor of hypertension for men. The ROC c-statistic was 0.70 for both men and women in the derivation samples. Risk scores of ≥28 for men and ≥18 for women showed respective sensitivity, specificity, positive predictive value and negative predictive value of 56.6%, 71.7%, 13.0% and 96.0% for men and 68.7%, 60.2%, 11% and 96.0% for women in the derivation population. The scores performed comparably with the Zhuhai rural sample and the 2008–2010 Guangzhou urban samples but poorly in the Tibet sample. The performance of pre-existing USA, Shanghai, and Chengdu risk scores was poorer in our population than in their original study populations. The results suggest that the developed simple IFG risk scores can be generalized in Guangzhou city and nearby rural regions and may help primary health care workers to identify individuals with IFG in their practice. PMID:25625405
Functional brain networks for learning predictive statistics.
Giorgio, Joseph; Karlaftis, Vasilis M; Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew; Kourtzi, Zoe
2017-08-18
Making predictions about future events relies on interpreting streams of information that may initially appear incomprehensible. This skill relies on extracting regular patterns in space and time by mere exposure to the environment (i.e., without explicit feedback). Yet, we know little about the functional brain networks that mediate this type of statistical learning. Here, we test whether changes in the processing and connectivity of functional brain networks due to training relate to our ability to learn temporal regularities. By combining behavioral training and functional brain connectivity analysis, we demonstrate that individuals adapt to the environment's statistics as they change over time from simple repetition to probabilistic combinations. Further, we show that individual learning of temporal structures relates to decision strategy. Our fMRI results demonstrate that learning-dependent changes in fMRI activation within and functional connectivity between brain networks relate to individual variability in strategy. In particular, extracting the exact sequence statistics (i.e., matching) relates to changes in brain networks known to be involved in memory and stimulus-response associations, while selecting the most probable outcomes in a given context (i.e., maximizing) relates to changes in frontal and striatal networks. Thus, our findings provide evidence that dissociable brain networks mediate individual ability in learning behaviorally-relevant statistics. Copyright © 2017 The Authors. Published by Elsevier Ltd.. All rights reserved.
Generalized statistical mechanics approaches to earthquakes and tectonics.
Vallianatos, Filippos; Papadakis, Giorgos; Michas, Georgios
2016-12-01
Despite the extreme complexity that characterizes the mechanism of the earthquake generation process, simple empirical scaling relations apply to the collective properties of earthquakes and faults in a variety of tectonic environments and scales. The physical characterization of those properties and the scaling relations that describe them attract a wide scientific interest and are incorporated in the probabilistic forecasting of seismicity in local, regional and planetary scales. Considerable progress has been made in the analysis of the statistical mechanics of earthquakes, which, based on the principle of entropy, can provide a physical rationale to the macroscopic properties frequently observed. The scale-invariant properties, the (multi) fractal structures and the long-range interactions that have been found to characterize fault and earthquake populations have recently led to the consideration of non-extensive statistical mechanics (NESM) as a consistent statistical mechanics framework for the description of seismicity. The consistency between NESM and observations has been demonstrated in a series of publications on seismicity, faulting, rock physics and other fields of geosciences. The aim of this review is to present in a concise manner the fundamental macroscopic properties of earthquakes and faulting and how these can be derived by using the notions of statistical mechanics and NESM, providing further insights into earthquake physics and fault growth processes.
Generalized statistical mechanics approaches to earthquakes and tectonics
Papadakis, Giorgos; Michas, Georgios
2016-01-01
Despite the extreme complexity that characterizes the mechanism of the earthquake generation process, simple empirical scaling relations apply to the collective properties of earthquakes and faults in a variety of tectonic environments and scales. The physical characterization of those properties and the scaling relations that describe them attract a wide scientific interest and are incorporated in the probabilistic forecasting of seismicity in local, regional and planetary scales. Considerable progress has been made in the analysis of the statistical mechanics of earthquakes, which, based on the principle of entropy, can provide a physical rationale to the macroscopic properties frequently observed. The scale-invariant properties, the (multi) fractal structures and the long-range interactions that have been found to characterize fault and earthquake populations have recently led to the consideration of non-extensive statistical mechanics (NESM) as a consistent statistical mechanics framework for the description of seismicity. The consistency between NESM and observations has been demonstrated in a series of publications on seismicity, faulting, rock physics and other fields of geosciences. The aim of this review is to present in a concise manner the fundamental macroscopic properties of earthquakes and faulting and how these can be derived by using the notions of statistical mechanics and NESM, providing further insights into earthquake physics and fault growth processes. PMID:28119548
NASA Astrophysics Data System (ADS)
Mugnes, J.-M.; Robert, C.
2015-11-01
Spectral analysis is a powerful tool to investigate stellar properties and it has been widely used for decades now. However, the methods considered to perform this kind of analysis are mostly based on iteration among a few diagnostic lines to determine the stellar parameters. While these methods are often simple and fast, they can lead to errors and large uncertainties due to the required assumptions. Here, we present a method based on Bayesian statistics to find simultaneously the best combination of effective temperature, surface gravity, projected rotational velocity, and microturbulence velocity, using all the available spectral lines. Different tests are discussed to demonstrate the strength of our method, which we apply to 54 mid-resolution spectra of field and cluster B stars obtained at the Observatoire du Mont-Mégantic. We compare our results with those found in the literature. Differences are seen which are well explained by the different methods used. We conclude that the B-star microturbulence velocities are often underestimated. We also confirm the trend that B stars in clusters are on average faster rotators than field B stars.
Summary and Statistical Analysis of the First AIAA Sonic Boom Prediction Workshop
NASA Technical Reports Server (NTRS)
Park, Michael A.; Morgenstern, John M.
2014-01-01
A summary is provided for the First AIAA Sonic Boom Workshop held 11 January 2014 in conjunction with AIAA SciTech 2014. Near-field pressure signatures extracted from computational fluid dynamics solutions are gathered from nineteen participants representing three countries for the two required cases, an axisymmetric body and simple delta wing body. Structured multiblock, unstructured mixed-element, unstructured tetrahedral, overset, and Cartesian cut-cell methods are used by the participants. Participants provided signatures computed on participant generated and solution adapted grids. Signatures are also provided for a series of uniformly refined workshop provided grids. These submissions are propagated to the ground and loudness measures are computed. This allows the grid convergence of a loudness measure and a validation metric (dfference norm between computed and wind tunnel measured near-field signatures) to be studied for the first time. Statistical analysis is also presented for these measures. An optional configuration includes fuselage, wing, tail, flow-through nacelles, and blade sting. This full configuration exhibits more variation in eleven submissions than the sixty submissions provided for each required case. Recommendations are provided for potential improvements to the analysis methods and a possible subsequent workshop.
Variation Principles and Applications in the Study of Cell Structure and Aging
NASA Technical Reports Server (NTRS)
Economos, Angelos C.; Miquel, Jaime; Ballard, Ralph C.; Johnson, John E., Jr.
1981-01-01
In this report we have attempted to show that "some reality lies concealed in biological variation". This "reality" has its principles, laws, mechanisms, and rules, only a few of which we have sketched. A related idea we pursued was that important information may be lost in the process of ignoring frequency distributions of physiological variables (as is customary in experimental physiology and gerontology). We suggested that it may be advantageous to expand one's "statistical field of vision" beyond simple averages +/- standard deviations. Indeed, frequency distribution analysis may make visible some hidden information not evident from a simple qualitative analysis, particularly when the effect of some external factor or condition (e.g., aging, dietary chemicals) is being investigated. This was clearly illustrated by the application of distribution analysis in the study of variation in mouse liver cellular and fine structure, and may be true of fine structural studies in general. In living systems, structure and function interact in a dynamic way; they are "inseparable," unlike in technological systems or machines. Changes in fine structure therefore reflect changes in function. If such changes do not exceed a certain physiologic range, a quantitative analysis of structure will provide valuable information on quantitative changes in function that may not be possible or easy to measure directly. Because there is a large inherent variation in fine structure of cells in a given organ of an individual and among individuals, changes in fine structure can be analyzed only by studying frequency distribution curves of various structural characteristics (dimensions). Simple averages +/- S.D. do not in general reveal all information on the effect of a certain factor, because often this effect is not uniform; on the contrary, this will be apparent from distribution analysis because the form of the curves will be affected. We have also attempted to show in this chapter that similar general statistical principles and mechanisms may be operative in biological and technological systems. Despite the common belief that most biological and technological characteristics of interest have a symmetric bell-shaped (normal or Gaussian) distribution, we have shown that more often than not, distributions tend to be asymmetric and often resemble a so-called log-normal distribution. We saw that at least three general mechanisms may be operative, i.e., nonadditivity of influencing factors, competition among individuals for a common resource, and existence of an "optimum" value for a studied characteristic; more such mechanisms could exist.
Kakourou, Alexia; Vach, Werner; Nicolardi, Simone; van der Burgt, Yuri; Mertens, Bart
2016-10-01
Mass spectrometry based clinical proteomics has emerged as a powerful tool for high-throughput protein profiling and biomarker discovery. Recent improvements in mass spectrometry technology have boosted the potential of proteomic studies in biomedical research. However, the complexity of the proteomic expression introduces new statistical challenges in summarizing and analyzing the acquired data. Statistical methods for optimally processing proteomic data are currently a growing field of research. In this paper we present simple, yet appropriate methods to preprocess, summarize and analyze high-throughput MALDI-FTICR mass spectrometry data, collected in a case-control fashion, while dealing with the statistical challenges that accompany such data. The known statistical properties of the isotopic distribution of the peptide molecules are used to preprocess the spectra and translate the proteomic expression into a condensed data set. Information on either the intensity level or the shape of the identified isotopic clusters is used to derive summary measures on which diagnostic rules for disease status allocation will be based. Results indicate that both the shape of the identified isotopic clusters and the overall intensity level carry information on the class outcome and can be used to predict the presence or absence of the disease.
Metikaridis, T Damianos; Hadjipavlou, Alexander; Artemiadis, Artemios; Chrousos, George; Darviri, Christina
2016-05-20
Studies have shown that stress is implicated in the cause of neck pain (NP). The purpose of this study is to examine the effect of a simple, zero cost stress management program on patients suffering from NP. This study is a parallel-type randomized clinical study. People suffering from chronic non-specific NP were chosen randomly to participate in an eight week duration program of stress management (N= 28) (including diaphragmatic breathing, progressive muscle relaxation) or in a no intervention control condition (N= 25). Self-report measures were used for the evaluation of various variables at the beginning and at the end of the eight-week monitoring period. Descriptive and inferential statistic methods were used for the statistical analysis. At the end of the monitoring period, the intervention group showed a statistically significant reduction of stress and anxiety (p= 0.03, p= 0.01), report of stress related symptoms (p= 0.003), percentage of disability due to NP (p= 0.000) and NP intensity (p= 0.002). At the same time, daily routine satisfaction levels were elevated (p= 0.019). No statistically significant difference was observed in cortisol measurements. Stress management has positive effects on NP patients.
Statistics of Optical Coherence Tomography Data From Human Retina
de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo
2010-01-01
Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733
Simple Statistics: - Summarized!
ERIC Educational Resources Information Center
Blai, Boris, Jr.
Statistics are an essential tool for making proper judgement decisions. It is concerned with probability distribution models, testing of hypotheses, significance tests and other means of determining the correctness of deductions and the most likely outcome of decisions. Measures of central tendency include the mean, median and mode. A second…
Superordinate Shape Classification Using Natural Shape Statistics
ERIC Educational Resources Information Center
Wilder, John; Feldman, Jacob; Singh, Manish
2011-01-01
This paper investigates the classification of shapes into broad natural categories such as "animal" or "leaf". We asked whether such coarse classifications can be achieved by a simple statistical classification of the shape skeleton. We surveyed databases of natural shapes, extracting shape skeletons and tabulating their…
Rohrmeier, Martin A; Cross, Ian
2014-07-01
Humans rapidly learn complex structures in various domains. Findings of above-chance performance of some untrained control groups in artificial grammar learning studies raise questions about the extent to which learning can occur in an untrained, unsupervised testing situation with both correct and incorrect structures. The plausibility of unsupervised online-learning effects was modelled with n-gram, chunking and simple recurrent network models. A novel evaluation framework was applied, which alternates forced binary grammaticality judgments and subsequent learning of the same stimulus. Our results indicate a strong online learning effect for n-gram and chunking models and a weaker effect for simple recurrent network models. Such findings suggest that online learning is a plausible effect of statistical chunk learning that is possible when ungrammatical sequences contain a large proportion of grammatical chunks. Such common effects of continuous statistical learning may underlie statistical and implicit learning paradigms and raise implications for study design and testing methodologies. Copyright © 2014 Elsevier Inc. All rights reserved.
Contingency and statistical laws in replicate microbial closed ecosystems.
Hekstra, Doeke R; Leibler, Stanislas
2012-05-25
Contingency, the persistent influence of past random events, pervades biology. To what extent, then, is each course of ecological or evolutionary dynamics unique, and to what extent are these dynamics subject to a common statistical structure? Addressing this question requires replicate measurements to search for emergent statistical laws. We establish a readily replicated microbial closed ecosystem (CES), sustaining its three species for years. We precisely measure the local population density of each species in many CES replicates, started from the same initial conditions and kept under constant light and temperature. The covariation among replicates of the three species densities acquires a stable structure, which could be decomposed into discrete eigenvectors, or "ecomodes." The largest ecomode dominates population density fluctuations around the replicate-average dynamics. These fluctuations follow simple power laws consistent with a geometric random walk. Thus, variability in ecological dynamics can be studied with CES replicates and described by simple statistical laws. Copyright © 2012 Elsevier Inc. All rights reserved.
Synthetic data sets for the identification of key ingredients for RNA-seq differential analysis.
Rigaill, Guillem; Balzergue, Sandrine; Brunaud, Véronique; Blondet, Eddy; Rau, Andrea; Rogier, Odile; Caius, José; Maugis-Rabusseau, Cathy; Soubigou-Taconnat, Ludivine; Aubourg, Sébastien; Lurin, Claire; Martin-Magniette, Marie-Laure; Delannoy, Etienne
2018-01-01
Numerous statistical pipelines are now available for the differential analysis of gene expression measured with RNA-sequencing technology. Most of them are based on similar statistical frameworks after normalization, differing primarily in the choice of data distribution, mean and variance estimation strategy and data filtering. We propose an evaluation of the impact of these choices when few biological replicates are available through the use of synthetic data sets. This framework is based on real data sets and allows the exploration of various scenarios differing in the proportion of non-differentially expressed genes. Hence, it provides an evaluation of the key ingredients of the differential analysis, free of the biases associated with the simulation of data using parametric models. Our results show the relevance of a proper modeling of the mean by using linear or generalized linear modeling. Once the mean is properly modeled, the impact of the other parameters on the performance of the test is much less important. Finally, we propose to use the simple visualization of the raw P-value histogram as a practical evaluation criterion of the performance of differential analysis methods on real data sets. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Laparoscopic repair of perforated peptic ulcer: simple closure versus omentopexy.
Lin, Being-Chuan; Liao, Chien-Hung; Wang, Shang-Yu; Hwang, Tsann-Long
2017-12-01
This report presents our experience with laparoscopic repair performed in 118 consecutive patients diagnosed with a perforated peptic ulcer (PPU). We compared the surgical outcome of simple closure with modified Cellan-Jones omentopexy and report the safety and benefit of simple closure. From January 2010 to December 2014, 118 patients with PPU underwent laparoscopic repair with simple closure (n = 27) or omentopexy (n = 91). Charts were retrospectively reviewed for demographic characteristics and outcome. The data were compared by Fisher's exact test, Mann-Whitney U test, Pearson's chi-square test, and the Kruskal-Wallis test. The results were considered statistically significant if P < 0.05. No patients died, whereas three incurred leakage. After matching, the simple closure and omentopexy groups had similarity in sex, systolic blood pressure, pulse rate, respiratory rate, Boey score, Charlson comorbidity index, Mannheim peritonitis index, and leakage. There were statistically significant differences in age, length of hospital stay, perforated size, and operating time. Comparison of the operating time in the ≤4.0 mm and 5.0-12 mm groups revealed that the simple closure took less time than omentopexy in both groups (≤4.0 mm, 76 versus 133 minutes, P < 0.0001; 5.0-12 mm, 97 versus 139.5 minutes; P = 0.006). Compared to the omentopexy, laparoscopic simple closure is a safe procedure and shortens the operating time. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.
Kelley, George A.; Kelley, Kristi S.; Kohrt, Wendy M.
2013-01-01
Objective. Examine the effects of exercise on femoral neck (FN) and lumbar spine (LS) bone mineral density (BMD) in premenopausal women. Methods. Meta-analysis of randomized controlled exercise trials ≥24 weeks in premenopausal women. Standardized effect sizes (g) were calculated for each result and pooled using random-effects models, Z score alpha values, 95% confidence intervals (CIs), and number needed to treat (NNT). Heterogeneity was examined using Q and I 2. Moderator and predictor analyses using mixed-effects ANOVA and simple metaregression were conducted. Statistical significance was set at P ≤ 0.05. Results. Statistically significant improvements were found for both FN (7g's, 466 participants, g = 0.342, 95% CI = 0.132, 0.553, P = 0.001, Q = 10.8, P = 0.22, I 2 = 25.7%, NNT = 5) and LS (6g's, 402 participants, g = 0.201, 95% CI = 0.009, 0.394, P = 0.04, Q = 3.3, P = 0.65, I 2 = 0%, NNT = 9) BMD. A trend for greater benefits in FN BMD was observed for studies published in countries other than the United States and for those who participated in home versus facility-based exercise. Statistically significant, or a trend for statistically significant, associations were observed for 7 different moderators and predictors, 6 for FN BMD and 1 for LS BMD. Conclusions. Exercise benefits FN and LS BMD in premenopausal women. The observed moderators and predictors deserve further investigation in well-designed randomized controlled trials. PMID:23401684
Kelley, George A; Kelley, Kristi S; Kohrt, Wendy M
2013-01-01
Objective. Examine the effects of exercise on femoral neck (FN) and lumbar spine (LS) bone mineral density (BMD) in premenopausal women. Methods. Meta-analysis of randomized controlled exercise trials ≥24 weeks in premenopausal women. Standardized effect sizes (g) were calculated for each result and pooled using random-effects models, Z score alpha values, 95% confidence intervals (CIs), and number needed to treat (NNT). Heterogeneity was examined using Q and I(2). Moderator and predictor analyses using mixed-effects ANOVA and simple metaregression were conducted. Statistical significance was set at P ≤ 0.05. Results. Statistically significant improvements were found for both FN (7g's, 466 participants, g = 0.342, 95% CI = 0.132, 0.553, P = 0.001, Q = 10.8, P = 0.22, I(2) = 25.7%, NNT = 5) and LS (6g's, 402 participants, g = 0.201, 95% CI = 0.009, 0.394, P = 0.04, Q = 3.3, P = 0.65, I(2) = 0%, NNT = 9) BMD. A trend for greater benefits in FN BMD was observed for studies published in countries other than the United States and for those who participated in home versus facility-based exercise. Statistically significant, or a trend for statistically significant, associations were observed for 7 different moderators and predictors, 6 for FN BMD and 1 for LS BMD. Conclusions. Exercise benefits FN and LS BMD in premenopausal women. The observed moderators and predictors deserve further investigation in well-designed randomized controlled trials.
Hood of the truck statistics for food animal practitioners.
Slenning, Barrett D
2006-03-01
This article offers some tips on working with statistics and develops four relatively simple procedures to deal with most kinds of data with which veterinarians work. The criterion for a procedure to be a "Hood of the Truck Statistics" (HOT Stats) technique is that it must be simple enough to be done with pencil, paper, and a calculator. The goal of HOT Stats is to have the tools available to run quick analyses in only a few minutes so that decisions can be made in a timely fashion. The discipline allows us to move away from the all-too-common guess work about effects and differences we perceive following a change in treatment or management. The techniques allow us to move toward making more defensible, credible, and more quantifiably "risk-aware" real-time recommendations to our clients.
Proceedings, Seminar on Probabilistic Methods in Geotechnical Engineering
NASA Astrophysics Data System (ADS)
Hynes-Griffin, M. E.; Buege, L. L.
1983-09-01
Contents: Applications of Probabilistic Methods in Geotechnical Engineering; Probabilistic Seismic and Geotechnical Evaluation at a Dam Site; Probabilistic Slope Stability Methodology; Probability of Liquefaction in a 3-D Soil Deposit; Probabilistic Design of Flood Levees; Probabilistic and Statistical Methods for Determining Rock Mass Deformability Beneath Foundations: An Overview; Simple Statistical Methodology for Evaluating Rock Mechanics Exploration Data; New Developments in Statistical Techniques for Analyzing Rock Slope Stability.
Larson, Nicholas B; McDonnell, Shannon; Cannon Albright, Lisa; Teerlink, Craig; Stanford, Janet; Ostrander, Elaine A; Isaacs, William B; Xu, Jianfeng; Cooney, Kathleen A; Lange, Ethan; Schleutker, Johanna; Carpten, John D; Powell, Isaac; Bailey-Wilson, Joan E; Cussenot, Olivier; Cancel-Tassin, Geraldine; Giles, Graham G; MacInnis, Robert J; Maier, Christiane; Whittemore, Alice S; Hsieh, Chih-Lin; Wiklund, Fredrik; Catalona, William J; Foulkes, William; Mandal, Diptasri; Eeles, Rosalind; Kote-Jarai, Zsofia; Ackerman, Michael J; Olson, Timothy M; Klein, Christopher J; Thibodeau, Stephen N; Schaid, Daniel J
2017-05-01
Next-generation sequencing technologies have afforded unprecedented characterization of low-frequency and rare genetic variation. Due to low power for single-variant testing, aggregative methods are commonly used to combine observed rare variation within a single gene. Causal variation may also aggregate across multiple genes within relevant biomolecular pathways. Kernel-machine regression and adaptive testing methods for aggregative rare-variant association testing have been demonstrated to be powerful approaches for pathway-level analysis, although these methods tend to be computationally intensive at high-variant dimensionality and require access to complete data. An additional analytical issue in scans of large pathway definition sets is multiple testing correction. Gene set definitions may exhibit substantial genic overlap, and the impact of the resultant correlation in test statistics on Type I error rate control for large agnostic gene set scans has not been fully explored. Herein, we first outline a statistical strategy for aggregative rare-variant analysis using component gene-level linear kernel score test summary statistics as well as derive simple estimators of the effective number of tests for family-wise error rate control. We then conduct extensive simulation studies to characterize the behavior of our approach relative to direct application of kernel and adaptive methods under a variety of conditions. We also apply our method to two case-control studies, respectively, evaluating rare variation in hereditary prostate cancer and schizophrenia. Finally, we provide open-source R code for public use to facilitate easy application of our methods to existing rare-variant analysis results. © 2017 WILEY PERIODICALS, INC.
LOD significance thresholds for QTL analysis in experimental populations of diploid species
Van Ooijen JW
1999-11-01
Linkage analysis with molecular genetic markers is a very powerful tool in the biological research of quantitative traits. The lack of an easy way to know what areas of the genome can be designated as statistically significant for containing a gene affecting the quantitative trait of interest hampers the important prediction of the rate of false positives. In this paper four tables, obtained by large-scale simulations, are presented that can be used with a simple formula to get the false-positives rate for analyses of the standard types of experimental populations with diploid species with any size of genome. A new definition of the term 'suggestive linkage' is proposed that allows a more objective comparison of results across species.
Market inefficiency identified by both single and multiple currency trends
NASA Astrophysics Data System (ADS)
Tokár, T.; Horváth, D.
2012-11-01
Many studies have shown that there are good reasons to claim very low predictability of currency returns; nevertheless, the deviations from true randomness exist which have potential predictive and prognostic power [J. James, Simple trend-following strategies in currency trading, Quantitative finance 3 (2003) C75-C77]. We analyze the local trends which are of the main focus of the technical analysis. In this article we introduced various statistical quantities examining role of single temporal discretized trend or multitude of grouped trends corresponding to different time delays. Our specific analysis based predominantly on Euro-dollar currency pair data at the one minute frequency suggests the importance of cumulative nonrandom effect of trends on the potential forecasting performance.
Data survey on the effect of product features on competitive advantage of selected firms in Nigeria.
Olokundun, Maxwell; Iyiola, Oladele; Ibidunni, Stephen; Falola, Hezekiah; Salau, Odunayo; Amaihian, Augusta; Peter, Fred; Borishade, Taiye
2018-06-01
The main objective of this study was to present a data article that investigates the effect product features on firm's competitive advantage. Few studies have examined how the features of a product could help in driving the competitive advantage of a firm. Descriptive research method was used. Statistical Package for Social Sciences (SPSS 22) was engaged for analysis of one hundred and fifty (150) valid questionnaire which were completed by small business owners registered under small and medium scale enterprises development of Nigeria (SMEDAN). Stratified and simple random sampling techniques were employed; reliability and validity procedures were also confirmed. The field data set is made publicly available to enable critical or extended analysis.
Acoustic environmental accuracy requirements for response determination
NASA Technical Reports Server (NTRS)
Pettitt, M. R.
1983-01-01
A general purpose computer program was developed for the prediction of vehicle interior noise. This program, named VIN, has both modal and statistical energy analysis capabilities for structural/acoustic interaction analysis. The analytic models and their computer implementation were verified through simple test cases with well-defined experimental results. The model was also applied in a space shuttle payload bay launch acoustics prediction study. The computer program processes large and small problems with equal efficiency because all arrays are dynamically sized by program input variables at run time. A data base is built and easily accessed for design studies. The data base significantly reduces the computational costs of such studies by allowing the reuse of the still-valid calculated parameters of previous iterations.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-01-01
Aims A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R2), using R2 as the primary metric of assay agreement. However, the use of R2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. Methods We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Results Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. Conclusions The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. PMID:28747393
NASA Astrophysics Data System (ADS)
Wang, Audrey; Price, David T.
2007-03-01
A simple integrated algorithm was developed to relate global climatology to distributions of tree plant functional types (PFT). Multivariate cluster analysis was performed to analyze the statistical homogeneity of the climate space occupied by individual tree PFTs. Forested regions identified from the satellite-based GLC2000 classification were separated into tropical, temperate, and boreal sub-PFTs for use in the Canadian Terrestrial Ecosystem Model (CTEM). Global data sets of monthly minimum temperature, growing degree days, an index of climatic moisture, and estimated PFT cover fractions were then used as variables in the cluster analysis. The statistical results for individual PFT clusters were found consistent with other global-scale classifications of dominant vegetation. As an improvement of the quantification of the climatic limitations on PFT distributions, the results also demonstrated overlapping of PFT cluster boundaries that reflected vegetation transitions, for example, between tropical and temperate biomes. The resulting global database should provide a better basis for simulating the interaction of climate change and terrestrial ecosystem dynamics using global vegetation models.
Xu, Chet C; Chan, Roger W; Sun, Han; Zhan, Xiaowei
2017-11-01
A mixed-effects model approach was introduced in this study for the statistical analysis of rheological data of vocal fold tissues, in order to account for the data correlation caused by multiple measurements of each tissue sample across the test frequency range. Such data correlation had often been overlooked in previous studies in the past decades. The viscoelastic shear properties of the vocal fold lamina propria of two commonly used laryngeal research animal species (i.e. rabbit, porcine) were measured by a linear, controlled-strain simple-shear rheometer. Along with published canine and human rheological data, the vocal fold viscoelastic shear moduli of these animal species were compared to those of human over a frequency range of 1-250Hz using the mixed-effects models. Our results indicated that tissues of the rabbit, canine and porcine vocal fold lamina propria were significantly stiffer and more viscous than those of human. Mixed-effects models were shown to be able to more accurately analyze rheological data generated from repeated measurements. Copyright © 2017 Elsevier Ltd. All rights reserved.
Success rates of a skeletal anchorage system in orthodontics: A retrospective analysis.
Lam, Raymond; Goonewardene, Mithran S; Allan, Brent P; Sugawara, Junji
2018-01-01
To evaluate the premise that skeletal anchorage with SAS miniplates are highly successful and predictable for a range of complex orthodontic movements. This retrospective cross-sectional analysis consisted of 421 bone plates placed by one clinician in 163 patients (95 female, 68 male, mean age 29.4 years ± 12.02). Simple descriptive statistics were performed for a wide range of malocclusions and desired movements to obtain success, complication, and failure rates. The success rate of skeletal anchorage system miniplates was 98.6%, where approximately 40% of cases experienced mild complications. The most common complication was soft tissue inflammation, which was amenable to focused oral hygiene and antiseptic rinses. Infection occurred in approximately 15% of patients where there was a statistically significant correlation with poor oral hygiene. The most common movements were distalization and intrusion of teeth. More than a third of the cases involved complex movements in more than one plane of space. The success rate of skeletal anchorage system miniplates is high and predictable for a wide range of complex orthodontic movements.
NASA Astrophysics Data System (ADS)
Kim, Hyun-Sil; Kim, Jae-Seung; Lee, Seong-Hyun; Seo, Yun-Ho
2014-12-01
Insertion loss prediction of large acoustical enclosures using Statistical Energy Analysis (SEA) method is presented. The SEA model consists of three elements: sound field inside the enclosure, vibration energy of the enclosure panel, and sound field outside the enclosure. It is assumed that the space surrounding the enclosure is sufficiently large so that there is no energy flow from the outside to the wall panel or to air cavity inside the enclosure. The comparison of the predicted insertion loss to the measured data for typical large acoustical enclosures shows good agreements. It is found that if the critical frequency of the wall panel falls above the frequency region of interest, insertion loss is dominated by the sound transmission loss of the wall panel and averaged sound absorption coefficient inside the enclosure. However, if the critical frequency of the wall panel falls into the frequency region of interest, acoustic power from the sound radiation by the wall panel must be added to the acoustic power from transmission through the panel.
Predictor of increase in caregiver burden for disabled elderly at home.
Okamoto, Kazushi; Harasawa, Yuko
2009-01-01
In order to classify the caregivers at high risk of increase in their burden early, linear discriminant analysis was performed to obtain an effective discriminant model for differentiation of the presence or absence of increase in caregiver burden. The data obtained by self-administered questionnaire from 193 caregivers of frail elderly from January to February of 2005 were used. The discriminant analysis yielded a statistically significant function explaining 35.0% (Rc=0.59; d.f.=6; p=0.0001). The configuration indicated that the psychological predictors of change in caregiver burden with much perceived stress (1.47), high caregiver burden at baseline (1.28), emotional control (0.75), effort to achieve (-0.28), symptomatic depression (0.20) and "ikigai" (purpose in life) (0.18) made statistically significant contributions to the differentiation between no increase and increase in caregiver burden. The discriminant function showed a sensitivity of 86% and specificity of 81%, and successfully classified 83% of the caregivers. The function at baseline is a simple and useful method for screening of an increase in caregiver burden among caregivers for the frail elderly at home.
Urban Land Cover Mapping Accuracy Assessment - A Cost-benefit Analysis Approach
NASA Astrophysics Data System (ADS)
Xiao, T.
2012-12-01
One of the most important components in urban land cover mapping is mapping accuracy assessment. Many statistical models have been developed to help design simple schemes based on both accuracy and confidence levels. It is intuitive that an increased number of samples increases the accuracy as well as the cost of an assessment. Understanding cost and sampling size is crucial in implementing efficient and effective of field data collection. Few studies have included a cost calculation component as part of the assessment. In this study, a cost-benefit sampling analysis model was created by combining sample size design and sampling cost calculation. The sampling cost included transportation cost, field data collection cost, and laboratory data analysis cost. Simple Random Sampling (SRS) and Modified Systematic Sampling (MSS) methods were used to design sample locations and to extract land cover data in ArcGIS. High resolution land cover data layers of Denver, CO and Sacramento, CA, street networks, and parcel GIS data layers were used in this study to test and verify the model. The relationship between the cost and accuracy was used to determine the effectiveness of each sample method. The results of this study can be applied to other environmental studies that require spatial sampling.
Drew, Mark S.
2016-01-01
Cutaneous melanoma is the most life-threatening form of skin cancer. Although advanced melanoma is often considered as incurable, if detected and excised early, the prognosis is promising. Today, clinicians use computer vision in an increasing number of applications to aid early detection of melanoma through dermatological image analysis (dermoscopy images, in particular). Colour assessment is essential for the clinical diagnosis of skin cancers. Due to this diagnostic importance, many studies have either focused on or employed colour features as a constituent part of their skin lesion analysis systems. These studies range from using low-level colour features, such as simple statistical measures of colours occurring in the lesion, to availing themselves of high-level semantic features such as the presence of blue-white veil, globules, or colour variegation in the lesion. This paper provides a retrospective survey and critical analysis of contributions in this research direction. PMID:28096807
Quantitation & Case-Study-Driven Inquiry to Enhance Yeast Fermentation Studies
ERIC Educational Resources Information Center
Grammer, Robert T.
2012-01-01
We propose a procedure for the assay of fermentation in yeast in microcentrifuge tubes that is simple and rapid, permitting assay replicates, descriptive statistics, and the preparation of line graphs that indicate reproducibility. Using regression and simple derivatives to determine initial velocities, we suggest methods to compare the effects of…
A Simple Statistical Thermodynamics Experiment
ERIC Educational Resources Information Center
LoPresto, Michael C.
2010-01-01
Comparing the predicted and actual rolls of combinations of both two and three dice can help to introduce many of the basic concepts of statistical thermodynamics, including multiplicity, probability, microstates, and macrostates, and demonstrate that entropy is indeed a measure of randomness, that disordered states (those of higher entropy) are…
ERIC Educational Resources Information Center
Harris, Ronald M.
1978-01-01
Presents material dealing with an application of statistical thermodynamics to the diatomic solid I-2(s). The objective is to enhance the student's appreciation of the power of the statistical formulation of thermodynamics. The Simple Einstein Model is used. (Author/MA)
Entropy for Mechanically Vibrating Systems
NASA Astrophysics Data System (ADS)
Tufano, Dante
The research contained within this thesis deals with the subject of entropy as defined for and applied to mechanically vibrating systems. This work begins with an overview of entropy as it is understood in the fields of classical thermodynamics, information theory, statistical mechanics, and statistical vibroacoustics. Khinchin's definition of entropy, which is the primary definition used for the work contained in this thesis, is introduced in the context of vibroacoustic systems. The main goal of this research is to to establish a mathematical framework for the application of Khinchin's entropy in the field of statistical vibroacoustics by examining the entropy context of mechanically vibrating systems. The introduction of this thesis provides an overview of statistical energy analysis (SEA), a modeling approach to vibroacoustics that motivates this work on entropy. The objective of this thesis is given, and followed by a discussion of the intellectual merit of this work as well as a literature review of relevant material. Following the introduction, an entropy analysis of systems of coupled oscillators is performed utilizing Khinchin's definition of entropy. This analysis develops upon the mathematical theory relating to mixing entropy, which is generated by the coupling of vibroacoustic systems. The mixing entropy is shown to provide insight into the qualitative behavior of such systems. Additionally, it is shown that the entropy inequality property of Khinchin's entropy can be reduced to an equality using the mixing entropy concept. This equality can be interpreted as a facet of the second law of thermodynamics for vibroacoustic systems. Following this analysis, an investigation of continuous systems is performed using Khinchin's entropy. It is shown that entropy analyses using Khinchin's entropy are valid for continuous systems that can be decomposed into a finite number of modes. The results are shown to be analogous to those obtained for simple oscillators, which demonstrates the applicability of entropy-based approaches to real-world systems. Three systems are considered to demonstrate these findings: 1) a rod end-coupled to a simple oscillator, 2) two end-coupled rods, and 3) two end-coupled beams. The aforementioned work utilizes the weak coupling assumption to determine the entropy of composite systems. Following this discussion, a direct method of finding entropy is developed which does not rely on this limiting assumption. The resulting entropy provides a useful benchmark for evaluating the accuracy of the weak coupling approach, and is validated using systems of coupled oscillators. The later chapters of this work discuss Khinchin's entropy as applied to nonlinear and nonconservative systems, respectively. The discussion of entropy for nonlinear systems is motivated by the desire to expand the applicability of SEA techniques beyond the linear regime. The discussion of nonconservative systems is also crucial, since real-world systems interact with their environment, and it is necessary to confirm the validity of an entropy approach for systems that are relevant in the context of SEA. Having developed a mathematical framework for determining entropy under a number of previously unexplored cases, the relationship between thermodynamics and statistical vibroacoustics can be better understood. Specifically, vibroacoustic temperatures can be obtained for systems that are not necessarily linear or weakly coupled. In this way, entropy provides insight into how the power flow proportionality of statistical energy analysis (SEA) can be applied to a broader class of vibroacoustic systems. As such, entropy is a useful tool for both justifying and expanding the foundational results of SEA.
Metaplot: a novel stata graph for assessing heterogeneity at a glance.
Poorolajal, J; Mahmoodi, M; Majdzadeh, R; Fotouhi, A
2010-01-01
Heterogeneity is usually a major concern in meta-analysis. Although there are some statistical approaches for assessing variability across studies, here we present a new approach to heterogeneity using "MetaPlot" that investigate the influence of a single study on the overall heterogeneity. MetaPlot is a two-way (x, y) graph, which can be considered as a complementary graphical approach for testing heterogeneity. This method shows graphically as well as numerically the results of an influence analysis, in which Higgins' I(2) statistic with 95% (Confidence interval) CI are computed omitting one study in each turn and then are plotted against reciprocal of standard error (1/SE) or "precision". In this graph, "1/SE" lies on x axis and "I(2) results" lies on y axe. Having a first glance at MetaPlot, one can predict to what extent omission of a single study may influence the overall heterogeneity. The precision on x-axis enables us to distinguish the size of each trial. The graph describes I(2) statistic with 95% CI graphically as well as numerically in one view for prompt comparison. It is possible to implement MetaPlot for meta-analysis of different types of outcome data and summary measures. This method presents a simple graphical approach to identify an outlier and its effect on overall heterogeneity at a glance. We wish to suggest MetaPlot to Stata experts to prepare its module for the software.
Metaplot: A Novel Stata Graph for Assessing Heterogeneity at a Glance
Poorolajal, J; Mahmoodi, M; Majdzadeh, R; Fotouhi, A
2010-01-01
Background: Heterogeneity is usually a major concern in meta-analysis. Although there are some statistical approaches for assessing variability across studies, here we present a new approach to heterogeneity using “MetaPlot” that investigate the influence of a single study on the overall heterogeneity. Methods: MetaPlot is a two-way (x, y) graph, which can be considered as a complementary graphical approach for testing heterogeneity. This method shows graphically as well as numerically the results of an influence analysis, in which Higgins’ I2 statistic with 95% (Confidence interval) CI are computed omitting one study in each turn and then are plotted against reciprocal of standard error (1/SE) or “precision”. In this graph, “1/SE” lies on x axis and “I2 results” lies on y axe. Results: Having a first glance at MetaPlot, one can predict to what extent omission of a single study may influence the overall heterogeneity. The precision on x-axis enables us to distinguish the size of each trial. The graph describes I2 statistic with 95% CI graphically as well as numerically in one view for prompt comparison. It is possible to implement MetaPlot for meta-analysis of different types of outcome data and summary measures. Conclusion: This method presents a simple graphical approach to identify an outlier and its effect on overall heterogeneity at a glance. We wish to suggest MetaPlot to Stata experts to prepare its module for the software. PMID:23113013
Allele-sharing models: LOD scores and accurate linkage tests.
Kong, A; Cox, N J
1997-11-01
Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested.
Allele-sharing models: LOD scores and accurate linkage tests.
Kong, A; Cox, N J
1997-01-01
Starting with a test statistic for linkage analysis based on allele sharing, we propose an associated one-parameter model. Under general missing-data patterns, this model allows exact calculation of likelihood ratios and LOD scores and has been implemented by a simple modification of existing software. Most important, accurate linkage tests can be performed. Using an example, we show that some previously suggested approaches to handling less than perfectly informative data can be unacceptably conservative. Situations in which this model may not perform well are discussed, and an alternative model that requires additional computations is suggested. PMID:9345087
Continuous distribution of emission states from single CdSe/ZnS quantum dots.
Zhang, Kai; Chang, Hauyee; Fu, Aihua; Alivisatos, A Paul; Yang, Haw
2006-04-01
The photoluminescence dynamics of colloidal CdSe/ZnS/streptavidin quantum dots were studied using time-resolved single-molecule spectroscopy. Statistical tests of the photon-counting data suggested that the simple "on/off" discrete state model is inconsistent with experimental results. Instead, a continuous emission state distribution model was found to be more appropriate. Autocorrelation analysis of lifetime and intensity fluctuations showed a nonlinear correlation between them. These results were consistent with the model that charged quantum dots were also emissive, and that time-dependent charge migration gave rise to the observed photoluminescence dynamics.
Invariant approach to the character classification
NASA Astrophysics Data System (ADS)
Šariri, Kristina; Demoli, Nazif
2008-04-01
Image moments analysis is a very useful tool which allows image description invariant to translation and rotation, scale change and some types of image distortions. The aim of this work was development of simple method for fast and reliable classification of characters by using Hu's and affine moment invariants. Measure of Eucleidean distance was used as a discrimination feature with statistical parameters estimated. The method was tested in classification of Times New Roman font letters as well as sets of the handwritten characters. It is shown that using all Hu's and three affine invariants as discrimination set improves recognition rate by 30%.
Noise properties in the ideal Kirchhoff-Law-Johnson-Noise secure communication system.
Gingl, Zoltan; Mingesz, Robert
2014-01-01
In this paper we determine the noise properties needed for unconditional security for the ideal Kirchhoff-Law-Johnson-Noise (KLJN) secure key distribution system using simple statistical analysis. It has already been shown using physical laws that resistors and Johnson-like noise sources provide unconditional security. However real implementations use artificial noise generators, therefore it is a question if other kind of noise sources and resistor values could be used as well. We answer this question and in the same time we provide a theoretical basis to analyze real systems as well.
Operating a Geiger Müller tube using a PC sound card
NASA Astrophysics Data System (ADS)
Azooz, A. A.
2009-01-01
In this paper, a simple MATLAB-based PC program that enables the computer to function as a replacement for the electronic scalar-counter system associated with a Geiger-Müller (GM) tube is described. The program utilizes the ability of MATLAB to acquire data directly from the computer sound card. The signal from the GM tube is applied to the computer sound card via the line in port. All standard GM experiments, pulse shape and statistical analysis experiments can be carried out using this system. A new visual demonstration of dead time effects is also presented.
Statistical issues in the design and planning of proteomic profiling experiments.
Cairns, David A
2015-01-01
The statistical design of a clinical proteomics experiment is a critical part of well-undertaken investigation. Standard concepts from experimental design such as randomization, replication and blocking should be applied in all experiments, and this is possible when the experimental conditions are well understood by the investigator. The large number of proteins simultaneously considered in proteomic discovery experiments means that determining the number of required replicates to perform a powerful experiment is more complicated than in simple experiments. However, by using information about the nature of an experiment and making simple assumptions this is achievable for a variety of experiments useful for biomarker discovery and initial validation.
A simple statistical model for geomagnetic reversals
NASA Technical Reports Server (NTRS)
Constable, Catherine
1990-01-01
The diversity of paleomagnetic records of geomagnetic reversals now available indicate that the field configuration during transitions cannot be adequately described by simple zonal or standing field models. A new model described here is based on statistical properties inferred from the present field and is capable of simulating field transitions like those observed. Some insight is obtained into what one can hope to learn from paleomagnetic records. In particular, it is crucial that the effects of smoothing in the remanence acquisition process be separated from true geomagnetic field behavior. This might enable us to determine the time constants associated with the dominant field configuration during a reversal.
Franc, Jeffrey Michael; Ingrassia, Pier Luigi; Verde, Manuela; Colombo, Davide; Della Corte, Francesco
2015-02-01
Surge capacity, or the ability to manage an extraordinary volume of patients, is fundamental for hospital management of mass-casualty incidents. However, quantification of surge capacity is difficult and no universal standard for its measurement has emerged, nor has a standardized statistical method been advocated. As mass-casualty incidents are rare, simulation may represent a viable alternative to measure surge capacity. Hypothesis/Problem The objective of the current study was to develop a statistical method for the quantification of surge capacity using a combination of computer simulation and simple process-control statistical tools. Length-of-stay (LOS) and patient volume (PV) were used as metrics. The use of this method was then demonstrated on a subsequent computer simulation of an emergency department (ED) response to a mass-casualty incident. In the derivation phase, 357 participants in five countries performed 62 computer simulations of an ED response to a mass-casualty incident. Benchmarks for ED response were derived from these simulations, including LOS and PV metrics for triage, bed assignment, physician assessment, and disposition. In the application phase, 13 students of the European Master in Disaster Medicine (EMDM) program completed the same simulation scenario, and the results were compared to the standards obtained in the derivation phase. Patient-volume metrics included number of patients to be triaged, assigned to rooms, assessed by a physician, and disposed. Length-of-stay metrics included median time to triage, room assignment, physician assessment, and disposition. Simple graphical methods were used to compare the application phase group to the derived benchmarks using process-control statistical tools. The group in the application phase failed to meet the indicated standard for LOS from admission to disposition decision. This study demonstrates how simulation software can be used to derive values for objective benchmarks of ED surge capacity using PV and LOS metrics. These objective metrics can then be applied to other simulation groups using simple graphical process-control tools to provide a numeric measure of surge capacity. Repeated use in simulations of actual EDs may represent a potential means of objectively quantifying disaster management surge capacity. It is hoped that the described statistical method, which is simple and reusable, will be useful for investigators in this field to apply to their own research.
Wang, Tong; Wu, Hai-Long; Xie, Li-Xia; Zhu, Li; Liu, Zhi; Sun, Xiao-Dong; Xiao, Rong; Yu, Ru-Qin
2017-04-01
In this work, a smart chemometrics-enhanced strategy, high-performance liquid chromatography, and diode array detection coupled with second-order calibration method based on alternating trilinear decomposition algorithm was proposed to simultaneously quantify 12 polyphenols in different kinds of apple peel and pulp samples. The proposed strategy proved to be a powerful tool to solve the problems of coelution, unknown interferences, and chromatographic shifts in the process of high-performance liquid chromatography analysis, making it possible for the determination of 12 polyphenols in complex apple matrices within 10 min under simple conditions of elution. The average recoveries with standard deviations, and figures of merit including sensitivity, selectivity, limit of detection, and limit of quantitation were calculated to validate the accuracy of the proposed method. Compared to the quantitative analysis results from the classic high-performance liquid chromatography method, the statistical and graphical analysis showed that our proposed strategy obtained more reliable results. All results indicated that our proposed method used in the quantitative analysis of apple polyphenols was an accurate, fast, universal, simple, and green one, and it was expected to be developed as an attractive alternative method for simultaneous determination of multitargeted analytes in complex matrices. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Regression: The Apple Does Not Fall Far From the Tree.
Vetter, Thomas R; Schober, Patrick
2018-05-15
Researchers and clinicians are frequently interested in either: (1) assessing whether there is a relationship or association between 2 or more variables and quantifying this association; or (2) determining whether 1 or more variables can predict another variable. The strength of such an association is mainly described by the correlation. However, regression analysis and regression models can be used not only to identify whether there is a significant relationship or association between variables but also to generate estimations of such a predictive relationship between variables. This basic statistical tutorial discusses the fundamental concepts and techniques related to the most common types of regression analysis and modeling, including simple linear regression, multiple regression, logistic regression, ordinal regression, and Poisson regression, as well as the common yet often underrecognized phenomenon of regression toward the mean. The various types of regression analysis are powerful statistical techniques, which when appropriately applied, can allow for the valid interpretation of complex, multifactorial data. Regression analysis and models can assess whether there is a relationship or association between 2 or more observed variables and estimate the strength of this association, as well as determine whether 1 or more variables can predict another variable. Regression is thus being applied more commonly in anesthesia, perioperative, critical care, and pain research. However, it is crucial to note that regression can identify plausible risk factors; it does not prove causation (a definitive cause and effect relationship). The results of a regression analysis instead identify independent (predictor) variable(s) associated with the dependent (outcome) variable. As with other statistical methods, applying regression requires that certain assumptions be met, which can be tested with specific diagnostics.
Information Visualization Techniques for Effective Cross-Discipline Communication
NASA Astrophysics Data System (ADS)
Fisher, Ward
2013-04-01
Collaboration between research groups in different fields is a common occurrence, but it can often be frustrating due to the absence of a common vocabulary. This lack of a shared context can make expressing important concepts and discussing results difficult. This problem may be further exacerbated when communicating to an audience of laypeople. Without a clear frame of reference, simple concepts are often rendered difficult-to-understand at best, and unintelligible at worst. An easy way to alleviate this confusion is with the use of clear, well-designed visualizations to illustrate an idea, process or conclusion. There exist a number of well-described machine-learning and statistical techniques which can be used to illuminate the information present within complex high-dimensional datasets. Once the information has been separated from the data, clear communication becomes a matter of selecting an appropriate visualization. Ideally, the visualization is information-rich but data-scarce. Anything from a simple bar chart, to a line chart with confidence intervals, to an animated set of 3D point-clouds can be used to render a complex idea as an easily understood image. Several case studies will be presented in this work. In the first study, we will examine how a complex statistical analysis was applied to a high-dimensional dataset, and how the results were succinctly communicated to an audience of microbiologists and chemical engineers. Next, we will examine a technique used to illustrate the concept of the singular value decomposition, as used in the field of computer vision, to a lay audience of undergraduate students from mixed majors. We will then examine a case where a simple animated line plot was used to communicate an approach to signal decomposition, and will finish with a discussion of the tools available to create these visualizations.
HPTLC Determination of Artemisinin and Its Derivatives in Bulk and Pharmaceutical Dosage
NASA Astrophysics Data System (ADS)
Agarwal, Suraj P.; Ahuja, Shipra
A simple, selective, accurate, and precise high-performance thin-layer chromatographic (HPTLC) method has been established and validated for the analysis of artemisinin and its derivatives (artesunate, artemether, and arteether) in the bulk drugs and formulations. The artemisinin, artesunate, artemether, and arteether were separated on aluminum-backed silica gel 60 F254 plates with toluene:ethyl acetate (10:1), toluene: ethyl acetate: acetic acid (2:8:0.2), toluene:butanol (10:1), and toluene:dichloro methane (0.5:10) mobile phase, respectively. The linear detector response for concentrations between 100 and 600 ng/spot showed good linear relationship with r value 0.9967, 0.9989, 0.9981 and 0.9989 for artemisinin, artesunate, artemether, and arteether, respectively. Statistical analysis proves that the method is precise, accurate, and reproducible and hence can be employed for the routine analysis.
[The application of stereology in radiology imaging and cell biology fields].
Hu, Na; Wang, Yan; Feng, Yuanming; Lin, Wang
2012-08-01
Stereology is an interdisciplinary method for 3D morphological study developed from mathematics and morphology. It is widely used in medical image analysis and cell biology studies. Because of its unbiased, simple, fast, reliable and non-invasive characteristics, stereology has been widely used in biomedical areas for quantitative analysis and statistics, such as histology, pathology and medical imaging. Because the stereological parameters show distinct differences in different pathology, many scholars use stereological methods to do quantitative analysis in their studies in recent years, for example, in the areas of the condition of cancer cells, tumor grade, disease development and the patient's prognosis, etc. This paper describes the stereological concept and estimation methods, also illustrates the applications of stereology in the fields of CT images, MRI images and cell biology, and finally reflects the universality, the superiority and reliability of stereology.
Vu, Trung N; Valkenborg, Dirk; Smets, Koen; Verwaest, Kim A; Dommisse, Roger; Lemière, Filip; Verschoren, Alain; Goethals, Bart; Laukens, Kris
2011-10-20
Nuclear magnetic resonance spectroscopy (NMR) is a powerful technique to reveal and compare quantitative metabolic profiles of biological tissues. However, chemical and physical sample variations make the analysis of the data challenging, and typically require the application of a number of preprocessing steps prior to data interpretation. For example, noise reduction, normalization, baseline correction, peak picking, spectrum alignment and statistical analysis are indispensable components in any NMR analysis pipeline. We introduce a novel suite of informatics tools for the quantitative analysis of NMR metabolomic profile data. The core of the processing cascade is a novel peak alignment algorithm, called hierarchical Cluster-based Peak Alignment (CluPA). The algorithm aligns a target spectrum to the reference spectrum in a top-down fashion by building a hierarchical cluster tree from peak lists of reference and target spectra and then dividing the spectra into smaller segments based on the most distant clusters of the tree. To reduce the computational time to estimate the spectral misalignment, the method makes use of Fast Fourier Transformation (FFT) cross-correlation. Since the method returns a high-quality alignment, we can propose a simple methodology to study the variability of the NMR spectra. For each aligned NMR data point the ratio of the between-group and within-group sum of squares (BW-ratio) is calculated to quantify the difference in variability between and within predefined groups of NMR spectra. This differential analysis is related to the calculation of the F-statistic or a one-way ANOVA, but without distributional assumptions. Statistical inference based on the BW-ratio is achieved by bootstrapping the null distribution from the experimental data. The workflow performance was evaluated using a previously published dataset. Correlation maps, spectral and grey scale plots show clear improvements in comparison to other methods, and the down-to-earth quantitative analysis works well for the CluPA-aligned spectra. The whole workflow is embedded into a modular and statistically sound framework that is implemented as an R package called "speaq" ("spectrum alignment and quantitation"), which is freely available from http://code.google.com/p/speaq/.
An On-Demand Optical Quantum Random Number Generator with In-Future Action and Ultra-Fast Response
Stipčević, Mario; Ursin, Rupert
2015-01-01
Random numbers are essential for our modern information based society e.g. in cryptography. Unlike frequently used pseudo-random generators, physical random number generators do not depend on complex algorithms but rather on a physicsal process to provide true randomness. Quantum random number generators (QRNG) do rely on a process, wich can be described by a probabilistic theory only, even in principle. Here we present a conceptualy simple implementation, which offers a 100% efficiency of producing a random bit upon a request and simultaneously exhibits an ultra low latency. A careful technical and statistical analysis demonstrates its robustness against imperfections of the actual implemented technology and enables to quickly estimate randomness of very long sequences. Generated random numbers pass standard statistical tests without any post-processing. The setup described, as well as the theory presented here, demonstrate the maturity and overall understanding of the technology. PMID:26057576
THE DISTRIBUTION OF COOK’S D STATISTIC
Muller, Keith E.; Mok, Mario Chen
2013-01-01
Cook (1977) proposed a diagnostic to quantify the impact of deleting an observation on the estimated regression coefficients of a General Linear Univariate Model (GLUM). Simulations of models with Gaussian response and predictors demonstrate that his suggestion of comparing the diagnostic to the median of the F for overall regression captures an erratically varying proportion of the values. We describe the exact distribution of Cook’s statistic for a GLUM with Gaussian predictors and response. We also present computational forms, simple approximations, and asymptotic results. A simulation supports the accuracy of the results. The methods allow accurate evaluation of a single value or the maximum value from a regression analysis. The approximations work well for a single value, but less well for the maximum. In contrast, the cut-point suggested by Cook provides widely varying tail probabilities. As with all diagnostics, the data analyst must use scientific judgment in deciding how to treat highlighted observations. PMID:24363487
Investigating Student Understanding for a Statistical Analysis of Two Thermally Interacting Solids
NASA Astrophysics Data System (ADS)
Loverude, Michael E.
2010-10-01
As part of an ongoing research and curriculum development project for upper-division courses in thermal physics, we have developed a sequence of tutorials in which students apply statistical methods to examine the behavior of two interacting Einstein solids. In the sequence, students begin with simple results from probability and develop a means for counting the states in a single Einstein solid. The students then consider the thermal interaction of two solids, and observe that the classical equilibrium state corresponds to the most probable distribution of energy between the two solids. As part of the development of the tutorial sequence, we have developed several assessment questions to probe student understanding of various aspects of this system. In this paper, we describe the strengths and weaknesses of student reasoning, both qualitative and quantitative, to assess the readiness of students for one tutorial in the sequence.
Sun, J
1995-09-01
In this paper we discuss the non-parametric estimation of a distribution function based on incomplete data for which the measurement origin of a survival time or the date of enrollment in a study is known only to belong to an interval. Also the survival time of interest itself is observed from a truncated distribution and is known only to lie in an interval. To estimate the distribution function, a simple self-consistency algorithm, a generalization of Turnbull's (1976, Journal of the Royal Statistical Association, Series B 38, 290-295) self-consistency algorithm, is proposed. This method is then used to analyze two AIDS cohort studies, for which direct use of the EM algorithm (Dempster, Laird and Rubin, 1976, Journal of the Royal Statistical Association, Series B 39, 1-38), which is computationally complicated, has previously been the usual method of the analysis.
Optimizing Integrated Terminal Airspace Operations Under Uncertainty
NASA Technical Reports Server (NTRS)
Bosson, Christabelle; Xue, Min; Zelinski, Shannon
2014-01-01
In the terminal airspace, integrated departures and arrivals have the potential to increase operations efficiency. Recent research has developed geneticalgorithm- based schedulers for integrated arrival and departure operations under uncertainty. This paper presents an alternate method using a machine jobshop scheduling formulation to model the integrated airspace operations. A multistage stochastic programming approach is chosen to formulate the problem and candidate solutions are obtained by solving sample average approximation problems with finite sample size. Because approximate solutions are computed, the proposed algorithm incorporates the computation of statistical bounds to estimate the optimality of the candidate solutions. A proof-ofconcept study is conducted on a baseline implementation of a simple problem considering a fleet mix of 14 aircraft evolving in a model of the Los Angeles terminal airspace. A more thorough statistical analysis is also performed to evaluate the impact of the number of scenarios considered in the sampled problem. To handle extensive sampling computations, a multithreading technique is introduced.
DOE Office of Scientific and Technical Information (OSTI.GOV)
House, L.L.; Querfeld, C.W.; Rees, D.E.
1982-04-15
Coronal magnetic fields influence in the intensity and linear polarization of light scattered by coronal Fe XIV ions. To interpret polarization measurements of Fe XIV 5303 A coronal emission requires a detailed understanding of the dependence of the emitted Stokes vector on coronal magnetic field direction, electron density, and temperature and on height of origin. The required dependence is included in the solutions of statistical equilibrium for the ion which are solved explicitly for 34 magnetic sublevels in both the ground and four excited terms. The full solutions are reduced to equivalent simple analytic forms which clearly show the requiredmore » dependence on coronal conditions. The analytic forms of the reduced solutions are suitable for routine analysis of 5303 green line polarimetric data obtained at Pic du Midi and from the Solar Maximum Mission Coronagraph/Polarimeter.« less
How cells explore shape space: a quantitative statistical perspective of cellular morphogenesis.
Yin, Zheng; Sailem, Heba; Sero, Julia; Ardy, Rico; Wong, Stephen T C; Bakal, Chris
2014-12-01
Through statistical analysis of datasets describing single cell shape following systematic gene depletion, we have found that the morphological landscapes explored by cells are composed of a small number of attractor states. We propose that the topology of these landscapes is in large part determined by cell-intrinsic factors, such as biophysical constraints on cytoskeletal organization, and reflects different stable signaling and/or transcriptional states. Cell-extrinsic factors act to determine how cells explore these landscapes, and the topology of the landscapes themselves. Informational stimuli primarily drive transitions between stable states by engaging signaling networks, while mechanical stimuli tune, or even radically alter, the topology of these landscapes. As environments fluctuate, the topology of morphological landscapes explored by cells dynamically adapts to these fluctuations. Finally we hypothesize how complex cellular and tissue morphologies can be generated from a limited number of simple cell shapes. © 2014 WILEY Periodicals, Inc.
The ASC/SIL ratio for cytopathologists as a quality control measure: a follow-up study.
Nascimento, Alessandra F; Cibas, Edmund S
2007-10-01
Monitoring the relative frequency of the interpretations of atypical squamous cells (ASC) and squamous intraepithelial lesions (SIL) has been proposed as a quality control measure. To assess its value, an ASC/SIL ratio was calculated every 6 months for 3.5 years, and confidential feedback was provided to 10 cytopathologists (CPs). By using simple regression analysis, we analyzed the initial and final ASC/SIL ratios for individual CPs and for the entire group. The ratio was below the upper benchmark of 3:1 for all but 1 CP during every 6-month period. The ratio for all CPs combined showed a downward trend (from 2.05 to 1.73). The ratio for 6 CPs decreased, and for two of them the decrease was statistically significant. One CP showed a statistically significant increase in the ASC/SIL ratio. The decrease for some CPs likely reflects the salutary effect of confidential feedback and counseling.
Manufacturing Squares: An Integrative Statistical Process Control Exercise
ERIC Educational Resources Information Center
Coy, Steven P.
2016-01-01
In the exercise, students in a junior-level operations management class are asked to manufacture a simple product. Given product specifications, they must design a production process, create roles and design jobs for each team member, and develop a statistical process control plan that efficiently and effectively controls quality during…
NASA Astrophysics Data System (ADS)
Zhang, Weijia; Fuller, Robert G.
1998-05-01
A demographic database for the 139 Nobel prize winners in physics from 1901 to 1990 has been created from a variety of sources. The results of our statistical study are discussed in the light of the implications for physics teaching.
Teaching Statistics with Minitab II.
ERIC Educational Resources Information Center
Ryan, T. A., Jr.; And Others
Minitab is a statistical computing system which uses simple language, produces clear output, and keeps track of bookkeeping automatically. Error checking with English diagnostics and inclusion of several default options help to facilitate use of the system by students. Minitab II is an improved and expanded version of the original Minitab which…
Applying Descriptive Statistics to Teaching the Regional Classification of Climate.
ERIC Educational Resources Information Center
Lindquist, Peter S.; Hammel, Daniel J.
1998-01-01
Describes an exercise for college and high school students that relates descriptive statistics to the regional climatic classification. The exercise introduces students to simple calculations of central tendency and dispersion, the construction and interpretation of scatterplots, and the definition of climatic regions. Forces students to engage…
An Experimental Approach to Teaching and Learning Elementary Statistical Mechanics
ERIC Educational Resources Information Center
Ellis, Frank B.; Ellis, David C.
2008-01-01
Introductory statistical mechanics is studied for a simple two-state system using an inexpensive and easily built apparatus. A large variety of demonstrations, suitable for students in high school and introductory university chemistry courses, are possible. This article details demonstrations for exothermic and endothermic reactions, the dynamic…
Origin of the correlations between exit times in pedestrian flows through a bottleneck
NASA Astrophysics Data System (ADS)
Nicolas, Alexandre; Touloupas, Ioannis
2018-01-01
Robust statistical features have emerged from the microscopic analysis of dense pedestrian flows through a bottleneck, notably with respect to the time gaps between successive passages. We pinpoint the mechanisms at the origin of these features thanks to simple models that we develop and analyse quantitatively. We disprove the idea that anticorrelations between successive time gaps (i.e. an alternation between shorter ones and longer ones) are a hallmark of a zipper-like intercalation of pedestrian lines and show that they simply result from the possibility that pedestrians from distinct ‘lines’ or directions cross the bottleneck within a short time interval. A second feature concerns the bursts of escapes, i.e. egresses that come in fast succession. Despite the ubiquity of exponential distributions of burst sizes, entailed by a Poisson process, we argue that anomalous (power-law) statistics arise if the bottleneck is nearly congested, albeit only in a tiny portion of parameter space. The generality of the proposed mechanisms implies that similar statistical features should also be observed for other types of particulate flows.
Bellenguez, Céline; Strange, Amy; Freeman, Colin; Donnelly, Peter; Spencer, Chris C A
2012-01-01
High-throughput genotyping arrays provide an efficient way to survey single nucleotide polymorphisms (SNPs) across the genome in large numbers of individuals. Downstream analysis of the data, for example in genome-wide association studies (GWAS), often involves statistical models of genotype frequencies across individuals. The complexities of the sample collection process and the potential for errors in the experimental assay can lead to biases and artefacts in an individual's inferred genotypes. Rather than attempting to model these complications, it has become a standard practice to remove individuals whose genome-wide data differ from the sample at large. Here we describe a simple, but robust, statistical algorithm to identify samples with atypical summaries of genome-wide variation. Its use as a semi-automated quality control tool is demonstrated using several summary statistics, selected to identify different potential problems, and it is applied to two different genotyping platforms and sample collections. The algorithm is written in R and is freely available at www.well.ox.ac.uk/chris-spencer chris.spencer@well.ox.ac.uk Supplementary data are available at Bioinformatics online.
Statistical fluctuations in pedestrian evacuation times and the effect of social contagion
NASA Astrophysics Data System (ADS)
Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.
2016-08-01
Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.
Learning predictive statistics from temporal sequences: Dynamics and strategies.
Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe
2017-10-01
Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.
A simple blind placement of the left-sided double-lumen tubes.
Zong, Zhi Jun; Shen, Qi Ying; Lu, Yao; Li, Yuan Hai
2016-11-01
One-lung ventilation (OLV) has been commonly provided by using a double-lumen tube (DLT). Previous reports have indicated the high incidence of inappropriate DLT positioning in conventional maneuvers.After obtaining approval from the medical ethics committee of First Affiliated Hospital of Anhui Medical University and written consent from patients, 88 adult patients belonging to American society of anesthesiologists (ASA) physical status grade I or II, and undergoing elective thoracic surgery requiring a left-side DLT for OLV were enrolled in this prospective, single-blind, randomized controlled study. Patients were randomly allocated to 1 of 2 groups: simple maneuver group or conventional maneuver group. The simple maneuver is a method that relies on partially inflating the bronchial balloon and recreating the effect of a carinal hook on the DLTs to give an idea of orientation and depth. After the induction of anesthesia the patients were intubated with a left-sided Robertshaw DLT using one of the 2 intubation techniques. After intubation of each DLT, an anesthesiologist used flexible bronchoscopy to evaluate the patient while the patient lay in a supine position. The number of optimal position and the time required to place DLT in correct position were recorded.Time for the intubation of DLT took 100 ± 16.2 seconds (mean ± SD) in simple maneuver group and 95.1 ± 20.8 seconds in conventional maneuver group. The difference was not statistically significant (P = 0.221). Time for fiberoptic bronchoscope (FOB) took 22 ± 4.8 seconds in simple maneuver group and was statistically faster than that in conventional maneuver group (43.6 ± 23.7 seconds, P < 0.001). Nearly 98% of the 44 intubations in simple maneuver group were considered as in optimal position while only 52% of the 44 intubations in conventional maneuver group were in optimal position, and the difference was statistically significant (P < 0.001).This simple maneuver is more rapid and more accurate to position left-sided DLTs, it may be substituted for FOB during positioning of a left-sided DLT in condition that FOB is unavailable or inapplicable.
NASA Technical Reports Server (NTRS)
Hakkinen, Raimo J; Richardson, A S , Jr
1957-01-01
Sinusoidally oscillating downwash and lift produced on a simple rigid airfoil were measured and compared with calculated values. Statistically stationary random downwash and the corresponding lift on a simple rigid airfoil were also measured and the transfer functions between their power spectra determined. The random experimental values are compared with theoretically approximated values. Limitations of the experimental technique and the need for more extensive experimental data are discussed.
NASA Astrophysics Data System (ADS)
Kang, Pilsang; Koo, Changhoi; Roh, Hokyu
2017-11-01
Since simple linear regression theory was established at the beginning of the 1900s, it has been used in a variety of fields. Unfortunately, it cannot be used directly for calibration. In practical calibrations, the observed measurements (the inputs) are subject to errors, and hence they vary, thus violating the assumption that the inputs are fixed. Therefore, in the case of calibration, the regression line fitted using the method of least squares is not consistent with the statistical properties of simple linear regression as already established based on this assumption. To resolve this problem, "classical regression" and "inverse regression" have been proposed. However, they do not completely resolve the problem. As a fundamental solution, we introduce "reversed inverse regression" along with a new methodology for deriving its statistical properties. In this study, the statistical properties of this regression are derived using the "error propagation rule" and the "method of simultaneous error equations" and are compared with those of the existing regression approaches. The accuracy of the statistical properties thus derived is investigated in a simulation study. We conclude that the newly proposed regression and methodology constitute the complete regression approach for univariate linear calibrations.
On-line estimation of error covariance parameters for atmospheric data assimilation
NASA Technical Reports Server (NTRS)
Dee, Dick P.
1995-01-01
A simple scheme is presented for on-line estimation of covariance parameters in statistical data assimilation systems. The scheme is based on a maximum-likelihood approach in which estimates are produced on the basis of a single batch of simultaneous observations. Simple-sample covariance estimation is reasonable as long as the number of available observations exceeds the number of tunable parameters by two or three orders of magnitude. Not much is known at present about model error associated with actual forecast systems. Our scheme can be used to estimate some important statistical model error parameters such as regionally averaged variances or characteristic correlation length scales. The advantage of the single-sample approach is that it does not rely on any assumptions about the temporal behavior of the covariance parameters: time-dependent parameter estimates can be continuously adjusted on the basis of current observations. This is of practical importance since it is likely to be the case that both model error and observation error strongly depend on the actual state of the atmosphere. The single-sample estimation scheme can be incorporated into any four-dimensional statistical data assimilation system that involves explicit calculation of forecast error covariances, including optimal interpolation (OI) and the simplified Kalman filter (SKF). The computational cost of the scheme is high but not prohibitive; on-line estimation of one or two covariance parameters in each analysis box of an operational bozed-OI system is currently feasible. A number of numerical experiments performed with an adaptive SKF and an adaptive version of OI, using a linear two-dimensional shallow-water model and artificially generated model error are described. The performance of the nonadaptive versions of these methods turns out to depend rather strongly on correct specification of model error parameters. These parameters are estimated under a variety of conditions, including uniformly distributed model error and time-dependent model error statistics.
Weighing Evidence "Steampunk" Style via the Meta-Analyser.
Bowden, Jack; Jackson, Chris
2016-10-01
The funnel plot is a graphical visualization of summary data estimates from a meta-analysis, and is a useful tool for detecting departures from the standard modeling assumptions. Although perhaps not widely appreciated, a simple extension of the funnel plot can help to facilitate an intuitive interpretation of the mathematics underlying a meta-analysis at a more fundamental level, by equating it to determining the center of mass of a physical system. We used this analogy to explain the concepts of weighing evidence and of biased evidence to a young audience at the Cambridge Science Festival, without recourse to precise definitions or statistical formulas and with a little help from Sherlock Holmes! Following on from the science fair, we have developed an interactive web-application (named the Meta-Analyser) to bring these ideas to a wider audience. We envisage that our application will be a useful tool for researchers when interpreting their data. First, to facilitate a simple understanding of fixed and random effects modeling approaches; second, to assess the importance of outliers; and third, to show the impact of adjusting for small study bias. This final aim is realized by introducing a novel graphical interpretation of the well-known method of Egger regression.
NASA Technical Reports Server (NTRS)
Farassat, Fereidoun; Casper, Jay H.
2012-01-01
We show that a simple modification of Formulation 1 of Farassat results in a new analytic expression that is highly suitable for broadband noise prediction when extensive turbulence simulation is available. This result satisfies all the stringent requirements, such as permitting the use of the exact geometry and kinematics of the moving body, that we have set as our goal in the derivation of useful acoustic formulas for the prediction of rotating blade and airframe noise. We also derive a simple analytic expression for the autocorrelation of the acoustic pressure that is valid in the near and far fields. Our analysis is based on the time integral of the acoustic pressure that can easily be obtained at any resolution for any observer time interval and digitally analyzed for broadband noise prediction. We have named this result as Formulation 2B of Farassat. One significant consequence of Formulation 2B is the derivation of the acoustic velocity potential for the thickness and loading terms of the Ffowcs Williams-Hawkings (FW-H) equation. This will greatly enhance the usefulness of the Fast Scattering Code (FSC) by providing a high fidelity boundary condition input for scattering predictions.
Grave prognosis on spontaneous intracerebral haemorrhage: GP on STAGE score.
Poungvarin, Niphon; Suwanwela, Nijasri C; Venketasubramanian, Narayanaswamy; Wong, Lawrence K S; Navarro, Jose C; Bitanga, Ester; Yoon, Byung Woo; Chang, Hui M; Alam, Sardar M
2006-11-01
Spontaneous intracerebral haemorrhage (ICH) is more common in Asia than in western countries, and has a high mortality rate. A simple prognostic score for predicting grave prognosis of ICH is lacking. Our objective was to develop a simple and reliable score for most physicians. ICH patients from seven Asian countries were enrolled between May 2000 and April 2002 for a prospective study. Clinical features such as headache and vomiting, vascular risk factors, Glasgow coma scale (GCS), body temperature (BT), blood pressure on arrival, location and size of haematoma, intraventricular haemorrhage (IVH), hydrocephalus, need for surgical treatment, medical treatment, length of hospital stay and other complications were analyzed to determine the outcome using a modified Rankin scale (MRS). Grave prognosis (defined as MRS of 5-6) was judged on the discharge date. 995 patients, mean age 59.5 +/- 14.3 years were analyzed, after exclusion of incomplete data in 87 patients. 402 patients (40.4%) were in the grave prognosis group (MRS 5-6). Univariable analysis and then multivariable analysis showed only four statistically significant predictors for grave outcome of ICH. They were fever (BT > or = 37.8 degrees c), low GCS, large haematoma and IVH. The grave prognosis on spontaneous intracerebral haemorrhage (GP on STAGE) score was derived from these four factors using a multiple logistic model. A simple and pragmatic prognostic score for ICH outcome has been developed with high sensitivity (82%) and specificity (82%). Furthermore, it can be administered by most general practitioners. Validation in other populations is now required.
Aerosol Complexity and Implications for Predictability and Short-Term Forecasting
NASA Technical Reports Server (NTRS)
Colarco, Peter
2016-01-01
There are clear NWP and climate impacts from including aerosol radiative and cloud interactions. Changes in dynamics and cloud fields affect aerosol lifecycle, plume height, long-range transport, overall forcing of the climate system, etc. Inclusion of aerosols in NWP systems has benefit to surface field biases (e.g., T2m, U10m). Including aerosol affects has impact on analysis increments and can have statistically significant impacts on, e.g., tropical cyclogenesis. Above points are made especially with respect to aerosol radiative interactions, but aerosol-cloud interaction is a bigger signal on the global system. Many of these impacts are realized even in models with relatively simple (bulk) aerosol schemes (approx.10 -20 tracers). Simple schemes though imply simple representation of aerosol absorption and importantly for aerosol-cloud interaction particle-size distribution. Even so, more complex schemes exhibit a lot of diversity between different models, with issues such as size selection both for emitted particles and for modes. Prospects for complex sectional schemes to tune modal (and even bulk) schemes toward better selection of size representation. I think this is a ripe topic for more research -Systematic documentation of benefits of no vs. climatological vs. interactive (direct and then direct+indirect) aerosols. Document aerosol impact on analysis increments, inclusion in NWP data assimilation operator -Further refinement of baseline assumptions in model design (e.g., absorption, particle size distribution). Did not get into model resolution and interplay of other physical processes with aerosols (e.g., moist physics, obviously important), chemistry
Abreu, P C; Greenberg, D A; Hodge, S E
1999-09-01
Several methods have been proposed for linkage analysis of complex traits with unknown mode of inheritance. These methods include the LOD score maximized over disease models (MMLS) and the "nonparametric" linkage (NPL) statistic. In previous work, we evaluated the increase of type I error when maximizing over two or more genetic models, and we compared the power of MMLS to detect linkage, in a number of complex modes of inheritance, with analysis assuming the true model. In the present study, we compare MMLS and NPL directly. We simulated 100 data sets with 20 families each, using 26 generating models: (1) 4 intermediate models (penetrance of heterozygote between that of the two homozygotes); (2) 6 two-locus additive models; and (3) 16 two-locus heterogeneity models (admixture alpha = 1.0,.7,.5, and.3; alpha = 1.0 replicates simple Mendelian models). For LOD scores, we assumed dominant and recessive inheritance with 50% penetrance. We took the higher of the two maximum LOD scores and subtracted 0.3 to correct for multiple tests (MMLS-C). We compared expected maximum LOD scores and power, using MMLS-C and NPL as well as the true model. Since NPL uses only the affected family members, we also performed an affecteds-only analysis using MMLS-C. The MMLS-C was both uniformly more powerful than NPL for most cases we examined, except when linkage information was low, and close to the results for the true model under locus heterogeneity. We still found better power for the MMLS-C compared with NPL in affecteds-only analysis. The results show that use of two simple modes of inheritance at a fixed penetrance can have more power than NPL when the trait mode of inheritance is complex and when there is heterogeneity in the data set.
Advanced statistics: linear regression, part II: multiple linear regression.
Marill, Keith A
2004-01-01
The applications of simple linear regression in medical research are limited, because in most situations, there are multiple relevant predictor variables. Univariate statistical techniques such as simple linear regression use a single predictor variable, and they often may be mathematically correct but clinically misleading. Multiple linear regression is a mathematical technique used to model the relationship between multiple independent predictor variables and a single dependent outcome variable. It is used in medical research to model observational data, as well as in diagnostic and therapeutic studies in which the outcome is dependent on more than one factor. Although the technique generally is limited to data that can be expressed with a linear function, it benefits from a well-developed mathematical framework that yields unique solutions and exact confidence intervals for regression coefficients. Building on Part I of this series, this article acquaints the reader with some of the important concepts in multiple regression analysis. These include multicollinearity, interaction effects, and an expansion of the discussion of inference testing, leverage, and variable transformations to multivariate models. Examples from the first article in this series are expanded on using a primarily graphic, rather than mathematical, approach. The importance of the relationships among the predictor variables and the dependence of the multivariate model coefficients on the choice of these variables are stressed. Finally, concepts in regression model building are discussed.
[Prevalence of postmenopausal simple ovarian cyst diagnosed by ultrasound].
Luján Irastorza, Jesús E; Hernández Marín, Imelda; Figueroa Preciado, Gudelia; Ayala, Aquiles R
2006-10-01
The high-resolution ultrasound has taken to discover small ovary cysts in postmenopausal asymptomatic women who in another situation would not been detected; these cysts frequently disappear spontaneously and rarely develop cancer; however, they are treated aggressively. To know the prevalence, evolution and treatment of ovary simple cysts in the postmenopausal women in our department, since in our country there are not studies that had analyzed these data. We made a retrospective and descriptive study in the Service of Biology of the Human Reproduction of the Hospital Juarez de Mexico, in a four-year period (2000-2003) that included 1,010 postmenopausal women. The statistical analysis was made using the SPSS software program with which we obtained descriptive measurements in localization, dispersion and by a graphic analysis. We found a simple cysts prevalence of 8.2% (n = 83); the average of age at the diagnosis time was 50.76 years with a standard deviation of 5.55; the cysts diameter was between 0.614 to 12,883 cm with a mean and standard deviation of 2.542 and 1.91 cm respectively; in 27.71% of the cases (n = 23), the cysts disappear spontaneously in the follow up of 3 to 36 month (mean of 14.1). Surgery was indicated in 16.46% (n = 13), by increase in the size of the cyst in 9 patients (11.64%) and by changes in morphology from simple to complex in 4 (4.82%). Tumor like markers were made only to 37 patients (44.57%), which were in normal ranks; no carcinoma was found in this group. The prevalence of ovary simple cysts was similar to the reported in literature. Risk of cancer of these cysts is extremely low when a suitable evaluation is made, a reason why the conservative treatment is suggested when these are simple cysts lesser than 5cm with Ca-125 levels within normal ranks. We recommend a follow up every 3-6 months by Doppler color ultrasound and tumor like markers for five years.
Association factor analysis between osteoporosis with cerebral artery disease: The STROBE study.
Jin, Eun-Sun; Jeong, Je Hoon; Lee, Bora; Im, Soo Bin
2017-03-01
The purpose of this study was to determine the clinical association factors between osteoporosis and cerebral artery disease in Korean population. Two hundred nineteen postmenopausal women and men undergoing cerebral computed tomography angiography were enrolled in this study to evaluate the cerebral artery disease by cross-sectional study. Cerebral artery disease was diagnosed if there was narrowing of 50% higher diameter in one or more cerebral vessel artery or presence of vascular calcification. History of osteoporotic fracture was assessed using medical record, and radiographic data such as simple radiography, MRI, and bone scan. Bone mineral density was checked by dual-energy x-ray absorptiometry. We reviewed clinical characteristics in all patients and also performed subgroup analysis for total or extracranial/ intracranial cerebral artery disease group retrospectively. We performed statistical analysis by means of chi-square test or Fisher's exact test for categorical variables and Student's t-test or Wilcoxon's rank sum test for continuous variables. We also used univariate and multivariate logistic regression analyses were conducted to assess the factors associated with the prevalence of cerebral artery disease. A two-tailed p-value of less than 0.05 was considered as statistically significant. All statistical analyses were performed using R (version 3.1.3; The R Foundation for Statistical Computing, Vienna, Austria) and SPSS (version 14.0; SPSS, Inc, Chicago, Ill, USA). Of the 219 patients, 142 had cerebral artery disease. All vertebral fracture was observed in 29 (13.24%) patients. There was significant difference in hip fracture according to the presence or absence of cerebral artery disease. In logistic regression analysis, osteoporotic hip fracture was significantly associated with extracranial cerebral artery disease after adjusting for multiple risk factors. Females with osteoporotic hip fracture were associated with total calcified cerebral artery disease. Some clinical factors such as age, hypertension, and osteoporotic hip fracture, smoking history and anti-osteoporosis drug use were associated with cerebral artery disease.
Williams, Mobolaji
2018-01-01
The field of disordered systems in statistical physics provides many simple models in which the competing influences of thermal and nonthermal disorder lead to new phases and nontrivial thermal behavior of order parameters. In this paper, we add a model to the subject by considering a disordered system where the state space consists of various orderings of a list. As in spin glasses, the disorder of such "permutation glasses" arises from a parameter in the Hamiltonian being drawn from a distribution of possible values, thus allowing nominally "incorrect orderings" to have lower energies than "correct orderings" in the space of permutations. We analyze a Gaussian, uniform, and symmetric Bernoulli distribution of energy costs, and, by employing Jensen's inequality, derive a simple condition requiring the permutation glass to always transition to the correctly ordered state at a temperature lower than that of the nondisordered system, provided that this correctly ordered state is accessible. We in turn find that in order for the correctly ordered state to be accessible, the probability that an incorrectly ordered component is energetically favored must be less than the inverse of the number of components in the system. We show that all of these results are consistent with a replica symmetric ansatz of the system. We conclude by arguing that there is no distinct permutation glass phase for the simplest model considered here and by discussing how to extend the analysis to more complex Hamiltonians capable of novel phase behavior and replica symmetry breaking. Finally, we outline an apparent correspondence between the presented system and a discrete-energy-level fermion gas. In all, the investigation introduces a class of exactly soluble models into statistical mechanics and provides a fertile ground to investigate statistical models of disorder.
Di Donato, Violante; Kontopantelis, Evangelos; Aletti, Giovanni; Casorelli, Assunta; Piacenti, Ilaria; Bogani, Giorgio; Lecce, Francesca; Benedetti Panici, Pierluigi
2017-06-01
Primary cytoreductive surgery (PDS) followed by platinum-based chemotherapy is the cornerstone of treatment and the absence of residual tumor after PDS is universally considered the most important prognostic factor. The aim of the present analysis was to evaluate trend and predictors of 30-day mortality in patients undergoing primary cytoreduction for ovarian cancer. Literature was searched for records reporting 30-day mortality after PDS. All cohorts were rated for quality. Simple and multiple Poisson regression models were used to quantify the association between 30-day mortality and the following: overall or severe complications, proportion of patients with stage IV disease, median age, year of publication, and weighted surgical complexity index. Using the multiple regression model, we calculated the risk of perioperative mortality at different levels for statistically significant covariates of interest. Simple regression identified median age and proportion of patients with stage IV disease as statistically significant predictors of 30-day mortality. When included in the multiple Poisson regression model, both remained statistically significant, with an incidence rate ratio of 1.087 for median age and 1.017 for stage IV disease. Disease stage was a strong predictor, with the risk estimated to increase from 2.8% (95% confidence interval 2.02-3.66) for stage III to 16.1% (95% confidence interval 6.18-25.93) for stage IV, for a cohort with a median age of 65 years. Metaregression demonstrated that increased age and advanced clinical stage were independently associated with an increased risk of mortality, and the combined effects of both factors greatly increased the risk.
Statistical evaluation of stability data: criteria for change-over-time and data variability.
Bar, Raphael
2003-01-01
In a recently issued ICH Q1E guidance on evaluation of stability data of drug substances and products, the need to perform a statistical extrapolation of a shelf-life of a drug product or a retest period for a drug substance is based heavily on whether data exhibit a change-over-time and/or variability. However, this document suggests neither measures nor acceptance criteria of these two parameters. This paper demonstrates a useful application of simple statistical parameters for determining whether sets of stability data from either accelerated or long-term storage programs exhibit a change-over-time and/or variability. These parameters are all derived from a simple linear regression analysis first performed on the stability data. The p-value of the slope of the regression line is taken as a measure for change-over-time, and a value of 0.25 is suggested as a limit to insignificant change of the quantitative stability attributes monitored. The minimal process capability index, Cpk, calculated from the standard deviation of the regression line, is suggested as a measure for variability with a value of 2.5 as a limit for an insignificant variability. The usefulness of the above two parameters, p-value and Cpk, was demonstrated on stability data of a refrigerated drug product and on pooled data of three batches of a drug substance. In both cases, the determined parameters allowed characterization of the data in terms of change-over-time and variability. Consequently, complete evaluation of the stability data could be pursued according to the ICH guidance. It is believed that the application of the above two parameters with their acceptance criteria will allow a more unified evaluation of stability data.
Misyura, Maksym; Sukhai, Mahadeo A; Kulasignam, Vathany; Zhang, Tong; Kamel-Reid, Suzanne; Stockley, Tracy L
2018-02-01
A standard approach in test evaluation is to compare results of the assay in validation to results from previously validated methods. For quantitative molecular diagnostic assays, comparison of test values is often performed using simple linear regression and the coefficient of determination (R 2 ), using R 2 as the primary metric of assay agreement. However, the use of R 2 alone does not adequately quantify constant or proportional errors required for optimal test evaluation. More extensive statistical approaches, such as Bland-Altman and expanded interpretation of linear regression methods, can be used to more thoroughly compare data from quantitative molecular assays. We present the application of Bland-Altman and linear regression statistical methods to evaluate quantitative outputs from next-generation sequencing assays (NGS). NGS-derived data sets from assay validation experiments were used to demonstrate the utility of the statistical methods. Both Bland-Altman and linear regression were able to detect the presence and magnitude of constant and proportional error in quantitative values of NGS data. Deming linear regression was used in the context of assay comparison studies, while simple linear regression was used to analyse serial dilution data. Bland-Altman statistical approach was also adapted to quantify assay accuracy, including constant and proportional errors, and precision where theoretical and empirical values were known. The complementary application of the statistical methods described in this manuscript enables more extensive evaluation of performance characteristics of quantitative molecular assays, prior to implementation in the clinical molecular laboratory. © Article author(s) (or their employer(s) unless otherwise stated in the text of the article) 2018. All rights reserved. No commercial use is permitted unless otherwise expressly granted.
Model for neural signaling leap statistics
NASA Astrophysics Data System (ADS)
Chevrollier, Martine; Oriá, Marcos
2011-03-01
We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T = 37.5°C, awaken regime) and Lévy statistics (T = 35.5°C, sleeping period), characterized by rare events of long range connections.
Seasonal ENSO forecasting: Where does a simple model stand amongst other operational ENSO models?
NASA Astrophysics Data System (ADS)
Halide, Halmar
2017-01-01
We apply a simple linear multiple regression model called IndOzy for predicting ENSO up to 7 seasonal lead times. The model still used 5 (five) predictors of the past seasonal Niño 3.4 ENSO indices derived from chaos theory and it was rolling-validated to give a one-step ahead forecast. The model skill was evaluated against data from the season of May-June-July (MJJ) 2003 to November-December-January (NDJ) 2015/2016. There were three skill measures such as: Pearson correlation, RMSE, and Euclidean distance were used for forecast verification. The skill of this simple model was than compared to those of combined Statistical and Dynamical models compiled at the IRI (International Research Institute) website. It was found that the simple model was only capable of producing a useful ENSO prediction only up to 3 seasonal leads, while the IRI statistical and Dynamical model skill were still useful up to 4 and 6 seasonal leads, respectively. Even with its short-range seasonal prediction skills, however, the simple model still has a potential to give ENSO-derived tailored products such as probabilistic measures of precipitation and air temperature. Both meteorological conditions affect the presence of wild-land fire hot-spots in Sumatera and Kalimantan. It is suggested that to improve its long-range skill, the simple INDOZY model needs to incorporate a nonlinear model such as an artificial neural network technique.
NASA Astrophysics Data System (ADS)
Tanaka, H. L.
2003-06-01
In this study, a numerical simulation of the Arctic Oscillation (AO) is conducted using a simple barotropic model that considers the barotropic-baroclinic interactions as the external forcing. The model is referred to as a barotropic S model since the external forcing is obtained statistically from the long-term historical data, solving an inverse problem. The barotropic S model has been integrated for 51 years under a perpetual January condition and the dominant empirical orthogonal function (EOF) modes in the model have been analyzed. The results are compared with the EOF analysis of the barotropic component of the real atmosphere based on the daily NCEP-NCAR reanalysis for 50 yr from 1950 to 1999.According to the result, the first EOF of the model atmosphere appears to be the AO similar to the observation. The annular structure of the AO and the two centers of action at Pacific and Atlantic are simulated nicely by the barotropic S model. Therefore, the atmospheric low-frequency variabilities have been captured satisfactorily even by the simple barotropic model.The EOF analysis is further conducted to the external forcing of the barotropic S model. The structure of the dominant forcing shows the characteristics of synoptic-scale disturbances of zonal wavenumber 6 along the Pacific storm track. The forcing is induced by the barotropic-baroclinic interactions associated with baroclinic instability.The result suggests that the AO can be understood as the natural variability of the barotropic component of the atmosphere induced by the inherent barotropic dynamics, which is forced by the barotropic-baroclinic interactions. The fluctuating upscale energy cascade from planetary waves and synoptic disturbances to the zonal motion plays the key role for the excitation of the AO.
Wadley, Leven M; Keating, Kevin S; Duarte, Carlos M; Pyle, Anna Marie
2007-09-28
Quantitatively describing RNA structure and conformational elements remains a formidable problem. Seven standard torsion angles and the sugar pucker are necessary to characterize the conformation of an RNA nucleotide completely. Progress has been made toward understanding the discrete nature of RNA structure, but classifying simple and ubiquitous structural elements such as helices and motifs remains a difficult task. One approach for describing RNA structure in a simple, mathematically consistent, and computationally accessible manner involves the invocation of two pseudotorsions, eta (C4'(n-1), P(n), C4'(n), P(n+1)) and theta (P(n), C4'(n), P(n+1), C4'(n+1)), which can be used to describe RNA conformation in much the same way that varphi and psi are used to describe backbone configuration of proteins. Here, we conduct an exploration and statistical evaluation of pseudotorsional space and of the Ramachandran-like eta-theta plot. We show that, through the rigorous quantitative analysis of the eta-theta plot, the pseudotorsional descriptors eta and theta, together with sugar pucker, are sufficient to describe RNA backbone conformation fully in most cases. These descriptors are also shown to contain considerable information about nucleotide base conformation, revealing a previously uncharacterized interplay between backbone and base orientation. A window function analysis is used to discern statistically relevant regions of density in the eta-theta scatter plot and then nucleotides in colocalized clusters in the eta-theta plane are shown to have similar 3-D structures through RMSD analysis of the RNA structural constituents. We find that major clusters in the eta-theta plot are few, underscoring the discrete nature of RNA backbone conformation. Like the Ramachandran plot, the eta-theta plot is a valuable system for conceptualizing biomolecular conformation, it is a useful tool for analyzing RNA tertiary structures, and it is a vital component of new approaches for solving the 3-D structures of large RNA molecules and RNA assemblies.
NASA Astrophysics Data System (ADS)
Most, S.; Nowak, W.; Bijeljic, B.
2014-12-01
Transport processes in porous media are frequently simulated as particle movement. This process can be formulated as a stochastic process of particle position increments. At the pore scale, the geometry and micro-heterogeneities prohibit the commonly made assumption of independent and normally distributed increments to represent dispersion. Many recent particle methods seek to loosen this assumption. Recent experimental data suggest that we have not yet reached the end of the need to generalize, because particle increments show statistical dependency beyond linear correlation and over many time steps. The goal of this work is to better understand the validity regions of commonly made assumptions. We are investigating after what transport distances can we observe: A statistical dependence between increments, that can be modelled as an order-k Markov process, boils down to order 1. This would be the Markovian distance for the process, where the validity of yet-unexplored non-Gaussian-but-Markovian random walks would start. A bivariate statistical dependence that simplifies to a multi-Gaussian dependence based on simple linear correlation (validity of correlated PTRW). Complete absence of statistical dependence (validity of classical PTRW/CTRW). The approach is to derive a statistical model for pore-scale transport from a powerful experimental data set via copula analysis. The model is formulated as a non-Gaussian, mutually dependent Markov process of higher order, which allows us to investigate the validity ranges of simpler models.
Measuring Student and School Progress with the California API. CSE Technical Report.
ERIC Educational Resources Information Center
Thum, Yeow Meng
This paper focuses on interpreting the major conceptual features of California's Academic Performance Index (API) as a coherent set of statistical procedures. To facilitate a characterization of its statistical properties, the paper casts the index as a simple weighted average of the subjective worth of students' normative performance and presents…
A Critique of One-Tailed Hypothesis Test Procedures in Business and Economics Statistics Textbooks.
ERIC Educational Resources Information Center
Liu, Tung; Stone, Courtenay C.
1999-01-01
Surveys introductory business and economics statistics textbooks and finds that they differ over the best way to explain one-tailed hypothesis tests: the simple null-hypothesis approach or the composite null-hypothesis approach. Argues that the composite null-hypothesis approach contains methodological shortcomings that make it more difficult for…
Simple Data Sets for Distinct Basic Summary Statistics
ERIC Educational Resources Information Center
Lesser, Lawrence M.
2011-01-01
It is important to avoid ambiguity with numbers because unfortunate choices of numbers can inadvertently make it possible for students to form misconceptions or make it difficult for teachers to tell if students obtained the right answer for the right reason. Therefore, it is important to make sure when introducing basic summary statistics that…
NASA Technical Reports Server (NTRS)
Staubert, R.
1985-01-01
Methods for calculating the statistical significance of excess events and the interpretation of the formally derived values are discussed. It is argued that a simple formula for a conservative estimate should generally be used in order to provide a common understanding of quoted values.
Re-Thinking Statistics Education for Social Science Majors
ERIC Educational Resources Information Center
Reid, Howard M.; Mason, Susan E.
2008-01-01
Many college students majoring in the social sciences find the required statistics course to be dull as well as difficult. Further, they often do not retain much of the material, which limits their success in subsequent courses. We describe a few simple changes, the incorporation of which may enhance student learning. (Contains 1 table.)
Flipped Statistics Class Results: Better Performance than Lecture over One Year Later
ERIC Educational Resources Information Center
Winquist, Jennifer R.; Carlson, Keith A.
2014-01-01
In this paper, we compare an introductory statistics course taught using a flipped classroom approach to the same course taught using a traditional lecture based approach. In the lecture course, students listened to lecture, took notes, and completed homework assignments. In the flipped course, students read relatively simple chapters and answered…
Assistive Technologies for Second-Year Statistics Students Who Are Blind
ERIC Educational Resources Information Center
Erhardt, Robert J.; Shuman, Michael P.
2015-01-01
At Wake Forest University, a student who is blind enrolled in a second course in statistics. The course covered simple and multiple regression, model diagnostics, model selection, data visualization, and elementary logistic regression. These topics required that the student both interpret and produce three sets of materials: mathematical writing,…
Statistical and linguistic features of DNA sequences
NASA Technical Reports Server (NTRS)
Havlin, S.; Buldyrev, S. V.; Goldberger, A. L.; Mantegna, R. N.; Peng, C. K.; Simons, M.; Stanley, H. E.
1995-01-01
We present evidence supporting the idea that the DNA sequence in genes containing noncoding regions is correlated, and that the correlation is remarkably long range--indeed, base pairs thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationary" feature of the sequence of base pairs by applying a new algorithm called Detrended Fluctuation Analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and noncoding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to all eukaryotic DNA sequences (33 301 coding and 29 453 noncoding) in the entire GenBank database. We describe a simple model to account for the presence of long-range power-law correlations which is based upon a generalization of the classic Levy walk. Finally, we describe briefly some recent work showing that the noncoding sequences have certain statistical features in common with natural languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts, and the Shannon approach to quantifying the "redundancy" of a linguistic text in terms of a measurable entropy function. We suggest that noncoding regions in plants and invertebrates may display a smaller entropy and larger redundancy than coding regions, further supporting the possibility that noncoding regions of DNA may carry biological information.
Using statistical process control to make data-based clinical decisions.
Pfadt, A; Wheeler, D J
1995-01-01
Applied behavior analysis is based on an investigation of variability due to interrelationships among antecedents, behavior, and consequences. This permits testable hypotheses about the causes of behavior as well as for the course of treatment to be evaluated empirically. Such information provides corrective feedback for making data-based clinical decisions. This paper considers how a different approach to the analysis of variability based on the writings of Walter Shewart and W. Edwards Deming in the area of industrial quality control helps to achieve similar objectives. Statistical process control (SPC) was developed to implement a process of continual product improvement while achieving compliance with production standards and other requirements for promoting customer satisfaction. SPC involves the use of simple statistical tools, such as histograms and control charts, as well as problem-solving techniques, such as flow charts, cause-and-effect diagrams, and Pareto charts, to implement Deming's management philosophy. These data-analytic procedures can be incorporated into a human service organization to help to achieve its stated objectives in a manner that leads to continuous improvement in the functioning of the clients who are its customers. Examples are provided to illustrate how SPC procedures can be used to analyze behavioral data. Issues related to the application of these tools for making data-based clinical decisions and for creating an organizational climate that promotes their routine use in applied settings are also considered.
NASA Astrophysics Data System (ADS)
Iskandar, I.
2018-03-01
The exponential distribution is the most widely used reliability analysis. This distribution is very suitable for representing the lengths of life of many cases and is available in a simple statistical form. The characteristic of this distribution is a constant hazard rate. The exponential distribution is the lower rank of the Weibull distributions. In this paper our effort is to introduce the basic notions that constitute an exponential competing risks model in reliability analysis using Bayesian analysis approach and presenting their analytic methods. The cases are limited to the models with independent causes of failure. A non-informative prior distribution is used in our analysis. This model describes the likelihood function and follows with the description of the posterior function and the estimations of the point, interval, hazard function, and reliability. The net probability of failure if only one specific risk is present, crude probability of failure due to a specific risk in the presence of other causes, and partial crude probabilities are also included.
NASA Astrophysics Data System (ADS)
Lototzis, M.; Papadopoulos, G. K.; Droulia, F.; Tseliou, A.; Tsiros, I. X.
2018-04-01
There are several cases where a circular variable is associated with a linear one. A typical example is wind direction that is often associated with linear quantities such as air temperature and air humidity. The analysis of a statistical relationship of this kind can be tested by the use of parametric and non-parametric methods, each of which has its own advantages and drawbacks. This work deals with correlation analysis using both the parametric and the non-parametric procedure on a small set of meteorological data of air temperature and wind direction during a summer period in a Mediterranean climate. Correlations were examined between hourly, daily and maximum-prevailing values, under typical and non-typical meteorological conditions. Both tests indicated a strong correlation between mean hourly wind directions and mean hourly air temperature, whereas mean daily wind direction and mean daily air temperature do not seem to be correlated. In some cases, however, the two procedures were found to give quite dissimilar levels of significance on the rejection or not of the null hypothesis of no correlation. The simple statistical analysis presented in this study, appropriately extended in large sets of meteorological data, may be a useful tool for estimating effects of wind on local climate studies.
Takeuchi, Takeaki; Nakao, Mutsuhiro; Nishikitani, Mariko; Yano, Eiji
2004-07-01
This study aims to clarify the effects of stress perception and related social indicators on three major musculoskeletal symptoms: low back, shoulder, and joint pains in a Japanese population. Twenty health-related variables (stress perception and 19 social indicators) and the three symptoms were obtained from the following Japanese national surveys: the Comprehensive Survey of Living Condition of the People on Health and Welfare, the System of Social and Demographic Statistics of Japan, and the Statistical Report on Health Administration Services. The results were compared among 46 Japanese prefectures in 1995 and 2001. By factor analysis, the 19 indicators were classified into three factors of urbanization, aging and life-regularity, and individualization. The prevalence of stress perception was significantly correlated to the 8 indicators of urbanization factor. Although simple correlation analysis revealed a significant relationship of stress perception only to shoulder pain (in both years) and low back pain (in 2001), the results of multiple regression analysis showed that stress perception and some urbanization factors were significantly associated with all the three symptoms in both years exclusive of joint pain in 1995. Taking the effects of urbanization into consideration, stress perception seems to be closely related to the complaints of musculoskeletal symptoms in Japan.
Goodness-of-fit tests and model diagnostics for negative binomial regression of RNA sequencing data.
Mi, Gu; Di, Yanming; Schafer, Daniel W
2015-01-01
This work is about assessing model adequacy for negative binomial (NB) regression, particularly (1) assessing the adequacy of the NB assumption, and (2) assessing the appropriateness of models for NB dispersion parameters. Tools for the first are appropriate for NB regression generally; those for the second are primarily intended for RNA sequencing (RNA-Seq) data analysis. The typically small number of biological samples and large number of genes in RNA-Seq analysis motivate us to address the trade-offs between robustness and statistical power using NB regression models. One widely-used power-saving strategy, for example, is to assume some commonalities of NB dispersion parameters across genes via simple models relating them to mean expression rates, and many such models have been proposed. As RNA-Seq analysis is becoming ever more popular, it is appropriate to make more thorough investigations into power and robustness of the resulting methods, and into practical tools for model assessment. In this article, we propose simulation-based statistical tests and diagnostic graphics to address model adequacy. We provide simulated and real data examples to illustrate that our proposed methods are effective for detecting the misspecification of the NB mean-variance relationship as well as judging the adequacy of fit of several NB dispersion models.
NASA Astrophysics Data System (ADS)
Guillen, George; Rainey, Gail; Morin, Michelle
2004-04-01
Currently, the Minerals Management Service uses the Oil Spill Risk Analysis model (OSRAM) to predict the movement of potential oil spills greater than 1000 bbl originating from offshore oil and gas facilities. OSRAM generates oil spill trajectories using meteorological and hydrological data input from either actual physical measurements or estimates generated from other hydrological models. OSRAM and many other models produce output matrices of average, maximum and minimum contact probabilities to specific landfall or target segments (columns) from oil spills at specific points (rows). Analysts and managers are often interested in identifying geographic areas or groups of facilities that pose similar risks to specific targets or groups of targets if a spill occurred. Unfortunately, due to the potentially large matrix generated by many spill models, this question is difficult to answer without the use of data reduction and visualization methods. In our study we utilized a multivariate statistical method called cluster analysis to group areas of similar risk based on potential distribution of landfall target trajectory probabilities. We also utilized ArcView™ GIS to display spill launch point groupings. The combination of GIS and multivariate statistical techniques in the post-processing of trajectory model output is a powerful tool for identifying and delineating areas of similar risk from multiple spill sources. We strongly encourage modelers, statistical and GIS software programmers to closely collaborate to produce a more seamless integration of these technologies and approaches to analyzing data. They are complimentary methods that strengthen the overall assessment of spill risks.
Brief communication: Skeletal biology past and present: Are we moving in the right direction?
Hens, Samantha M; Godde, Kanya
2008-10-01
In 1982, Spencer's edited volume A History of American Physical Anthropology: 1930-1980 allowed numerous authors to document the state of our science, including a critical examination of skeletal biology. Some authors argued that the first 50 years of skeletal biology were characterized by the descriptive-historical approach with little regard for processual problems and that technological and statistical analyses were not rooted in theory. In an effort to determine whether Spencer's landmark volume impacted the field of skeletal biology, a content analysis was carried out for the American Journal of Physical Anthropology from 1980 to 2004. The percentage of skeletal biology articles is similar to that of previous decades. Analytical articles averaged only 32% and are defined by three criteria: statistical analysis, hypothesis testing, and broader explanatory context. However, when these criteria were scored individually, nearly 80% of papers attempted a broader theoretical explanation, 44% tested hypotheses, and 67% used advanced statistics, suggesting that the skeletal biology papers in the journal have an analytical emphasis. Considerable fluctuation exists between subfields; trends toward a more analytical approach are witnessed in the subfields of age/sex/stature/demography, skeletal maturation, anatomy, and nonhuman primate studies, which also increased in frequency, while paleontology and pathology were largely descriptive. Comparisons to the International Journal of Osteoarchaeology indicate that there are statistically significant differences between the two journals in terms of analytical criteria. These data indicate a positive shift in theoretical thinking, i.e., an attempt by most to explain processes rather than present a simple description of events.
Malacarne, Mario; Nardin, Tiziana; Bertoldi, Daniela; Nicolini, Giorgio; Larcher, Roberto
2016-09-01
Commercial tannins from several botanical sources and with different chemical and technological characteristics are used in the food and winemaking industries. Different ways to check their botanical authenticity have been studied in the last few years, through investigation of different analytical parameters. This work proposes a new, effective approach based on the quantification of 6 carbohydrates, 7 polyalcohols, and 55 phenols. 87 tannins from 12 different botanical sources were analysed following a very simple sample preparation procedure. Using Forward Stepwise Discriminant Analysis, 3 statistical models were created based on sugars content, phenols concentration and combination of the two classes of compounds for the 8 most abundant categories (i.e. oak, grape seed, grape skin, gall, chestnut, quebracho, tea and acacia). The last approach provided good results in attributing tannins to the correct botanical origin. Validation, repeated 3 times on subsets of 10% of samples, confirmed the reliability of this model. Copyright © 2016 Elsevier Ltd. All rights reserved.
Lee, Ying Li; Chien, Tsai Feng; Kuo, Ming Chuan; Chang, Polun
2014-01-01
This study aims to understand the relationship between participating nurses' motivation, achievement and satisfaction before and after they learned to program in Excel Visual Basic for Applications (Excel VBA). We held a workshop to train nurses in developing simple Excel VBA information systems to support their clinical or administrative practices. Before and after the workshop, the participants were evaluated on their knowledge of Excel VBA, and a questionnaire was given to survey their learning motivation and satisfaction. Statistics softwares Winsteps and SPSS were used for data analysis. Results show that the participants are more knowledgeable about VBA as well as more motivated in learning VBA after the workshop. Participants were highly satisfied with the overall arrangement of the workshop and instructors, but didn't have enough confidence in promoting the application of Excel VBA themselves. In addition, we were unable to predict the participants' achievement by their demographic characteristics or pre-test motivation level.
NASA Astrophysics Data System (ADS)
Suhir, E.
2014-05-01
The well known and widely used experimental reliability "passport" of a mass manufactured electronic or a photonic product — the bathtub curve — reflects the combined contribution of the statistics-related and reliability-physics (physics-of-failure)-related processes. When time progresses, the first process results in a decreasing failure rate, while the second process associated with the material aging and degradation leads to an increased failure rate. An attempt has been made in this analysis to assess the level of the reliability physics-related aging process from the available bathtub curve (diagram). It is assumed that the products of interest underwent the burn-in testing and therefore the obtained bathtub curve does not contain the infant mortality portion. It has been also assumed that the two random processes in question are statistically independent, and that the failure rate of the physical process can be obtained by deducting the theoretically assessed statistical failure rate from the bathtub curve ordinates. In the carried out numerical example, the Raleigh distribution for the statistical failure rate was used, for the sake of a relatively simple illustration. The developed methodology can be used in reliability physics evaluations, when there is a need to better understand the roles of the statistics-related and reliability-physics-related irreversible random processes in reliability evaluations. The future work should include investigations on how powerful and flexible methods and approaches of the statistical mechanics can be effectively employed, in addition to reliability physics techniques, to model the operational reliability of electronic and photonic products.
Improvement of a picking algorithm real-time P-wave detection by kurtosis
NASA Astrophysics Data System (ADS)
Ishida, H.; Yamada, M.
2016-12-01
Earthquake early warning (EEW) requires fast and accurate P-wave detection. The current EEW system in Japan uses the STA/LTAalgorithm (Allen, 1978) to detect P-wave arrival.However, some stations did not trigger during the 2011 Great Tohoku Earthquake due to the emergent onset. In addition, accuracy of the P-wave detection is very important: on August 1, 2016, the EEW issued a false alarm with M9 in Tokyo region due to a thunder noise.To solve these problems, we use a P-wave detection method using kurtosis statistics. It detects the change of statistic distribution of the waveform amplitude. This method was recently developed (Saragiotis et al., 2002) and used for off-line analysis such as making seismic catalogs. To apply this method for EEW, we need to remove an acausal calculation and enable a real-time processing. Here, we propose a real-time P-wave detection method using kurtosis statistics with a noise filter.To avoid false triggering by a noise, we incorporated a simple filter to classify seismic signal and noise. Following Kong et al. (2016), we used the interquartilerange and zero cross rate for the classification. The interquartile range is an amplitude measure that is equal to the middle 50% of amplitude in a certain time window. The zero cross rate is a simple frequency measure that counts the number of times that the signal crosses baseline zero. A discriminant function including these measures was constructed by the linear discriminant analysis.To test this kurtosis method, we used strong motion records for 62 earthquakes between April, 2005 and July, 2015, which recorded the seismic intensity greater equal to 6 lower in the JMA intensity scale. The records with hypocentral distance < 200km were used for the analysis. An attached figure shows the error of P-wave detection speed for STA/LTA and kurtosis methods against manual picks. It shows that the median error is 0.13 sec and 0.035 sec for STA/LTA and kurtosis method. The kurtosis method tends to be more sensitive to small changes in amplitude.Our approach will contribute to improve the accuracy of source location determination of earthquakes and improve the shaking intensity estimation for an earthquake early warning.
Appplication of statistical mechanical methods to the modeling of social networks
NASA Astrophysics Data System (ADS)
Strathman, Anthony Robert
With the recent availability of large-scale social data sets, social networks have become open to quantitative analysis via the methods of statistical physics. We examine the statistical properties of a real large-scale social network, generated from cellular phone call-trace logs. We find this network, like many other social networks to be assortative (r = 0.31) and clustered (i.e., strongly transitive, C = 0.21). We measure fluctuation scaling to identify the presence of internal structure in the network and find that structural inhomogeneity effectively disappears at the scale of a few hundred nodes, though there is no sharp cutoff. We introduce an agent-based model of social behavior, designed to model the formation and dissolution of social ties. The model is a modified Metropolis algorithm containing agents operating under the basic sociological constraints of reciprocity, communication need and transitivity. The model introduces the concept of a social temperature. We go on to show that this simple model reproduces the global statistical network features (incl. assortativity, connected fraction, mean degree, clustering, and mean shortest path length) of the real network data and undergoes two phase transitions, one being from a "gas" to a "liquid" state and the second from a liquid to a glassy state as function of this social temperature.
Exact goodness-of-fit tests for Markov chains.
Besag, J; Mondal, D
2013-06-01
Goodness-of-fit tests are useful in assessing whether a statistical model is consistent with available data. However, the usual χ² asymptotics often fail, either because of the paucity of the data or because a nonstandard test statistic is of interest. In this article, we describe exact goodness-of-fit tests for first- and higher order Markov chains, with particular attention given to time-reversible ones. The tests are obtained by conditioning on the sufficient statistics for the transition probabilities and are implemented by simple Monte Carlo sampling or by Markov chain Monte Carlo. They apply both to single and to multiple sequences and allow a free choice of test statistic. Three examples are given. The first concerns multiple sequences of dry and wet January days for the years 1948-1983 at Snoqualmie Falls, Washington State, and suggests that standard analysis may be misleading. The second one is for a four-state DNA sequence and lends support to the original conclusion that a second-order Markov chain provides an adequate fit to the data. The last one is six-state atomistic data arising in molecular conformational dynamics simulation of solvated alanine dipeptide and points to strong evidence against a first-order reversible Markov chain at 6 picosecond time steps. © 2013, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Pollard, David; Chang, Won; Haran, Murali; Applegate, Patrick; DeConto, Robert
2016-05-01
A 3-D hybrid ice-sheet model is applied to the last deglacial retreat of the West Antarctic Ice Sheet over the last ˜ 20 000 yr. A large ensemble of 625 model runs is used to calibrate the model to modern and geologic data, including reconstructed grounding lines, relative sea-level records, elevation-age data and uplift rates, with an aggregate score computed for each run that measures overall model-data misfit. Two types of statistical methods are used to analyze the large-ensemble results: simple averaging weighted by the aggregate score, and more advanced Bayesian techniques involving Gaussian process-based emulation and calibration, and Markov chain Monte Carlo. The analyses provide sea-level-rise envelopes with well-defined parametric uncertainty bounds, but the simple averaging method only provides robust results with full-factorial parameter sampling in the large ensemble. Results for best-fit parameter ranges and envelopes of equivalent sea-level rise with the simple averaging method agree well with the more advanced techniques. Best-fit parameter ranges confirm earlier values expected from prior model tuning, including large basal sliding coefficients on modern ocean beds.
Demonstrating microbial co-occurrence pattern analyses within and between ecosystems
Williams, Ryan J.; Howe, Adina; Hofmockel, Kirsten S.
2014-01-01
Co-occurrence patterns are used in ecology to explore interactions between organisms and environmental effects on coexistence within biological communities. Analysis of co-occurrence patterns among microbial communities has ranged from simple pairwise comparisons between all community members to direct hypothesis testing between focal species. However, co-occurrence patterns are rarely studied across multiple ecosystems or multiple scales of biological organization within the same study. Here we outline an approach to produce co-occurrence analyses that are focused at three different scales: co-occurrence patterns between ecosystems at the community scale, modules of co-occurring microorganisms within communities, and co-occurring pairs within modules that are nested within microbial communities. To demonstrate our co-occurrence analysis approach, we gathered publicly available 16S rRNA amplicon datasets to compare and contrast microbial co-occurrence at different taxonomic levels across different ecosystems. We found differences in community composition and co-occurrence that reflect environmental filtering at the community scale and consistent pairwise occurrences that may be used to infer ecological traits about poorly understood microbial taxa. However, we also found that conclusions derived from applying network statistics to microbial relationships can vary depending on the taxonomic level chosen and criteria used to build co-occurrence networks. We present our statistical analysis and code for public use in analysis of co-occurrence patterns across microbial communities. PMID:25101065
Statistical strategies for averaging EC50 from multiple dose-response experiments.
Jiang, Xiaoqi; Kopp-Schneider, Annette
2015-11-01
In most dose-response studies, repeated experiments are conducted to determine the EC50 value for a chemical, requiring averaging EC50 estimates from a series of experiments. Two statistical strategies, the mixed-effect modeling and the meta-analysis approach, can be applied to estimate average behavior of EC50 values over all experiments by considering the variabilities within and among experiments. We investigated these two strategies in two common cases of multiple dose-response experiments in (a) complete and explicit dose-response relationships are observed in all experiments and in (b) only in a subset of experiments. In case (a), the meta-analysis strategy is a simple and robust method to average EC50 estimates. In case (b), all experimental data sets can be first screened using the dose-response screening plot, which allows visualization and comparison of multiple dose-response experimental results. As long as more than three experiments provide information about complete dose-response relationships, the experiments that cover incomplete relationships can be excluded from the meta-analysis strategy of averaging EC50 estimates. If there are only two experiments containing complete dose-response information, the mixed-effects model approach is suggested. We subsequently provided a web application for non-statisticians to implement the proposed meta-analysis strategy of averaging EC50 estimates from multiple dose-response experiments.
Docking and multivariate methods to explore HIV-1 drug-resistance: a comparative analysis
NASA Astrophysics Data System (ADS)
Almerico, Anna Maria; Tutone, Marco; Lauria, Antonino
2008-05-01
In this paper we describe a comparative analysis between multivariate and docking methods in the study of the drug resistance to the reverse transcriptase and the protease inhibitors. In our early papers we developed a simple but efficient method to evaluate the features of compounds that are less likely to trigger resistance or are effective against mutant HIV strains, using the multivariate statistical procedures PCA and DA. In the attempt to create a more solid background for the prediction of susceptibility or resistance, we carried out a comparative analysis between our previous multivariate approach and molecular docking study. The intent of this paper is not only to find further support to the results obtained by the combined use of PCA and DA, but also to evidence the structural features, in terms of molecular descriptors, similarity, and energetic contributions, derived from docking, which can account for the arising of drug-resistance against mutant strains.
Protecting Privacy of Shared Epidemiologic Data without Compromising Analysis Potential
Cologne, John; Grant, Eric J.; Nakashima, Eiji; ...
2012-01-01
Objective . Ensuring privacy of research subjects when epidemiologic data are shared with outside collaborators involves masking (modifying) the data, but overmasking can compromise utility (analysis potential). Methods of statistical disclosure control for protecting privacy may be impractical for individual researchers involved in small-scale collaborations. Methods . We investigated a simple approach based on measures of disclosure risk and analytical utility that are straightforward for epidemiologic researchers to derive. The method is illustrated using data from the Japanese Atomic-bomb Survivor population. Results . Masking by modest rounding did not adequately enhance security but rounding to remove several digits of relativemore » accuracy effectively reduced the risk of identification without substantially reducing utility. Grouping or adding random noise led to noticeable bias. Conclusions . When sharing epidemiologic data, it is recommended that masking be performed using rounding. Specific treatment should be determined separately in individual situations after consideration of the disclosure risks and analysis needs.« less
GPS baseline configuration design based on robustness analysis
NASA Astrophysics Data System (ADS)
Yetkin, M.; Berber, M.
2012-11-01
The robustness analysis results obtained from a Global Positioning System (GPS) network are dramatically influenced by the configuration
Protecting Privacy of Shared Epidemiologic Data without Compromising Analysis Potential
Cologne, John; Grant, Eric J.; Nakashima, Eiji; Chen, Yun; Funamoto, Sachiyo; Katayama, Hiroaki
2012-01-01
Objective. Ensuring privacy of research subjects when epidemiologic data are shared with outside collaborators involves masking (modifying) the data, but overmasking can compromise utility (analysis potential). Methods of statistical disclosure control for protecting privacy may be impractical for individual researchers involved in small-scale collaborations. Methods. We investigated a simple approach based on measures of disclosure risk and analytical utility that are straightforward for epidemiologic researchers to derive. The method is illustrated using data from the Japanese Atomic-bomb Survivor population. Results. Masking by modest rounding did not adequately enhance security but rounding to remove several digits of relative accuracy effectively reduced the risk of identification without substantially reducing utility. Grouping or adding random noise led to noticeable bias. Conclusions. When sharing epidemiologic data, it is recommended that masking be performed using rounding. Specific treatment should be determined separately in individual situations after consideration of the disclosure risks and analysis needs. PMID:22505949
Protecting privacy of shared epidemiologic data without compromising analysis potential.
Cologne, John; Grant, Eric J; Nakashima, Eiji; Chen, Yun; Funamoto, Sachiyo; Katayama, Hiroaki
2012-01-01
Ensuring privacy of research subjects when epidemiologic data are shared with outside collaborators involves masking (modifying) the data, but overmasking can compromise utility (analysis potential). Methods of statistical disclosure control for protecting privacy may be impractical for individual researchers involved in small-scale collaborations. We investigated a simple approach based on measures of disclosure risk and analytical utility that are straightforward for epidemiologic researchers to derive. The method is illustrated using data from the Japanese Atomic-bomb Survivor population. Masking by modest rounding did not adequately enhance security but rounding to remove several digits of relative accuracy effectively reduced the risk of identification without substantially reducing utility. Grouping or adding random noise led to noticeable bias. When sharing epidemiologic data, it is recommended that masking be performed using rounding. Specific treatment should be determined separately in individual situations after consideration of the disclosure risks and analysis needs.
Statistical Mechanics of the US Supreme Court
NASA Astrophysics Data System (ADS)
Lee, Edward D.; Broedersz, Chase P.; Bialek, William
2015-07-01
We build simple models for the distribution of voting patterns in a group, using the Supreme Court of the United States as an example. The maximum entropy model consistent with the observed pairwise correlations among justices' votes, an Ising spin glass, agrees quantitatively with the data. While all correlations (perhaps surprisingly) are positive, the effective pairwise interactions in the spin glass model have both signs, recovering the intuition that ideologically opposite justices negatively influence each another. Despite the competing interactions, a strong tendency toward unanimity emerges from the model, organizing the voting patterns in a relatively simple "energy landscape." Besides unanimity, other energy minima in this landscape, or maxima in probability, correspond to prototypical voting states, such as the ideological split or a tightly correlated, conservative core. The model correctly predicts the correlation of justices with the majority and gives us a measure of their influence on the majority decision. These results suggest that simple models, grounded in statistical physics, can capture essential features of collective decision making quantitatively, even in a complex political context.
Missing CD4+ cell response in randomized clinical trials of maraviroc and dolutegravir.
Cuffe, Robert; Barnett, Carly; Granier, Catherine; Machida, Mitsuaki; Wang, Cunshan; Roger, James
2015-10-01
Missing data can compromise inferences from clinical trials, yet the topic has received little attention in the clinical trial community. Shortcomings in commonly used methods used to analyze studies with missing data (complete case, last- or baseline-observation carried forward) have been highlighted in a recent Food and Drug Administration-sponsored report. This report recommends how to mitigate the issues associated with missing data. We present an example of the proposed concepts using data from recent clinical trials. CD4+ cell count data from the previously reported SINGLE and MOTIVATE studies of dolutegravir and maraviroc were analyzed using a variety of statistical methods to explore the impact of missing data. Four methodologies were used: complete case analysis, simple imputation, mixed models for repeated measures, and multiple imputation. We compared the sensitivity of conclusions to the volume of missing data and to the assumptions underpinning each method. Rates of missing data were greater in the MOTIVATE studies (35%-68% premature withdrawal) than in SINGLE (12%-20%). The sensitivity of results to assumptions about missing data was related to volume of missing data. Estimates of treatment differences by various analysis methods ranged across a 61 cells/mm3 window in MOTIVATE and a 22 cells/mm3 window in SINGLE. Where missing data are anticipated, analyses require robust statistical and clinical debate of the necessary but unverifiable underlying statistical assumptions. Multiple imputation makes these assumptions transparent, can accommodate a broad range of scenarios, and is a natural analysis for clinical trials in HIV with missing data.