statistical normal distribution: Topics by Science.gov

Sample records for statistical normal distribution

Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

PubMed

Chou, C P; Bentler, P M; Satorra, A

1991-11-01

Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.
Log Normal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of Alpha Particle Track Autoradiography

PubMed Central

Neti, Prasad V.S.V.; Howell, Roger W.

2008-01-01

Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log normal distribution function (J Nucl Med 47, 6 (2006) 1049-1058) with the aid of an autoradiographic approach. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analyses of these data. Methods The measured distributions of alpha particle tracks per cell were subjected to statistical tests with Poisson (P), log normal (LN), and Poisson – log normal (P – LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL 210Po-citrate. When cells were exposed to 67 kBq/mL, the P – LN distribution function gave a better fit, however, the underlying activity distribution remained log normal. Conclusions The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:16741316
Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

PubMed

Austin, Peter C; Steyerberg, Ewout W

2012-06-20

When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.
Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

PubMed

Wu, Hao

2018-05-01

In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.
Application of a truncated normal failure distribution in reliability testing

NASA Technical Reports Server (NTRS)

Groves, C., Jr.

1968-01-01

Statistical truncated normal distribution function is applied as a time-to-failure distribution function in equipment reliability estimations. Age-dependent characteristics of the truncated function provide a basis for formulating a system of high-reliability testing that effectively merges statistical, engineering, and cost considerations.
Understanding a Normal Distribution of Data.

PubMed

Maltenfort, Mitchell G

2015-12-01

Assuming data follow a normal distribution is essential for many common statistical tests. However, what are normal data and when can we assume that a data set follows this distribution? What can be done to analyze non-normal data?
Tools for Basic Statistical Analysis

NASA Technical Reports Server (NTRS)

Luz, Paul L.

2005-01-01

Statistical Analysis Toolset is a collection of eight Microsoft Excel spreadsheet programs, each of which performs calculations pertaining to an aspect of statistical analysis. These programs present input and output data in user-friendly, menu-driven formats, with automatic execution. The following types of calculations are performed: Descriptive statistics are computed for a set of data x(i) (i = 1, 2, 3 . . . ) entered by the user. Normal Distribution Estimates will calculate the statistical value that corresponds to cumulative probability values, given a sample mean and standard deviation of the normal distribution. Normal Distribution from two Data Points will extend and generate a cumulative normal distribution for the user, given two data points and their associated probability values. Two programs perform two-way analysis of variance (ANOVA) with no replication or generalized ANOVA for two factors with four levels and three repetitions. Linear Regression-ANOVA will curvefit data to the linear equation y=f(x) and will do an ANOVA to check its significance.
Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable

PubMed Central

2012-01-01

Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998
EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

PubMed

Tong, Xiaoxiao; Bentler, Peter M

2013-01-01

Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.
Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects

ERIC Educational Resources Information Center

Ho, Andrew D.; Yu, Carol C.

2015-01-01

Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological…
CAN'T MISS--conquer any number task by making important statistics simple. Part 2. Probability, populations, samples, and normal distributions.

PubMed

Hansen, John P

2003-01-01

Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 2, describes probability, populations, and samples. The uses of descriptive and inferential statistics are outlined. The article also discusses the properties and probability of normal distributions, including the standard normal distribution.
Normal versus Noncentral Chi-Square Asymptotics of Misspecified Models

ERIC Educational Resources Information Center

Chun, So Yeon; Shapiro, Alexander

2009-01-01

The noncentral chi-square approximation of the distribution of the likelihood ratio (LR) test statistic is a critical part of the methodology in structural equation modeling. Recently, it was argued by some authors that in certain situations normal distributions may give a better approximation of the distribution of the LR test statistic. The main…
Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects.

PubMed

Ho, Andrew D; Yu, Carol C

2015-06-01

Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.
Normal Approximations to the Distributions of the Wilcoxon Statistics: Accurate to What "N"? Graphical Insights

ERIC Educational Resources Information Center

Bellera, Carine A.; Julien, Marilyse; Hanley, James A.

2010-01-01

The Wilcoxon statistics are usually taught as nonparametric alternatives for the 1- and 2-sample Student-"t" statistics in situations where the data appear to arise from non-normal distributions, or where sample sizes are so small that we cannot check whether they do. In the past, critical values, based on exact tail areas, were…
Determination of statistics for any rotation of axes of a bivariate normal elliptical distribution. [of wind vector components

NASA Technical Reports Server (NTRS)

Falls, L. W.; Crutcher, H. L.

1976-01-01

Transformation of statistics from a dimensional set to another dimensional set involves linear functions of the original set of statistics. Similarly, linear functions will transform statistics within a dimensional set such that the new statistics are relevant to a new set of coordinate axes. A restricted case of the latter is the rotation of axes in a coordinate system involving any two correlated random variables. A special case is the transformation for horizontal wind distributions. Wind statistics are usually provided in terms of wind speed and direction (measured clockwise from north) or in east-west and north-south components. A direct application of this technique allows the determination of appropriate wind statistics parallel and normal to any preselected flight path of a space vehicle. Among the constraints for launching space vehicles are critical values selected from the distribution of the expected winds parallel to and normal to the flight path. These procedures are applied to space vehicle launches at Cape Kennedy, Florida.
A spatial scan statistic for survival data based on Weibull distribution.

PubMed

Bhatt, Vijaya; Tiwari, Neeraj

2014-05-20

The spatial scan statistic has been developed as a geographical cluster detection analysis tool for different types of data sets such as Bernoulli, Poisson, ordinal, normal and exponential. We propose a scan statistic for survival data based on Weibull distribution. It may also be used for other survival distributions, such as exponential, gamma, and log normal. The proposed method is applied on the survival data of tuberculosis patients for the years 2004-2005 in Nainital district of Uttarakhand, India. Simulation studies reveal that the proposed method performs well for different survival distribution functions. Copyright © 2013 John Wiley & Sons, Ltd.
Asymptotic Distribution of the Likelihood Ratio Test Statistic for Sphericity of Complex Multivariate Normal Distribution.

DTIC Science & Technology

1981-08-01

RATIO TEST STATISTIC FOR SPHERICITY OF COMPLEX MULTIVARIATE NORMAL DISTRIBUTION* C. Fang P. R. Krishnaiah B. N. Nagarsenker** August 1981 Technical...and their applications in time sEries, the reader is referred to Krishnaiah (1976). Motivated by the applications in the area of inference on multiple...for practical purposes. Here, we note that Krishnaiah , Lee and Chang (1976) approxi- mated the null distribution of certain power of the likeli
Statistical Power Analysis with Microsoft Excel: Normal Tests for One or Two Means as a Prelude to Using Non-Central Distributions to Calculate Power

ERIC Educational Resources Information Center

Texeira, Antonio; Rosa, Alvaro; Calapez, Teresa

2009-01-01

This article presents statistical power analysis (SPA) based on the normal distribution using Excel, adopting textbook and SPA approaches. The objective is to present the latter in a comparative way within a framework that is familiar to textbook level readers, as a first step to understand SPA with other distributions. The analysis focuses on the…
Statistical Characterization of the Mechanical Parameters of Intact Rock Under Triaxial Compression: An Experimental Proof of the Jinping Marble

NASA Astrophysics Data System (ADS)

Jiang, Quan; Zhong, Shan; Cui, Jie; Feng, Xia-Ting; Song, Leibo

2016-12-01

We investigated the statistical characteristics and probability distribution of the mechanical parameters of natural rock using triaxial compression tests. Twenty cores of Jinping marble were tested under each different levels of confining stress (i.e., 5, 10, 20, 30, and 40 MPa). From these full stress-strain data, we summarized the numerical characteristics and determined the probability distribution form of several important mechanical parameters, including deformational parameters, characteristic strength, characteristic strains, and failure angle. The statistical proofs relating to the mechanical parameters of rock presented new information about the marble's probabilistic distribution characteristics. The normal and log-normal distributions were appropriate for describing random strengths of rock; the coefficients of variation of the peak strengths had no relationship to the confining stress; the only acceptable random distribution for both Young's elastic modulus and Poisson's ratio was the log-normal function; and the cohesive strength had a different probability distribution pattern than the frictional angle. The triaxial tests and statistical analysis also provided experimental evidence for deciding the minimum reliable number of experimental sample and for picking appropriate parameter distributions to use in reliability calculations for rock engineering.
Robustness of S1 statistic with Hodges-Lehmann for skewed distributions

NASA Astrophysics Data System (ADS)

Ahad, Nor Aishah; Yahaya, Sharipah Soaad Syed; Yin, Lee Ping

2016-10-01

Analysis of variance (ANOVA) is a common use parametric method to test the differences in means for more than two groups when the populations are normally distributed. ANOVA is highly inefficient under the influence of non- normal and heteroscedastic settings. When the assumptions are violated, researchers are looking for alternative such as Kruskal-Wallis under nonparametric or robust method. This study focused on flexible method, S1 statistic for comparing groups using median as the location estimator. S1 statistic was modified by substituting the median with Hodges-Lehmann and the default scale estimator with the variance of Hodges-Lehmann and MADn to produce two different test statistics for comparing groups. Bootstrap method was used for testing the hypotheses since the sampling distributions of these modified S1 statistics are unknown. The performance of the proposed statistic in terms of Type I error was measured and compared against the original S1 statistic, ANOVA and Kruskal-Wallis. The propose procedures show improvement compared to the original statistic especially under extremely skewed distribution.

Plasma Electrolyte Distributions in Humans-Normal or Skewed?

PubMed

Feldman, Mark; Dickson, Beverly

2017-11-01

It is widely believed that plasma electrolyte levels are normally distributed. Statistical tests and calculations using plasma electrolyte data are often reported based on this assumption of normality. Examples include t tests, analysis of variance, correlations and confidence intervals. The purpose of our study was to determine whether plasma sodium (Na + ), potassium (K + ), chloride (Cl - ) and bicarbonate [Formula: see text] distributions are indeed normally distributed. We analyzed plasma electrolyte data from 237 consecutive adults (137 women and 100 men) who had normal results on a standard basic metabolic panel which included plasma electrolyte measurements. The skewness of each distribution (as a measure of its asymmetry) was compared to the zero skewness of a normal (Gaussian) distribution. The plasma Na + distribution was skewed slightly to the right, but the skew was not significantly different from zero skew. The plasma Cl - distribution was skewed slightly to the left, but again the skew was not significantly different from zero skew. On the contrary, both the plasma K + and [Formula: see text] distributions were significantly skewed to the right (P < 0.01 zero skew). There was also a suggestion from examining frequency distribution curves that K + and [Formula: see text] distributions were bimodal. In adults with a normal basic metabolic panel, plasma potassium and bicarbonate levels are not normally distributed and may be bimodal. Thus, statistical methods to evaluate these 2 plasma electrolytes should be nonparametric tests and not parametric ones that require a normal distribution. Copyright © 2017 Southern Society for Clinical Investigation. Published by Elsevier Inc. All rights reserved.
Notes on power of normality tests of error terms in regression models

DOE Office of Scientific and Technical Information (OSTI.GOV)

Střelec, Luboš

2015-03-10

Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less
Statistical analysis of the 70 meter antenna surface distortions

NASA Technical Reports Server (NTRS)

Kiedron, K.; Chian, C. T.; Chuang, K. L.

1987-01-01

Statistical analysis of surface distortions of the 70 meter NASA/JPL antenna, located at Goldstone, was performed. The purpose of this analysis is to verify whether deviations due to gravity loading can be treated as quasi-random variables with normal distribution. Histograms of the RF pathlength error distribution for several antenna elevation positions were generated. The results indicate that the deviations from the ideal antenna surface are not normally distributed. The observed density distribution for all antenna elevation angles is taller and narrower than the normal density, which results in large positive values of kurtosis and a significant amount of skewness. The skewness of the distribution changes from positive to negative as the antenna elevation changes from zenith to horizon.
Smooth quantile normalization.

PubMed

Hicks, Stephanie C; Okrah, Kwame; Paulson, Joseph N; Quackenbush, John; Irizarry, Rafael A; Bravo, Héctor Corrada

2018-04-01

Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
Statistical analysis of multivariate atmospheric variables. [cloud cover

NASA Technical Reports Server (NTRS)

Tubbs, J. D.

1979-01-01

Topics covered include: (1) estimation in discrete multivariate distributions; (2) a procedure to predict cloud cover frequencies in the bivariate case; (3) a program to compute conditional bivariate normal parameters; (4) the transformation of nonnormal multivariate to near-normal; (5) test of fit for the extreme value distribution based upon the generalized minimum chi-square; (6) test of fit for continuous distributions based upon the generalized minimum chi-square; (7) effect of correlated observations on confidence sets based upon chi-square statistics; and (8) generation of random variates from specified distributions.
Analysis of vector wind change with respect to time for Cape Kennedy, Florida: Wind aloft profile change vs. time, phase 1

NASA Technical Reports Server (NTRS)

Adelfang, S. I.

1977-01-01

Wind vector change with respect to time at Cape Kennedy, Florida, is examined according to the theory of multivariate normality. The joint distribution of the four variables represented by the components of the wind vector at an initial time and after a specified elapsed time is hypothesized to be quadravariate normal; the fourteen statistics of this distribution, calculated from fifteen years of twice daily Rawinsonde data are presented by monthly reference periods for each month from 0 to 27 km. The hypotheses that the wind component changes with respect to time is univariate normal, the joint distribution of wind component changes is bivariate normal, and the modulus of vector wind change is Rayleigh, has been tested by comparison with observed distributions. Statistics of the conditional bivariate normal distributions of vector wind at a future time given the vector wind at an initial time are derived. Wind changes over time periods from one to five hours, calculated from Jimsphere data, are presented.
Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

PubMed

Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

2017-01-01

If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.
Statistical characteristics of the spatial distribution of territorial contamination by radionuclides from the Chernobyl accident

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arutyunyan, R.V.; Bol`shov, L.A.; Vasil`ev, S.K.

1994-06-01

The objective of this study was to clarify a number of issues related to the spatial distribution of contaminants from the Chernobyl accident. The effects of local statistics were addressed by collecting and analyzing (for Cesium 137) soil samples from a number of regions, and it was found that sample activity differed by a factor of 3-5. The effect of local non-uniformity was estimated by modeling the distribution of the average activity of a set of five samples for each of the regions, with the spread in the activities for a {+-}2 range being equal to 25%. The statistical characteristicsmore » of the distribution of contamination were then analyzed and found to be a log-normal distribution with the standard deviation being a function of test area. All data for the Bryanskaya Oblast area were analyzed statistically and were adequately described by a log-normal function.« less
Modeling Error Distributions of Growth Curve Models through Bayesian Methods

ERIC Educational Resources Information Center

Zhang, Zhiyong

2016-01-01

Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is…
Computation of distribution of minimum resolution for log-normal distribution of chromatographic peak heights.

PubMed

Davis, Joe M

2011-10-28

General equations are derived for the distribution of minimum resolution between two chromatographic peaks, when peak heights in a multi-component chromatogram follow a continuous statistical distribution. The derivation draws on published theory by relating the area under the distribution of minimum resolution to the area under the distribution of the ratio of peak heights, which in turn is derived from the peak-height distribution. Two procedures are proposed for the equations' numerical solution. The procedures are applied to the log-normal distribution, which recently was reported to describe the distribution of component concentrations in three complex natural mixtures. For published statistical parameters of these mixtures, the distribution of minimum resolution is similar to that for the commonly assumed exponential distribution of peak heights used in statistical-overlap theory. However, these two distributions of minimum resolution can differ markedly, depending on the scale parameter of the log-normal distribution. Theory for the computation of the distribution of minimum resolution is extended to other cases of interest. With the log-normal distribution of peak heights as an example, the distribution of minimum resolution is computed when small peaks are lost due to noise or detection limits, and when the height of at least one peak is less than an upper limit. The distribution of minimum resolution shifts slightly to lower resolution values in the first case and to markedly larger resolution values in the second one. The theory and numerical procedure are confirmed by Monte Carlo simulation. Copyright © 2011 Elsevier B.V. All rights reserved.
Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

ERIC Educational Resources Information Center

Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

2002-01-01

Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…
Is Middle-Upper Arm Circumference "normally" distributed? Secondary data analysis of 852 nutrition surveys.

PubMed

Frison, Severine; Checchi, Francesco; Kerac, Marko; Nicholas, Jennifer

2016-01-01

Wasting is a major public health issue throughout the developing world. Out of the 6.9 million estimated deaths among children under five annually, over 800,000 deaths (11.6 %) are attributed to wasting. Wasting is quantified as low Weight-For-Height (WFH) and/or low Mid-Upper Arm Circumference (MUAC) (since 2005). Many statistical procedures are based on the assumption that the data used are normally distributed. Analyses have been conducted on the distribution of WFH but there are no equivalent studies on the distribution of MUAC. This secondary data analysis assesses the normality of the MUAC distributions of 852 nutrition cross-sectional survey datasets of children from 6 to 59 months old and examines different approaches to normalise "non-normal" distributions. The distribution of MUAC showed no departure from a normal distribution in 319 (37.7 %) distributions using the Shapiro-Wilk test. Out of the 533 surveys showing departure from a normal distribution, 183 (34.3 %) were skewed (D'Agostino test) and 196 (36.8 %) had a kurtosis different to the one observed in the normal distribution (Anscombe-Glynn test). Testing for normality can be sensitive to data quality, design effect and sample size. Out of the 533 surveys showing departure from a normal distribution, 294 (55.2 %) showed high digit preference, 164 (30.8 %) had a large design effect, and 204 (38.3 %) a large sample size. Spline and LOESS smoothing techniques were explored and both techniques work well. After Spline smoothing, 56.7 % of the MUAC distributions showing departure from normality were "normalised" and 59.7 % after LOESS. Box-Cox power transformation had similar results on distributions showing departure from normality with 57 % of distributions approximating "normal" after transformation. Applying Box-Cox transformation after Spline or Loess smoothing techniques increased that proportion to 82.4 and 82.7 % respectively. This suggests that statistical approaches relying on the normal distribution assumption can be successfully applied to MUAC. In light of this promising finding, further research is ongoing to evaluate the performance of a normal distribution based approach to estimating the prevalence of wasting using MUAC.
DISFIT: A PROGRAM FOR FITTING DISTRIBUTIONS IN DATA

EPA Science Inventory

Although distribution fitting methods abound in the statistical literature, very few of these methods are found in the major statistical packages. In particular, SPSS (1975), BMD-P (1981) and SAS (1979) only give some overall tests for normality. There a few specialized distribut...
Statistical analysis of content of Cs-137 in soils in Bansko-Razlog region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kobilarov, R. G., E-mail: rkobi@tu-sofia.bg

Statistical analysis of the data set consisting of the activity concentrations of {sup 137}Cs in soils in Bansko–Razlog region is carried out in order to establish the dependence of the deposition and the migration of {sup 137}Cs on the soil type. The descriptive statistics and the test of normality show that the data set have not normal distribution. Positively skewed distribution and possible outlying values of the activity of {sup 137}Cs in soils were observed. After reduction of the effects of outliers, the data set is divided into two parts, depending on the soil type. Test of normality of themore » two new data sets shows that they have a normal distribution. Ordinary kriging technique is used to characterize the spatial distribution of the activity of {sup 137}Cs over an area covering 40 km{sup 2} (whole Razlog valley). The result (a map of the spatial distribution of the activity concentration of {sup 137}Cs) can be used as a reference point for future studies on the assessment of radiological risk to the population and the erosion of soils in the study area.« less
Noncentral Chi-Square versus Normal Distributions in Describing the Likelihood Ratio Statistic: The Univariate Case and Its Multivariate Implication

ERIC Educational Resources Information Center

Yuan, Ke-Hai

2008-01-01

In the literature of mean and covariance structure analysis, noncentral chi-square distribution is commonly used to describe the behavior of the likelihood ratio (LR) statistic under alternative hypothesis. Due to the inaccessibility of the rather technical literature for the distribution of the LR statistic, it is widely believed that the…
Lognormal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of α-Particle Track Autoradiography

PubMed Central

Neti, Prasad V.S.V.; Howell, Roger W.

2010-01-01

Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log-normal (LN) distribution function (J Nucl Med. 2006;47:1049–1058) with the aid of autoradiography. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analysis of these earlier data. Methods The measured distributions of α-particle tracks per cell were subjected to statistical tests with Poisson, LN, and Poisson-lognormal (P-LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL of 210Po-citrate. When cells were exposed to 67 kBq/mL, the P-LN distribution function gave a better fit; however, the underlying activity distribution remained log-normal. Conclusion The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:18483086
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

PubMed

Lin, Johnny; Bentler, Peter M

2012-01-01

Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.
A closer look at the effect of preliminary goodness-of-fit testing for normality for the one-sample t-test.

PubMed

Rochon, Justine; Kieser, Meinhard

2011-11-01

Student's one-sample t-test is a commonly used method when inference about the population mean is made. As advocated in textbooks and articles, the assumption of normality is often checked by a preliminary goodness-of-fit (GOF) test. In a paper recently published by Schucany and Ng it was shown that, for the uniform distribution, screening of samples by a pretest for normality leads to a more conservative conditional Type I error rate than application of the one-sample t-test without preliminary GOF test. In contrast, for the exponential distribution, the conditional level is even more elevated than the Type I error rate of the t-test without pretest. We examine the reasons behind these characteristics. In a simulation study, samples drawn from the exponential, lognormal, uniform, Student's t-distribution with 2 degrees of freedom (t(2) ) and the standard normal distribution that had passed normality screening, as well as the ingredients of the test statistics calculated from these samples, are investigated. For non-normal distributions, we found that preliminary testing for normality may change the distribution of means and standard deviations of the selected samples as well as the correlation between them (if the underlying distribution is non-symmetric), thus leading to altered distributions of the resulting test statistics. It is shown that for skewed distributions the excess in Type I error rate may be even more pronounced when testing one-sided hypotheses. ©2010 The British Psychological Society.
Gradually truncated log-normal in USA publicly traded firm size distribution

NASA Astrophysics Data System (ADS)

Gupta, Hari M.; Campanha, José R.; de Aguiar, Daniela R.; Queiroz, Gabriel A.; Raheja, Charu G.

2007-03-01

We study the statistical distribution of firm size for USA and Brazilian publicly traded firms through the Zipf plot technique. Sale size is used to measure firm size. The Brazilian firm size distribution is given by a log-normal distribution without any adjustable parameter. However, we also need to consider different parameters of log-normal distribution for the largest firms in the distribution, which are mostly foreign firms. The log-normal distribution has to be gradually truncated after a certain critical value for USA firms. Therefore, the original hypothesis of proportional effect proposed by Gibrat is valid with some modification for very large firms. We also consider the possible mechanisms behind this distribution.
The stochastic distribution of available coefficient of friction for human locomotion of five different floor surfaces.

PubMed

Chang, Wen-Ruey; Matz, Simon; Chang, Chien-Chi

2014-05-01

The maximum coefficient of friction that can be supported at the shoe and floor interface without a slip is usually called the available coefficient of friction (ACOF) for human locomotion. The probability of a slip could be estimated using a statistical model by comparing the ACOF with the required coefficient of friction (RCOF), assuming that both coefficients have stochastic distributions. An investigation of the stochastic distributions of the ACOF of five different floor surfaces under dry, water and glycerol conditions is presented in this paper. One hundred friction measurements were performed on each floor surface under each surface condition. The Kolmogorov-Smirnov goodness-of-fit test was used to determine if the distribution of the ACOF was a good fit with the normal, log-normal and Weibull distributions. The results indicated that the ACOF distributions had a slightly better match with the normal and log-normal distributions than with the Weibull in only three out of 15 cases with a statistical significance. The results are far more complex than what had heretofore been published and different scenarios could emerge. Since the ACOF is compared with the RCOF for the estimate of slip probability, the distribution of the ACOF in seven cases could be considered a constant for this purpose when the ACOF is much lower or higher than the RCOF. A few cases could be represented by a normal distribution for practical reasons based on their skewness and kurtosis values without a statistical significance. No representation could be found in three cases out of 15. Copyright © 2013 Elsevier Ltd and The Ergonomics Society. All rights reserved.

A nonparametric spatial scan statistic for continuous data.

PubMed

Jung, Inkyung; Cho, Ho Jin

2015-10-20

Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kelekis, Alexios, E-mail: akelekis@med.uoa.gr; Filippiadis, Dimitrios K., E-mail: dfilippiadis@yahoo.gr; Vergadis, Chrysovalantis, E-mail: valvergadis@yahoo.gr

PurposeThrough a prospective comparison of patients with vertebral fractures and normal population, we illustrate effect of percutaneous vertebroplasty (PV) upon projection of load distribution changes.MethodsVertebroplasty group (36 symptomatic patients with osteoporotic vertebral fractures) was evaluated on an electronic baropodometer registering projection of weight bearing areas on feet. Load distribution between right and left foot (including rear-front of the same foot) during standing and walking was recorded and compared before (group V1) and the day after (group V2) PV. Control group (30 healthy asymptomatic volunteers-no surgery record) were evaluated on the same baropodometer.ResultsMean value of load distribution difference between rear-front ofmore » the same foot was 9.45 ± 6.79 % (54.72–45.28 %) upon standing and 14.76 ± 7.09 % (57.38–42.62 %) upon walking in the control group. Respective load distribution values before PV were 16.52 ± 11.23 and 30.91 ± 19.26 % and after PV were 10.08 ± 6.26 and 14.25 ± 7.68 % upon standing and walking respectively. Mean value of load distribution variation between the two feet was 6.36 and 14.6 % before and 4.62 and 10.4 % after PV upon standing and walking respectively. Comparison of load distribution variation (group V1–V2, group V1-control group) is statistically significant. Comparison of load distribution variation (group V2-control group) is not statistically significant. Comparison of load distribution variation among the two feet is statistically significant during walking but not statistically significant during standing.ConclusionsThere is a statistically significant difference when comparing load distribution variation prior vertebroplasty and that of normal population. After vertebroplasty, this difference normalizes in a statistically significant way. PV is efficient on equilibrium-load distribution improvement as well.« less
Computer program determines exact two-sided tolerance limits for normal distributions

NASA Technical Reports Server (NTRS)

Friedman, H. A.; Webb, S. R.

1968-01-01

Computer program determines by numerical integration the exact statistical two-sided tolerance limits, when the proportion between the limits is at least a specified number. The program is limited to situations in which the underlying probability distribution for the population sampled is the normal distribution with unknown mean and variance.
Empirical analysis on the runners' velocity distribution in city marathons

NASA Astrophysics Data System (ADS)

Lin, Zhenquan; Meng, Fan

2018-01-01

In recent decades, much researches have been performed on human temporal activity and mobility patterns, while few investigations have been made to examine the features of the velocity distributions of human mobility patterns. In this paper, we investigated empirically the velocity distributions of finishers in New York City marathon, American Chicago marathon, Berlin marathon and London marathon. By statistical analyses on the datasets of the finish time records, we captured some statistical features of human behaviors in marathons: (1) The velocity distributions of all finishers and of partial finishers in the fastest age group both follow log-normal distribution; (2) In the New York City marathon, the velocity distribution of all male runners in eight 5-kilometer internal timing courses undergoes two transitions: from log-normal distribution at the initial stage (several initial courses) to the Gaussian distribution at the middle stage (several middle courses), and to log-normal distribution at the last stage (several last courses); (3) The intensity of the competition, which is described by the root-mean-square value of the rank changes of all runners, goes weaker from initial stage to the middle stage corresponding to the transition of the velocity distribution from log-normal distribution to Gaussian distribution, and when the competition gets stronger in the last course of the middle stage, there will come a transition from Gaussian distribution to log-normal one at last stage. This study may enrich the researches on human mobility patterns and attract attentions on the velocity features of human mobility.
Tips and Tricks for Successful Application of Statistical Methods to Biological Data.

PubMed

Schlenker, Evelyn

2016-01-01

This chapter discusses experimental design and use of statistics to describe characteristics of data (descriptive statistics) and inferential statistics that test the hypothesis posed by the investigator. Inferential statistics, based on probability distributions, depend upon the type and distribution of the data. For data that are continuous, randomly and independently selected, as well as normally distributed more powerful parametric tests such as Student's t test and analysis of variance (ANOVA) can be used. For non-normally distributed or skewed data, transformation of the data (using logarithms) may normalize the data allowing use of parametric tests. Alternatively, with skewed data nonparametric tests can be utilized, some of which rely on data that are ranked prior to statistical analysis. Experimental designs and analyses need to balance between committing type 1 errors (false positives) and type 2 errors (false negatives). For a variety of clinical studies that determine risk or benefit, relative risk ratios (random clinical trials and cohort studies) or odds ratios (case-control studies) are utilized. Although both use 2 × 2 tables, their premise and calculations differ. Finally, special statistical methods are applied to microarray and proteomics data, since the large number of genes or proteins evaluated increase the likelihood of false discoveries. Additional studies in separate samples are used to verify microarray and proteomic data. Examples in this chapter and references are available to help continued investigation of experimental designs and appropriate data analysis.
Detection of Person Misfit in Computerized Adaptive Tests with Polytomous Items.

ERIC Educational Resources Information Center

van Krimpen-Stoop, Edith M. L. A.; Meijer, Rob R.

2002-01-01

Compared the nominal and empirical null distributions of the standardized log-likelihood statistic for polytomous items for paper-and-pencil (P&P) and computerized adaptive tests (CATs). Results show that the empirical distribution of the statistic differed from the assumed standard normal distribution for both P&P tests and CATs. Also…
Distribution of water quality parameters in Dhemaji district, Assam (India).

PubMed

Buragohain, Mridul; Bhuyan, Bhabajit; Sarma, H P

2010-07-01

The primary objective of this study is to present a statistically significant water quality database of Dhemaji district, Assam (India) with special reference to pH, fluoride, nitrate, arsenic, iron, sodium and potassium. 25 water samples collected from different locations of five development blocks in Dhemaji district have been studied separately. The implications presented are based on statistical analyses of the raw data. Normal distribution statistics and reliability analysis (correlation and covariance matrix) have been employed to find out the distribution pattern, localisation of data, and other related information. Statistical observations show that all the parameters under investigation exhibit non uniform distribution with a long asymmetric tail either on the right or left side of the median. The width of the third quartile was consistently found to be more than the second quartile for each parameter. Differences among mean, mode and median, significant skewness and kurtosis value indicate that the distribution of various water quality parameters in the study area is widely off normal. Thus, the intrinsic water quality is not encouraging due to unsymmetrical distribution of various water quality parameters in the study area.
Robustness of location estimators under t-distributions: a literature review

NASA Astrophysics Data System (ADS)

Sumarni, C.; Sadik, K.; Notodiputro, K. A.; Sartono, B.

2017-03-01

The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model.
An Evaluation of Normal versus Lognormal Distribution in Data Description and Empirical Analysis

ERIC Educational Resources Information Center

Diwakar, Rekha

2017-01-01

Many existing methods of statistical inference and analysis rely heavily on the assumption that the data are normally distributed. However, the normality assumption is not fulfilled when dealing with data which does not contain negative values or are otherwise skewed--a common occurrence in diverse disciplines such as finance, economics, political…
An Empirical Bayes Approach to Mantel-Haenszel DIF Analysis.

ERIC Educational Resources Information Center

Zwick, Rebecca; Thayer, Dorothy T.; Lewis, Charles

1999-01-01

Developed an empirical Bayes enhancement to Mantel-Haenszel (MH) analysis of differential item functioning (DIF) in which it is assumed that the MH statistics are normally distributed and that the prior distribution of underlying DIF parameters is also normal. (Author/SLD)
A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

PubMed Central

Lin, Johnny; Bentler, Peter M.

2012-01-01

Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511
Statistical distributions of ultra-low dose CT sinograms and their fundamental limits

NASA Astrophysics Data System (ADS)

Lee, Tzu-Cheng; Zhang, Ruoqiao; Alessio, Adam M.; Fu, Lin; De Man, Bruno; Kinahan, Paul E.

2017-03-01

Low dose CT imaging is typically constrained to be diagnostic. However, there are applications for even lowerdose CT imaging, including image registration across multi-frame CT images and attenuation correction for PET/CT imaging. We define this as the ultra-low-dose (ULD) CT regime where the exposure level is a factor of 10 lower than current low-dose CT technique levels. In the ULD regime it is possible to use statistically-principled image reconstruction methods that make full use of the raw data information. Since most statistical based iterative reconstruction methods are based on the assumption of that post-log noise distribution is close to Poisson or Gaussian, our goal is to understand the statistical distribution of ULD CT data with different non-positivity correction methods, and to understand when iterative reconstruction methods may be effective in producing images that are useful for image registration or attenuation correction in PET/CT imaging. We first used phantom measurement and calibrated simulation to reveal how the noise distribution deviate from normal assumption under the ULD CT flux environment. In summary, our results indicate that there are three general regimes: (1) Diagnostic CT, where post-log data are well modeled by normal distribution. (2) Lowdose CT, where normal distribution remains a reasonable approximation and statistically-principled (post-log) methods that assume a normal distribution have an advantage. (3) An ULD regime that is photon-starved and the quadratic approximation is no longer effective. For instance, a total integral density of 4.8 (ideal pi for 24 cm of water) for 120kVp, 0.5mAs of radiation source is the maximum pi value where a definitive maximum likelihood value could be found. This leads to fundamental limits in the estimation of ULD CT data when using a standard data processing stream
A novel generalized normal distribution for human longevity and other negatively skewed data.

PubMed

Robertson, Henry T; Allison, David B

2012-01-01

Negatively skewed data arise occasionally in statistical practice; perhaps the most familiar example is the distribution of human longevity. Although other generalizations of the normal distribution exist, we demonstrate a new alternative that apparently fits human longevity data better. We propose an alternative approach of a normal distribution whose scale parameter is conditioned on attained age. This approach is consistent with previous findings that longevity conditioned on survival to the modal age behaves like a normal distribution. We derive such a distribution and demonstrate its accuracy in modeling human longevity data from life tables. The new distribution is characterized by 1. An intuitively straightforward genesis; 2. Closed forms for the pdf, cdf, mode, quantile, and hazard functions; and 3. Accessibility to non-statisticians, based on its close relationship to the normal distribution.
A Novel Generalized Normal Distribution for Human Longevity and other Negatively Skewed Data

PubMed Central

Robertson, Henry T.; Allison, David B.

2012-01-01

Negatively skewed data arise occasionally in statistical practice; perhaps the most familiar example is the distribution of human longevity. Although other generalizations of the normal distribution exist, we demonstrate a new alternative that apparently fits human longevity data better. We propose an alternative approach of a normal distribution whose scale parameter is conditioned on attained age. This approach is consistent with previous findings that longevity conditioned on survival to the modal age behaves like a normal distribution. We derive such a distribution and demonstrate its accuracy in modeling human longevity data from life tables. The new distribution is characterized by 1. An intuitively straightforward genesis; 2. Closed forms for the pdf, cdf, mode, quantile, and hazard functions; and 3. Accessibility to non-statisticians, based on its close relationship to the normal distribution. PMID:22623974
Apparent Transition in the Human Height Distribution Caused by Age-Dependent Variation during Puberty Period

NASA Astrophysics Data System (ADS)

Iwata, Takaki; Yamazaki, Yoshihiro; Kuninaka, Hiroto

2013-08-01

In this study, we examine the validity of the transition of the human height distribution from the log-normal distribution to the normal distribution during puberty, as suggested in an earlier study [Kuninaka et al.: J. Phys. Soc. Jpn. 78 (2009) 125001]. Our data analysis reveals that, in late puberty, the variation in height decreases as children grow. Thus, the classification of a height dataset by age at this stage leads us to analyze a mixture of distributions with larger means and smaller variations. This mixture distribution has a negative skewness and is consequently closer to the normal distribution than to the log-normal distribution. The opposite case occurs in early puberty and the mixture distribution is positively skewed, which resembles the log-normal distribution rather than the normal distribution. Thus, this scenario mimics the transition during puberty. Additionally, our scenario is realized through a numerical simulation based on a statistical model. The present study does not support the transition suggested by the earlier study.
Non-normal Distributions Commonly Used in Health, Education, and Social Sciences: A Systematic Review

PubMed Central

Bono, Roser; Blanca, María J.; Arnau, Jaume; Gómez-Benito, Juana

2017-01-01

Statistical analysis is crucial for research and the choice of analytical technique should take into account the specific distribution of data. Although the data obtained from health, educational, and social sciences research are often not normally distributed, there are very few studies detailing which distributions are most likely to represent data in these disciplines. The aim of this systematic review was to determine the frequency of appearance of the most common non-normal distributions in the health, educational, and social sciences. The search was carried out in the Web of Science database, from which we retrieved the abstracts of papers published between 2010 and 2015. The selection was made on the basis of the title and the abstract, and was performed independently by two reviewers. The inter-rater reliability for article selection was high (Cohen’s kappa = 0.84), and agreement regarding the type of distribution reached 96.5%. A total of 262 abstracts were included in the final review. The distribution of the response variable was reported in 231 of these abstracts, while in the remaining 31 it was merely stated that the distribution was non-normal. In terms of their frequency of appearance, the most-common non-normal distributions can be ranked in descending order as follows: gamma, negative binomial, multinomial, binomial, lognormal, and exponential. In addition to identifying the distributions most commonly used in empirical studies these results will help researchers to decide which distributions should be included in simulation studies examining statistical procedures. PMID:28959227
Sketching Curves for Normal Distributions--Geometric Connections

ERIC Educational Resources Information Center

Bosse, Michael J.

2006-01-01

Within statistics instruction, students are often requested to sketch the curve representing a normal distribution with a given mean and standard deviation. Unfortunately, these sketches are often notoriously imprecise. Poor sketches are usually the result of missing mathematical knowledge. This paper considers relationships which exist among…
How log-normal is your country? An analysis of the statistical distribution of the exported volumes of products

NASA Astrophysics Data System (ADS)

Annunziata, Mario Alberto; Petri, Alberto; Pontuale, Giorgio; Zaccaria, Andrea

2016-10-01

We have considered the statistical distributions of the volumes of 1131 products exported by 148 countries. We have found that the form of these distributions is not unique but heavily depends on the level of development of the nation, as expressed by macroeconomic indicators like GDP, GDP per capita, total export and a recently introduced measure for countries' economic complexity called fitness. We have identified three major classes: a) an incomplete log-normal shape, truncated on the left side, for the less developed countries, b) a complete log-normal, with a wider range of volumes, for nations characterized by intermediate economy, and c) a strongly asymmetric shape for countries with a high degree of development. Finally, the log-normality hypothesis has been checked for the distributions of all the 148 countries through different tests, Kolmogorov-Smirnov and Cramér-Von Mises, confirming that it cannot be rejected only for the countries of intermediate economy.
Multiplicative Modeling of Children's Growth and Its Statistical Properties

NASA Astrophysics Data System (ADS)

Kuninaka, Hiroto; Matsushita, Mitsugu

2014-03-01

We develop a numerical growth model that can predict the statistical properties of the height distribution of Japanese children. Our previous studies have clarified that the height distribution of schoolchildren shows a transition from the lognormal distribution to the normal distribution during puberty. In this study, we demonstrate by simulation that the transition occurs owing to the variability of the onset of puberty.
Modeling error distributions of growth curve models through Bayesian methods.

PubMed

Zhang, Zhiyong

2016-06-01

Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.

Optimal transformations leading to normal distributions of positron emission tomography standardized uptake values.

PubMed

Scarpelli, Matthew; Eickhoff, Jens; Cuna, Enrique; Perlman, Scott; Jeraj, Robert

2018-01-30

The statistical analysis of positron emission tomography (PET) standardized uptake value (SUV) measurements is challenging due to the skewed nature of SUV distributions. This limits utilization of powerful parametric statistical models for analyzing SUV measurements. An ad-hoc approach, which is frequently used in practice, is to blindly use a log transformation, which may or may not result in normal SUV distributions. This study sought to identify optimal transformations leading to normally distributed PET SUVs extracted from tumors and assess the effects of therapy on the optimal transformations. The optimal transformation for producing normal distributions of tumor SUVs was identified by iterating the Box-Cox transformation parameter (λ) and selecting the parameter that maximized the Shapiro-Wilk P-value. Optimal transformations were identified for tumor SUV max distributions at both pre and post treatment. This study included 57 patients that underwent 18 F-fluorodeoxyglucose ( 18 F-FDG) PET scans (publically available dataset). In addition, to test the generality of our transformation methodology, we included analysis of 27 patients that underwent 18 F-Fluorothymidine ( 18 F-FLT) PET scans at our institution. After applying the optimal Box-Cox transformations, neither the pre nor the post treatment 18 F-FDG SUV distributions deviated significantly from normality (P > 0.10). Similar results were found for 18 F-FLT PET SUV distributions (P > 0.10). For both 18 F-FDG and 18 F-FLT SUV distributions, the skewness and kurtosis increased from pre to post treatment, leading to a decrease in the optimal Box-Cox transformation parameter from pre to post treatment. There were types of distributions encountered for both 18 F-FDG and 18 F-FLT where a log transformation was not optimal for providing normal SUV distributions. Optimization of the Box-Cox transformation, offers a solution for identifying normal SUV transformations for when the log transformation is insufficient. The log transformation is not always the appropriate transformation for producing normally distributed PET SUVs.
Optimal transformations leading to normal distributions of positron emission tomography standardized uptake values

NASA Astrophysics Data System (ADS)

Scarpelli, Matthew; Eickhoff, Jens; Cuna, Enrique; Perlman, Scott; Jeraj, Robert

2018-02-01

The statistical analysis of positron emission tomography (PET) standardized uptake value (SUV) measurements is challenging due to the skewed nature of SUV distributions. This limits utilization of powerful parametric statistical models for analyzing SUV measurements. An ad-hoc approach, which is frequently used in practice, is to blindly use a log transformation, which may or may not result in normal SUV distributions. This study sought to identify optimal transformations leading to normally distributed PET SUVs extracted from tumors and assess the effects of therapy on the optimal transformations. Methods. The optimal transformation for producing normal distributions of tumor SUVs was identified by iterating the Box-Cox transformation parameter (λ) and selecting the parameter that maximized the Shapiro-Wilk P-value. Optimal transformations were identified for tumor SUVmax distributions at both pre and post treatment. This study included 57 patients that underwent 18F-fluorodeoxyglucose (18F-FDG) PET scans (publically available dataset). In addition, to test the generality of our transformation methodology, we included analysis of 27 patients that underwent 18F-Fluorothymidine (18F-FLT) PET scans at our institution. Results. After applying the optimal Box-Cox transformations, neither the pre nor the post treatment 18F-FDG SUV distributions deviated significantly from normality (P > 0.10). Similar results were found for 18F-FLT PET SUV distributions (P > 0.10). For both 18F-FDG and 18F-FLT SUV distributions, the skewness and kurtosis increased from pre to post treatment, leading to a decrease in the optimal Box-Cox transformation parameter from pre to post treatment. There were types of distributions encountered for both 18F-FDG and 18F-FLT where a log transformation was not optimal for providing normal SUV distributions. Conclusion. Optimization of the Box-Cox transformation, offers a solution for identifying normal SUV transformations for when the log transformation is insufficient. The log transformation is not always the appropriate transformation for producing normally distributed PET SUVs.
Box-Cox transformation of firm size data in statistical analysis

NASA Astrophysics Data System (ADS)

Chen, Ting Ting; Takaishi, Tetsuya

2014-03-01

Firm size data usually do not show the normality that is often assumed in statistical analysis such as regression analysis. In this study we focus on two firm size data: the number of employees and sale. Those data deviate considerably from a normal distribution. To improve the normality of those data we transform them by the Box-Cox transformation with appropriate parameters. The Box-Cox transformation parameters are determined so that the transformed data best show the kurtosis of a normal distribution. It is found that the two firm size data transformed by the Box-Cox transformation show strong linearity. This indicates that the number of employees and sale have the similar property as a firm size indicator. The Box-Cox parameters obtained for the firm size data are found to be very close to zero. In this case the Box-Cox transformations are approximately a log-transformation. This suggests that the firm size data we used are approximately log-normal distributions.
Vector wind and vector wind shear models 0 to 27 km altitude for Cape Kennedy, Florida, and Vandenberg AFB, California

NASA Technical Reports Server (NTRS)

Smith, O. E.

1976-01-01

The techniques are presented to derive several statistical wind models. The techniques are from the properties of the multivariate normal probability function. Assuming that the winds can be considered as bivariate normally distributed, then (1) the wind components and conditional wind components are univariate normally distributed, (2) the wind speed is Rayleigh distributed, (3) the conditional distribution of wind speed given a wind direction is Rayleigh distributed, and (4) the frequency of wind direction can be derived. All of these distributions are derived from the 5-sample parameter of wind for the bivariate normal distribution. By further assuming that the winds at two altitudes are quadravariate normally distributed, then the vector wind shear is bivariate normally distributed and the modulus of the vector wind shear is Rayleigh distributed. The conditional probability of wind component shears given a wind component is normally distributed. Examples of these and other properties of the multivariate normal probability distribution function as applied to Cape Kennedy, Florida, and Vandenberg AFB, California, wind data samples are given. A technique to develop a synthetic vector wind profile model of interest to aerospace vehicle applications is presented.
Statistical distribution of mechanical properties for three graphite-epoxy material systems

NASA Technical Reports Server (NTRS)

Reese, C.; Sorem, J., Jr.

1981-01-01

Graphite-epoxy composites are playing an increasing role as viable alternative materials in structural applications necessitating thorough investigation into the predictability and reproducibility of their material strength properties. This investigation was concerned with tension, compression, and short beam shear coupon testing of large samples from three different material suppliers to determine their statistical strength behavior. Statistical results indicate that a two Parameter Weibull distribution model provides better overall characterization of material behavior for the graphite-epoxy systems tested than does the standard Normal distribution model that is employed for most design work. While either a Weibull or Normal distribution model provides adequate predictions for average strength values, the Weibull model provides better characterization in the lower tail region where the predictions are of maximum design interest. The two sets of the same material were found to have essentially the same material properties, and indicate that repeatability can be achieved.
The Theoretical Distribution of Evoked Brainstem Activity in Preterm, High-Risk, and Healthy Infants.

ERIC Educational Resources Information Center

Salamy, A.

1981-01-01

Determines the frequency distribution of Brainstem Auditory Evoked Potential variables (BAEP) for premature babies at different stages of development--normal newborns, infants, young children, and adults. The author concludes that the assumption of normality underlying most "standard" statistical analyses can be met for many BAEP…
Fundamentals of Research Data and Variables: The Devil Is in the Details.

PubMed

Vetter, Thomas R

2017-10-01

Designing, conducting, analyzing, reporting, and interpreting the findings of a research study require an understanding of the types and characteristics of data and variables. Descriptive statistics are typically used simply to calculate, describe, and summarize the collected research data in a logical, meaningful, and efficient way. Inferential statistics allow researchers to make a valid estimate of the association between an intervention and the treatment effect in a specific population, based upon their randomly collected, representative sample data. Categorical data can be either dichotomous or polytomous. Dichotomous data have only 2 categories, and thus are considered binary. Polytomous data have more than 2 categories. Unlike dichotomous and polytomous data, ordinal data are rank ordered, typically based on a numerical scale that is comprised of a small set of discrete classes or integers. Continuous data are measured on a continuum and can have any numeric value over this continuous range. Continuous data can be meaningfully divided into smaller and smaller or finer and finer increments, depending upon the precision of the measurement instrument. Interval data are a form of continuous data in which equal intervals represent equal differences in the property being measured. Ratio data are another form of continuous data, which have the same properties as interval data, plus a true definition of an absolute zero point, and the ratios of the values on the measurement scale make sense. The normal (Gaussian) distribution ("bell-shaped curve") is of the most common statistical distributions. Many applied inferential statistical tests are predicated on the assumption that the analyzed data follow a normal distribution. The histogram and the Q-Q plot are 2 graphical methods to assess if a set of data have a normal distribution (display "normality"). The Shapiro-Wilk test and the Kolmogorov-Smirnov test are 2 well-known and historically widely applied quantitative methods to assess for data normality. Parametric statistical tests make certain assumptions about the characteristics and/or parameters of the underlying population distribution upon which the test is based, whereas nonparametric tests make fewer or less rigorous assumptions. If the normality test concludes that the study data deviate significantly from a Gaussian distribution, rather than applying a less robust nonparametric test, the problem can potentially be remedied by judiciously and openly: (1) performing a data transformation of all the data values; or (2) eliminating any obvious data outlier(s).
Explorations in statistics: the log transformation.

PubMed

Curran-Everett, Douglas

2018-06-01

Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.
Statistical distribution of building lot frontage: application for Tokyo downtown districts

NASA Astrophysics Data System (ADS)

Usui, Hiroyuki

2018-03-01

The frontage of a building lot is the determinant factor of the residential environment. The statistical distribution of building lot frontages shows how the perimeters of urban blocks are shared by building lots for a given density of buildings and roads. For practitioners in urban planning, this is indispensable to identify potential districts which comprise a high percentage of building lots with narrow frontage after subdivision and to reconsider the appropriate criteria for the density of buildings and roads as residential environment indices. In the literature, however, the statistical distribution of building lot frontages and the density of buildings and roads has not been fully researched. In this paper, based on the empirical study in the downtown districts of Tokyo, it is found that (1) a log-normal distribution fits the observed distribution of building lot frontages better than a gamma distribution, which is the model of the size distribution of Poisson Voronoi cells on closed curves; (2) the statistical distribution of building lot frontages statistically follows a log-normal distribution, whose parameters are the gross building density, road density, average road width, the coefficient of variation of building lot frontage, and the ratio of the number of building lot frontages to the number of buildings; and (3) the values of the coefficient of variation of building lot frontages, and that of the ratio of the number of building lot frontages to that of buildings are approximately equal to 0.60 and 1.19, respectively.
Non-Gaussian Distributions Affect Identification of Expression Patterns, Functional Annotation, and Prospective Classification in Human Cancer Genomes

PubMed Central

Marko, Nicholas F.; Weil, Robert J.

2012-01-01

Introduction Gene expression data is often assumed to be normally-distributed, but this assumption has not been tested rigorously. We investigate the distribution of expression data in human cancer genomes and study the implications of deviations from the normal distribution for translational molecular oncology research. Methods We conducted a central moments analysis of five cancer genomes and performed empiric distribution fitting to examine the true distribution of expression data both on the complete-experiment and on the individual-gene levels. We used a variety of parametric and nonparametric methods to test the effects of deviations from normality on gene calling, functional annotation, and prospective molecular classification using a sixth cancer genome. Results Central moments analyses reveal statistically-significant deviations from normality in all of the analyzed cancer genomes. We observe as much as 37% variability in gene calling, 39% variability in functional annotation, and 30% variability in prospective, molecular tumor subclassification associated with this effect. Conclusions Cancer gene expression profiles are not normally-distributed, either on the complete-experiment or on the individual-gene level. Instead, they exhibit complex, heavy-tailed distributions characterized by statistically-significant skewness and kurtosis. The non-Gaussian distribution of this data affects identification of differentially-expressed genes, functional annotation, and prospective molecular classification. These effects may be reduced in some circumstances, although not completely eliminated, by using nonparametric analytics. This analysis highlights two unreliable assumptions of translational cancer gene expression analysis: that “small” departures from normality in the expression data distributions are analytically-insignificant and that “robust” gene-calling algorithms can fully compensate for these effects. PMID:23118863
Characterizations of linear sufficient statistics

NASA Technical Reports Server (NTRS)

Peters, B. C., Jr.; Reoner, R.; Decell, H. P., Jr.

1977-01-01

A surjective bounded linear operator T from a Banach space X to a Banach space Y must be a sufficient statistic for a dominated family of probability measures defined on the Borel sets of X. These results were applied, so that they characterize linear sufficient statistics for families of the exponential type, including as special cases the Wishart and multivariate normal distributions. The latter result was used to establish precisely which procedures for sampling from a normal population had the property that the sample mean was a sufficient statistic.
Normality of raw data in general linear models: The most widespread myth in statistics

USGS Publications Warehouse

Kery, Marc; Hatfield, Jeff S.

2003-01-01

In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.
Analysis of vector wind change with respect to time for Cape Kennedy, Florida

NASA Technical Reports Server (NTRS)

Adelfang, S. I.

1978-01-01

Multivariate analysis was used to determine the joint distribution of the four variables represented by the components of the wind vector at an initial time and after a specified elapsed time is hypothesized to be quadravariate normal; the fourteen statistics of this distribution, calculated from 15 years of twice-daily rawinsonde data are presented by monthly reference periods for each month from 0 to 27 km. The hypotheses that the wind component changes with respect to time is univariate normal, that the joint distribution of wind component change with respect to time is univariate normal, that the joint distribution of wind component changes is bivariate normal, and that the modulus of vector wind change is Rayleigh are tested by comparison with observed distributions. Statistics of the conditional bivariate normal distributions of vector wind at a future time given the vector wind at an initial time are derived. Wind changes over time periods from 1 to 5 hours, calculated from Jimsphere data, are presented. Extension of the theoretical prediction (based on rawinsonde data) of wind component change standard deviation to time periods of 1 to 5 hours falls (with a few exceptions) within the 95 percentile confidence band of the population estimate obtained from the Jimsphere sample data. The joint distributions of wind change components, conditional wind components, and 1 km vector wind shear change components are illustrated by probability ellipses at the 95 percentile level.
Normal Distribution of CD8+ T-Cell-Derived ELISPOT Counts within Replicates Justifies the Reliance on Parametric Statistics for Identifying Positive Responses.

PubMed

Karulin, Alexey Y; Caspell, Richard; Dittrich, Marcus; Lehmann, Paul V

2015-03-02

Accurate assessment of positive ELISPOT responses for low frequencies of antigen-specific T-cells is controversial. In particular, it is still unknown whether ELISPOT counts within replicate wells follow a theoretical distribution function, and thus whether high power parametric statistics can be used to discriminate between positive and negative wells. We studied experimental distributions of spot counts for up to 120 replicate wells of IFN-γ production by CD8+ T-cell responding to EBV LMP2A (426 - 434) peptide in human PBMC. The cells were tested in serial dilutions covering a wide range of average spot counts per condition, from just a few to hundreds of spots per well. Statistical analysis of the data using diagnostic Q-Q plots and the Shapiro-Wilk normality test showed that in the entire dynamic range of ELISPOT spot counts within replicate wells followed a normal distribution. This result implies that the Student t-Test and ANOVA are suited to identify positive responses. We also show experimentally that borderline responses can be reliably detected by involving more replicate wells, plating higher numbers of PBMC, addition of IL-7, or a combination of these. Furthermore, we have experimentally verified that the number of replicates needed for detection of weak responses can be calculated using parametric statistics.
Flame surface statistics of constant-pressure turbulent expanding premixed flames

NASA Astrophysics Data System (ADS)

Saha, Abhishek; Chaudhuri, Swetaprovo; Law, Chung K.

2014-04-01

In this paper we investigate the local flame surface statistics of constant-pressure turbulent expanding flames. First the statistics of local length ratio is experimentally determined from high-speed planar Mie scattering images of spherically expanding flames, with the length ratio on the measurement plane, at predefined equiangular sectors, defined as the ratio of the actual flame length to the length of a circular-arc of radius equal to the average radius of the flame. Assuming isotropic distribution of such flame segments we then convolute suitable forms of the length-ratio probability distribution functions (pdfs) to arrive at the corresponding area-ratio pdfs. It is found that both the length ratio and area ratio pdfs are near log-normally distributed and shows self-similar behavior with increasing radius. Near log-normality and rather intermittent behavior of the flame-length ratio suggests similarity with dissipation rate quantities which stimulates multifractal analysis.
A product Pearson-type VII density distribution

NASA Astrophysics Data System (ADS)

Nadarajah, Saralees; Kotz, Samuel

2008-01-01

The Pearson-type VII distributions (containing the Student's t distributions) are becoming increasing prominent and are being considered as competitors to the normal distribution. Motivated by real examples in decision sciences, Bayesian statistics, probability theory and Physics, a new Pearson-type VII distribution is introduced by taking the product of two Pearson-type VII pdfs. Various structural properties of this distribution are derived, including its cdf, moments, mean deviation about the mean, mean deviation about the median, entropy, asymptotic distribution of the extreme order statistics, maximum likelihood estimates and the Fisher information matrix. Finally, an application to a Bayesian testing problem is illustrated.
A new statistical method for design and analyses of component tolerance

NASA Astrophysics Data System (ADS)

Movahedi, Mohammad Mehdi; Khounsiavash, Mohsen; Otadi, Mahmood; Mosleh, Maryam

2017-03-01

Tolerancing conducted by design engineers to meet customers' needs is a prerequisite for producing high-quality products. Engineers use handbooks to conduct tolerancing. While use of statistical methods for tolerancing is not something new, engineers often use known distributions, including the normal distribution. Yet, if the statistical distribution of the given variable is unknown, a new statistical method will be employed to design tolerance. In this paper, we use generalized lambda distribution for design and analyses component tolerance. We use percentile method (PM) to estimate the distribution parameters. The findings indicated that, when the distribution of the component data is unknown, the proposed method can be used to expedite the design of component tolerance. Moreover, in the case of assembled sets, more extensive tolerance for each component with the same target performance can be utilized.
Statistical studies of animal response data from USF toxicity screening test method

NASA Technical Reports Server (NTRS)

Hilado, C. J.; Machado, A. M.

1978-01-01

Statistical examination of animal response data obtained using Procedure B of the USF toxicity screening test method indicates that the data deviate only slightly from a normal or Gaussian distribution. This slight departure from normality is not expected to invalidate conclusions based on theoretical statistics. Comparison of times to staggering, convulsions, collapse, and death as endpoints shows that time to death appears to be the most reliable endpoint because it offers the lowest probability of missed observations and premature judgements.
Selecting the right statistical model for analysis of insect count data by using information theoretic measures.

PubMed

Sileshi, G

2006-10-01

Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.
WE-H-207A-03: The Universality of the Lognormal Behavior of [F-18]FLT PET SUV Measurements

DOE Office of Scientific and Technical Information (OSTI.GOV)

Scarpelli, M; Eickhoff, J; Perlman, S

Purpose: Log transforming [F-18]FDG PET standardized uptake values (SUVs) has been shown to lead to normal SUV distributions, which allows utilization of powerful parametric statistical models. This study identified the optimal transformation leading to normally distributed [F-18]FLT PET SUVs from solid tumors and offers an example of how normal distributions permits analysis of non-independent/correlated measurements. Methods: Forty patients with various metastatic diseases underwent up to six FLT PET/CT scans during treatment. Tumors were identified by nuclear medicine physician and manually segmented. Average uptake was extracted for each patient giving a global SUVmean (gSUVmean) for each scan. The Shapiro-Wilk test wasmore » used to test distribution normality. One parameter Box-Cox transformations were applied to each of the six gSUVmean distributions and the optimal transformation was found by selecting the parameter that maximized the Shapiro-Wilk test statistic. The relationship between gSUVmean and a serum biomarker (VEGF) collected at imaging timepoints was determined using a linear mixed effects model (LMEM), which accounted for correlated/non-independent measurements from the same individual. Results: Untransformed gSUVmean distributions were found to be significantly non-normal (p<0.05). The optimal transformation parameter had a value of 0.3 (95%CI: −0.4 to 1.6). Given the optimal parameter was close to zero (which corresponds to log transformation), the data were subsequently log transformed. All log transformed gSUVmean distributions were normally distributed (p>0.10 for all timepoints). Log transformed data were incorporated into the LMEM. VEGF serum levels significantly correlated with gSUVmean (p<0.001), revealing log-linear relationship between SUVs and underlying biology. Conclusion: Failure to account for correlated/non-independent measurements can lead to invalid conclusions and motivated transformation to normally distributed SUVs. The log transformation was found to be close to optimal and sufficient for obtaining normally distributed FLT PET SUVs. These transformations allow utilization of powerful LMEMs when analyzing quantitative imaging metrics.« less

Comparison of Sample Size by Bootstrap and by Formulas Based on Normal Distribution Assumption.

PubMed

Wang, Zuozhen

2018-01-01

Bootstrapping technique is distribution-independent, which provides an indirect way to estimate the sample size for a clinical trial based on a relatively smaller sample. In this paper, sample size estimation to compare two parallel-design arms for continuous data by bootstrap procedure are presented for various test types (inequality, non-inferiority, superiority, and equivalence), respectively. Meanwhile, sample size calculation by mathematical formulas (normal distribution assumption) for the identical data are also carried out. Consequently, power difference between the two calculation methods is acceptably small for all the test types. It shows that the bootstrap procedure is a credible technique for sample size estimation. After that, we compared the powers determined using the two methods based on data that violate the normal distribution assumption. To accommodate the feature of the data, the nonparametric statistical method of Wilcoxon test was applied to compare the two groups in the data during the process of bootstrap power estimation. As a result, the power estimated by normal distribution-based formula is far larger than that by bootstrap for each specific sample size per group. Hence, for this type of data, it is preferable that the bootstrap method be applied for sample size calculation at the beginning, and that the same statistical method as used in the subsequent statistical analysis is employed for each bootstrap sample during the course of bootstrap sample size estimation, provided there is historical true data available that can be well representative of the population to which the proposed trial is planning to extrapolate.
Multivariate normality

NASA Technical Reports Server (NTRS)

Crutcher, H. L.; Falls, L. W.

1976-01-01

Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.
The Universal Statistical Distributions of the Affinity, Equilibrium Constants, Kinetics and Specificity in Biomolecular Recognition

PubMed Central

Zheng, Xiliang; Wang, Jin

2015-01-01

We uncovered the universal statistical laws for the biomolecular recognition/binding process. We quantified the statistical energy landscapes for binding, from which we can characterize the distributions of the binding free energy (affinity), the equilibrium constants, the kinetics and the specificity by exploring the different ligands binding with a particular receptor. The results of the analytical studies are confirmed by the microscopic flexible docking simulations. The distribution of binding affinity is Gaussian around the mean and becomes exponential near the tail. The equilibrium constants of the binding follow a log-normal distribution around the mean and a power law distribution in the tail. The intrinsic specificity for biomolecular recognition measures the degree of discrimination of native versus non-native binding and the optimization of which becomes the maximization of the ratio of the free energy gap between the native state and the average of non-native states versus the roughness measured by the variance of the free energy landscape around its mean. The intrinsic specificity obeys a Gaussian distribution near the mean and an exponential distribution near the tail. Furthermore, the kinetics of binding follows a log-normal distribution near the mean and a power law distribution at the tail. Our study provides new insights into the statistical nature of thermodynamics, kinetics and function from different ligands binding with a specific receptor or equivalently specific ligand binding with different receptors. The elucidation of distributions of the kinetics and free energy has guiding roles in studying biomolecular recognition and function through small-molecule evolution and chemical genetics. PMID:25885453
PV System Component Fault and Failure Compilation and Analysis.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Klise, Geoffrey Taylor; Lavrova, Olga; Gooding, Renee Lynne

This report describes data collection and analysis of solar photovoltaic (PV) equipment events, which consist of faults and fa ilures that occur during the normal operation of a distributed PV system or PV power plant. We present summary statistics from locations w here maintenance data is being collected at various intervals, as well as reliability statistics gathered from that da ta, consisting of fault/failure distributions and repair distributions for a wide range of PV equipment types.
Normal probabilities for Vandenberg AFB wind components - monthly reference periods for all flight azimuths, 0- to 70-km altitudes

NASA Technical Reports Server (NTRS)

Falls, L. W.

1975-01-01

Vandenberg Air Force Base (AFB), California, wind component statistics are presented to be used for aerospace engineering applications that require component wind probabilities for various flight azimuths and selected altitudes. The normal (Gaussian) distribution is presented as a statistical model to represent component winds at Vandenberg AFB. Head tail, and crosswind components are tabulated for all flight azimuths for altitudes from 0 to 70 km by monthly reference periods. Wind components are given for 11 selected percentiles ranging from 0.135 percent to 99.865 percent for each month. The results of statistical goodness-of-fit tests are presented to verify the use of the Gaussian distribution as an adequate model to represent component winds at Vandenberg AFB.
Normal probabilities for Cape Kennedy wind components: Monthly reference periods for all flight azimuths. Altitudes 0 to 70 kilometers

NASA Technical Reports Server (NTRS)

Falls, L. W.

1973-01-01

This document replaces Cape Kennedy empirical wind component statistics which are presently being used for aerospace engineering applications that require component wind probabilities for various flight azimuths and selected altitudes. The normal (Gaussian) distribution is presented as an adequate statistical model to represent component winds at Cape Kennedy. Head-, tail-, and crosswind components are tabulated for all flight azimuths for altitudes from 0 to 70 km by monthly reference periods. Wind components are given for 11 selected percentiles ranging from 0.135 percent to 99,865 percent for each month. Results of statistical goodness-of-fit tests are presented to verify the use of the Gaussian distribution as an adequate model to represent component winds at Cape Kennedy, Florida.
Statistical aspects of genetic association testing in small samples, based on selective DNA pooling data in the arctic fox.

PubMed

Szyda, Joanna; Liu, Zengting; Zatoń-Dobrowolska, Magdalena; Wierzbicki, Heliodor; Rzasa, Anna

2008-01-01

We analysed data from a selective DNA pooling experiment with 130 individuals of the arctic fox (Alopex lagopus), which originated from 2 different types regarding body size. The association between alleles of 6 selected unlinked molecular markers and body size was tested by using univariate and multinomial logistic regression models, applying odds ratio and test statistics from the power divergence family. Due to the small sample size and the resulting sparseness of the data table, in hypothesis testing we could not rely on the asymptotic distributions of the tests. Instead, we tried to account for data sparseness by (i) modifying confidence intervals of odds ratio; (ii) using a normal approximation of the asymptotic distribution of the power divergence tests with different approaches for calculating moments of the statistics; and (iii) assessing P values empirically, based on bootstrap samples. As a result, a significant association was observed for 3 markers. Furthermore, we used simulations to assess the validity of the normal approximation of the asymptotic distribution of the test statistics under the conditions of small and sparse samples.
Parametric vs. non-parametric statistics of low resolution electromagnetic tomography (LORETA).

PubMed

Thatcher, R W; North, D; Biver, C

2005-01-01

This study compared the relative statistical sensitivity of non-parametric and parametric statistics of 3-dimensional current sources as estimated by the EEG inverse solution Low Resolution Electromagnetic Tomography (LORETA). One would expect approximately 5% false positives (classification of a normal as abnormal) at the P < .025 level of probability (two tailed test) and approximately 1% false positives at the P < .005 level. EEG digital samples (2 second intervals sampled 128 Hz, 1 to 2 minutes eyes closed) from 43 normal adult subjects were imported into the Key Institute's LORETA program. We then used the Key Institute's cross-spectrum and the Key Institute's LORETA output files (*.lor) as the 2,394 gray matter pixel representation of 3-dimensional currents at different frequencies. The mean and standard deviation *.lor files were computed for each of the 2,394 gray matter pixels for each of the 43 subjects. Tests of Gaussianity and different transforms were computed in order to best approximate a normal distribution for each frequency and gray matter pixel. The relative sensitivity of parametric vs. non-parametric statistics were compared using a "leave-one-out" cross validation method in which individual normal subjects were withdrawn and then statistically classified as being either normal or abnormal based on the remaining subjects. Log10 transforms approximated Gaussian distribution in the range of 95% to 99% accuracy. Parametric Z score tests at P < .05 cross-validation demonstrated an average misclassification rate of approximately 4.25%, and range over the 2,394 gray matter pixels was 27.66% to 0.11%. At P < .01 parametric Z score cross-validation false positives were 0.26% and ranged from 6.65% to 0% false positives. The non-parametric Key Institute's t-max statistic at P < .05 had an average misclassification error rate of 7.64% and ranged from 43.37% to 0.04% false positives. The nonparametric t-max at P < .01 had an average misclassification rate of 6.67% and ranged from 41.34% to 0% false positives of the 2,394 gray matter pixels for any cross-validated normal subject. In conclusion, adequate approximation to Gaussian distribution and high cross-validation can be achieved by the Key Institute's LORETA programs by using a log10 transform and parametric statistics, and parametric normative comparisons had lower false positive rates than the non-parametric tests.
Log-Normality and Multifractal Analysis of Flame Surface Statistics

NASA Astrophysics Data System (ADS)

Saha, Abhishek; Chaudhuri, Swetaprovo; Law, Chung K.

2013-11-01

The turbulent flame surface is typically highly wrinkled and folded at a multitude of scales controlled by various flame properties. It is useful if the information contained in this complex geometry can be projected onto a simpler regular geometry for the use of spectral, wavelet or multifractal analyses. Here we investigate local flame surface statistics of turbulent flame expanding under constant pressure. First the statistics of local length ratio is experimentally obtained from high-speed Mie scattering images. For spherically expanding flame, length ratio on the measurement plane, at predefined equiangular sectors is defined as the ratio of the actual flame length to the length of a circular-arc of radius equal to the average radius of the flame. Assuming isotropic distribution of such flame segments we convolute suitable forms of the length-ratio probability distribution functions (pdfs) to arrive at corresponding area-ratio pdfs. Both the pdfs are found to be near log-normally distributed and shows self-similar behavior with increasing radius. Near log-normality and rather intermittent behavior of the flame-length ratio suggests similarity with dissipation rate quantities which stimulates multifractal analysis. Currently at Indian Institute of Science, India.
Finding differentially expressed genes in high dimensional data: Rank based test statistic via a distance measure.

PubMed

Mathur, Sunil; Sadana, Ajit

2015-12-01

We present a rank-based test statistic for the identification of differentially expressed genes using a distance measure. The proposed test statistic is highly robust against extreme values and does not assume the distribution of parent population. Simulation studies show that the proposed test is more powerful than some of the commonly used methods, such as paired t-test, Wilcoxon signed rank test, and significance analysis of microarray (SAM) under certain non-normal distributions. The asymptotic distribution of the test statistic, and the p-value function are discussed. The application of proposed method is shown using a real-life data set. © The Author(s) 2011.
Probability density cloud as a geometrical tool to describe statistics of scattered light.

PubMed

Yaitskova, Natalia

2017-04-01

First-order statistics of scattered light is described using the representation of the probability density cloud, which visualizes a two-dimensional distribution for complex amplitude. The geometric parameters of the cloud are studied in detail and are connected to the statistical properties of phase. The moment-generating function for intensity is obtained in a closed form through these parameters. An example of exponentially modified normal distribution is provided to illustrate the functioning of this geometrical approach.
Statistical analysis of hydrodynamic cavitation events

NASA Astrophysics Data System (ADS)

Gimenez, G.; Sommer, R.

1980-10-01

The frequency (number of events per unit time) of pressure pulses produced by hydrodynamic cavitation bubble collapses is investigated using statistical methods. The results indicate that this frequency is distributed according to a normal law, its parameters not being time-evolving.
The Effect of General Statistical Fiber Misalignment on Predicted Damage Initiation in Composites

NASA Technical Reports Server (NTRS)

Bednarcyk, Brett A.; Aboudi, Jacob; Arnold, Steven M.

2014-01-01

A micromechanical method is employed for the prediction of unidirectional composites in which the fiber orientation can possess various statistical misalignment distributions. The method relies on the probability-weighted averaging of the appropriate concentration tensor, which is established by the micromechanical procedure. This approach provides access to the local field quantities throughout the constituents, from which initiation of damage in the composite can be predicted. In contrast, a typical macromechanical procedure can determine the effective composite elastic properties in the presence of statistical fiber misalignment, but cannot provide the local fields. Fully random fiber distribution is presented as a special case using the proposed micromechanical method. Results are given that illustrate the effects of various amounts of fiber misalignment in terms of the standard deviations of in-plane and out-of-plane misalignment angles, where normal distributions have been employed. Damage initiation envelopes, local fields, effective moduli, and strengths are predicted for polymer and ceramic matrix composites with given normal distributions of misalignment angles, as well as fully random fiber orientation.
Modified Distribution-Free Goodness-of-Fit Test Statistic.

PubMed

Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

2018-03-01

Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.
Using Computer Graphics in Statistics.

ERIC Educational Resources Information Center

Kerley, Lyndell M.

1990-01-01

Described is software which allows a student to use simulation to produce analytical output as well as graphical results. The results include a frequency histogram of a selected population distribution, a frequency histogram of the distribution of the sample means, and test the normality distributions of the sample means. (KR)
Transformation of arbitrary distributions to the normal distribution with application to EEG test-retest reliability.

PubMed

van Albada, S J; Robinson, P A

2007-04-15

Many variables in the social, physical, and biosciences, including neuroscience, are non-normally distributed. To improve the statistical properties of such data, or to allow parametric testing, logarithmic or logit transformations are often used. Box-Cox transformations or ad hoc methods are sometimes used for parameters for which no transformation is known to approximate normality. However, these methods do not always give good agreement with the Gaussian. A transformation is discussed that maps probability distributions as closely as possible to the normal distribution, with exact agreement for continuous distributions. To illustrate, the transformation is applied to a theoretical distribution, and to quantitative electroencephalographic (qEEG) measures from repeat recordings of 32 subjects which are highly non-normal. Agreement with the Gaussian was better than using logarithmic, logit, or Box-Cox transformations. Since normal data have previously been shown to have better test-retest reliability than non-normal data under fairly general circumstances, the implications of our transformation for the test-retest reliability of parameters were investigated. Reliability was shown to improve with the transformation, where the improvement was comparable to that using Box-Cox. An advantage of the general transformation is that it does not require laborious optimization over a range of parameters or a case-specific choice of form.
Comparison of low-altitude wind-shear statistics derived from measured and proposed standard wind profiles

NASA Technical Reports Server (NTRS)

Usry, J. W.

1983-01-01

Wind shear statistics were calculated for a simulated set of wind profiles based on a proposed standard wind field data base. Wind shears were grouped in altitude in altitude bands of 100 ft between 100 and 1400 ft and in wind shear increments of 0.025 knot/ft. Frequency distributions, means, and standard deviations for each altitude band were derived for the total sample were derived for both sets. It was found that frequency distributions in each altitude band for the simulated data set were more dispersed below 800 ft and less dispersed above 900 ft than those for the measured data set. Total sample frequency of occurrence for the two data sets was about equal for wind shear values between +0.075 knot/ft, but the simulated data set had significantly larger values for all wind shears outside these boundaries. It is shown that normal distribution in both data sets neither data set was normally distributed; similar results are observed from the cumulative frequency distributions.
Football goal distributions and extremal statistics

NASA Astrophysics Data System (ADS)

Greenhough, J.; Birch, P. C.; Chapman, S. C.; Rowlands, G.

2002-12-01

We analyse the distributions of the number of goals scored by home teams, away teams, and the total scored in the match, in domestic football games from 169 countries between 1999 and 2001. The probability density functions (PDFs) of goals scored are too heavy-tailed to be fitted over their entire ranges by Poisson or negative binomial distributions which would be expected for uncorrelated processes. Log-normal distributions cannot include zero scores and here we find that the PDFs are consistent with those arising from extremal statistics. In addition, we show that it is sufficient to model English top division and FA Cup matches in the seasons of 1970/71-2000/01 on Poisson or negative binomial distributions, as reported in analyses of earlier seasons, and that these are not consistent with extremal statistics.
heterogeneous mixture distributions for multi-source extreme rainfall

NASA Astrophysics Data System (ADS)

Ouarda, T.; Shin, J.; Lee, T. S.

2013-12-01

Mixture distributions have been used to model hydro-meteorological variables showing mixture distributional characteristics, e.g. bimodality. Homogeneous mixture (HOM) distributions (e.g. Normal-Normal and Gumbel-Gumbel) have been traditionally applied to hydro-meteorological variables. However, there is no reason to restrict the mixture distribution as the combination of one identical type. It might be beneficial to characterize the statistical behavior of hydro-meteorological variables from the application of heterogeneous mixture (HTM) distributions such as Normal-Gamma. In the present work, we focus on assessing the suitability of HTM distributions for the frequency analysis of hydro-meteorological variables. In the present work, in order to estimate the parameters of HTM distributions, the meta-heuristic algorithm (Genetic Algorithm) is employed to maximize the likelihood function. In the present study, a number of distributions are compared, including the Gamma-Extreme value type-one (EV1) HTM distribution, the EV1-EV1 HOM distribution, and EV1 distribution. The proposed distribution models are applied to the annual maximum precipitation data in South Korea. The Akaike Information Criterion (AIC), the root mean squared errors (RMSE) and the log-likelihood are used as measures of goodness-of-fit of the tested distributions. Results indicate that the HTM distribution (Gamma-EV1) presents the best fitness. The HTM distribution shows significant improvement in the estimation of quantiles corresponding to the 20-year return period. It is shown that extreme rainfall in the coastal region of South Korea presents strong heterogeneous mixture distributional characteristics. Results indicate that HTM distributions are a good alternative for the frequency analysis of hydro-meteorological variables when disparate statistical characteristics are presented.
Distribution of CD163-positive cell and MHC class II-positive cell in the normal equine uveal tract.

PubMed

Sano, Yuto; Matsuda, Kazuya; Okamoto, Minoru; Takehana, Kazushige; Hirayama, Kazuko; Taniyama, Hiroyuki

2016-02-01

Antigen-presenting cells (APCs) in the uveal tract participate in ocular immunity including immune homeostasis and the pathogenesis of uveitis. In horses, although uveitis is the most common ocular disorder, little is known about ocular immunity, such as the distribution of APCs. In this study, we investigated the distribution of CD163-positive and MHC II-positive cells in the normal equine uveal tract using an immunofluorescence technique. Eleven eyes from 10 Thoroughbred horses aged 1 to 24 years old were used. Indirect immunofluorescence was performed using the primary antibodies CD163, MHC class II (MHC II) and CD20. To demonstrate the site of their greatest distribution, positive cells were manually counted in 3 different parts of the uveal tract (ciliary body, iris and choroid), and their average number was assessed by statistical analysis. The distribution of pleomorphic CD163- and MHC II-expressed cells was detected throughout the equine uveal tract, but no CD20-expressed cells were detected. The statistical analysis demonstrated the distribution of CD163- and MHC II-positive cells focusing on the ciliary body. These results demonstrated that the ciliary body is the largest site of their distribution in the normal equine uveal tract, and the ciliary body is considered to play important roles in uveal and/or ocular immune homeostasis. The data provided in this study will help further understanding of equine ocular immunity in the normal state and might be beneficial for understanding of mechanisms of ocular disorders, such as equine uveitis.

An asymptotic analysis of the logrank test.

PubMed

Strawderman, R L

1997-01-01

Asymptotic expansions for the null distribution of the logrank statistic and its distribution under local proportional hazards alternatives are developed in the case of iid observations. The results, which are derived from the work of Gu (1992) and Taniguchi (1992), are easy to interpret, and provide some theoretical justification for many behavioral characteristics of the logrank test that have been previously observed in simulation studies. We focus primarily upon (i) the inadequacy of the usual normal approximation under treatment group imbalance; and, (ii) the effects of treatment group imbalance on power and sample size calculations. A simple transformation of the logrank statistic is also derived based on results in Konishi (1991) and is found to substantially improve the standard normal approximation to its distribution under the null hypothesis of no survival difference when there is treatment group imbalance.
New spatial upscaling methods for multi-point measurements: From normal to p-normal

NASA Astrophysics Data System (ADS)

Liu, Feng; Li, Xin

2017-12-01

Careful attention must be given to determining whether the geophysical variables of interest are normally distributed, since the assumption of a normal distribution may not accurately reflect the probability distribution of some variables. As a generalization of the normal distribution, the p-normal distribution and its corresponding maximum likelihood estimation (the least power estimation, LPE) were introduced in upscaling methods for multi-point measurements. Six methods, including three normal-based methods, i.e., arithmetic average, least square estimation, block kriging, and three p-normal-based methods, i.e., LPE, geostatistics LPE and inverse distance weighted LPE are compared in two types of experiments: a synthetic experiment to evaluate the performance of the upscaling methods in terms of accuracy, stability and robustness, and a real-world experiment to produce real-world upscaling estimates using soil moisture data obtained from multi-scale observations. The results show that the p-normal-based methods produced lower mean absolute errors and outperformed the other techniques due to their universality and robustness. We conclude that introducing appropriate statistical parameters into an upscaling strategy can substantially improve the estimation, especially if the raw measurements are disorganized; however, further investigation is required to determine which parameter is the most effective among variance, spatial correlation information and parameter p.
Statistical methods for estimating normal blood chemistry ranges and variance in rainbow trout (Salmo gairdneri), Shasta Strain

USGS Publications Warehouse

Wedemeyer, Gary A.; Nelson, Nancy C.

1975-01-01

Gaussian and nonparametric (percentile estimate and tolerance interval) statistical methods were used to estimate normal ranges for blood chemistry (bicarbonate, bilirubin, calcium, hematocrit, hemoglobin, magnesium, mean cell hemoglobin concentration, osmolality, inorganic phosphorus, and pH for juvenile rainbow (Salmo gairdneri, Shasta strain) trout held under defined environmental conditions. The percentile estimate and Gaussian methods gave similar normal ranges, whereas the tolerance interval method gave consistently wider ranges for all blood variables except hemoglobin. If the underlying frequency distribution is unknown, the percentile estimate procedure would be the method of choice.
Statistics for nuclear engineers and scientists. Part 1. Basic statistical inference

DOE Office of Scientific and Technical Information (OSTI.GOV)

Beggs, W.J.

1981-02-01

This report is intended for the use of engineers and scientists working in the nuclear industry, especially at the Bettis Atomic Power Laboratory. It serves as the basis for several Bettis in-house statistics courses. The objectives of the report are to introduce the reader to the language and concepts of statistics and to provide a basic set of techniques to apply to problems of the collection and analysis of data. Part 1 covers subjects of basic inference. The subjects include: descriptive statistics; probability; simple inference for normally distributed populations, and for non-normal populations as well; comparison of two populations; themore » analysis of variance; quality control procedures; and linear regression analysis.« less
Simulating Univariate and Multivariate Burr Type IIII and Type XII Distributions through the Method of L-Moments

ERIC Educational Resources Information Center

Pant, Mohan Dev

2011-01-01

The Burr families (Type III and Type XII) of distributions are traditionally used in the context of statistical modeling and for simulating non-normal distributions with moment-based parameters (e.g., Skew and Kurtosis). In educational and psychological studies, the Burr families of distributions can be used to simulate extremely asymmetrical and…
Best Statistical Distribution of flood variables for Johor River in Malaysia

NASA Astrophysics Data System (ADS)

Salarpour Goodarzi, M.; Yusop, Z.; Yusof, F.

2012-12-01

A complex flood event is always characterized by a few characteristics such as flood peak, flood volume, and flood duration, which might be mutually correlated. This study explored the statistical distribution of peakflow, flood duration and flood volume at Rantau Panjang gauging station on the Johor River in Malaysia. Hourly data were recorded for 45 years. The data were analysed based on water year (July - June). Five distributions namely, Log Normal, Generalize Pareto, Log Pearson, Normal and Generalize Extreme Value (GEV) were used to model the distribution of all the three variables. Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests were used to evaluate the best fit. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume. However, Generalize Pareto distribution is found to be the most suitable model when tested with the Anderson-Darling test and the, Kolmogorov-Smirnov suggested that GEV is the best for peakflow. The result of this research can be used to improve flood frequency analysis. Comparison between Generalized Extreme Value, Generalized Pareto and Log Pearson distributions in the Cumulative Distribution Function of peakflow
A brief introduction to probability.

PubMed

Di Paola, Gioacchino; Bertani, Alessandro; De Monte, Lavinia; Tuzzolino, Fabio

2018-02-01

The theory of probability has been debated for centuries: back in 1600, French mathematics used the rules of probability to place and win bets. Subsequently, the knowledge of probability has significantly evolved and is now an essential tool for statistics. In this paper, the basic theoretical principles of probability will be reviewed, with the aim of facilitating the comprehension of statistical inference. After a brief general introduction on probability, we will review the concept of the "probability distribution" that is a function providing the probabilities of occurrence of different possible outcomes of a categorical or continuous variable. Specific attention will be focused on normal distribution that is the most relevant distribution applied to statistical analysis.
Multivariate meta-analysis: a robust approach based on the theory of U-statistic.

PubMed

Ma, Yan; Mazumdar, Madhu

2011-10-30

Meta-analysis is the methodology for combining findings from similar research studies asking the same question. When the question of interest involves multiple outcomes, multivariate meta-analysis is used to synthesize the outcomes simultaneously taking into account the correlation between the outcomes. Likelihood-based approaches, in particular restricted maximum likelihood (REML) method, are commonly utilized in this context. REML assumes a multivariate normal distribution for the random-effects model. This assumption is difficult to verify, especially for meta-analysis with small number of component studies. The use of REML also requires iterative estimation between parameters, needing moderately high computation time, especially when the dimension of outcomes is large. A multivariate method of moments (MMM) is available and is shown to perform equally well to REML. However, there is a lack of information on the performance of these two methods when the true data distribution is far from normality. In this paper, we propose a new nonparametric and non-iterative method for multivariate meta-analysis on the basis of the theory of U-statistic and compare the properties of these three procedures under both normal and skewed data through simulation studies. It is shown that the effect on estimates from REML because of non-normal data distribution is marginal and that the estimates from MMM and U-statistic-based approaches are very similar. Therefore, we conclude that for performing multivariate meta-analysis, the U-statistic estimation procedure is a viable alternative to REML and MMM. Easy implementation of all three methods are illustrated by their application to data from two published meta-analysis from the fields of hip fracture and periodontal disease. We discuss ideas for future research based on U-statistic for testing significance of between-study heterogeneity and for extending the work to meta-regression setting. Copyright © 2011 John Wiley & Sons, Ltd.
Identifying Epigenetic Biomarkers using Maximal Relevance and Minimal Redundancy Based Feature Selection for Multi-Omics Data.

PubMed

Mallik, Saurav; Bhadra, Tapas; Maulik, Ujjwal

2017-01-01

Epigenetic Biomarker discovery is an important task in bioinformatics. In this article, we develop a new framework of identifying statistically significant epigenetic biomarkers using maximal-relevance and minimal-redundancy criterion based feature (gene) selection for multi-omics dataset. Firstly, we determine the genes that have both expression as well as methylation values, and follow normal distribution. Similarly, we identify the genes which consist of both expression and methylation values, but do not follow normal distribution. For each case, we utilize a gene-selection method that provides maximal-relevant, but variable-weighted minimum-redundant genes as top ranked genes. For statistical validation, we apply t-test on both the expression and methylation data consisting of only the normally distributed top ranked genes to determine how many of them are both differentially expressed andmethylated. Similarly, we utilize Limma package for performing non-parametric Empirical Bayes test on both expression and methylation data comprising only the non-normally distributed top ranked genes to identify how many of them are both differentially expressed and methylated. We finally report the top-ranking significant gene-markerswith biological validation. Moreover, our framework improves positive predictive rate and reduces false positive rate in marker identification. In addition, we provide a comparative analysis of our gene-selection method as well as othermethods based on classificationperformances obtained using several well-known classifiers.
Extreme Mean and Its Applications

NASA Technical Reports Server (NTRS)

Swaroop, R.; Brownlow, J. D.

1979-01-01

Extreme value statistics obtained from normally distributed data are considered. An extreme mean is defined as the mean of p-th probability truncated normal distribution. An unbiased estimate of this extreme mean and its large sample distribution are derived. The distribution of this estimate even for very large samples is found to be nonnormal. Further, as the sample size increases, the variance of the unbiased estimate converges to the Cramer-Rao lower bound. The computer program used to obtain the density and distribution functions of the standardized unbiased estimate, and the confidence intervals of the extreme mean for any data are included for ready application. An example is included to demonstrate the usefulness of extreme mean application.
Standard deviation and standard error of the mean.

PubMed

Lee, Dong Kyu; In, Junyong; Lee, Sangseok

2015-06-01

In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results.
Standard deviation and standard error of the mean

PubMed Central

In, Junyong; Lee, Sangseok

2015-01-01

In most clinical and experimental studies, the standard deviation (SD) and the estimated standard error of the mean (SEM) are used to present the characteristics of sample data and to explain statistical analysis results. However, some authors occasionally muddle the distinctive usage between the SD and SEM in medical literature. Because the process of calculating the SD and SEM includes different statistical inferences, each of them has its own meaning. SD is the dispersion of data in a normal distribution. In other words, SD indicates how accurately the mean represents sample data. However the meaning of SEM includes statistical inference based on the sampling distribution. SEM is the SD of the theoretical distribution of the sample means (the sampling distribution). While either SD or SEM can be applied to describe data and statistical results, one should be aware of reasonable methods with which to use SD and SEM. We aim to elucidate the distinctions between SD and SEM and to provide proper usage guidelines for both, which summarize data and describe statistical results. PMID:26045923
Evaluation of the best fit distribution for partial duration series of daily rainfall in Madinah, western Saudi Arabia

NASA Astrophysics Data System (ADS)

Alahmadi, F.; Rahman, N. A.; Abdulrazzak, M.

2014-09-01

Rainfall frequency analysis is an essential tool for the design of water related infrastructure. It can be used to predict future flood magnitudes for a given magnitude and frequency of extreme rainfall events. This study analyses the application of rainfall partial duration series (PDS) in the vast growing urban Madinah city located in the western part of Saudi Arabia. Different statistical distributions were applied (i.e. Normal, Log Normal, Extreme Value type I, Generalized Extreme Value, Pearson Type III, Log Pearson Type III) and their distribution parameters were estimated using L-moments methods. Also, different selection criteria models are applied, e.g. Akaike Information Criterion (AIC), Corrected Akaike Information Criterion (AICc), Bayesian Information Criterion (BIC) and Anderson-Darling Criterion (ADC). The analysis indicated the advantage of Generalized Extreme Value as the best fit statistical distribution for Madinah partial duration daily rainfall series. The outcome of such an evaluation can contribute toward better design criteria for flood management, especially flood protection measures.
Analysis of field size distributions, LACIE test sites 5029, 5033, and 5039, Anhwei Province, People's Republic of China

NASA Technical Reports Server (NTRS)

Podwysocki, M. H.

1976-01-01

A study was made of the field size distributions for LACIE test sites 5029, 5033, and 5039, People's Republic of China. Field lengths and widths were measured from LANDSAT imagery, and field area was statistically modeled. Field size parameters have log-normal or Poisson frequency distributions. These were normalized to the Gaussian distribution and theoretical population curves were made. When compared to fields in other areas of the same country measured in the previous study, field lengths and widths in the three LACIE test sites were 2 to 3 times smaller and areas were smaller by an order of magnitude.
Atrial Electrogram Fractionation Distribution before and after Pulmonary Vein Isolation in Human Persistent Atrial Fibrillation-A Retrospective Multivariate Statistical Analysis.

PubMed

Almeida, Tiago P; Chu, Gavin S; Li, Xin; Dastagir, Nawshin; Tuan, Jiun H; Stafford, Peter J; Schlindwein, Fernando S; Ng, G André

2017-01-01

Purpose: Complex fractionated atrial electrograms (CFAE)-guided ablation after pulmonary vein isolation (PVI) has been used for persistent atrial fibrillation (persAF) therapy. This strategy has shown suboptimal outcomes due to, among other factors, undetected changes in the atrial tissue following PVI. In the present work, we investigate CFAE distribution before and after PVI in patients with persAF using a multivariate statistical model. Methods: 207 pairs of atrial electrograms (AEGs) were collected before and after PVI respectively, from corresponding LA regions in 18 persAF patients. Twelve attributes were measured from the AEGs, before and after PVI. Statistical models based on multivariate analysis of variance (MANOVA) and linear discriminant analysis (LDA) have been used to characterize the atrial regions and AEGs. Results: PVI significantly reduced CFAEs in the LA (70 vs. 40%; P < 0.0001). Four types of LA regions were identified, based on the AEGs characteristics: (i) fractionated before PVI that remained fractionated after PVI (31% of the collected points); (ii) fractionated that converted to normal (39%); (iii) normal prior to PVI that became fractionated (9%) and; (iv) normal that remained normal (21%). Individually, the attributes failed to distinguish these LA regions, but multivariate statistical models were effective in their discrimination ( P < 0.0001). Conclusion: Our results have unveiled that there are LA regions resistant to PVI, while others are affected by it. Although, traditional methods were unable to identify these different regions, the proposed multivariate statistical model discriminated LA regions resistant to PVI from those affected by it without prior ablation information.
[Establishment of the mathematic model of total quantum statistical moment standard similarity for application to medical theoretical research].

PubMed

He, Fu-yuan; Deng, Kai-wen; Huang, Sheng; Liu, Wen-long; Shi, Ji-lian

2013-09-01

The paper aims to elucidate and establish a new mathematic model: the total quantum statistical moment standard similarity (TQSMSS) on the base of the original total quantum statistical moment model and to illustrate the application of the model to medical theoretical research. The model was established combined with the statistical moment principle and the normal distribution probability density function properties, then validated and illustrated by the pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical method for them, and by analysis of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving the Buyanghanwu-decoction extract. The established model consists of four mainly parameters: (1) total quantum statistical moment similarity as ST, an overlapped area by two normal distribution probability density curves in conversion of the two TQSM parameters; (2) total variability as DT, a confidence limit of standard normal accumulation probability which is equal to the absolute difference value between the two normal accumulation probabilities within integration of their curve nodical; (3) total variable probability as 1-Ss, standard normal distribution probability within interval of D(T); (4) total variable probability (1-beta)alpha and (5) stable confident probability beta(1-alpha): the correct probability to make positive and negative conclusions under confident coefficient alpha. With the model, we had analyzed the TQSMS similarities of pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical methods for them were at range of 0.3852-0.9875 that illuminated different pharmacokinetic behaviors of each other; and the TQSMS similarities (ST) of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving Buyanghuanwu-decoction-extract were at range of 0.6842-0.999 2 that showed different constituents with various solvent extracts. The TQSMSS can characterize the sample similarity, by which we can quantitate the correct probability with the test of power under to make positive and negative conclusions no matter the samples come from same population under confident coefficient a or not, by which we can realize an analysis at both macroscopic and microcosmic levels, as an important similar analytical method for medical theoretical research.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number.

PubMed

Fragkos, Konstantinos C; Tsagris, Michail; Frangos, Christos C

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator.
Publication Bias in Meta-Analysis: Confidence Intervals for Rosenthal's Fail-Safe Number

PubMed Central

Fragkos, Konstantinos C.; Tsagris, Michail; Frangos, Christos C.

2014-01-01

The purpose of the present paper is to assess the efficacy of confidence intervals for Rosenthal's fail-safe number. Although Rosenthal's estimator is highly used by researchers, its statistical properties are largely unexplored. First of all, we developed statistical theory which allowed us to produce confidence intervals for Rosenthal's fail-safe number. This was produced by discerning whether the number of studies analysed in a meta-analysis is fixed or random. Each case produces different variance estimators. For a given number of studies and a given distribution, we provided five variance estimators. Confidence intervals are examined with a normal approximation and a nonparametric bootstrap. The accuracy of the different confidence interval estimates was then tested by methods of simulation under different distributional assumptions. The half normal distribution variance estimator has the best probability coverage. Finally, we provide a table of lower confidence intervals for Rosenthal's estimator. PMID:27437470
Statistical analysis of sperm sorting

NASA Astrophysics Data System (ADS)

Koh, James; Marcos, Marcos

2017-11-01

The success rate of assisted reproduction depends on the proportion of morphologically normal sperm. It is possible to use an external field for manipulation and sorting. Depending on their morphology, the extent of response varies. Due to the wide distribution in sperm morphology even among individuals, the resulting distribution of kinematic behaviour, and consequently the feasibility of sorting, should be analysed statistically. In this theoretical work, Resistive Force Theory and Slender Body Theory will be applied and compared. Full name is Marcos.
Statistical characterization of a large geochemical database and effect of sample size

USGS Publications Warehouse

Zhang, C.; Manheim, F.T.; Hinde, J.; Grossman, J.N.

2005-01-01

The authors investigated statistical distributions for concentrations of chemical elements from the National Geochemical Survey (NGS) database of the U.S. Geological Survey. At the time of this study, the NGS data set encompasses 48,544 stream sediment and soil samples from the conterminous United States analyzed by ICP-AES following a 4-acid near-total digestion. This report includes 27 elements: Al, Ca, Fe, K, Mg, Na, P, Ti, Ba, Ce, Co, Cr, Cu, Ga, La, Li, Mn, Nb, Nd, Ni, Pb, Sc, Sr, Th, V, Y and Zn. The goal and challenge for the statistical overview was to delineate chemical distributions in a complex, heterogeneous data set spanning a large geographic range (the conterminous United States), and many different geological provinces and rock types. After declustering to create a uniform spatial sample distribution with 16,511 samples, histograms and quantile-quantile (Q-Q) plots were employed to delineate subpopulations that have coherent chemical and mineral affinities. Probability groupings are discerned by changes in slope (kinks) on the plots. Major rock-forming elements, e.g., Al, Ca, K and Na, tend to display linear segments on normal Q-Q plots. These segments can commonly be linked to petrologic or mineralogical associations. For example, linear segments on K and Na plots reflect dilution of clay minerals by quartz sand (low in K and Na). Minor and trace element relationships are best displayed on lognormal Q-Q plots. These sensitively reflect discrete relationships in subpopulations within the wide range of the data. For example, small but distinctly log-linear subpopulations for Pb, Cu, Zn and Ag are interpreted to represent ore-grade enrichment of naturally occurring minerals such as sulfides. None of the 27 chemical elements could pass the test for either normal or lognormal distribution on the declustered data set. Part of the reasons relate to the presence of mixtures of subpopulations and outliers. Random samples of the data set with successively smaller numbers of data points showed that few elements passed standard statistical tests for normality or log-normality until sample size decreased to a few hundred data points. Large sample size enhances the power of statistical tests, and leads to rejection of most statistical hypotheses for real data sets. For large sample sizes (e.g., n > 1000), graphical methods such as histogram, stem-and-leaf, and probability plots are recommended for rough judgement of probability distribution if needed. ?? 2005 Elsevier Ltd. All rights reserved.

Surface to 90 km winds for Kennedy Space Center, Florida, and Vandenberg AFB, California

NASA Technical Reports Server (NTRS)

Johnson, D. L.; Brown, S. C.

1979-01-01

Bivariate normal wind statistics for a 90 degree flight azimuth, from 0 through 90 km altitude, for Kennedy Space Center, Florida, and Vandenberg AFB, California are presented. Wind probability distributions and statistics for any rotation of axes can be computed from the five given parameters.
A Nonparametric Geostatistical Method For Estimating Species Importance

Treesearch

Andrew J. Lister; Rachel Riemann; Michael Hoppus

2001-01-01

Parametric statistical methods are not always appropriate for conducting spatial analyses of forest inventory data. Parametric geostatistical methods such as variography and kriging are essentially averaging procedures, and thus can be affected by extreme values. Furthermore, non normal distributions violate the assumptions of analyses in which test statistics are...
Effect of ultrasound frequency on the Nakagami statistics of human liver tissues.

PubMed

Tsui, Po-Hsiang; Zhou, Zhuhuang; Lin, Ying-Hsiu; Hung, Chieh-Ming; Chung, Shih-Jou; Wan, Yung-Liang

2017-01-01

The analysis of the backscattered statistics using the Nakagami parameter is an emerging ultrasound technique for assessing hepatic steatosis and fibrosis. Previous studies indicated that the echo amplitude distribution of a normal liver follows the Rayleigh distribution (the Nakagami parameter m is close to 1). However, using different frequencies may change the backscattered statistics of normal livers. This study explored the frequency dependence of the backscattered statistics in human livers and then discussed the sources of ultrasound scattering in the liver. A total of 30 healthy participants were enrolled to undergo a standard care ultrasound examination on the liver, which is a natural model containing diffuse and coherent scatterers. The liver of each volunteer was scanned from the right intercostal view to obtain image raw data at different central frequencies ranging from 2 to 3.5 MHz. Phantoms with diffuse scatterers only were also made to perform ultrasound scanning using the same protocol for comparisons with clinical data. The Nakagami parameter-frequency correlation was evaluated using Pearson correlation analysis. The median and interquartile range of the Nakagami parameter obtained from livers was 1.00 (0.98-1.05) for 2 MHz, 0.93 (0.89-0.98) for 2.3 MHz, 0.87 (0.84-0.92) for 2.5 MHz, 0.82 (0.77-0.88) for 3.3 MHz, and 0.81 (0.76-0.88) for 3.5 MHz. The Nakagami parameter decreased with the increasing central frequency (r = -0.67, p < 0.0001). However, the effect of ultrasound frequency on the statistical distribution of the backscattered envelopes was not found in the phantom results (r = -0.147, p = 0.0727). The current results demonstrated that the backscattered statistics of normal livers is frequency-dependent. Moreover, the coherent scatterers may be the primary factor to dominate the frequency dependence of the backscattered statistics in a liver.
On generalisations of the log-Normal distribution by means of a new product definition in the Kapteyn process

NASA Astrophysics Data System (ADS)

Duarte Queirós, Sílvio M.

2012-07-01

We discuss the modification of the Kapteyn multiplicative process using the q-product of Borges [E.P. Borges, A possible deformed algebra and calculus inspired in nonextensive thermostatistics, Physica A 340 (2004) 95]. Depending on the value of the index q a generalisation of the log-Normal distribution is yielded. Namely, the distribution increases the tail for small (when q<1) or large (when q>1) values of the variable upon analysis. The usual log-Normal distribution is retrieved when q=1, which corresponds to the traditional Kapteyn multiplicative process. The main statistical features of this distribution as well as related random number generators and tables of quantiles of the Kolmogorov-Smirnov distance are presented. Finally, we illustrate the validity of this scenario by describing a set of variables of biological and financial origin.
ASYMPTOTIC DISTRIBUTION OF ΔAUC, NRIs, AND IDI BASED ON THEORY OF U-STATISTICS

PubMed Central

Demler, Olga V.; Pencina, Michael J.; Cook, Nancy R.; D’Agostino, Ralph B.

2017-01-01

The change in AUC (ΔAUC), the IDI, and NRI are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues we unite the ΔAUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ΔAUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ΔAUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ΔAUC, NRIs, or IDI. In the former case SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ΔAUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ΔAUC. PMID:28627112
Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.

PubMed

Demler, Olga V; Pencina, Michael J; Cook, Nancy R; D'Agostino, Ralph B

2017-09-20

The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Stochastic modelling of non-stationary financial assets

NASA Astrophysics Data System (ADS)

Estevens, Joana; Rocha, Paulo; Boto, João P.; Lind, Pedro G.

2017-11-01

We model non-stationary volume-price distributions with a log-normal distribution and collect the time series of its two parameters. The time series of the two parameters are shown to be stationary and Markov-like and consequently can be modelled with Langevin equations, which are derived directly from their series of values. Having the evolution equations of the log-normal parameters, we reconstruct the statistics of the first moments of volume-price distributions which fit well the empirical data. Finally, the proposed framework is general enough to study other non-stationary stochastic variables in other research fields, namely, biology, medicine, and geology.
Comparison of hypertabastic survival model with other unimodal hazard rate functions using a goodness-of-fit test.

PubMed

Tahir, M Ramzan; Tran, Quang X; Nikulin, Mikhail S

2017-05-30

We studied the problem of testing a hypothesized distribution in survival regression models when the data is right censored and survival times are influenced by covariates. A modified chi-squared type test, known as Nikulin-Rao-Robson statistic, is applied for the comparison of accelerated failure time models. This statistic is used to test the goodness-of-fit for hypertabastic survival model and four other unimodal hazard rate functions. The results of simulation study showed that the hypertabastic distribution can be used as an alternative to log-logistic and log-normal distribution. In statistical modeling, because of its flexible shape of hazard functions, this distribution can also be used as a competitor of Birnbaum-Saunders and inverse Gaussian distributions. The results for the real data application are shown. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
On the application of Rice's exceedance statistics to atmospheric turbulence.

NASA Technical Reports Server (NTRS)

Chen, W. Y.

1972-01-01

Discrepancies produced by the application of Rice's exceedance statistics to atmospheric turbulence are examined. First- and second-order densities from several data sources have been measured for this purpose. Particular care was paid to each selection of turbulence that provides stationary mean and variance over the entire segment. Results show that even for a stationary segment of turbulence, the process is still highly non-Gaussian, in spite of a Gaussian appearance for its first-order distribution. Data also indicate strongly non-Gaussian second-order distributions. It is therefore concluded that even stationary atmospheric turbulence with a normal first-order distribution cannot be considered a Gaussian process, and consequently the application of Rice's exceedance statistics should be approached with caution.
How to Introduce Historically the Normal Distribution in Engineering Education: A Classroom Experiment

ERIC Educational Resources Information Center

Blanco, Monica; Ginovart, Marta

2010-01-01

Little has been explored with regard to introducing historical aspects in the undergraduate statistics classroom in engineering studies. This article focuses on the design, implementation and assessment of a specific activity concerning the introduction of the normal probability curve and related aspects from a historical dimension. Following a…
Probability density function of non-reactive solute concentration in heterogeneous porous formations.

PubMed

Bellin, Alberto; Tonina, Daniele

2007-10-30

Available models of solute transport in heterogeneous formations lack in providing complete characterization of the predicted concentration. This is a serious drawback especially in risk analysis where confidence intervals and probability of exceeding threshold values are required. Our contribution to fill this gap of knowledge is a probability distribution model for the local concentration of conservative tracers migrating in heterogeneous aquifers. Our model accounts for dilution, mechanical mixing within the sampling volume and spreading due to formation heterogeneity. It is developed by modeling local concentration dynamics with an Ito Stochastic Differential Equation (SDE) that under the hypothesis of statistical stationarity leads to the Beta probability distribution function (pdf) for the solute concentration. This model shows large flexibility in capturing the smoothing effect of the sampling volume and the associated reduction of the probability of exceeding large concentrations. Furthermore, it is fully characterized by the first two moments of the solute concentration, and these are the same pieces of information required for standard geostatistical techniques employing Normal or Log-Normal distributions. Additionally, we show that in the absence of pore-scale dispersion and for point concentrations the pdf model converges to the binary distribution of [Dagan, G., 1982. Stochastic modeling of groundwater flow by unconditional and conditional probabilities, 2, The solute transport. Water Resour. Res. 18 (4), 835-848.], while it approaches the Normal distribution for sampling volumes much larger than the characteristic scale of the aquifer heterogeneity. Furthermore, we demonstrate that the same model with the spatial moments replacing the statistical moments can be applied to estimate the proportion of the plume volume where solute concentrations are above or below critical thresholds. Application of this model to point and vertically averaged bromide concentrations from the first Cape Cod tracer test and to a set of numerical simulations confirms the above findings and for the first time it shows the superiority of the Beta model to both Normal and Log-Normal models in interpreting field data. Furthermore, we show that assuming a-priori that local concentrations are normally or log-normally distributed may result in a severe underestimate of the probability of exceeding large concentrations.
Applications of multiscale change point detections to monthly stream flow and rainfall in Xijiang River in southern China, part I: correlation and variance

NASA Astrophysics Data System (ADS)

Zhu, Yuxiang; Jiang, Jianmin; Huang, Changxing; Chen, Yongqin David; Zhang, Qiang

2018-04-01

This article, as part I, introduces three algorithms and applies them to both series of the monthly stream flow and rainfall in Xijiang River, southern China. The three algorithms include (1) normalization of probability distribution, (2) scanning U test for change points in correlation between two time series, and (3) scanning F-test for change points in variances. The normalization algorithm adopts the quantile method to normalize data from a non-normal into the normal probability distribution. The scanning U test and F-test have three common features: grafting the classical statistics onto the wavelet algorithm, adding corrections for independence into each statistic criteria at given confidence respectively, and being almost objective and automatic detection on multiscale time scales. In addition, the coherency analyses between two series are also carried out for changes in variance. The application results show that the changes of the monthly discharge are still controlled by natural precipitation variations in Xijiang's fluvial system. Human activities disturbed the ecological balance perhaps in certain content and in shorter spells but did not violate the natural relationships of correlation and variance changes so far.
Bootstrap Estimation and Testing for Variance Equality.

ERIC Educational Resources Information Center

Olejnik, Stephen; Algina, James

The purpose of this study was to develop a single procedure for comparing population variances which could be used for distribution forms. Bootstrap methodology was used to estimate the variability of the sample variance statistic when the population distribution was normal, platykurtic and leptokurtic. The data for the study were generated and…
A brief history of numbers and statistics with cytometric applications.

PubMed

Watson, J V

2001-02-15

A brief history of numbers and statistics traces the development of numbers from prehistory to completion of our current system of numeration with the introduction of the decimal fraction by Viete, Stevin, Burgi, and Galileo at the turn of the 16th century. This was followed by the development of what we now know as probability theory by Pascal, Fermat, and Huygens in the mid-17th century which arose in connection with questions in gambling with dice and can be regarded as the origin of statistics. The three main probability distributions on which statistics depend were introduced and/or formalized between the mid-17th and early 19th centuries: the binomial distribution by Pascal; the normal distribution by de Moivre, Gauss, and Laplace, and the Poisson distribution by Poisson. The formal discipline of statistics commenced with the works of Pearson, Yule, and Gosset at the turn of the 19th century when the first statistical tests were introduced. Elementary descriptions of the statistical tests most likely to be used in conjunction with cytometric data are given and it is shown how these can be applied to the analysis of difficult immunofluorescence distributions when there is overlap between the labeled and unlabeled cell populations. Copyright 2001 Wiley-Liss, Inc.
Accumulation risk assessment for the flooding hazard

NASA Astrophysics Data System (ADS)

Roth, Giorgio; Ghizzoni, Tatiana; Rudari, Roberto

2010-05-01

One of the main consequences of the demographic and economic development and of markets and trades globalization is represented by risks cumulus. In most cases, the cumulus of risks intuitively arises from the geographic concentration of a number of vulnerable elements in a single place. For natural events, risks cumulus can be associated, in addition to intensity, also to event's extension. In this case, the magnitude can be such that large areas, that may include many regions or even large portions of different countries, are stroked by single, catastrophic, events. Among natural risks, the impact of the flooding hazard cannot be understated. To cope with, a variety of mitigation actions can be put in place: from the improvement of monitoring and alert systems to the development of hydraulic structures, throughout land use restrictions, civil protection, financial and insurance plans. All of those viable options present social and economic impacts, either positive or negative, whose proper estimate should rely on the assumption of appropriate - present and future - flood risk scenarios. It is therefore necessary to identify proper statistical methodologies, able to describe the multivariate aspects of the involved physical processes and their spatial dependence. In hydrology and meteorology, but also in finance and insurance practice, it has early been recognized that classical statistical theory distributions (e.g., the normal and gamma families) are of restricted use for modeling multivariate spatial data. Recent research efforts have been therefore directed towards developing statistical models capable of describing the forms of asymmetry manifest in data sets. This, in particular, for the quite frequent case of phenomena whose empirical outcome behaves in a non-normal fashion, but still maintains some broad similarity with the multivariate normal distribution. Fruitful approaches were recognized in the use of flexible models, which include the normal distribution as a special or limiting case (e.g., the skew-normal or skew-t distributions). The present contribution constitutes an attempt to provide a better estimation of the joint probability distribution able to describe flood events in a multi-site multi-basin fashion. This goal will be pursued through the multivariate skew-t distribution, which allows to analytically define the joint probability distribution. Performances of the skew-t distribution will be discussed with reference to the Tanaro River in Northwestern Italy. To enhance the characteristics of the correlation structure, both nested and non-nested gauging stations will be selected, with significantly different contributing areas.
Application of Statistically Derived CPAS Parachute Parameters

NASA Technical Reports Server (NTRS)

Romero, Leah M.; Ray, Eric S.

2013-01-01

The Capsule Parachute Assembly System (CPAS) Analysis Team is responsible for determining parachute inflation parameters and dispersions that are ultimately used in verifying system requirements. A model memo is internally released semi-annually documenting parachute inflation and other key parameters reconstructed from flight test data. Dispersion probability distributions published in previous versions of the model memo were uniform because insufficient data were available for determination of statistical based distributions. Uniform distributions do not accurately represent the expected distributions since extreme parameter values are just as likely to occur as the nominal value. CPAS has taken incremental steps to move away from uniform distributions. Model Memo version 9 (MMv9) made the first use of non-uniform dispersions, but only for the reefing cutter timing, for which a large number of sample was available. In order to maximize the utility of the available flight test data, clusters of parachutes were reconstructed individually starting with Model Memo version 10. This allowed for statistical assessment for steady-state drag area (CDS) and parachute inflation parameters such as the canopy fill distance (n), profile shape exponent (expopen), over-inflation factor (C(sub k)), and ramp-down time (t(sub k)) distributions. Built-in MATLAB distributions were applied to the histograms, and parameters such as scale (sigma) and location (mu) were output. Engineering judgment was used to determine the "best fit" distribution based on the test data. Results include normal, log normal, and uniform (where available data remains insufficient) fits of nominal and failure (loss of parachute and skipped stage) cases for all CPAS parachutes. This paper discusses the uniform methodology that was previously used, the process and result of the statistical assessment, how the dispersions were incorporated into Monte Carlo analyses, and the application of the distributions in trajectory benchmark testing assessments with parachute inflation parameters, drag area, and reefing cutter timing used by CPAS.
Estimating statistical uncertainty of Monte Carlo efficiency-gain in the context of a correlated sampling Monte Carlo code for brachytherapy treatment planning with non-normal dose distribution.

PubMed

Mukhopadhyay, Nitai D; Sampson, Andrew J; Deniz, Daniel; Alm Carlsson, Gudrun; Williamson, Jeffrey; Malusek, Alexandr

2012-01-01

Correlated sampling Monte Carlo methods can shorten computing times in brachytherapy treatment planning. Monte Carlo efficiency is typically estimated via efficiency gain, defined as the reduction in computing time by correlated sampling relative to conventional Monte Carlo methods when equal statistical uncertainties have been achieved. The determination of the efficiency gain uncertainty arising from random effects, however, is not a straightforward task specially when the error distribution is non-normal. The purpose of this study is to evaluate the applicability of the F distribution and standardized uncertainty propagation methods (widely used in metrology to estimate uncertainty of physical measurements) for predicting confidence intervals about efficiency gain estimates derived from single Monte Carlo runs using fixed-collision correlated sampling in a simplified brachytherapy geometry. A bootstrap based algorithm was used to simulate the probability distribution of the efficiency gain estimates and the shortest 95% confidence interval was estimated from this distribution. It was found that the corresponding relative uncertainty was as large as 37% for this particular problem. The uncertainty propagation framework predicted confidence intervals reasonably well; however its main disadvantage was that uncertainties of input quantities had to be calculated in a separate run via a Monte Carlo method. The F distribution noticeably underestimated the confidence interval. These discrepancies were influenced by several photons with large statistical weights which made extremely large contributions to the scored absorbed dose difference. The mechanism of acquiring high statistical weights in the fixed-collision correlated sampling method was explained and a mitigation strategy was proposed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Statistical analysis tables for truncated or censored samples

NASA Technical Reports Server (NTRS)

Cohen, A. C.; Cooley, C. G.

1971-01-01

Compilation describes characteristics of truncated and censored samples, and presents six illustrations of practical use of tables in computing mean and variance estimates for normal distribution using selected samples.
Rescaled earthquake recurrence time statistics: application to microrepeaters

NASA Astrophysics Data System (ADS)

Goltz, Christian; Turcotte, Donald L.; Abaimov, Sergey G.; Nadeau, Robert M.; Uchida, Naoki; Matsuzawa, Toru

2009-01-01

Slip on major faults primarily occurs during `characteristic' earthquakes. The recurrence statistics of characteristic earthquakes play an important role in seismic hazard assessment. A major problem in determining applicable statistics is the short sequences of characteristic earthquakes that are available worldwide. In this paper, we introduce a rescaling technique in which sequences can be superimposed to establish larger numbers of data points. We consider the Weibull and log-normal distributions, in both cases we rescale the data using means and standard deviations. We test our approach utilizing sequences of microrepeaters, micro-earthquakes which recur in the same location on a fault. It seems plausible to regard these earthquakes as a miniature version of the classic characteristic earthquakes. Microrepeaters are much more frequent than major earthquakes, leading to longer sequences for analysis. In this paper, we present results for the analysis of recurrence times for several microrepeater sequences from Parkfield, CA as well as NE Japan. We find that, once the respective sequence can be considered to be of sufficient stationarity, the statistics can be well fitted by either a Weibull or a log-normal distribution. We clearly demonstrate this fact by our technique of rescaled combination. We conclude that the recurrence statistics of the microrepeater sequences we consider are similar to the recurrence statistics of characteristic earthquakes on major faults.
A case study: application of statistical process control tool for determining process capability and sigma level.

PubMed

Chopra, Vikram; Bairagi, Mukesh; Trivedi, P; Nagar, Mona

2012-01-01

Statistical process control is the application of statistical methods to the measurement and analysis of variation process. Various regulatory authorities such as Validation Guidance for Industry (2011), International Conference on Harmonisation ICH Q10 (2009), the Health Canada guidelines (2009), Health Science Authority, Singapore: Guidance for Product Quality Review (2008), and International Organization for Standardization ISO-9000:2005 provide regulatory support for the application of statistical process control for better process control and understanding. In this study risk assessments, normal probability distributions, control charts, and capability charts are employed for selection of critical quality attributes, determination of normal probability distribution, statistical stability, and capability of production processes, respectively. The objective of this study is to determine tablet production process quality in the form of sigma process capability. By interpreting data and graph trends, forecasting of critical quality attributes, sigma process capability, and stability of process were studied. The overall study contributes to an assessment of process at the sigma level with respect to out-of-specification attributes produced. Finally, the study will point to an area where the application of quality improvement and quality risk assessment principles for achievement of six sigma-capable processes is possible. Statistical process control is the most advantageous tool for determination of the quality of any production process. This tool is new for the pharmaceutical tablet production process. In the case of pharmaceutical tablet production processes, the quality control parameters act as quality assessment parameters. Application of risk assessment provides selection of critical quality attributes among quality control parameters. Sequential application of normality distributions, control charts, and capability analyses provides a valid statistical process control study on process. Interpretation of such a study provides information about stability, process variability, changing of trends, and quantification of process ability against defective production. Comparative evaluation of critical quality attributes by Pareto charts provides the least capable and most variable process that is liable for improvement. Statistical process control thus proves to be an important tool for six sigma-capable process development and continuous quality improvement.

Application of the Bootstrap Statistical Method in Deriving Vibroacoustic Specifications

NASA Technical Reports Server (NTRS)

Hughes, William O.; Paez, Thomas L.

2006-01-01

This paper discusses the Bootstrap Method for specification of vibroacoustic test specifications. Vibroacoustic test specifications are necessary to properly accept or qualify a spacecraft and its components for the expected acoustic, random vibration and shock environments seen on an expendable launch vehicle. Traditionally, NASA and the U.S. Air Force have employed methods of Normal Tolerance Limits to derive these test levels based upon the amount of data available, and the probability and confidence levels desired. The Normal Tolerance Limit method contains inherent assumptions about the distribution of the data. The Bootstrap is a distribution-free statistical subsampling method which uses the measured data themselves to establish estimates of statistical measures of random sources. This is achieved through the computation of large numbers of Bootstrap replicates of a data measure of interest and the use of these replicates to derive test levels consistent with the probability and confidence desired. The comparison of the results of these two methods is illustrated via an example utilizing actual spacecraft vibroacoustic data.
Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part III: Application to statistical modal analysis

NASA Astrophysics Data System (ADS)

Yan, Wang-Ji; Ren, Wei-Xin

2018-01-01

This study applies the theoretical findings of circularly-symmetric complex normal ratio distribution Yan and Ren (2016) [1,2] to transmissibility-based modal analysis from a statistical viewpoint. A probabilistic model of transmissibility function in the vicinity of the resonant frequency is formulated in modal domain, while some insightful comments are offered. It theoretically reveals that the statistics of transmissibility function around the resonant frequency is solely dependent on 'noise-to-signal' ratio and mode shapes. As a sequel to the development of the probabilistic model of transmissibility function in modal domain, this study poses the process of modal identification in the context of Bayesian framework by borrowing a novel paradigm. Implementation issues unique to the proposed approach are resolved by Lagrange multiplier approach. Also, this study explores the possibility of applying Bayesian analysis in distinguishing harmonic components and structural ones. The approaches are verified through simulated data and experimentally testing data. The uncertainty behavior due to variation of different factors is also discussed in detail.
On the extreme value statistics of normal random matrices and 2D Coulomb gases: Universality and finite N corrections

NASA Astrophysics Data System (ADS)

Ebrahimi, R.; Zohren, S.

2018-03-01

In this paper we extend the orthogonal polynomials approach for extreme value calculations of Hermitian random matrices, developed by Nadal and Majumdar (J. Stat. Mech. P04001 arXiv:1102.0738), to normal random matrices and 2D Coulomb gases in general. Firstly, we show that this approach provides an alternative derivation of results in the literature. More precisely, we show convergence of the rescaled eigenvalue with largest modulus of a normal Gaussian ensemble to a Gumbel distribution, as well as universality for an arbitrary radially symmetric potential. Secondly, it is shown that this approach can be generalised to obtain convergence of the eigenvalue with smallest modulus and its universality for ring distributions. Most interestingly, the here presented techniques are used to compute all slowly varying finite N correction of the above distributions, which is important for practical applications, given the slow convergence. Another interesting aspect of this work is the fact that we can use standard techniques from Hermitian random matrices to obtain the extreme value statistics of non-Hermitian random matrices resembling the large N expansion used in context of the double scaling limit of Hermitian matrix models in string theory.
Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors

NASA Astrophysics Data System (ADS)

Pernot, Pascal; Savin, Andreas

2018-06-01

Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.
Extracting Spurious Latent Classes in Growth Mixture Modeling with Nonnormal Errors

ERIC Educational Resources Information Center

Guerra-Peña, Kiero; Steinley, Douglas

2016-01-01

Growth mixture modeling is generally used for two purposes: (1) to identify mixtures of normal subgroups and (2) to approximate oddly shaped distributions by a mixture of normal components. Often in applied research this methodology is applied to both of these situations indistinctly: using the same fit statistics and likelihood ratio tests. This…
A Note on the Assumption of Identical Distributions for Nonparametric Tests of Location

ERIC Educational Resources Information Center

Nordstokke, David W.; Colp, S. Mitchell

2018-01-01

Often, when testing for shift in location, researchers will utilize nonparametric statistical tests in place of their parametric counterparts when there is evidence or belief that the assumptions of the parametric test are not met (i.e., normally distributed dependent variables). An underlying and often unattended to assumption of nonparametric…
Using R to Simulate Permutation Distributions for Some Elementary Experimental Designs

ERIC Educational Resources Information Center

Eudey, T. Lynn; Kerr, Joshua D.; Trumbo, Bruce E.

2010-01-01

Null distributions of permutation tests for two-sample, paired, and block designs are simulated using the R statistical programming language. For each design and type of data, permutation tests are compared with standard normal-theory and nonparametric tests. These examples (often using real data) provide for classroom discussion use of metrics…
A novel gamma-fitting statistical method for anti-drug antibody assays to establish assay cut points for data with non-normal distribution.

PubMed

Schlain, Brian; Amaravadi, Lakshmi; Donley, Jean; Wickramasekera, Ananda; Bennett, Donald; Subramanyam, Meena

2010-01-31

In recent years there has been growing recognition of the impact of anti-drug or anti-therapeutic antibodies (ADAs, ATAs) on the pharmacokinetic and pharmacodynamic behavior of the drug, which ultimately affects drug exposure and activity. These anti-drug antibodies can also impact safety of the therapeutic by inducing a range of reactions from hypersensitivity to neutralization of the activity of an endogenous protein. Assessments of immunogenicity, therefore, are critically dependent on the bioanalytical method used to test samples, in which a positive versus negative reactivity is determined by a statistically derived cut point based on the distribution of drug naïve samples. For non-normally distributed data, a novel gamma-fitting method for obtaining assay cut points is presented. Non-normal immunogenicity data distributions, which tend to be unimodal and positively skewed, can often be modeled by 3-parameter gamma fits. Under a gamma regime, gamma based cut points were found to be more accurate (closer to their targeted false positive rates) compared to normal or log-normal methods and more precise (smaller standard errors of cut point estimators) compared with the nonparametric percentile method. Under a gamma regime, normal theory based methods for estimating cut points targeting a 5% false positive rate were found in computer simulation experiments to have, on average, false positive rates ranging from 6.2 to 8.3% (or positive biases between +1.2 and +3.3%) with bias decreasing with the magnitude of the gamma shape parameter. The log-normal fits tended, on average, to underestimate false positive rates with negative biases as large a -2.3% with absolute bias decreasing with the shape parameter. These results were consistent with the well known fact that gamma distributions become less skewed and closer to a normal distribution as their shape parameters increase. Inflated false positive rates, especially in a screening assay, shifts the emphasis to confirm test results in a subsequent test (confirmatory assay). On the other hand, deflated false positive rates in the case of screening immunogenicity assays will not meet the minimum 5% false positive target as proposed in the immunogenicity assay guidance white papers. Copyright 2009 Elsevier B.V. All rights reserved.
Characterization of the Distribution of the Lz Index of Person Fit According to the Estimated Proficiency Level

ERIC Educational Resources Information Center

Raiche, Gilles; Blais, Jean-Guy

2005-01-01

The distribution of person fit indices is not easy to describe in tests where the item sample is too small to conform to a theoretical asymptotic statistical distribution, particularly the normal N(0,1). In practice, it is always the fact and, consequently, it is difficult to get the critical percentile value indicating person misfit. First, we…
On the Seasonality of Sudden Stratospheric Warmings

NASA Astrophysics Data System (ADS)

Reichler, T.; Horan, M.

2017-12-01

The downward influence of sudden stratospheric warmings (SSWs) creates significant tropospheric circulation anomalies that last for weeks. It is therefore of theoretical and practical interest to understand the time when SSWs are most likely to occur and the controlling factors for the temporal distribution of SSWs. Conceivably, the distribution between mid-winter and late-winter is controlled by the interplay between decreasing eddy convergence in the region of the polar vortex and the weakening strength of the polar vortex. General circulation models (GCMs) tend to produce SSW maxima later in winter than observations, which has been considered as a model deficiency. However, the observed record is short, suggesting that under-sampling of SSWs may contribute to this discrepancy. Here, we study the climatological frequency distribution of SSWs and related events in a long control simulation with a stratosphere resolving GCM. We also create a simple statistical model to determine the primary factors controlling the SSW distribution. The statistical model is based on the daily climatological mean, standard deviation, and autocorrelation of stratospheric winds, and assumes that the winds follow a normal distribution. We find that the null hypothesis, that model and observations stem from the same distribution, cannot be rejected, suggesting that the mid-winter SSW maximum seen in the observations is due to sampling uncertainty. We also find that the statistical model faithfully reproduces the seasonal distribution of SSWs, and that the decreasing climatological strength of the polar vortex is the primary factor for it. We conclude that the late-winter SSW maximum seen in most models is realistic and that late events will be more prominent in future observations. We further conclude that SSWs simply form the tail of normally distributed stratospheric winds, suggesting that there is a continuum of weak polar vortex states and that statistically there is nothing special about the zero-threshold used to define SSWs.
Textural content in 3T MR: an image-based marker for Alzheimer's disease

NASA Astrophysics Data System (ADS)

Bharath Kumar, S. V.; Mullick, Rakesh; Patil, Uday

2005-04-01

In this paper, we propose a study, which investigates the first-order and second-order distributions of T2 images from a magnetic resonance (MR) scan for an age-matched data set of 24 Alzheimer's disease and 17 normal patients. The study is motivated by the desire to analyze the brain iron uptake in the hippocampus of Alzheimer's patients, which is captured by low T2 values. Since, excess iron deposition occurs locally in certain regions of the brain, we are motivated to investigate the spatial distribution of T2, which is captured by higher-order statistics. Based on the first-order and second-order distributions (involving gray level co-occurrence matrix) of T2, we show that the second-order statistics provide features with sensitivity >90% (at 80% specificity), which in turn capture the textural content in T2 data. Hence, we argue that different texture characteristics of T2 in the hippocampus for Alzheimer's and normal patients could be used as an early indicator of Alzheimer's disease.
Model for neural signaling leap statistics

NASA Astrophysics Data System (ADS)

Chevrollier, Martine; Oriá, Marcos

2011-03-01

We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T = 37.5°C, awaken regime) and Lévy statistics (T = 35.5°C, sleeping period), characterized by rare events of long range connections.
Non-linear learning in online tutorial to enhance students’ knowledge on normal distribution application topic

NASA Astrophysics Data System (ADS)

Kartono; Suryadi, D.; Herman, T.

2018-01-01

This study aimed to analyze the enhancement of non-linear learning (NLL) in the online tutorial (OT) content to students’ knowledge of normal distribution application (KONDA). KONDA is a competence expected to be achieved after students studied the topic of normal distribution application in the course named Education Statistics. The analysis was performed by quasi-experiment study design. The subject of the study was divided into an experimental class that was given OT content in NLL model and a control class which was given OT content in conventional learning (CL) model. Data used in this study were the results of online objective tests to measure students’ statistical prior knowledge (SPK) and students’ pre- and post-test of KONDA. The statistical analysis test of a gain score of KONDA of students who had low and moderate SPK’s scores showed students’ KONDA who learn OT content with NLL model was better than students’ KONDA who learn OT content with CL model. Meanwhile, for students who had high SPK’s scores, the gain score of students who learn OT content with NLL model had relatively similar with the gain score of students who learn OT content with CL model. Based on those findings it could be concluded that the NLL model applied to OT content could enhance KONDA of students in low and moderate SPK’s levels. Extra and more challenging didactical situation was needed for students in high SPK’s level to achieve the significant gain score.
Lognormal Behavior of the Size Distributions of Animation Characters

NASA Astrophysics Data System (ADS)

Yamamoto, Ken

This study investigates the statistical property of the character sizes of animation, superhero series, and video game. By using online databases of Pokémon (video game) and Power Rangers (superhero series), the height and weight distributions are constructed, and we find that the weight distributions of Pokémon and Zords (robots in Power Rangers) follow the lognormal distribution in common. For the theoretical mechanism of this lognormal behavior, the combination of the normal distribution and the Weber-Fechner law is proposed.
Using the range to calculate the coefficient of variation.

PubMed

Rhiel, G Steven

2004-12-01

In this research a coefficient of variation (CVhigh-low) is calculated from the highest and lowest values in a set of data. Use of CVhigh-low when the population is normal, leptokurtic, and skewed is discussed. The statistic is the most effective when sampling from the normal distribution. With the leptokurtic distributions, CVhigh-low works well for comparing the relative variability between two or more distributions but does not provide a very "good" point estimate of the population coefficient of variation. With skewed distributions CVhigh-low works well in identifying which data set has the more relative variation but does not specify how much difference there is in the variation. It also does not provide a "good" point estimate.
Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

NASA Astrophysics Data System (ADS)

Sergeenko, N. P.

2017-11-01

An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.
Performance of statistical models to predict mental health and substance abuse cost.

PubMed

Montez-Rath, Maria; Christiansen, Cindy L; Ettner, Susan L; Loveland, Susan; Rosen, Amy K

2006-10-26

Providers use risk-adjustment systems to help manage healthcare costs. Typically, ordinary least squares (OLS) models on either untransformed or log-transformed cost are used. We examine the predictive ability of several statistical models, demonstrate how model choice depends on the goal for the predictive model, and examine whether building models on samples of the data affects model choice. Our sample consisted of 525,620 Veterans Health Administration patients with mental health (MH) or substance abuse (SA) diagnoses who incurred costs during fiscal year 1999. We tested two models on a transformation of cost: a Log Normal model and a Square-root Normal model, and three generalized linear models on untransformed cost, defined by distributional assumption and link function: Normal with identity link (OLS); Gamma with log link; and Gamma with square-root link. Risk-adjusters included age, sex, and 12 MH/SA categories. To determine the best model among the entire dataset, predictive ability was evaluated using root mean square error (RMSE), mean absolute prediction error (MAPE), and predictive ratios of predicted to observed cost (PR) among deciles of predicted cost, by comparing point estimates and 95% bias-corrected bootstrap confidence intervals. To study the effect of analyzing a random sample of the population on model choice, we re-computed these statistics using random samples beginning with 5,000 patients and ending with the entire sample. The Square-root Normal model had the lowest estimates of the RMSE and MAPE, with bootstrap confidence intervals that were always lower than those for the other models. The Gamma with square-root link was best as measured by the PRs. The choice of best model could vary if smaller samples were used and the Gamma with square-root link model had convergence problems with small samples. Models with square-root transformation or link fit the data best. This function (whether used as transformation or as a link) seems to help deal with the high comorbidity of this population by introducing a form of interaction. The Gamma distribution helps with the long tail of the distribution. However, the Normal distribution is suitable if the correct transformation of the outcome is used.
The fracture load and failure types of veneered anterior zirconia crowns: an analysis of normal and Weibull distribution of complete and censored data.

PubMed

Stawarczyk, Bogna; Ozcan, Mutlu; Hämmerle, Christoph H F; Roos, Malgorzata

2012-05-01

The aim of this study was to compare the fracture load of veneered anterior zirconia crowns using normal and Weibull distribution of complete and censored data. Standardized zirconia frameworks for maxillary canines were milled using a CAD/CAM system and randomly divided into 3 groups (N=90, n=30 per group). They were veneered with three veneering ceramics, namely GC Initial ZR, Vita VM9, IPS e.max Ceram using layering technique. The crowns were cemented with glass ionomer cement on metal abutments. The specimens were then loaded to fracture (1 mm/min) in a Universal Testing Machine. The data were analyzed using classical method (normal data distribution (μ, σ); Levene test and one-way ANOVA) and according to the Weibull statistics (s, m). In addition, fracture load results were analyzed depending on complete and censored failure types (only chipping vs. total fracture together with chipping). When computed with complete data, significantly higher mean fracture loads (N) were observed for GC Initial ZR (μ=978, σ=157; s=1043, m=7.2) and VITA VM9 (μ=1074, σ=179; s=1139; m=7.8) than that of IPS e.max Ceram (μ=798, σ=174; s=859, m=5.8) (p<0.05) by classical and Weibull statistics, respectively. When the data were censored for only total fracture, IPS e.max Ceram presented the lowest fracture load for chipping with both classical distribution (μ=790, σ=160) and Weibull statistics (s=836, m=6.5). When total fracture with chipping (classical distribution) was considered as failure, IPS e.max Ceram did not show significant fracture load for total fracture (μ=1054, σ=110) compared to other groups (GC Initial ZR: μ=1039, σ=152, VITA VM9: μ=1170, σ=166). According to Weibull distributed data, VITA VM9 showed significantly higher fracture load (s=1228, m=9.4) than those of other groups. Both classical distribution and Weibull statistics for complete data yielded similar outcomes. Censored data analysis of all ceramic systems based on failure types is essential and brings additional information regarding the susceptibility to chipping or total fracture. Copyright © 2011 Academy of Dental Materials. Published by Elsevier Ltd. All rights reserved.
PIXE analysis of elements in gastric cancer and adjacent mucosa

NASA Astrophysics Data System (ADS)

Liu, Qixin; Zhong, Ming; Zhang, Xiaofeng; Yan, Lingnuo; Xu, Yongling; Ye, Simao

1990-04-01

The elemental regional distributions in 20 resected human stomach tissues were obtained using PIXE analysis. The samples were pathologically divided into four types: normal, adjacent mucosa A, adjacent mucosa B and cancer. The targets for PIXE analysis were prepared by wet digestion with a pressure bomb system. P, K, Fe, Cu, Zn and Se were measured and statistically analysed. We found significantly higher concentrations of P, K, Cu, Zn and a higher ratio of Cu compared to Zn in cancer tissue as compared with normal tissue, but statistically no significant difference between adjacent mucosa and cancer tissue was found.
Generalized linear models and point count data: statistical considerations for the design and analysis of monitoring studies

Treesearch

Nathaniel E. Seavy; Suhel Quader; John D. Alexander; C. John Ralph

2005-01-01

The success of avian monitoring programs to effectively guide management decisions requires that studies be efficiently designed and data be properly analyzed. A complicating factor is that point count surveys often generate data with non-normal distributional properties. In this paper we review methods of dealing with deviations from normal assumptions, and we focus...

Particle size distributions by transmission electron microscopy: an interlaboratory comparison case study

PubMed Central

Rice, Stephen B; Chan, Christopher; Brown, Scott C; Eschbach, Peter; Han, Li; Ensor, David S; Stefaniak, Aleksandr B; Bonevich, John; Vladár, András E; Hight Walker, Angela R; Zheng, Jiwen; Starnes, Catherine; Stromberg, Arnold; Ye, Jia; Grulke, Eric A

2015-01-01

This paper reports an interlaboratory comparison that evaluated a protocol for measuring and analysing the particle size distribution of discrete, metallic, spheroidal nanoparticles using transmission electron microscopy (TEM). The study was focused on automated image capture and automated particle analysis. NIST RM8012 gold nanoparticles (30 nm nominal diameter) were measured for area-equivalent diameter distributions by eight laboratories. Statistical analysis was used to (1) assess the data quality without using size distribution reference models, (2) determine reference model parameters for different size distribution reference models and non-linear regression fitting methods and (3) assess the measurement uncertainty of a size distribution parameter by using its coefficient of variation. The interlaboratory area-equivalent diameter mean, 27.6 nm ± 2.4 nm (computed based on a normal distribution), was quite similar to the area-equivalent diameter, 27.6 nm, assigned to NIST RM8012. The lognormal reference model was the preferred choice for these particle size distributions as, for all laboratories, its parameters had lower relative standard errors (RSEs) than the other size distribution reference models tested (normal, Weibull and Rosin–Rammler–Bennett). The RSEs for the fitted standard deviations were two orders of magnitude higher than those for the fitted means, suggesting that most of the parameter estimate errors were associated with estimating the breadth of the distributions. The coefficients of variation for the interlaboratory statistics also confirmed the lognormal reference model as the preferred choice. From quasi-linear plots, the typical range for good fits between the model and cumulative number-based distributions was 1.9 fitted standard deviations less than the mean to 2.3 fitted standard deviations above the mean. Automated image capture, automated particle analysis and statistical evaluation of the data and fitting coefficients provide a framework for assessing nanoparticle size distributions using TEM for image acquisition. PMID:26361398
Forest fires in Pennsylvania.

Treesearch

Donald A. Haines; William A. Main; Eugene F. McNamara

1978-01-01

Describes factors that contribute to forest fires in Pennsylvania. Includes an analysis of basic statistics; distribution of fires during normal, drought, and wet years; fire cause, fire activity by day-of-week; multiple-fire day; and fire climatology.
An Application of Extreme Value Theory to Learning Analytics: Predicting Collaboration Outcome from Eye-Tracking Data

ERIC Educational Resources Information Center

Sharma, Kshitij; Chavez-Demoulin, Valérie; Dillenbourg, Pierre

2017-01-01

The statistics used in education research are based on central trends such as the mean or standard deviation, discarding outliers. This paper adopts another viewpoint that has emerged in statistics, called extreme value theory (EVT). EVT claims that the bulk of normal distribution is comprised mainly of uninteresting variations while the most…
The concept of normality through history: a didactic review of features related to philosophy, statistics and medicine.

PubMed

Conti, A A; Conti, A; Gensini, G F

2006-09-01

Normality characterises in medicine any possible qualitative or quantitative situation whose absence implies an illness or a state of abnormality. The illness concept was first a philosophical one. But the use of mathematics in the study of biological events, which began with Galton (1822-1911) and with Pearson (1857-1936), changed the frame of reference. In the second part of the 19th century mathematics was used to study the distribution of some biological characteristics in the evolution of the species. Around 1900, statistics became the basis for the study of the diffusion of the illnesses. Half a century later statistics made possible the transition from the description of single cases to groups of cases. Even more important is the concept of "normality" in laboratory medicine. In this field the search for the "perfect norm" was, and possibly still is, under way. The widespread use of statistics in the laboratory has allowed the definition, in a certain sense, of a new normality. This is the reason why the term "reference value" has been introduced. However, even the introduction of this new term has merely shifted the problem, and not resolved it.
Goodness-of-Fit Tests for Generalized Normal Distribution for Use in Hydrological Frequency Analysis

NASA Astrophysics Data System (ADS)

Das, Samiran

2018-04-01

The use of three-parameter generalized normal (GNO) as a hydrological frequency distribution is well recognized, but its application is limited due to unavailability of popular goodness-of-fit (GOF) test statistics. This study develops popular empirical distribution function (EDF)-based test statistics to investigate the goodness-of-fit of the GNO distribution. The focus is on the case most relevant to the hydrologist, namely, that in which the parameter values are unidentified and estimated from a sample using the method of L-moments. The widely used EDF tests such as Kolmogorov-Smirnov, Cramer von Mises, and Anderson-Darling (AD) are considered in this study. A modified version of AD, namely, the Modified Anderson-Darling (MAD) test, is also considered and its performance is assessed against other EDF tests using a power study that incorporates six specific Wakeby distributions (WA-1, WA-2, WA-3, WA-4, WA-5, and WA-6) as the alternative distributions. The critical values of the proposed test statistics are approximated using Monte Carlo techniques and are summarized in chart and regression equation form to show the dependence of shape parameter and sample size. The performance results obtained from the power study suggest that the AD and a variant of the MAD (MAD-L) are the most powerful tests. Finally, the study performs case studies involving annual maximum flow data of selected gauged sites from Irish and US catchments to show the application of the derived critical values and recommends further assessments to be carried out on flow data sets of rivers with various hydrological regimes.
A heteroscedastic generalized linear model with a non-normal speed factor for responses and response times.

PubMed

Molenaar, Dylan; Bolsinova, Maria

2017-05-01

In generalized linear modelling of responses and response times, the observed response time variables are commonly transformed to make their distribution approximately normal. A normal distribution for the transformed response times is desirable as it justifies the linearity and homoscedasticity assumptions in the underlying linear model. Past research has, however, shown that the transformed response times are not always normal. Models have been developed to accommodate this violation. In the present study, we propose a modelling approach for responses and response times to test and model non-normality in the transformed response times. Most importantly, we distinguish between non-normality due to heteroscedastic residual variances, and non-normality due to a skewed speed factor. In a simulation study, we establish parameter recovery and the power to separate both effects. In addition, we apply the model to a real data set. © 2017 The Authors. British Journal of Mathematical and Statistical Psychology published by John Wiley & Sons Ltd on behalf of British Psychological Society.
Ground winds and winds aloft for Edwards AFB, California (1978 revision)

NASA Technical Reports Server (NTRS)

Johnson, D. L.; Brown, S. C.

1978-01-01

Ground level runway wind statistics for the Edwards AFB, California area are presented. Crosswind, headwind, tailwind, and headwind reversal percentage frequencies are given with respect to month and hour for the two major Edwards AFB runways. Also presented are Edwards AFB bivariate normal wind statistics for a 90 degree flight azimuth for altitudes 0 through 27 km. Wind probability distributions and statistics for any rotation of axes can be computed from the five given parameters.
[Quantitative study of diesel/CNG buses exhaust particulate size distribution in a road tunnel].

PubMed

Zhu, Chun; Zhang, Xu

2010-10-01

Vehicle emission is one of main sources of fine/ultra-fine particles in many cities. This study firstly presents daily mean particle size distributions of mixed diesel/CNG buses traffic flow by 4 days consecutive real world measurement in an Australia road tunnel. Emission factors (EFs) of particle size distribution of diesel buses and CNG buses are obtained by MLR methods, particle distributions of diesel buses and CNG buses are observed as single accumulation mode and nuclei-mode separately. Particle size distributions of mixed traffic flow are decomposed by two log-normal fitting curves for each 30 min interval mean scans, the degrees of fitting between combined fitting curves and corresponding in-situ scans for totally 90 fitting scans are from 0.972 to 0.998. Finally particle size distributions of diesel buses and CNG buses are quantified by statistical whisker-box charts. For log-normal particle size distribution of diesel buses, accumulation mode diameters are 74.5-86.5 nm, geometric standard deviations are 1.88-2.05. As to log-normal particle size distribution of CNG buses, nuclei-mode diameters are 19.9-22.9 nm, geometric standard deviations are 1.27-1.3.
New approach application of data transformation in mean centering of ratio spectra method

NASA Astrophysics Data System (ADS)

Issa, Mahmoud M.; Nejem, R.'afat M.; Van Staden, Raluca Ioana Stefan; Aboul-Enein, Hassan Y.

2015-05-01

Most of mean centering (MCR) methods are designed to be used with data sets whose values have a normal or nearly normal distribution. The errors associated with the values are also assumed to be independent and random. If the data are skewed, the results obtained may be doubtful. Most of the time, it was assumed a normal distribution and if a confidence interval includes a negative value, it was cut off at zero. However, it is possible to transform the data so that at least an approximately normal distribution is attained. Taking the logarithm of each data point is one transformation frequently used. As a result, the geometric mean is deliberated a better measure of central tendency than the arithmetic mean. The developed MCR method using the geometric mean has been successfully applied to the analysis of a ternary mixture of aspirin (ASP), atorvastatin (ATOR) and clopidogrel (CLOP) as a model. The results obtained were statistically compared with reported HPLC method.
Electron microscopic quantification of collagen fibril diameters in the rabbit medial collateral ligament: a baseline for comparison.

PubMed

Frank, C; Bray, D; Rademaker, A; Chrusch, C; Sabiston, P; Bodie, D; Rangayyan, R

1989-01-01

To establish a normal baseline for comparison, thirty-one thousand collagen fibril diameters were measured in calibrated transmission electron (TEM) photomicrographs of normal rabbit medial collateral ligaments (MCL's). A new automated method of quantitation was used to compare statistically fibril minimum diameter distributions in one midsubstance location in both MCL's from six animals at 3 months of age (immature) and three animals at 10 months of age (mature). Pooled results demonstrate that rabbit MCL's have statistically different (p less than 0.001) mean minimum diameters at these two ages. Interanimal differences in mean fibril minimum diameters were also significant (p less than 0.001) and varied by 20% to 25% in both mature and immature animals. Finally, there were significant differences (p less than 0.001) in mean diameters and distributions from side-to-side in all animals. These mean left-to-right differences were less than 10% in all mature animals but as much as 62% in some immature animals. Statistical analysis of these data demonstrate that animal-to-animal comparisons using these protocols require a large number of animals with appropriate numbers of fibrils being measured to detect small intergroup differences. With experiments which compare left to right ligaments, far fewer animals are required to detect similarly small differences. These results demonstrate the necessity for rigorous control of sampling, an extensive normal baseline and statistically confirmed experimental designs in any TEM comparisons of collagen fibril diameters.
Empirical Reference Distributions for Networks of Different Size

PubMed Central

Smith, Anna; Calder, Catherine A.; Browning, Christopher R.

2016-01-01

Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556
Calculating Student Grades.

ERIC Educational Resources Information Center

Allswang, John M.

1986-01-01

This article provides two short microcomputer gradebook programs. The programs, written in BASIC for the IBM-PC and Apple II, provide statistical information about class performance and calculate grades either on a normal distribution or based on teacher-defined break points. (JDH)
Final Report: Studies in Structural, Stochastic and Statistical Reliability for Communication Networks and Engineered Systems

DTIC Science & Technology

to do so, and (5) three distinct versions of the problem of estimating component reliability from system failure-time data are treated, each resulting inconsistent estimators with asymptotically normal distributions.
Effect of non-normality on test statistics for one-way independent groups designs.

PubMed

Cribbie, Robert A; Fiksenbaum, Lisa; Keselman, H J; Wilcox, Rand R

2012-02-01

The data obtained from one-way independent groups designs is typically non-normal in form and rarely equally variable across treatment populations (i.e., population variances are heterogeneous). Consequently, the classical test statistic that is used to assess statistical significance (i.e., the analysis of variance F test) typically provides invalid results (e.g., too many Type I errors, reduced power). For this reason, there has been considerable interest in finding a test statistic that is appropriate under conditions of non-normality and variance heterogeneity. Previously recommended procedures for analysing such data include the James test, the Welch test applied either to the usual least squares estimators of central tendency and variability, or the Welch test with robust estimators (i.e., trimmed means and Winsorized variances). A new statistic proposed by Krishnamoorthy, Lu, and Mathew, intended to deal with heterogeneous variances, though not non-normality, uses a parametric bootstrap procedure. In their investigation of the parametric bootstrap test, the authors examined its operating characteristics under limited conditions and did not compare it to the Welch test based on robust estimators. Thus, we investigated how the parametric bootstrap procedure and a modified parametric bootstrap procedure based on trimmed means perform relative to previously recommended procedures when data are non-normal and heterogeneous. The results indicated that the tests based on trimmed means offer the best Type I error control and power when variances are unequal and at least some of the distribution shapes are non-normal. © 2011 The British Psychological Society.
Jimsphere wind and turbulence exceedance statistic

NASA Technical Reports Server (NTRS)

Adelfang, S. I.; Court, A.

1972-01-01

Exceedance statistics of winds and gusts observed over Cape Kennedy with Jimsphere balloon sensors are described. Gust profiles containing positive and negative departures, from smoothed profiles, in the wavelength ranges 100-2500, 100-1900, 100-860, and 100-460 meters were computed from 1578 profiles with four 41 weight digital high pass filters. Extreme values of the square root of gust speed are normally distributed. Monthly and annual exceedance probability distributions of normalized rms gust speeds in three altitude bands (2-7, 6-11, and 9-14 km) are log-normal. The rms gust speeds are largest in the 100-2500 wavelength band between 9 and 14 km in late winter and early spring. A study of monthly and annual exceedance probabilities and the number of occurrences per kilometer of level crossings with positive slope indicates significant variability with season, altitude, and filter configuration. A decile sampling scheme is tested and an optimum approach is suggested for drawing a relatively small random sample that represents the characteristic extreme wind speeds and shears of a large parent population of Jimsphere wind profiles.
Normal and abnormal tissue identification system and method for medical images such as digital mammograms

NASA Technical Reports Server (NTRS)

Heine, John J. (Inventor); Clarke, Laurence P. (Inventor); Deans, Stanley R. (Inventor); Stauduhar, Richard Paul (Inventor); Cullers, David Kent (Inventor)

2001-01-01

A system and method for analyzing a medical image to determine whether an abnormality is present, for example, in digital mammograms, includes the application of a wavelet expansion to a raw image to obtain subspace images of varying resolution. At least one subspace image is selected that has a resolution commensurate with a desired predetermined detection resolution range. A functional form of a probability distribution function is determined for each selected subspace image, and an optimal statistical normal image region test is determined for each selected subspace image. A threshold level for the probability distribution function is established from the optimal statistical normal image region test for each selected subspace image. A region size comprising at least one sector is defined, and an output image is created that includes a combination of all regions for each selected subspace image. Each region has a first value when the region intensity level is above the threshold and a second value when the region intensity level is below the threshold. This permits the localization of a potential abnormality within the image.
A Monte Carlo Study of Levene's Test of Homogeneity of Variance: Empirical Frequencies of Type I Error in Normal Distributions.

ERIC Educational Resources Information Center

Neel, John H.; Stallings, William M.

An influential statistics test recommends a Levene text for homogeneity of variance. A recent note suggests that Levene's test is upwardly biased for small samples. Another report shows inflated Alpha estimates and low power. Neither study utilized more than two sample sizes. This Monte Carlo study involved sampling from a normal population for…
Lithographic stochastics: beyond 3σ

NASA Astrophysics Data System (ADS)

Bristol, Robert L.; Krysak, Marie E.

2017-04-01

As lithography tools continue their progress in both numerical aperture and wavelength in pursuit of Moore's law, we have reached the point where the number of features printed in a single pass can now easily surpass one trillion. Statistically, one should not be surprised to see some members of such a population exhibit fluctuations as great as 7σ. But what do these fluctuations look like? We consider the problem in terms of variations in the effective local resist sensitivity caused by feature-to-feature differences in absorbed photons and resist component counts, modeling these as a normal distribution. As the CD versus dose curve is generally nonlinear over large ranges, the normal distribution of the local effective sensitivity then maps to a nonnormal distribution in CD. For the case of individual vias printed near the resolution limit, it results in many more undersized or completely closed vias than one would expect from a normal distribution of the CDs. We show examples of this behavior from both EUV exposures in the fab and ebeam exposures in the lab.
Confidence Limits for the Indirect Effect: Distribution of the Product and Resampling Methods

ERIC Educational Resources Information Center

MacKinnon, David P.; Lockwood, Chondra M.; Williams, Jason

2004-01-01

The most commonly used method to test an indirect effect is to divide the estimate of the indirect effect by its standard error and compare the resulting z statistic with a critical value from the standard normal distribution. Confidence limits for the indirect effect are also typically based on critical values from the standard normal…
Reliable and More Powerful Methods for Power Analysis in Structural Equation Modeling

ERIC Educational Resources Information Center

Yuan, Ke-Hai; Zhang, Zhiyong; Zhao, Yanyun

2017-01-01

The normal-distribution-based likelihood ratio statistic T[subscript ml] = nF[subscript ml] is widely used for power analysis in structural Equation modeling (SEM). In such an analysis, power and sample size are computed by assuming that T[subscript ml] follows a central chi-square distribution under H[subscript 0] and a noncentral chi-square…

A Statistical Selection Strategy for Normalization Procedures in LC-MS Proteomics Experiments through Dataset Dependent Ranking of Normalization Scaling Factors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Webb-Robertson, Bobbie-Jo M.; Matzke, Melissa M.; Jacobs, Jon M.

2011-12-01

Quantification of LC-MS peak intensities assigned during peptide identification in a typical comparative proteomics experiment will deviate from run-to-run of the instrument due to both technical and biological variation. Thus, normalization of peak intensities across a LC-MS proteomics dataset is a fundamental step in pre-processing. However, the downstream analysis of LC-MS proteomics data can be dramatically affected by the normalization method selected . Current normalization procedures for LC-MS proteomics data are presented in the context of normalization values derived from subsets of the full collection of identified peptides. The distribution of these normalization values is unknown a priori. If theymore » are not independent from the biological factors associated with the experiment the normalization process can introduce bias into the data, which will affect downstream statistical biomarker discovery. We present a novel approach to evaluate normalization strategies, where a normalization strategy includes the peptide selection component associated with the derivation of normalization values. Our approach evaluates the effect of normalization on the between-group variance structure in order to identify candidate normalization strategies that improve the structure of the data without introducing bias into the normalized peak intensities.« less
Robustness of fit indices to outliers and leverage observations in structural equation modeling.

PubMed

Yuan, Ke-Hai; Zhong, Xiaoling

2013-06-01

Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).
Statistical analysis of landing contact conditions for three lifting body research vehicles

NASA Technical Reports Server (NTRS)

Larson, R. R.

1972-01-01

The landing contact conditions for the HL-10, M2-F2/F3, and the X-24A lifting body vehicles are analyzed statistically for 81 landings. The landing contact parameters analyzed are true airspeed, peak normal acceleration at the center of gravity, roll angle, and roll velocity. Ground measurement parameters analyzed are lateral and longitudinal distance from intended touchdown, lateral distance from touchdown to full stop, and rollout distance. The results are presented in the form of histograms for frequency distributions and cumulative frequency distribution probability curves with a Pearson Type 3 curve fit for extrapolation purposes.
Wavelet analysis of biological tissue's Mueller-matrix images

NASA Astrophysics Data System (ADS)

Tomka, Yu. Ya.

2008-05-01

The interrelations between statistics of the 1st-4th orders of the ensemble of Mueller-matrix images and geometric structure of birefringent architectonic nets of different morphological structure have been analyzed. The sensitivity of asymmetry and excess of statistic distributions of matrix elements Cik to changing of orientation structure of optically anisotropic protein fibrils of physiologically normal and pathologically changed biological tissues architectonics has been shown.
Statistical transformation and the interpretation of inpatient glucose control data.

PubMed

Saulnier, George E; Castro, Janna C; Cook, Curtiss B

2014-03-01

To introduce a statistical method of assessing hospital-based non-intensive care unit (non-ICU) inpatient glucose control. Point-of-care blood glucose (POC-BG) data from hospital non-ICUs were extracted for January 1 through December 31, 2011. Glucose data distribution was examined before and after Box-Cox transformations and compared to normality. Different subsets of data were used to establish upper and lower control limits, and exponentially weighted moving average (EWMA) control charts were constructed from June, July, and October data as examples to determine if out-of-control events were identified differently in nontransformed versus transformed data. A total of 36,381 POC-BG values were analyzed. In all 3 monthly test samples, glucose distributions in nontransformed data were skewed but approached a normal distribution once transformed. Interpretation of out-of-control events from EWMA control chart analyses also revealed differences. In the June test data, an out-of-control process was identified at sample 53 with nontransformed data, whereas the transformed data remained in control for the duration of the observed period. Analysis of July data demonstrated an out-of-control process sooner in the transformed (sample 55) than nontransformed (sample 111) data, whereas for October, transformed data remained in control longer than nontransformed data. Statistical transformations increase the normal behavior of inpatient non-ICU glycemic data sets. The decision to transform glucose data could influence the interpretation and conclusions about the status of inpatient glycemic control. Further study is required to determine whether transformed versus nontransformed data influence clinical decisions or evaluation of interventions.
The Statistical Fermi Paradox

NASA Astrophysics Data System (ADS)

Maccone, C.

In this paper is provided the statistical generalization of the Fermi paradox. The statistics of habitable planets may be based on a set of ten (and possibly more) astrobiological requirements first pointed out by Stephen H. Dole in his book Habitable planets for man (1964). The statistical generalization of the original and by now too simplistic Dole equation is provided by replacing a product of ten positive numbers by the product of ten positive random variables. This is denoted the SEH, an acronym standing for “Statistical Equation for Habitables”. The proof in this paper is based on the Central Limit Theorem (CLT) of Statistics, stating that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable (Lyapunov form of the CLT). It is then shown that: 1. The new random variable NHab, yielding the number of habitables (i.e. habitable planets) in the Galaxy, follows the log- normal distribution. By construction, the mean value of this log-normal distribution is the total number of habitable planets as given by the statistical Dole equation. 2. The ten (or more) astrobiological factors are now positive random variables. The probability distribution of each random variable may be arbitrary. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into the SEH by allowing an arbitrary probability distribution for each factor. This is both astrobiologically realistic and useful for any further investigations. 3. By applying the SEH it is shown that the (average) distance between any two nearby habitable planets in the Galaxy may be shown to be inversely proportional to the cubic root of NHab. This distance is denoted by new random variable D. The relevant probability density function is derived, which was named the "Maccone distribution" by Paul Davies in 2008. 4. A practical example is then given of how the SEH works numerically. Each of the ten random variables is uniformly distributed around its own mean value as given by Dole (1964) and a standard deviation of 10% is assumed. The conclusion is that the average number of habitable planets in the Galaxy should be around 100 million ±200 million, and the average distance in between any two nearby habitable planets should be about 88 light years ±40 light years. 5. The SEH results are matched against the results of the Statistical Drake Equation from reference 4. As expected, the number of currently communicating ET civilizations in the Galaxy turns out to be much smaller than the number of habitable planets (about 10,000 against 100 million, i.e. one ET civilization out of 10,000 habitable planets). The average distance between any two nearby habitable planets is much smaller that the average distance between any two neighbouring ET civilizations: 88 light years vs. 2000 light years, respectively. This means an ET average distance about 20 times higher than the average distance between any pair of adjacent habitable planets. 6. Finally, a statistical model of the Fermi Paradox is derived by applying the above results to the coral expansion model of Galactic colonization. The symbolic manipulator "Macsyma" is used to solve these difficult equations. A new random variable Tcol, representing the time needed to colonize a new planet is introduced, which follows the lognormal distribution, Then the new quotient random variable Tcol/D is studied and its probability density function is derived by Macsyma. Finally a linear transformation of random variables yields the overall time TGalaxy needed to colonize the whole Galaxy. We believe that our mathematical work in deriving this STATISTICAL Fermi Paradox is highly innovative and fruitful for the future.
On the cause of the non-Gaussian distribution of residuals in geomagnetism

NASA Astrophysics Data System (ADS)

Hulot, G.; Khokhlov, A.

2017-12-01

To describe errors in the data, Gaussian distributions naturally come to mind. In many practical instances, indeed, Gaussian distributions are appropriate. In the broad field of geomagnetism, however, it has repeatedly been noted that residuals between data and models often display much sharper distributions, sometimes better described by a Laplace distribution. In the present study, we make the case that such non-Gaussian behaviors are very likely the result of what is known as mixture of distributions in the statistical literature. Mixtures arise as soon as the data do not follow a common distribution or are not properly normalized, the resulting global distribution being a mix of the various distributions followed by subsets of the data, or even individual datum. We provide examples of the way such mixtures can lead to distributions that are much sharper than Gaussian distributions and discuss the reasons why such mixtures are likely the cause of the non-Gaussian distributions observed in geomagnetism. We also show that when properly selecting sub-datasets based on geophysical criteria, statistical mixture can sometimes be avoided and much more Gaussian behaviors recovered. We conclude with some general recommendations and point out that although statistical mixture always tends to sharpen the resulting distribution, it does not necessarily lead to a Laplacian distribution. This needs to be taken into account when dealing with such non-Gaussian distributions.
Methods for detrending success metrics to account for inflationary and deflationary factors*

NASA Astrophysics Data System (ADS)

Petersen, A. M.; Penner, O.; Stanley, H. E.

2011-01-01

Time-dependent economic, technological, and social factors can artificially inflate or deflate quantitative measures for career success. Here we develop and test a statistical method for normalizing career success metrics across time dependent factors. In particular, this method addresses the long standing question: how do we compare the career achievements of professional athletes from different historical eras? Developing an objective approach will be of particular importance over the next decade as major league baseball (MLB) players from the "steroids era" become eligible for Hall of Fame induction. Some experts are calling for asterisks (*) to be placed next to the career statistics of athletes found guilty of using performance enhancing drugs (PED). Here we address this issue, as well as the general problem of comparing statistics from distinct eras, by detrending the seasonal statistics of professional baseball players. We detrend player statistics by normalizing achievements to seasonal averages, which accounts for changes in relative player ability resulting from a range of factors. Our methods are general, and can be extended to various arenas of competition where time-dependent factors play a key role. For five statistical categories, we compare the probability density function (pdf) of detrended career statistics to the pdf of raw career statistics calculated for all player careers in the 90-year period 1920-2009. We find that the functional form of these pdfs is stationary under detrending. This stationarity implies that the statistical regularity observed in the right-skewed distributions for longevity and success in professional sports arises from both the wide range of intrinsic talent among athletes and the underlying nature of competition. We fit the pdfs for career success by the Gamma distribution in order to calculate objective benchmarks based on extreme statistics which can be used for the identification of extraordinary careers.
Probability distributions of the electroencephalogram envelope of preterm infants.

PubMed

Saji, Ryoya; Hirasawa, Kyoko; Ito, Masako; Kusuda, Satoshi; Konishi, Yukuo; Taga, Gentaro

2015-06-01

To determine the stationary characteristics of electroencephalogram (EEG) envelopes for prematurely born (preterm) infants and investigate the intrinsic characteristics of early brain development in preterm infants. Twenty neurologically normal sets of EEGs recorded in infants with a post-conceptional age (PCA) range of 26-44 weeks (mean 37.5 ± 5.0 weeks) were analyzed. Hilbert transform was applied to extract the envelope. We determined the suitable probability distribution of the envelope and performed a statistical analysis. It was found that (i) the probability distributions for preterm EEG envelopes were best fitted by lognormal distributions at 38 weeks PCA or less, and by gamma distributions at 44 weeks PCA; (ii) the scale parameter of the lognormal distribution had positive correlations with PCA as well as a strong negative correlation with the percentage of low-voltage activity; (iii) the shape parameter of the lognormal distribution had significant positive correlations with PCA; (iv) the statistics of mode showed significant linear relationships with PCA, and, therefore, it was considered a useful index in PCA prediction. These statistics, including the scale parameter of the lognormal distribution and the skewness and mode derived from a suitable probability distribution, may be good indexes for estimating stationary nature in developing brain activity in preterm infants. The stationary characteristics, such as discontinuity, asymmetry, and unimodality, of preterm EEGs are well indicated by the statistics estimated from the probability distribution of the preterm EEG envelopes. Copyright © 2014 International Federation of Clinical Neurophysiology. Published by Elsevier Ireland Ltd. All rights reserved.
Simulation-based estimation of mean and standard deviation for meta-analysis via Approximate Bayesian Computation (ABC).

PubMed

Kwon, Deukwoo; Reis, Isildinha M

2015-08-12

When conducting a meta-analysis of a continuous outcome, estimated means and standard deviations from the selected studies are required in order to obtain an overall estimate of the mean effect and its confidence interval. If these quantities are not directly reported in the publications, they must be estimated from other reported summary statistics, such as the median, the minimum, the maximum, and quartiles. We propose a simulation-based estimation approach using the Approximate Bayesian Computation (ABC) technique for estimating mean and standard deviation based on various sets of summary statistics found in published studies. We conduct a simulation study to compare the proposed ABC method with the existing methods of Hozo et al. (2005), Bland (2015), and Wan et al. (2014). In the estimation of the standard deviation, our ABC method performs better than the other methods when data are generated from skewed or heavy-tailed distributions. The corresponding average relative error (ARE) approaches zero as sample size increases. In data generated from the normal distribution, our ABC performs well. However, the Wan et al. method is best for estimating standard deviation under normal distribution. In the estimation of the mean, our ABC method is best regardless of assumed distribution. ABC is a flexible method for estimating the study-specific mean and standard deviation for meta-analysis, especially with underlying skewed or heavy-tailed distributions. The ABC method can be applied using other reported summary statistics such as the posterior mean and 95 % credible interval when Bayesian analysis has been employed.
Simulation of flight maneuver-load distributions by utilizing stationary, non-Gaussian random load histories

NASA Technical Reports Server (NTRS)

Leybold, H. A.

1971-01-01

Random numbers were generated with the aid of a digital computer and transformed such that the probability density function of a discrete random load history composed of these random numbers had one of the following non-Gaussian distributions: Poisson, binomial, log-normal, Weibull, and exponential. The resulting random load histories were analyzed to determine their peak statistics and were compared with cumulative peak maneuver-load distributions for fighter and transport aircraft in flight.
Photogrammetry: An available surface characterization tool for solar concentrators. Part 2: Assessment of surfaces

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shortis, M.; Johnston, G.

1997-11-01

In a previous paper, the results of photogrammetric measurements of a number of paraboloidal reflecting surfaces were presented. These results showed that photogrammetry can provide three-dimensional surface characterizations of such solar concentrators. The present paper describes the assessment of the quality of these surfaces as a derivation of the photogrammetrically produced surface coordinates. Statistical analysis of the z-coordinate distribution of errors indicates that these generally conform to a univariate Gaussian distribution, while the numerical assessment of the surface normal vectors on these surfaces indicates that the surface normal deviations appear to follow an approximately bivariate Gaussian distribution. Ray tracing ofmore » the measured surfaces to predict the expected flux distribution at the focal point of the 400 m{sup 2} dish show a close correlation with the videographically measured flux distribution at the focal point of the dish.« less
Computer routines for probability distributions, random numbers, and related functions

USGS Publications Warehouse

Kirby, W.

1983-01-01

Use of previously coded and tested subroutines simplifies and speeds up program development and testing. This report presents routines that can be used to calculate various probability distributions and other functions of importance in statistical hydrology. The routines are designed as general-purpose Fortran subroutines and functions to be called from user-written main progress. The probability distributions provided include the beta, chi-square, gamma, Gaussian (normal), Pearson Type III (tables and approximation), and Weibull. Also provided are the distributions of the Grubbs-Beck outlier test, Kolmogorov 's and Smirnov 's D, Student 's t, noncentral t (approximate), and Snedecor F. Other mathematical functions include the Bessel function, I sub o, gamma and log-gamma functions, error functions, and exponential integral. Auxiliary services include sorting and printer-plotting. Random number generators for uniform and normal numbers are provided and may be used with some of the above routines to generate numbers from other distributions. (USGS)
Computer routines for probability distributions, random numbers, and related functions

USGS Publications Warehouse

Kirby, W.H.

1980-01-01

Use of previously codes and tested subroutines simplifies and speeds up program development and testing. This report presents routines that can be used to calculate various probability distributions and other functions of importance in statistical hydrology. The routines are designed as general-purpose Fortran subroutines and functions to be called from user-written main programs. The probability distributions provided include the beta, chisquare, gamma, Gaussian (normal), Pearson Type III (tables and approximation), and Weibull. Also provided are the distributions of the Grubbs-Beck outlier test, Kolmogorov 's and Smirnov 's D, Student 's t, noncentral t (approximate), and Snedecor F tests. Other mathematical functions include the Bessel function I (subzero), gamma and log-gamma functions, error functions and exponential integral. Auxiliary services include sorting and printer plotting. Random number generators for uniform and normal numbers are provided and may be used with some of the above routines to generate numbers from other distributions. (USGS)
[Statistical (Poisson) motor unit number estimation. Methodological aspects and normal results in the extensor digitorum brevis muscle of healthy subjects].

PubMed

Murga Oporto, L; Menéndez-de León, C; Bauzano Poley, E; Núñez-Castaín, M J

Among the differents techniques for motor unit number estimation (MUNE) there is the statistical one (Poisson), in which the activation of motor units is carried out by electrical stimulation and the estimation performed by means of a statistical analysis based on the Poisson s distribution. The study was undertaken in order to realize an approximation to the MUNE Poisson technique showing a coprehensible view of its methodology and also to obtain normal results in the extensor digitorum brevis muscle (EDB) from a healthy population. One hundred fourteen normal volunteers with age ranging from 10 to 88 years were studied using the MUNE software contained in a Viking IV system. The normal subjects were divided into two age groups (10 59 and 60 88 years). The EDB MUNE from all them was 184 49. Both, the MUNE and the amplitude of the compound muscle action potential (CMAP) were significantly lower in the older age group (p< 0.0001), showing the MUNE a better correlation with age than CMAP amplitude ( 0.5002 and 0.4142, respectively p< 0.0001). Statistical MUNE method is an important way for the assessment to the phisiology of the motor unit. The value of MUNE correlates better with the neuromuscular aging process than CMAP amplitude does.
On measures of association among genetic variables

PubMed Central

Gianola, Daniel; Manfredi, Eduardo; Simianer, Henner

2012-01-01

Summary Systems involving many variables are important in population and quantitative genetics, for example, in multi-trait prediction of breeding values and in exploration of multi-locus associations. We studied departures of the joint distribution of sets of genetic variables from independence. New measures of association based on notions of statistical distance between distributions are presented. These are more general than correlations, which are pairwise measures, and lack a clear interpretation beyond the bivariate normal distribution. Our measures are based on logarithmic (Kullback-Leibler) and on relative ‘distances’ between distributions. Indexes of association are developed and illustrated for quantitative genetics settings in which the joint distribution of the variables is either multivariate normal or multivariate-t, and we show how the indexes can be used to study linkage disequilibrium in a two-locus system with multiple alleles and present applications to systems of correlated beta distributions. Two multivariate beta and multivariate beta-binomial processes are examined, and new distributions are introduced: the GMS-Sarmanov multivariate beta and its beta-binomial counterpart. PMID:22742500
Generalized t-statistic for two-group classification.

PubMed

Komori, Osamu; Eguchi, Shinto; Copas, John B

2015-06-01

In the classic discriminant model of two multivariate normal distributions with equal variance matrices, the linear discriminant function is optimal both in terms of the log likelihood ratio and in terms of maximizing the standardized difference (the t-statistic) between the means of the two distributions. In a typical case-control study, normality may be sensible for the control sample but heterogeneity and uncertainty in diagnosis may suggest that a more flexible model is needed for the cases. We generalize the t-statistic approach by finding the linear function which maximizes a standardized difference but with data from one of the groups (the cases) filtered by a possibly nonlinear function U. We study conditions for consistency of the method and find the function U which is optimal in the sense of asymptotic efficiency. Optimality may also extend to other measures of discriminatory efficiency such as the area under the receiver operating characteristic curve. The optimal function U depends on a scalar probability density function which can be estimated non-parametrically using a standard numerical algorithm. A lasso-like version for variable selection is implemented by adding L1-regularization to the generalized t-statistic. Two microarray data sets in the study of asthma and various cancers are used as motivating examples. © 2014, The International Biometric Society.
Checking distributional assumptions for pharmacokinetic summary statistics based on simulations with compartmental models.

PubMed

Shen, Meiyu; Russek-Cohen, Estelle; Slud, Eric V

2016-08-12

Bioequivalence (BE) studies are an essential part of the evaluation of generic drugs. The most common in vivo BE study design is the two-period two-treatment crossover design. AUC (area under the concentration-time curve) and Cmax (maximum concentration) are obtained from the observed concentration-time profiles for each subject from each treatment under each sequence. In the BE evaluation of pharmacokinetic crossover studies, the normality of the univariate response variable, e.g. log(AUC) 1 or log(Cmax), is often assumed in the literature without much evidence. Therefore, we investigate the distributional assumption of the normality of response variables, log(AUC) and log(Cmax), by simulating concentration-time profiles from two-stage pharmacokinetic models (commonly used in pharmacokinetic research) for a wide range of pharmacokinetic parameters and measurement error structures. Our simulations show that, under reasonable distributional assumptions on the pharmacokinetic parameters, log(AUC) has heavy tails and log(Cmax) is skewed. Sensitivity analyses are conducted to investigate how the distribution of the standardized log(AUC) (or the standardized log(Cmax)) for a large number of simulated subjects deviates from normality if distributions of errors in the pharmacokinetic model for plasma concentrations deviate from normality and if the plasma concentration can be described by different compartmental models.
Multisample adjusted U-statistics that account for confounding covariates.

PubMed

Satten, Glen A; Kong, Maiying; Datta, Somnath

2018-06-19

Multisample U-statistics encompass a wide class of test statistics that allow the comparison of 2 or more distributions. U-statistics are especially powerful because they can be applied to both numeric and nonnumeric data, eg, ordinal and categorical data where a pairwise similarity or distance-like measure between categories is available. However, when comparing the distribution of a variable across 2 or more groups, observed differences may be due to confounding covariates. For example, in a case-control study, the distribution of exposure in cases may differ from that in controls entirely because of variables that are related to both exposure and case status and are distributed differently among case and control participants. We propose to use individually reweighted data (ie, using the stratification score for retrospective data or the propensity score for prospective data) to construct adjusted U-statistics that can test the equality of distributions across 2 (or more) groups in the presence of confounding covariates. Asymptotic normality of our adjusted U-statistics is established and a closed form expression of their asymptotic variance is presented. The utility of our approach is demonstrated through simulation studies, as well as in an analysis of data from a case-control study conducted among African-Americans, comparing whether the similarity in haplotypes (ie, sets of adjacent genetic loci inherited from the same parent) occurring in a case and a control participant differs from the similarity in haplotypes occurring in 2 control participants. Copyright © 2018 John Wiley & Sons, Ltd.
Optimum runway orientation relative to crosswinds

NASA Technical Reports Server (NTRS)

Falls, L. W.; Brown, S. C.

1972-01-01

Specific magnitudes of crosswinds may exist that could be constraints to the success of an aircraft mission such as the landing of the proposed space shuttle. A method is required to determine the orientation or azimuth of the proposed runway which will minimize the probability of certain critical crosswinds. Two procedures for obtaining the optimum runway orientation relative to minimizing a specified crosswind speed are described and illustrated with examples. The empirical procedure requires only hand calculations on an ordinary wind rose. The theoretical method utilizes wind statistics computed after the bivariate normal elliptical distribution is applied to a data sample of component winds. This method requires only the assumption that the wind components are bivariate normally distributed. This assumption seems to be reasonable. Studies are currently in progress for testing wind components for bivariate normality for various stations. The close agreement between the theoretical and empirical results for the example chosen substantiates the bivariate normal assumption.

On the inequivalence of the CH and CHSH inequalities due to finite statistics

NASA Astrophysics Data System (ADS)

Renou, M. O.; Rosset, D.; Martin, A.; Gisin, N.

2017-06-01

Different variants of a Bell inequality, such as CHSH and CH, are known to be equivalent when evaluated on nonsignaling outcome probability distributions. However, in experimental setups, the outcome probability distributions are estimated using a finite number of samples. Therefore the nonsignaling conditions are only approximately satisfied and the robustness of the violation depends on the chosen inequality variant. We explain that phenomenon using the decomposition of the space of outcome probability distributions under the action of the symmetry group of the scenario, and propose a method to optimize the statistical robustness of a Bell inequality. In the process, we describe the finite group composed of relabeling of parties, measurement settings and outcomes, and identify correspondences between the irreducible representations of this group and properties of outcome probability distributions such as normalization, signaling or having uniform marginals.
Common Lognormal Behavior in Legal Systems

NASA Astrophysics Data System (ADS)

Yamamoto, Ken

2017-07-01

This study characterizes a statistical property of legal systems: the distribution of the number of articles in a law follows a lognormal distribution. This property is common to the Japanese, German, and Singaporean laws. To explain this lognormal behavior, tree structure of the law is analyzed. If the depth of a tree follows a normal distribution, the lognormal distribution of the number of articles can be theoretically derived. We analyze the structure of the Japanese laws using chapters, sections, and other levels of organization, and this analysis demonstrates that the proposed model is quantitatively reasonable.
A New Closed Form Approximation for BER for Optical Wireless Systems in Weak Atmospheric Turbulence

NASA Astrophysics Data System (ADS)

Kaushik, Rahul; Khandelwal, Vineet; Jain, R. C.

2018-04-01

Weak atmospheric turbulence condition in an optical wireless communication (OWC) is captured by log-normal distribution. The analytical evaluation of average bit error rate (BER) of an OWC system under weak turbulence is intractable as it involves the statistical averaging of Gaussian Q-function over log-normal distribution. In this paper, a simple closed form approximation for BER of OWC system under weak turbulence is given. Computation of BER for various modulation schemes is carried out using proposed expression. The results obtained using proposed expression compare favorably with those obtained using Gauss-Hermite quadrature approximation and Monte Carlo Simulations.
Fisher statistics for analysis of diffusion tensor directional information.

PubMed

Hutchinson, Elizabeth B; Rutecki, Paul A; Alexander, Andrew L; Sutula, Thomas P

2012-04-30

A statistical approach is presented for the quantitative analysis of diffusion tensor imaging (DTI) directional information using Fisher statistics, which were originally developed for the analysis of vectors in the field of paleomagnetism. In this framework, descriptive and inferential statistics have been formulated based on the Fisher probability density function, a spherical analogue of the normal distribution. The Fisher approach was evaluated for investigation of rat brain DTI maps to characterize tissue orientation in the corpus callosum, fornix, and hilus of the dorsal hippocampal dentate gyrus, and to compare directional properties in these regions following status epilepticus (SE) or traumatic brain injury (TBI) with values in healthy brains. Direction vectors were determined for each region of interest (ROI) for each brain sample and Fisher statistics were applied to calculate the mean direction vector and variance parameters in the corpus callosum, fornix, and dentate gyrus of normal rats and rats that experienced TBI or SE. Hypothesis testing was performed by calculation of Watson's F-statistic and associated p-value giving the likelihood that grouped observations were from the same directional distribution. In the fornix and midline corpus callosum, no directional differences were detected between groups, however in the hilus, significant (p<0.0005) differences were found that robustly confirmed observations that were suggested by visual inspection of directionally encoded color DTI maps. The Fisher approach is a potentially useful analysis tool that may extend the current capabilities of DTI investigation by providing a means of statistical comparison of tissue structural orientation. Copyright © 2012 Elsevier B.V. All rights reserved.
Analyzing repeated measures semi-continuous data, with application to an alcohol dependence study.

PubMed

Liu, Lei; Strawderman, Robert L; Johnson, Bankole A; O'Quigley, John M

2016-02-01

Two-part random effects models (Olsen and Schafer,(1) Tooze et al.(2)) have been applied to repeated measures of semi-continuous data, characterized by a mixture of a substantial proportion of zero values and a skewed distribution of positive values. In the original formulation of this model, the natural logarithm of the positive values is assumed to follow a normal distribution with a constant variance parameter. In this article, we review and consider three extensions of this model, allowing the positive values to follow (a) a generalized gamma distribution, (b) a log-skew-normal distribution, and (c) a normal distribution after the Box-Cox transformation. We allow for the possibility of heteroscedasticity. Maximum likelihood estimation is shown to be conveniently implemented in SAS Proc NLMIXED. The performance of the methods is compared through applications to daily drinking records in a secondary data analysis from a randomized controlled trial of topiramate for alcohol dependence treatment. We find that all three models provide a significantly better fit than the log-normal model, and there exists strong evidence for heteroscedasticity. We also compare the three models by the likelihood ratio tests for non-nested hypotheses (Vuong(3)). The results suggest that the generalized gamma distribution provides the best fit, though no statistically significant differences are found in pairwise model comparisons. © The Author(s) 2012.
Statistics of Weighted Brain Networks Reveal Hierarchical Organization and Gaussian Degree Distribution

PubMed Central

Ivković, Miloš; Kuceyeski, Amy; Raj, Ashish

2012-01-01

Whole brain weighted connectivity networks were extracted from high resolution diffusion MRI data of 14 healthy volunteers. A statistically robust technique was proposed for the removal of questionable connections. Unlike most previous studies our methods are completely adapted for networks with arbitrary weights. Conventional statistics of these weighted networks were computed and found to be comparable to existing reports. After a robust fitting procedure using multiple parametric distributions it was found that the weighted node degree of our networks is best described by the normal distribution, in contrast to previous reports which have proposed heavy tailed distributions. We show that post-processing of the connectivity weights, such as thresholding, can influence the weighted degree asymptotics. The clustering coefficients were found to be distributed either as gamma or power-law distribution, depending on the formula used. We proposed a new hierarchical graph clustering approach, which revealed that the brain network is divided into a regular base-2 hierarchical tree. Connections within and across this hierarchy were found to be uncommonly ordered. The combined weight of our results supports a hierarchically ordered view of the brain, whose connections have heavy tails, but whose weighted node degrees are comparable. PMID:22761649
Statistics of weighted brain networks reveal hierarchical organization and Gaussian degree distribution.

PubMed

Ivković, Miloš; Kuceyeski, Amy; Raj, Ashish

2012-01-01

Whole brain weighted connectivity networks were extracted from high resolution diffusion MRI data of 14 healthy volunteers. A statistically robust technique was proposed for the removal of questionable connections. Unlike most previous studies our methods are completely adapted for networks with arbitrary weights. Conventional statistics of these weighted networks were computed and found to be comparable to existing reports. After a robust fitting procedure using multiple parametric distributions it was found that the weighted node degree of our networks is best described by the normal distribution, in contrast to previous reports which have proposed heavy tailed distributions. We show that post-processing of the connectivity weights, such as thresholding, can influence the weighted degree asymptotics. The clustering coefficients were found to be distributed either as gamma or power-law distribution, depending on the formula used. We proposed a new hierarchical graph clustering approach, which revealed that the brain network is divided into a regular base-2 hierarchical tree. Connections within and across this hierarchy were found to be uncommonly ordered. The combined weight of our results supports a hierarchically ordered view of the brain, whose connections have heavy tails, but whose weighted node degrees are comparable.
A statistical study of EMIC waves observed by Cluster: 1. Wave properties

NASA Astrophysics Data System (ADS)

Allen, R. C.; Zhang, J.-C.; Kistler, L. M.; Spence, H. E.; Lin, R.-L.; Klecker, B.; Dunlop, M. W.; André, M.; Jordanova, V. K.

2015-07-01

Electromagnetic ion cyclotron (EMIC) waves are an important mechanism for particle energization and losses inside the magnetosphere. In order to better understand the effects of these waves on particle dynamics, detailed information about the occurrence rate, wave power, ellipticity, normal angle, energy propagation angle distributions, and local plasma parameters are required. Previous statistical studies have used in situ observations to investigate the distribution of these parameters in the magnetic local time versus L-shell (MLT-L) frame within a limited magnetic latitude (MLAT) range. In this study, we present a statistical analysis of EMIC wave properties using 10 years (2001-2010) of data from Cluster, totaling 25,431 min of wave activity. Due to the polar orbit of Cluster, we are able to investigate EMIC waves at all MLATs and MLTs. This allows us to further investigate the MLAT dependence of various wave properties inside different MLT sectors and further explore the effects of Shabansky orbits on EMIC wave generation and propagation. The statistical analysis is presented in two papers. This paper focuses on the wave occurrence distribution as well as the distribution of wave properties. The companion paper focuses on local plasma parameters during wave observations as well as wave generation proxies.
Quantifying nonstationary radioactivity concentration fluctuations near Chernobyl: A complete statistical description

NASA Astrophysics Data System (ADS)

Viswanathan, G. M.; Buldyrev, S. V.; Garger, E. K.; Kashpur, V. A.; Lucena, L. S.; Shlyakhter, A.; Stanley, H. E.; Tschiersch, J.

2000-09-01

We analyze nonstationary 137Cs atmospheric activity concentration fluctuations measured near Chernobyl after the 1986 disaster and find three new results: (i) the histogram of fluctuations is well described by a log-normal distribution; (ii) there is a pronounced spectral component with period T=1yr, and (iii) the fluctuations are long-range correlated. These findings allow us to quantify two fundamental statistical properties of the data: the probability distribution and the correlation properties of the time series. We interpret our findings as evidence that the atmospheric radionuclide resuspension processes are tightly coupled to the surrounding ecosystems and to large time scale weather patterns.
Liquid water breakthrough location distances on a gas diffusion layer of polymer electrolyte membrane fuel cells

NASA Astrophysics Data System (ADS)

Yu, Junliang; Froning, Dieter; Reimer, Uwe; Lehnert, Werner

2018-06-01

The lattice Boltzmann method is adopted to simulate the three dimensional dynamic process of liquid water breaking through the gas diffusion layer (GDL) in the polymer electrolyte membrane fuel cell. 22 micro-structures of Toray GDL are built based on a stochastic geometry model. It is found that more than one breakthrough locations are formed randomly on the GDL surface. Breakthrough location distance (BLD) are analyzed statistically in two ways. The distribution is evaluated statistically by the Lilliefors test. It is concluded that the BLD can be described by the normal distribution with certain statistic characteristics. Information of the shortest neighbor breakthrough location distance can be the input modeling setups on the cell-scale simulations in the field of fuel cell simulation.
Possible Statistics of Two Coupled Random Fields: Application to Passive Scalar

NASA Technical Reports Server (NTRS)

Dubrulle, B.; He, Guo-Wei; Bushnell, Dennis M. (Technical Monitor)

2000-01-01

We use the relativity postulate of scale invariance to derive the similarity transformations between two coupled scale-invariant random elds at different scales. We nd the equations leading to the scaling exponents. This formulation is applied to the case of passive scalars advected i) by a random Gaussian velocity field; and ii) by a turbulent velocity field. In the Gaussian case, we show that the passive scalar increments follow a log-Levy distribution generalizing Kraichnan's solution and, in an appropriate limit, a log-normal distribution. In the turbulent case, we show that when the velocity increments follow a log-Poisson statistics, the passive scalar increments follow a statistics close to log-Poisson. This result explains the experimental observations of Ruiz et al. about the temperature increments.
Deterministic annealing for density estimation by multivariate normal mixtures

NASA Astrophysics Data System (ADS)

Kloppenburg, Martin; Tavan, Paul

1997-03-01

An approach to maximum-likelihood density estimation by mixtures of multivariate normal distributions for large high-dimensional data sets is presented. Conventionally that problem is tackled by notoriously unstable expectation-maximization (EM) algorithms. We remove these instabilities by the introduction of soft constraints, enabling deterministic annealing. Our developments are motivated by the proof that algorithmically stable fuzzy clustering methods that are derived from statistical physics analogs are special cases of EM procedures.
Assessment of the maximum voluntary arm muscle contraction in sign language for the deaf.

PubMed

Regalo, S C H; Teixeira, V R; Vitti, M; Chaves, T C; Hallak, J E C; Bevilaqua-Grossi, D; Siriani de Oliveira, A

2006-01-01

The purpose of this study was to investigate the levels of upper member muscles' activation of deaf individuals, who use the Brazilian sign language - LIBRAS, comparing these findings to volunteers with no postural deviations and normal hearing Forty eight volunteers divided into two groups comprising healthy and deaf subjects (24 volunteers for each group). The signs of rest were obtained with the volunteer maintaining the upper member in an anatomical position, but with the forearm flexed and sustained by the lower member. Maximum voluntary isometric contractions (MVIC) of the biceps, triceps, deltoid, and trapezius muscles were performed in the position of muscular function testing. Statistical analysis was performed using the SPSS-10.0. Continuous data with normal distribution were analyzed by ANOVA with the significance level of p < 0.01. The normalized electromyographic muscle data obtained in muscular rest do not show statistically significant differences among the studies muscles, in both groups. In the comparison of normalized RMS values obtained in MVIC, the mean values for the trapezius muscle of deaf group were statistically lower than control group. This study's results indicate there are no differences between the levels of muscular activation for arm biceps, arm triceps, and the anterior portion of the deltoid muscle between the mean normalized RMS values of deaf and healthy individuals.
Statistical properties of two sine waves in Gaussian noise.

NASA Technical Reports Server (NTRS)

Esposito, R.; Wilson, L. R.

1973-01-01

A detailed study is presented of some statistical properties of a stochastic process that consists of the sum of two sine waves of unknown relative phase and a normal process. Since none of the statistics investigated seem to yield a closed-form expression, all the derivations are cast in a form that is particularly suitable for machine computation. Specifically, results are presented for the probability density function (pdf) of the envelope and the instantaneous value, the moments of these distributions, and the relative cumulative density function (cdf).
Evaluating the event-related synchronization and desynchronization by means of a statistical frequency test.

PubMed

Miranda de Sá, Antonio Mauricio F L; Infantosi, Antonio Fernando C; Lazarev, Vladimir V

2007-01-01

In the present work, a commonly used index for evaluating the Event-Related Synchronization and Desynchronization (ERS/ERD) in the EEG was expressed as a function of the Spectral F-Test (SFT), which is a statistical test for assessing if two sample spectra are from populations with identical theoretical spectra. The sampling distribution of SFT has been derived, allowing hence ERS/ERD to be evaluated under a statistical basis. An example of the technique was also provided in the EEG signals from 10 normal subjects during intermittent photic stimulation.
Statistical Considerations of Data Processing in Giovanni Online Tool

NASA Technical Reports Server (NTRS)

Suhung, Shen; Leptoukh, G.; Acker, J.; Berrick, S.

2005-01-01

The GES DISC Interactive Online Visualization and Analysis Infrastructure (Giovanni) is a web-based interface for the rapid visualization and analysis of gridded data from a number of remote sensing instruments. The GES DISC currently employs several Giovanni instances to analyze various products, such as Ocean-Giovanni for ocean products from SeaWiFS and MODIS-Aqua; TOMS & OM1 Giovanni for atmospheric chemical trace gases from TOMS and OMI, and MOVAS for aerosols from MODIS, etc. (http://giovanni.gsfc.nasa.gov) Foremost among the Giovanni statistical functions is data averaging. Two aspects of this function are addressed here. The first deals with the accuracy of averaging gridded mapped products vs. averaging from the ungridded Level 2 data. Some mapped products contain mean values only; others contain additional statistics, such as number of pixels (NP) for each grid, standard deviation, etc. Since NP varies spatially and temporally, averaging with or without weighting by NP will be different. In this paper, we address differences of various weighting algorithms for some datasets utilized in Giovanni. The second aspect is related to different averaging methods affecting data quality and interpretation for data with non-normal distribution. The present study demonstrates results of different spatial averaging methods using gridded SeaWiFS Level 3 mapped monthly chlorophyll a data. Spatial averages were calculated using three different methods: arithmetic mean (AVG), geometric mean (GEO), and maximum likelihood estimator (MLE). Biogeochemical data, such as chlorophyll a, are usually considered to have a log-normal distribution. The study determined that differences between methods tend to increase with increasing size of a selected coastal area, with no significant differences in most open oceans. The GEO method consistently produces values lower than AVG and MLE. The AVG method produces values larger than MLE in some cases, but smaller in other cases. Further studies indicated that significant differences between AVG and MLE methods occurred in coastal areas where data have large spatial variations and a log-bimodal distribution instead of log-normal distribution.
Topics in Statistical Calibration

DTIC Science & Technology

2014-03-27

on a parametric bootstrap where, instead of sampling directly from the residuals , samples are drawn from a normal distribution. This procedure will...addition to centering them (Davison and Hinkley, 1997). When there are outliers in the residuals , the bootstrap distribution of x̂0 can become skewed or...based and inversion methods using the linear mixed-effects model. Then, a simple parametric bootstrap algorithm is proposed that can be used to either
ELISPOTs Produced by CD8 and CD4 Cells Follow Log Normal Size Distribution Permitting Objective Counting

PubMed Central

Karulin, Alexey Y.; Karacsony, Kinga; Zhang, Wenji; Targoni, Oleg S.; Moldovan, Ioana; Dittrich, Marcus; Sundararaman, Srividya; Lehmann, Paul V.

2015-01-01

Each positive well in ELISPOT assays contains spots of variable sizes that can range from tens of micrometers up to a millimeter in diameter. Therefore, when it comes to counting these spots the decision on setting the lower and the upper spot size thresholds to discriminate between non-specific background noise, spots produced by individual T cells, and spots formed by T cell clusters is critical. If the spot sizes follow a known statistical distribution, precise predictions on minimal and maximal spot sizes, belonging to a given T cell population, can be made. We studied the size distributional properties of IFN-γ, IL-2, IL-4, IL-5 and IL-17 spots elicited in ELISPOT assays with PBMC from 172 healthy donors, upon stimulation with 32 individual viral peptides representing defined HLA Class I-restricted epitopes for CD8 cells, and with protein antigens of CMV and EBV activating CD4 cells. A total of 334 CD8 and 80 CD4 positive T cell responses were analyzed. In 99.7% of the test cases, spot size distributions followed Log Normal function. These data formally demonstrate that it is possible to establish objective, statistically validated parameters for counting T cell ELISPOTs. PMID:25612115
Sampling errors for satellite-derived tropical rainfall - Monte Carlo study using a space-time stochastic model

NASA Technical Reports Server (NTRS)

Bell, Thomas L.; Abdullah, A.; Martin, Russell L.; North, Gerald R.

1990-01-01

Estimates of monthly average rainfall based on satellite observations from a low earth orbit will differ from the true monthly average because the satellite observes a given area only intermittently. This sampling error inherent in satellite monitoring of rainfall would occur even if the satellite instruments could measure rainfall perfectly. The size of this error is estimated for a satellite system being studied at NASA, the Tropical Rainfall Measuring Mission (TRMM). First, the statistical description of rainfall on scales from 1 to 1000 km is examined in detail, based on rainfall data from the Global Atmospheric Research Project Atlantic Tropical Experiment (GATE). A TRMM-like satellite is flown over a two-dimensional time-evolving simulation of rainfall using a stochastic model with statistics tuned to agree with GATE statistics. The distribution of sampling errors found from many months of simulated observations is found to be nearly normal, even though the distribution of area-averaged rainfall is far from normal. For a range of orbits likely to be employed in TRMM, sampling error is found to be less than 10 percent of the mean for rainfall averaged over a 500 x 500 sq km area.
Robust LOD scores for variance component-based linkage analysis.

PubMed

Blangero, J; Williams, J T; Almasy, L

2000-01-01

The variance component method is now widely used for linkage analysis of quantitative traits. Although this approach offers many advantages, the importance of the underlying assumption of multivariate normality of the trait distribution within pedigrees has not been studied extensively. Simulation studies have shown that traits with leptokurtic distributions yield linkage test statistics that exhibit excessive Type I error when analyzed naively. We derive analytical formulae relating the deviation from the expected asymptotic distribution of the lod score to the kurtosis and total heritability of the quantitative trait. A simple correction constant yields a robust lod score for any deviation from normality and for any pedigree structure, and effectively eliminates the problem of inflated Type I error due to misspecification of the underlying probability model in variance component-based linkage analysis.

Comparison of Grand Median and Cumulative Sum Control Charts on Shuttlecock Weight Variable in CV Marjoko Kompas dan Domas

NASA Astrophysics Data System (ADS)

Musdalifah, N.; Handajani, S. S.; Zukhronah, E.

2017-06-01

Competition between the homoneous companies cause the company have to keep production quality. To cover this problem, the company controls the production with statistical quality control using control chart. Shewhart control chart is used to normal distributed data. The production data is often non-normal distribution and occured small process shift. Grand median control chart is a control chart for non-normal distributed data, while cumulative sum (cusum) control chart is a sensitive control chart to detect small process shift. The purpose of this research is to compare grand median and cusum control charts on shuttlecock weight variable in CV Marjoko Kompas dan Domas by generating data as the actual distribution. The generated data is used to simulate multiplier of standard deviation on grand median and cusum control charts. Simulation is done to get average run lenght (ARL) 370. Grand median control chart detects ten points that out of control, while cusum control chart detects a point out of control. It can be concluded that grand median control chart is better than cusum control chart.
A Robust Alternative to the Normal Distribution.

DTIC Science & Technology

1982-07-07

for any Purpose of the United States Governuent DEPARTMENT OF STATISTICS t -, STANFORD UIVERSITY I STANFORD, CALIFORNIA A Robust Alternative to the...Stanford University Technical Report No. 3. [5] Bhattacharya, S. K. (1966). A Modified Bessel Function lodel in Life Testing. Metrika 10, 133-144
DOE Office of Scientific and Technical Information (OSTI.GOV)

Simonen, F.A.; Khaleel, M.A.

This paper describes a statistical evaluation of the through-thickness copper variation for welds in reactor pressure vessels, and reviews the historical basis for the static and arrest fracture toughness (K{sub Ic} and K{sub Ia}) equations used in the VISA-II code. Copper variability in welds is due to fabrication procedures with copper contents being randomly distributed, variable from one location to another through the thickness of the vessel. The VISA-II procedure of sampling the copper content from a statistical distribution for every 6.35- to 12.7-mm (1/4- to 1/2-in.) layer through the thickness was found to be consistent with the statistical observations.more » However, the parameters of the VISA-II distribution and statistical limits required further investigation. Copper contents at few locations through the thickness were found to exceed the 0.4% upper limit of the VISA-II code. The data also suggest that the mean copper content varies systematically through the thickness. While, the assumption of normality is not clearly supported by the available data, a statistical evaluation based on all the available data results in mean and standard deviations within the VISA-II code limits.« less
Generalized ensemble theory with non-extensive statistics

NASA Astrophysics Data System (ADS)

Shen, Ke-Ming; Zhang, Ben-Wei; Wang, En-Ke

2017-12-01

The non-extensive canonical ensemble theory is reconsidered with the method of Lagrange multipliers by maximizing Tsallis entropy, with the constraint that the normalized term of Tsallis' q -average of physical quantities, the sum ∑ pjq, is independent of the probability pi for Tsallis parameter q. The self-referential problem in the deduced probability and thermal quantities in non-extensive statistics is thus avoided, and thermodynamical relationships are obtained in a consistent and natural way. We also extend the study to the non-extensive grand canonical ensemble theory and obtain the q-deformed Bose-Einstein distribution as well as the q-deformed Fermi-Dirac distribution. The theory is further applied to the generalized Planck law to demonstrate the distinct behaviors of the various generalized q-distribution functions discussed in literature.
GTest: a software tool for graphical assessment of empirical distributions' Gaussianity.

PubMed

Barca, E; Bruno, E; Bruno, D E; Passarella, G

2016-03-01

In the present paper, the novel software GTest is introduced, designed for testing the normality of a user-specified empirical distribution. It has been implemented with two unusual characteristics; the first is the user option of selecting four different versions of the normality test, each of them suited to be applied to a specific dataset or goal, and the second is the inferential paradigm that informs the output of such tests: it is basically graphical and intrinsically self-explanatory. The concept of inference-by-eye is an emerging inferential approach which will find a successful application in the near future due to the growing need of widening the audience of users of statistical methods to people with informal statistical skills. For instance, the latest European regulation concerning environmental issues introduced strict protocols for data handling (data quality assurance, outliers detection, etc.) and information exchange (areal statistics, trend detection, etc.) between regional and central environmental agencies. Therefore, more and more frequently, laboratory and field technicians will be requested to utilize complex software applications for subjecting data coming from monitoring, surveying or laboratory activities to specific statistical analyses. Unfortunately, inferential statistics, which actually influence the decisional processes for the correct managing of environmental resources, are often implemented in a way which expresses its outcomes in a numerical form with brief comments in a strict statistical jargon (degrees of freedom, level of significance, accepted/rejected H0, etc.). Therefore, often, the interpretation of such outcomes is really difficult for people with poor statistical knowledge. In such framework, the paradigm of the visual inference can contribute to fill in such gap, providing outcomes in self-explanatory graphical forms with a brief comment in the common language. Actually, the difficulties experienced by colleagues and their request for an effective tool for addressing such difficulties motivated us in adopting the inference-by-eye paradigm and implementing an easy-to-use, quick and reliable statistical tool. GTest visualizes its outcomes as a modified version of the Q-Q plot. The application has been developed in Visual Basic for Applications (VBA) within MS Excel 2010, which demonstrated to have all the characteristics of robustness and reliability needed. GTest provides true graphical normality tests which are as reliable as any statistical quantitative approach but much easier to understand. The Q-Q plots have been integrated with the outlining of an acceptance region around the representation of the theoretical distribution, defined in accordance with the alpha level of significance and the data sample size. The test decision rule is the following: if the empirical scatterplot falls completely within the acceptance region, then it can be concluded that the empirical distribution fits the theoretical one at the given alpha level. A comprehensive case study has been carried out with simulated and real-world data in order to check the robustness and reliability of the software.
Empirical study of the tails of mutual fund size

NASA Astrophysics Data System (ADS)

Schwarzkopf, Yonathan; Farmer, J. Doyne

2010-06-01

The mutual fund industry manages about a quarter of the assets in the U.S. stock market and thus plays an important role in the U.S. economy. The question of how much control is concentrated in the hands of the largest players is best quantitatively discussed in terms of the tail behavior of the mutual fund size distribution. We study the distribution empirically and show that the tail is much better described by a log-normal than a power law, indicating less concentration than, for example, personal income. The results are highly statistically significant and are consistent across fifteen years. This contradicts a recent theory concerning the origin of the power law tails of the trading volume distribution. Based on the analysis in a companion paper, the log-normality is to be expected, and indicates that the distribution of mutual funds remains perpetually out of equilibrium.
Probabilistic model of bridge vehicle loads in port area based on in-situ load testing

NASA Astrophysics Data System (ADS)

Deng, Ming; Wang, Lei; Zhang, Jianren; Wang, Rei; Yan, Yanhong

2017-11-01

Vehicle load is an important factor affecting the safety and usability of bridges. An statistical analysis is carried out in this paper to investigate the vehicle load data of Tianjin Haibin highway in Tianjin port of China, which are collected by the Weigh-in- Motion (WIM) system. Following this, the effect of the vehicle load on test bridge is calculated, and then compared with the calculation result according to HL-93(AASHTO LRFD). Results show that the overall vehicle load follows a distribution with a weighted sum of four normal distributions. The maximum vehicle load during the design reference period follows a type I extremum distribution. The vehicle load effect also follows a weighted sum of four normal distributions, and the standard value of the vehicle load is recommended as 1.8 times that of the calculated value according to HL-93.
Measuring Treasury Bond Portfolio Risk and Portfolio Optimization with a Non-Gaussian Multivariate Model

NASA Astrophysics Data System (ADS)

Dong, Yijun

The research about measuring the risk of a bond portfolio and the portfolio optimization was relatively rare previously, because the risk factors of bond portfolios are not very volatile. However, this condition has changed recently. The 2008 financial crisis brought high volatility to the risk factors and the related bond securities, even if the highly rated U.S. treasury bonds. Moreover, the risk factors of bond portfolios show properties of fat-tailness and asymmetry like risk factors of equity portfolios. Therefore, we need to use advanced techniques to measure and manage risk of bond portfolios. In our paper, we first apply autoregressive moving average generalized autoregressive conditional heteroscedasticity (ARMA-GARCH) model with multivariate normal tempered stable (MNTS) distribution innovations to predict risk factors of U.S. treasury bonds and statistically demonstrate that MNTS distribution has the ability to capture the properties of risk factors based on the goodness-of-fit tests. Then based on empirical evidence, we find that the VaR and AVaR estimated by assuming normal tempered stable distribution are more realistic and reliable than those estimated by assuming normal distribution, especially for the financial crisis period. Finally, we use the mean-risk portfolio optimization to minimize portfolios' potential risks. The empirical study indicates that the optimized bond portfolios have better risk-adjusted performances than the benchmark portfolios for some periods. Moreover, the optimized bond portfolios obtained by assuming normal tempered stable distribution have improved performances in comparison to the optimized bond portfolios obtained by assuming normal distribution.
Inverse statistics in the foreign exchange market

NASA Astrophysics Data System (ADS)

Jensen, M. H.; Johansen, A.; Petroni, F.; Simonsen, I.

2004-09-01

We investigate intra-day foreign exchange (FX) time series using the inverse statistic analysis developed by Simonsen et al. (Eur. Phys. J. 27 (2002) 583) and Jensen et al. (Physica A 324 (2003) 338). Specifically, we study the time-averaged distributions of waiting times needed to obtain a certain increase (decrease) ρ in the price of an investment. The analysis is performed for the Deutsch Mark (DM) against the US for the full year of 1998, but similar results are obtained for the Japanese Yen against the US. With high statistical significance, the presence of “resonance peaks” in the waiting time distributions is established. Such peaks are a consequence of the trading habits of the market participants as they are not present in the corresponding tick (business) waiting time distributions. Furthermore, a new stylized fact, is observed for the (normalized) waiting time distribution in the form of a power law Pdf. This result is achieved by rescaling of the physical waiting time by the corresponding tick time thereby partially removing scale-dependent features of the market activity.
Multiplicative processes in visual cognition

NASA Astrophysics Data System (ADS)

Credidio, H. F.; Teixeira, E. N.; Reis, S. D. S.; Moreira, A. A.; Andrade, J. S.

2014-03-01

The Central Limit Theorem (CLT) is certainly one of the most important results in the field of statistics. The simple fact that the addition of many random variables can generate the same probability curve, elucidated the underlying process for a broad spectrum of natural systems, ranging from the statistical distribution of human heights to the distribution of measurement errors, to mention a few. An extension of the CLT can be applied to multiplicative processes, where a given measure is the result of the product of many random variables. The statistical signature of these processes is rather ubiquitous, appearing in a diverse range of natural phenomena, including the distributions of incomes, body weights, rainfall, and fragment sizes in a rock crushing process. Here we corroborate results from previous studies which indicate the presence of multiplicative processes in a particular type of visual cognition task, namely, the visual search for hidden objects. Precisely, our results from eye-tracking experiments show that the distribution of fixation times during visual search obeys a log-normal pattern, while the fixational radii of gyration follow a power-law behavior.
Superstatistics analysis of the ion current distribution function: Met3PbCl influence study.

PubMed

Miśkiewicz, Janusz; Trela, Zenon; Przestalski, Stanisław; Karcz, Waldemar

2010-09-01

A novel analysis of ion current time series is proposed. It is shown that higher (second, third and fourth) statistical moments of the ion current probability distribution function (PDF) can yield new information about ion channel properties. The method is illustrated on a two-state model where the PDF of the compound states are given by normal distributions. The proposed method was applied to the analysis of the SV cation channels of vacuolar membrane of Beta vulgaris and the influence of trimethyllead chloride (Met(3)PbCl) on the ion current probability distribution. Ion currents were measured by patch-clamp technique. It was shown that Met(3)PbCl influences the variance of the open-state ion current but does not alter the PDF of the closed-state ion current. Incorporation of higher statistical moments into the standard investigation of ion channel properties is proposed.
Statistics of backscatter radar return from vegetation

NASA Technical Reports Server (NTRS)

Karam, M. A.; Chen, K. S.; Fung, A. K.

1992-01-01

The statistical characteristics of radar return from vegetation targets are investigated through a simulation study based upon the first-order scattered field. For simulation purposes, the vegetation targets are modeled as a layer of randomly oriented and spaced finite cylinders, needles, or discs, or a combination of them. The finite cylinder is used to represent a branch or a trunk, the needle for a stem or a coniferous leaf, and the disc for a decidous leaf. For a plane wave illuminating a vegetation canopy, simulation results show that the signal returned from a layer of disc- or needle-shaped leaves follows the Gamma distribution, and that the signal returned from a layer of branches resembles the log normal distribution. The Gamma distribution also represents the signal returned from a layer of a mixture of branches and leaves regardless of the leaf shapes. Results also indicate that the polarization state does not have a significant impact on signal distribution.
A methodology for the stochastic generation of hourly synthetic direct normal irradiation time series

NASA Astrophysics Data System (ADS)

Larrañeta, M.; Moreno-Tejera, S.; Lillo-Bravo, I.; Silva-Pérez, M. A.

2018-02-01

Many of the available solar radiation databases only provide global horizontal irradiance (GHI) while there is a growing need of extensive databases of direct normal radiation (DNI) mainly for the development of concentrated solar power and concentrated photovoltaic technologies. In the present work, we propose a methodology for the generation of synthetic DNI hourly data from the hourly average GHI values by dividing the irradiance into a deterministic and stochastic component intending to emulate the dynamics of the solar radiation. The deterministic component is modeled through a simple classical model. The stochastic component is fitted to measured data in order to maintain the consistency of the synthetic data with the state of the sky, generating statistically significant DNI data with a cumulative frequency distribution very similar to the measured data. The adaptation and application of the model to the location of Seville shows significant improvements in terms of frequency distribution over the classical models. The proposed methodology applied to other locations with different climatological characteristics better results than the classical models in terms of frequency distribution reaching a reduction of the 50% in the Finkelstein-Schafer (FS) and Kolmogorov-Smirnov test integral (KSI) statistics.
Alpha, delta and theta rhythms in a neural net model. Comparison with MEG data.

PubMed

Kotini, A; Anninos, P

2016-01-07

The aim of this study is to provide information regarding the comparison of a neural model to MEG measurements. Our study population consisted of 10 epileptic patients and 10 normal subjects. The epileptic patients had high MEG amplitudes characterized with θ (4-7 Hz) or δ (2-3 Hz) rhythms and absence of α-rhythm (8-13 Hz). The statistical analysis of such activities corresponded to Poisson distribution. Conversely, the MEG from normal subjects had low amplitudes, higher frequencies and presence of α-rhythm (8-13 Hz). Such activities were not synchronized and their distributions were Gauss. These findings were in agreement with our theoretical neural model. The comparison of the neural network with MEG data provides information about the status of brain function in epileptic and normal states. Copyright © 2015 Elsevier Ltd. All rights reserved.
Factoring handedness data: II. Geschwind's multidimensional hypothesis.

PubMed

Messinger, H B; Messinger, M I

1996-06-01

The challenge in this journal by Peters and Murphy to the validity of two published factor analyses of handedness data because of bimodality was dealt with in Part I by identifying measures to normalize the handedness item distributions. A new survey using Oldfield's questionnaire format had 38 bell-shaped (unimodal) handedness-item distributions and 11 that were only marginally bimodal out of the 55 items used in Geschwind's 1986 study. Yet they were still non-normal and the factor analysis was unsatisfactory; bimodality is not the only problem. By choosing a transformation for each item that was optimal as assessed by D'Agostino's K2 statistic, all but two items could be normalized. Seven factors were derived that showed high congruence between maximum likelihood and principal components extractions before and after varimax rotation. Geschwind's assertion that handedness is not unidimensional is therefore supported.
A comparison of likelihood ratio tests and Rao's score test for three separable covariance matrix structures.

PubMed

Filipiak, Katarzyna; Klein, Daniel; Roy, Anuradha

2017-01-01

The problem of testing the separability of a covariance matrix against an unstructured variance-covariance matrix is studied in the context of multivariate repeated measures data using Rao's score test (RST). The RST statistic is developed with the first component of the separable structure as a first-order autoregressive (AR(1)) correlation matrix or an unstructured (UN) covariance matrix under the assumption of multivariate normality. It is shown that the distribution of the RST statistic under the null hypothesis of any separability does not depend on the true values of the mean or the unstructured components of the separable structure. A significant advantage of the RST is that it can be performed for small samples, even smaller than the dimension of the data, where the likelihood ratio test (LRT) cannot be used, and it outperforms the standard LRT in a number of contexts. Monte Carlo simulations are then used to study the comparative behavior of the null distribution of the RST statistic, as well as that of the LRT statistic, in terms of sample size considerations, and for the estimation of the empirical percentiles. Our findings are compared with existing results where the first component of the separable structure is a compound symmetry (CS) correlation matrix. It is also shown by simulations that the empirical null distribution of the RST statistic converges faster than the empirical null distribution of the LRT statistic to the limiting χ 2 distribution. The tests are implemented on a real dataset from medical studies. © 2016 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Analyzing Large Gene Expression and Methylation Data Profiles Using StatBicRM: Statistical Biclustering-Based Rule Mining

PubMed Central

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level. PMID:25830807
Analyzing large gene expression and methylation data profiles using StatBicRM: statistical biclustering-based rule mining.

PubMed

Maulik, Ujjwal; Mallik, Saurav; Mukhopadhyay, Anirban; Bandyopadhyay, Sanghamitra

2015-01-01

Microarray and beadchip are two most efficient techniques for measuring gene expression and methylation data in bioinformatics. Biclustering deals with the simultaneous clustering of genes and samples. In this article, we propose a computational rule mining framework, StatBicRM (i.e., statistical biclustering-based rule mining) to identify special type of rules and potential biomarkers using integrated approaches of statistical and binary inclusion-maximal biclustering techniques from the biological datasets. At first, a novel statistical strategy has been utilized to eliminate the insignificant/low-significant/redundant genes in such way that significance level must satisfy the data distribution property (viz., either normal distribution or non-normal distribution). The data is then discretized and post-discretized, consecutively. Thereafter, the biclustering technique is applied to identify maximal frequent closed homogeneous itemsets. Corresponding special type of rules are then extracted from the selected itemsets. Our proposed rule mining method performs better than the other rule mining algorithms as it generates maximal frequent closed homogeneous itemsets instead of frequent itemsets. Thus, it saves elapsed time, and can work on big dataset. Pathway and Gene Ontology analyses are conducted on the genes of the evolved rules using David database. Frequency analysis of the genes appearing in the evolved rules is performed to determine potential biomarkers. Furthermore, we also classify the data to know how much the evolved rules are able to describe accurately the remaining test (unknown) data. Subsequently, we also compare the average classification accuracy, and other related factors with other rule-based classifiers. Statistical significance tests are also performed for verifying the statistical relevance of the comparative results. Here, each of the other rule mining methods or rule-based classifiers is also starting with the same post-discretized data-matrix. Finally, we have also included the integrated analysis of gene expression and methylation for determining epigenetic effect (viz., effect of methylation) on gene expression level.
Functional Relationships and Regression Analysis.

ERIC Educational Resources Information Center

Preece, Peter F. W.

1978-01-01

Using a degenerate multivariate normal model for the distribution of organismic variables, the form of least-squares regression analysis required to estimate a linear functional relationship between variables is derived. It is suggested that the two conventional regression lines may be considered to describe functional, not merely statistical,…
Spear Phishing Attack Detection

DTIC Science & Technology

2011-03-24

the insider amongst senior leaders of an organization [Mes08], the undercover detective within a drug cartel, or the classic secret agent planted in...to a mimicry attack that shapes the embedded malware to have a statistical distribution similar to "normal" or benign behavior. 2.3.1.3

Bayesian alternative to the ISO-GUM's use of the Welch Satterthwaite formula

NASA Astrophysics Data System (ADS)

Kacker, Raghu N.

2006-02-01

In certain disciplines, uncertainty is traditionally expressed as an interval about an estimate for the value of the measurand. Development of such uncertainty intervals with a stated coverage probability based on the International Organization for Standardization (ISO) Guide to the Expression of Uncertainty in Measurement (GUM) requires a description of the probability distribution for the value of the measurand. The ISO-GUM propagates the estimates and their associated standard uncertainties for various input quantities through a linear approximation of the measurement equation to determine an estimate and its associated standard uncertainty for the value of the measurand. This procedure does not yield a probability distribution for the value of the measurand. The ISO-GUM suggests that under certain conditions motivated by the central limit theorem the distribution for the value of the measurand may be approximated by a scaled-and-shifted t-distribution with effective degrees of freedom obtained from the Welch-Satterthwaite (W-S) formula. The approximate t-distribution may then be used to develop an uncertainty interval with a stated coverage probability for the value of the measurand. We propose an approximate normal distribution based on a Bayesian uncertainty as an alternative to the t-distribution based on the W-S formula. A benefit of the approximate normal distribution based on a Bayesian uncertainty is that it greatly simplifies the expression of uncertainty by eliminating altogether the need for calculating effective degrees of freedom from the W-S formula. In the special case where the measurand is the difference between two means, each evaluated from statistical analyses of independent normally distributed measurements with unknown and possibly unequal variances, the probability distribution for the value of the measurand is known to be a Behrens-Fisher distribution. We compare the performance of the approximate normal distribution based on a Bayesian uncertainty and the approximate t-distribution based on the W-S formula with respect to the Behrens-Fisher distribution. The approximate normal distribution is simpler and better in this case. A thorough investigation of the relative performance of the two approximate distributions would require comparison for a range of measurement equations by numerical methods.
Algae Tile Data: 2004-2007, BPA-51; Preliminary Report, October 28, 2008.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Holderman, Charles

Multiple files containing 2004 through 2007 Tile Chlorophyll data for the Kootenai River sites designated as: KR1, KR2, KR3, KR4 (Downriver) and KR6, KR7, KR9, KR9.1, KR10, KR11, KR12, KR13, KR14 (Upriver) were received by SCS. For a complete description of the sites covered, please refer to http://ktoi.scsnetw.com. To maintain consistency with the previous SCS algae reports, all analyses were carried out separately for the Upriver and Downriver categories, as defined in the aforementioned paragraph. The Upriver designation, however, now includes three additional sites, KR11, KR12, and the nutrient addition site, KR9.1. Summary statistics and information on the four responses,more » chlorophyll a, chlorophyll a Accrual Rate, Total Chlorophyll, and Total Chlorophyll Accrual Rate are presented in Print Out 2. Computations were carried out separately for each river position (Upriver and Downriver) and year. For example, the Downriver position in 2004 showed an average Chlorophyll a level of 25.5 mg with a standard deviation of 21.4 and minimum and maximum values of 3.1 and 196 mg, respectively. The Upriver data in 2004 showed a lower overall average chlorophyll a level at 2.23 mg with a lower standard deviation (3.6) and minimum and maximum values of (0.13 and 28.7, respectively). A more comprehensive summary of each variable and position is given in Print Out 3. This lists the information above as well as other summary information such as the variance, standard error, various percentiles and extreme values. Using the 2004 Downriver Chlorophyll a as an example again, the variance of this data was 459.3 and the standard error of the mean was 1.55. The median value or 50th percentile was 21.3, meaning 50% of the data fell above and below this value. It should be noted that this value is somewhat different than the mean of 25.5. This is an indication that the frequency distribution of the data is not symmetrical (skewed). The skewness statistic, listed as part of the first section of each analysis, quantifies this. In a symmetric distribution, such as a Normal distribution, the skewness value would be 0. The tile chlorophyll data, however, shows larger values. Chlorophyll a, in the 2004 Downriver example, has a skewness statistic of 3.54, which is quite high. In the last section of the summary analysis, the stem and leaf plot graphically demonstrates the asymmetry, showing most of the data centered around 25 with a large value at 196. The final plot is referred to as a normal probability plot and graphically compares the data to a theoretical normal distribution. For chlorophyll a, the data (asterisks) deviate substantially from the theoretical normal distribution (diagonal reference line of pluses), indicating that the data is non-normal. Other response variables in both the Downriver and Upriver categories also indicated skewed distributions. Because the sample size and mean comparison procedures below require symmetrical, normally distributed data, each response in the data set was logarithmically transformed. The logarithmic transformation, in this case, can help mitigate skewness problems. The summary statistics for the four transformed responses (log-ChlorA, log-TotChlor, and log-accrual ) are given in Print Out 4. For the 2004 Downriver Chlorophyll a data, the logarithmic transformation reduced the skewness value to -0.36 and produced a more bell-shaped symmetric frequency distribution. Similar improvements are shown for the remaining variables and river categories. Hence, all subsequent analyses given below are based on logarithmic transformations of the original responses.« less
Statistical approaches for the determination of cut points in anti-drug antibody bioassays.

PubMed

Schaarschmidt, Frank; Hofmann, Matthias; Jaki, Thomas; Grün, Bettina; Hothorn, Ludwig A

2015-03-01

Cut points in immunogenicity assays are used to classify future specimens into anti-drug antibody (ADA) positive or negative. To determine a cut point during pre-study validation, drug-naive specimens are often analyzed on multiple microtiter plates taking sources of future variability into account, such as runs, days, analysts, gender, drug-spiked and the biological variability of un-spiked specimens themselves. Five phenomena may complicate the statistical cut point estimation: i) drug-naive specimens may contain already ADA-positives or lead to signals that erroneously appear to be ADA-positive, ii) mean differences between plates may remain after normalization of observations by negative control means, iii) experimental designs may contain several factors in a crossed or hierarchical structure, iv) low sample sizes in such complex designs lead to low power for pre-tests on distribution, outliers and variance structure, and v) the choice between normal and log-normal distribution has a serious impact on the cut point. We discuss statistical approaches to account for these complex data: i) mixture models, which can be used to analyze sets of specimens containing an unknown, possibly larger proportion of ADA-positive specimens, ii) random effects models, followed by the estimation of prediction intervals, which provide cut points while accounting for several factors, and iii) diagnostic plots, which allow the post hoc assessment of model assumptions. All methods discussed are available in the corresponding R add-on package mixADA. Copyright © 2015 Elsevier B.V. All rights reserved.
Fault detection and diagnosis using neural network approaches

NASA Technical Reports Server (NTRS)

Kramer, Mark A.

1992-01-01

Neural networks can be used to detect and identify abnormalities in real-time process data. Two basic approaches can be used, the first based on training networks using data representing both normal and abnormal modes of process behavior, and the second based on statistical characterization of the normal mode only. Given data representative of process faults, radial basis function networks can effectively identify failures. This approach is often limited by the lack of fault data, but can be facilitated by process simulation. The second approach employs elliptical and radial basis function neural networks and other models to learn the statistical distributions of process observables under normal conditions. Analytical models of failure modes can then be applied in combination with the neural network models to identify faults. Special methods can be applied to compensate for sensor failures, to produce real-time estimation of missing or failed sensors based on the correlations codified in the neural network.
Ocular biometric parameters among 3-year-old Chinese children: testability, distribution and association with anthropometric parameters

PubMed Central

Huang, Dan; Chen, Xuejuan; Gong, Qi; Yuan, Chaoqun; Ding, Hui; Bai, Jing; Zhu, Hui; Fu, Zhujun; Yu, Rongbin; Liu, Hu

2016-01-01

This survey was conducted to determine the testability, distribution and associations of ocular biometric parameters in Chinese preschool children. Ocular biometric examinations, including the axial length (AL) and corneal radius of curvature (CR), were conducted on 1,688 3-year-old subjects by using an IOLMaster in August 2015. Anthropometric parameters, including height and weight, were measured according to a standardized protocol, and body mass index (BMI) was calculated. The testability was 93.7% for the AL and 78.6% for the CR overall, and both measures improved with age. Girls performed slightly better in AL measurements (P = 0.08), and the difference in CR was statistically significant (P < 0.05). The AL distribution was normal in girls (P = 0.12), whereas it was not in boys (P < 0.05). For CR1, all subgroups presented normal distributions (P = 0.16 for boys; P = 0.20 for girls), but the distribution varied when the subgroups were combined (P < 0.05). CR2 presented a normal distribution (P = 0.11), whereas the AL/CR ratio was abnormal (P < 0.001). Boys exhibited a significantly longer AL, a greater CR and a greater AL/CR ratio than girls (all P < 0.001). PMID:27384307
The Statistical Nature of Fatigue Crack Propagation

DTIC Science & Technology

1977-03-01

LEVEL x - V AFFDL-TRt-T843 r THE STATISTICAL NATURE OF b FATIGUE CRACK PROPAGATION D. A. VIRKLER B. M. HILLBERR Y LL= P. K. GOEL C* SCHOOL...function of crack length was best represented by the three-parameter log-normal distribution. Six growth rate calculation methods were investigated and the...dN, which varied moderately as a function of crack length, replicate a vs. N data were predicted This predicted data reproduced the mean behavior but
Application of survival analysis methodology to the quantitative analysis of LC-MS proteomics data.

PubMed

Tekwe, Carmen D; Carroll, Raymond J; Dabney, Alan R

2012-08-01

Protein abundance in quantitative proteomics is often based on observed spectral features derived from liquid chromatography mass spectrometry (LC-MS) or LC-MS/MS experiments. Peak intensities are largely non-normal in distribution. Furthermore, LC-MS-based proteomics data frequently have large proportions of missing peak intensities due to censoring mechanisms on low-abundance spectral features. Recognizing that the observed peak intensities detected with the LC-MS method are all positive, skewed and often left-censored, we propose using survival methodology to carry out differential expression analysis of proteins. Various standard statistical techniques including non-parametric tests such as the Kolmogorov-Smirnov and Wilcoxon-Mann-Whitney rank sum tests, and the parametric survival model and accelerated failure time-model with log-normal, log-logistic and Weibull distributions were used to detect any differentially expressed proteins. The statistical operating characteristics of each method are explored using both real and simulated datasets. Survival methods generally have greater statistical power than standard differential expression methods when the proportion of missing protein level data is 5% or more. In particular, the AFT models we consider consistently achieve greater statistical power than standard testing procedures, with the discrepancy widening with increasing missingness in the proportions. The testing procedures discussed in this article can all be performed using readily available software such as R. The R codes are provided as supplemental materials. ctekwe@stat.tamu.edu.
Statistical characterization of thermal plumes in turbulent thermal convection

NASA Astrophysics Data System (ADS)

Zhou, Sheng-Qi; Xie, Yi-Chao; Sun, Chao; Xia, Ke-Qing

2016-09-01

We report an experimental study on the statistical properties of the thermal plumes in turbulent thermal convection. A method has been proposed to extract the basic characteristics of thermal plumes from temporal temperature measurement inside the convection cell. It has been found that both plume amplitude A and cap width w , in a time domain, are approximately in the log-normal distribution. In particular, the normalized most probable front width is found to be a characteristic scale of thermal plumes, which is much larger than the thermal boundary layer thickness. Over a wide range of the Rayleigh number, the statistical characterizations of the thermal fluctuations of plumes, and the turbulent background, the plume front width and plume spacing have been discussed and compared with the theoretical predictions and morphological observations. For the most part good agreements have been found with the direct observations.
Polynomial probability distribution estimation using the method of moments

PubMed Central

Mattsson, Lars; Rydén, Jesper

2017-01-01

We suggest a procedure for estimating Nth degree polynomial approximations to unknown (or known) probability density functions (PDFs) based on N statistical moments from each distribution. The procedure is based on the method of moments and is setup algorithmically to aid applicability and to ensure rigor in use. In order to show applicability, polynomial PDF approximations are obtained for the distribution families Normal, Log-Normal, Weibull as well as for a bimodal Weibull distribution and a data set of anonymized household electricity use. The results are compared with results for traditional PDF series expansion methods of Gram–Charlier type. It is concluded that this procedure is a comparatively simple procedure that could be used when traditional distribution families are not applicable or when polynomial expansions of probability distributions might be considered useful approximations. In particular this approach is practical for calculating convolutions of distributions, since such operations become integrals of polynomial expressions. Finally, in order to show an advanced applicability of the method, it is shown to be useful for approximating solutions to the Smoluchowski equation. PMID:28394949
Simulations of large acoustic scintillations in the straits of Florida.

PubMed

Tang, Xin; Tappert, F D; Creamer, Dennis B

2006-12-01

Using a full-wave acoustic model, Monte Carlo numerical studies of intensity fluctuations in a realistic shallow water environment that simulates the Straits of Florida, including internal wave fluctuations and bottom roughness, have been performed. Results show that the sound intensity at distant receivers scintillates dramatically. The acoustic scintillation index SI increases rapidly with propagation range and is significantly greater than unity at ranges beyond about 10 km. This result supports a theoretical prediction by one of the authors. Statistical analyses show that the distribution of intensity of the random wave field saturates to the expected Rayleigh distribution with SI= 1 at short range due to multipath interference effects, and then SI continues to increase to large values. This effect, which is denoted supersaturation, is universal at long ranges in waveguides having lossy boundaries (where there is differential mode attenuation). The intensity distribution approaches a log-normal distribution to an excellent approximation; it may not be a universal distribution and comparison is also made to a K distribution. The long tails of the log-normal distribution cause "acoustic intermittency" in which very high, but rare, intensities occur.
Polynomial probability distribution estimation using the method of moments.

PubMed

Munkhammar, Joakim; Mattsson, Lars; Rydén, Jesper

2017-01-01

We suggest a procedure for estimating Nth degree polynomial approximations to unknown (or known) probability density functions (PDFs) based on N statistical moments from each distribution. The procedure is based on the method of moments and is setup algorithmically to aid applicability and to ensure rigor in use. In order to show applicability, polynomial PDF approximations are obtained for the distribution families Normal, Log-Normal, Weibull as well as for a bimodal Weibull distribution and a data set of anonymized household electricity use. The results are compared with results for traditional PDF series expansion methods of Gram-Charlier type. It is concluded that this procedure is a comparatively simple procedure that could be used when traditional distribution families are not applicable or when polynomial expansions of probability distributions might be considered useful approximations. In particular this approach is practical for calculating convolutions of distributions, since such operations become integrals of polynomial expressions. Finally, in order to show an advanced applicability of the method, it is shown to be useful for approximating solutions to the Smoluchowski equation.
Bimodal Aldosterone Distribution in Low-Renin Hypertension

PubMed Central

2013-01-01

BACKGROUND In low-renin hypertension (LRH), serum aldosterone levels are higher in those subjects with primary aldosteronism and may be lower in those with non-aldosterone mineralocorticoid excess or primary renal sodium retention. We investigated the hypothesis that the frequency distribution of aldosterone in LRH is bimodal. METHODS Of the 3,532 attendees at the sixth examination cycle of the Framingham Offspring Study, 1,831 were included in this cross-sectional analysis after we excluded those with conditions or taking medications such as antihypertensive drugs that might affect renin or aldosterone. RESULTS Three hundred three subjects (17%) had untreated hypertension (SBP ≥140mm Hg or DBP ≥90mm Hg). LRH, defined as plasma renin ≤5 mU/L, was present in 93 of those 303 hypertensive subjects (31%). Aldosterone values were adjusted statistically for age, sex, and the urinary sodium/creatinine ratio. In the subjects with LRH, the adjusted aldosterone distribution was bimodal (dip test for unimodality, P = 0.008). The adjusted aldosterone distribution was unimodal in the normal subjects (P = 0.98) and in the hypertensive subjects with normal plasma renin (P = 0.94). CONCLUSIONS In this community-based sample of white subjects, those with low-renin hypertension had a bimodal adjusted aldosterone distribution. Subjects with normal-renin hypertension and subjects with normal blood pressure had unimodal adjusted aldosterone distributions. These findings suggest 2 pathophysiological variants of LRH, one that is aldosterone-dependent and one that is non-aldosterone-dependent. PMID:23757402
Statistical computation of tolerance limits

NASA Technical Reports Server (NTRS)

Wheeler, J. T.

1993-01-01

Based on a new theory, two computer codes were developed specifically to calculate the exact statistical tolerance limits for normal distributions within unknown means and variances for the one-sided and two-sided cases for the tolerance factor, k. The quantity k is defined equivalently in terms of the noncentral t-distribution by the probability equation. Two of the four mathematical methods employ the theory developed for the numerical simulation. Several algorithms for numerically integrating and iteratively root-solving the working equations are written to augment the program simulation. The program codes generate some tables of k's associated with the varying values of the proportion and sample size for each given probability to show accuracy obtained for small sample sizes.
Are your covariates under control? How normalization can re-introduce covariate effects.

PubMed

Pain, Oliver; Dudbridge, Frank; Ronald, Angelica

2018-04-30

Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. When covariates are included in the analysis, a common approach is to first adjust for the covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested in which rank-based INT was applied to the dependent variable before regressing covariates. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates, increasing type-I errors and reducing power. On the other hand, when rank-based INT was applied prior to controlling for covariate effects, residuals were normally distributed and linearly uncorrelated with covariates. This latter approach is therefore recommended in situations were normality of the dependent variable is required.
A Comparison of the Effects of Non-Normal Distributions on Tests of Equivalence

ERIC Educational Resources Information Center

Ellington, Linda F.

2011-01-01

Statistical theory and its application provide the foundation to modern systematic inquiry in the behavioral, physical and social sciences disciplines (Fisher, 1958; Wilcox, 1996). It provides the tools for scholars and researchers to operationalize constructs, describe populations, and measure and interpret the relations between populations and…
Finite-size effects in transcript sequencing count distribution: its power-law correction necessarily precedes downstream normalization and comparative analysis.

PubMed

Wong, Wing-Cheong; Ng, Hong-Kiat; Tantoso, Erwin; Soong, Richie; Eisenhaber, Frank

2018-02-12

Though earlier works on modelling transcript abundance from vertebrates to lower eukaroytes have specifically singled out the Zip's law, the observed distributions often deviate from a single power-law slope. In hindsight, while power-laws of critical phenomena are derived asymptotically under the conditions of infinite observations, real world observations are finite where the finite-size effects will set in to force a power-law distribution into an exponential decay and consequently, manifests as a curvature (i.e., varying exponent values) in a log-log plot. If transcript abundance is truly power-law distributed, the varying exponent signifies changing mathematical moments (e.g., mean, variance) and creates heteroskedasticity which compromises statistical rigor in analysis. The impact of this deviation from the asymptotic power-law on sequencing count data has never truly been examined and quantified. The anecdotal description of transcript abundance being almost Zipf's law-like distributed can be conceptualized as the imperfect mathematical rendition of the Pareto power-law distribution when subjected to the finite-size effects in the real world; This is regardless of the advancement in sequencing technology since sampling is finite in practice. Our conceptualization agrees well with our empirical analysis of two modern day NGS (Next-generation sequencing) datasets: an in-house generated dilution miRNA study of two gastric cancer cell lines (NUGC3 and AGS) and a publicly available spike-in miRNA data; Firstly, the finite-size effects causes the deviations of sequencing count data from Zipf's law and issues of reproducibility in sequencing experiments. Secondly, it manifests as heteroskedasticity among experimental replicates to bring about statistical woes. Surprisingly, a straightforward power-law correction that restores the distribution distortion to a single exponent value can dramatically reduce data heteroskedasticity to invoke an instant increase in signal-to-noise ratio by 50% and the statistical/detection sensitivity by as high as 30% regardless of the downstream mapping and normalization methods. Most importantly, the power-law correction improves concordance in significant calls among different normalization methods of a data series averagely by 22%. When presented with a higher sequence depth (4 times difference), the improvement in concordance is asymmetrical (32% for the higher sequencing depth instance versus 13% for the lower instance) and demonstrates that the simple power-law correction can increase significant detection with higher sequencing depths. Finally, the correction dramatically enhances the statistical conclusions and eludes the metastasis potential of the NUGC3 cell line against AGS of our dilution analysis. The finite-size effects due to undersampling generally plagues transcript count data with reproducibility issues but can be minimized through a simple power-law correction of the count distribution. This distribution correction has direct implication on the biological interpretation of the study and the rigor of the scientific findings. This article was reviewed by Oliviero Carugo, Thomas Dandekar and Sandor Pongor.
A statistical study of EMIC waves observed by Cluster. 1. Wave properties. EMIC Wave Properties

DOE PAGES

Allen, R. C.; Zhang, J. -C.; Kistler, L. M.; ...

2015-07-23

Electromagnetic ion cyclotron (EMIC) waves are an important mechanism for particle energization and losses inside the magnetosphere. In order to better understand the effects of these waves on particle dynamics, detailed information about the occurrence rate, wave power, ellipticity, normal angle, energy propagation angle distributions, and local plasma parameters are required. Previous statistical studies have used in situ observations to investigate the distribution of these parameters in the magnetic local time versus L-shell (MLT-L) frame within a limited magnetic latitude (MLAT) range. In our study, we present a statistical analysis of EMIC wave properties using 10 years (2001–2010) of datamore » from Cluster, totaling 25,431 min of wave activity. Due to the polar orbit of Cluster, we are able to investigate EMIC waves at all MLATs and MLTs. This allows us to further investigate the MLAT dependence of various wave properties inside different MLT sectors and further explore the effects of Shabansky orbits on EMIC wave generation and propagation. Thus, the statistical analysis is presented in two papers. OUr paper focuses on the wave occurrence distribution as well as the distribution of wave properties. The companion paper focuses on local plasma parameters during wave observations as well as wave generation proxies.« less
Bidisperse and polydisperse suspension rheology at large solid fraction

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pednekar, Sidhant; Chun, Jaehun; Morris, Jeffrey F.

At the same solid volume fraction, bidisperse and polydisperse suspensions display lower viscosities, and weaker normal stress response, compared to monodisperse suspensions. The reduction of viscosity associated with size distribution can be explained by an increase of the maximum flowable, or jamming, solid fraction. In this work, concentrated or "dense" suspensions are simulated under strong shearing, where thermal motion and repulsive forces are negligible, but we allow for particle contact with a mild frictional interaction with interparticle friction coefficient of 0.2. Aspects of bidisperse suspension rheology are first revisited to establish that the approach reproduces established trends; the study ofmore » bidisperse suspensions at size ratios of large to small particle radii (2 to 4) shows that a minimum in the viscosity occurs for zeta slightly above 0.5, where zeta=phi_{large}/phi is the fraction of the total solid volume occupied by the large particles. The simple shear flows of polydisperse suspensions with truncated normal and log normal size distributions, and bidisperse suspensions which are statistically equivalent with these polydisperse cases up to third moment of the size distribution, are simulated and the rheologies are extracted. Prior work shows that such distributions with equivalent low-order moments have similar phi_{m}, and the rheological behaviors of normal, log normal and bidisperse cases are shown to be in close agreement for a wide range of standard deviation in particle size, with standard correlations which are functionally dependent on phi/phi_{m} providing excellent agreement with the rheology found in simulation. The close agreement of both viscosity and normal stress response between bi- and polydisperse suspensions demonstrates the controlling in influence of the maximum packing fraction in noncolloidal suspensions. Microstructural investigations and the stress distribution according to particle size are also presented.« less
Statistics of baryon correlation functions in lattice QCD

NASA Astrophysics Data System (ADS)

Wagman, Michael L.; Savage, Martin J.; Nplqcd Collaboration

2017-12-01

A systematic analysis of the structure of single-baryon correlation functions calculated with lattice QCD is performed, with a particular focus on characterizing the structure of the noise associated with quantum fluctuations. The signal-to-noise problem in these correlation functions is shown, as long suspected, to result from a sign problem. The log-magnitude and complex phase are found to be approximately described by normal and wrapped normal distributions respectively. Properties of circular statistics are used to understand the emergence of a large time noise region where standard energy measurements are unreliable. Power-law tails in the distribution of baryon correlation functions, associated with stable distributions and "Lévy flights," are found to play a central role in their time evolution. A new method of analyzing correlation functions is considered for which the signal-to-noise ratio of energy measurements is constant, rather than exponentially degrading, with increasing source-sink separation time. This new method includes an additional systematic uncertainty that can be removed by performing an extrapolation, and the signal-to-noise problem reemerges in the statistics of this extrapolation. It is demonstrated that this new method allows accurate results for the nucleon mass to be extracted from the large-time noise region inaccessible to standard methods. The observations presented here are expected to apply to quantum Monte Carlo calculations more generally. Similar methods to those introduced here may lead to practical improvements in analysis of noisier systems.
Statistical Analysis of Hubble /WFC3 Transit Spectroscopy of Extrasolar Planets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fu, Guangwei; Deming, Drake; Knutson, Heather

2017-10-01

Transmission spectroscopy provides a window to study exoplanetary atmospheres, but that window is fogged by clouds and hazes. Clouds and haze introduce a degeneracy between the strength of gaseous absorption features and planetary physical parameters such as abundances. One way to break that degeneracy is via statistical studies. We collect all published HST /WFC3 transit spectra for 1.1–1.65 μ m water vapor absorption and perform a statistical study on potential correlations between the water absorption feature and planetary parameters. We fit the observed spectra with a template calculated for each planet using the Exo-transmit code. We express the magnitude ofmore » the water absorption in scale heights, thereby removing the known dependence on temperature, surface gravity, and mean molecular weight. We find that the absorption in scale heights has a positive baseline correlation with planetary equilibrium temperature; our hypothesis is that decreasing cloud condensation with increasing temperature is responsible for this baseline slope. However, the observed sample is also intrinsically degenerate in the sense that equilibrium temperature correlates with planetary mass. We compile the distribution of absorption in scale heights, and we find that this distribution is closer to log-normal than Gaussian. However, we also find that the distribution of equilibrium temperatures for the observed planets is similarly log-normal. This indicates that the absorption values are affected by observational bias, whereby observers have not yet targeted a sufficient sample of the hottest planets.« less

A maximally selected test of symmetry about zero.

PubMed

Laska, Eugene; Meisner, Morris; Wanderling, Joseph

2012-11-20

The problem of testing symmetry about zero has a long and rich history in the statistical literature. We introduce a new test that sequentially discards observations whose absolute value is below increasing thresholds defined by the data. McNemar's statistic is obtained at each threshold and the largest is used as the test statistic. We obtain the exact distribution of this maximally selected McNemar and provide tables of critical values and a program for computing p-values. Power is compared with the t-test, the Wilcoxon Signed Rank Test and the Sign Test. The new test, MM, is slightly less powerful than the t-test and Wilcoxon Signed Rank Test for symmetric normal distributions with nonzero medians and substantially more powerful than all three tests for asymmetric mixtures of normal random variables with or without zero medians. The motivation for this test derives from the need to appraise the safety profile of new medications. If pre and post safety measures are obtained, then under the null hypothesis, the variables are exchangeable and the distribution of their difference is symmetric about a zero median. Large pre-post differences are the major concern of a safety assessment. The discarded small observations are not particularly relevant to safety and can reduce power to detect important asymmetry. The new test was utilized on data from an on-road driving study performed to determine if a hypnotic, a drug used to promote sleep, has next day residual effects. Copyright © 2012 John Wiley & Sons, Ltd.
Statistical Analysis of Hubble/WFC3 Transit Spectroscopy of Extrasolar Planets

NASA Astrophysics Data System (ADS)

Fu, Guangwei; Deming, Drake; Knutson, Heather; Madhusudhan, Nikku; Mandell, Avi; Fraine, Jonathan

2018-01-01

Transmission spectroscopy provides a window to study exoplanetary atmospheres, but that window is fogged by clouds and hazes. Clouds and haze introduce a degeneracy between the strength of gaseous absorption features and planetary physical parameters such as abundances. One way to break that degeneracy is via statistical studies. We collect all published HST/WFC3 transit spectra for 1.1-1.65 micron water vapor absorption, and perform a statistical study on potential correlations between the water absorption feature and planetary parameters. We fit the observed spectra with a template calculated for each planet using the Exo-Transmit code. We express the magnitude of the water absorption in scale heights, thereby removing the known dependence on temperature, surface gravity, and mean molecular weight. We find that the absorption in scale heights has a positive baseline correlation with planetary equilibrium temperature; our hypothesis is that decreasing cloud condensation with increasing temperature is responsible for this baseline slope. However, the observed sample is also intrinsically degenerate in the sense that equilibrium temperature correlates with planetary mass. We compile the distribution of absorption in scale heights, and we find that this distribution is closer to log-normal than Gaussian. However, we also find that the distribution of equilibrium temperatures for the observed planets is similarly log-normal. This indicates that the absorption values are affected by observational bias, whereby observers have not yet targeted a sufficient sample of the hottest planets.
Statistical Analysis of Hubble/WFC3 Transit Spectroscopy of Extrasolar Planets

NASA Astrophysics Data System (ADS)

Fu, Guangwei; Deming, Drake; Knutson, Heather; Madhusudhan, Nikku; Mandell, Avi; Fraine, Jonathan

2017-10-01

Transmission spectroscopy provides a window to study exoplanetary atmospheres, but that window is fogged by clouds and hazes. Clouds and haze introduce a degeneracy between the strength of gaseous absorption features and planetary physical parameters such as abundances. One way to break that degeneracy is via statistical studies. We collect all published HST/WFC3 transit spectra for 1.1-1.65 μm water vapor absorption and perform a statistical study on potential correlations between the water absorption feature and planetary parameters. We fit the observed spectra with a template calculated for each planet using the Exo-transmit code. We express the magnitude of the water absorption in scale heights, thereby removing the known dependence on temperature, surface gravity, and mean molecular weight. We find that the absorption in scale heights has a positive baseline correlation with planetary equilibrium temperature; our hypothesis is that decreasing cloud condensation with increasing temperature is responsible for this baseline slope. However, the observed sample is also intrinsically degenerate in the sense that equilibrium temperature correlates with planetary mass. We compile the distribution of absorption in scale heights, and we find that this distribution is closer to log-normal than Gaussian. However, we also find that the distribution of equilibrium temperatures for the observed planets is similarly log-normal. This indicates that the absorption values are affected by observational bias, whereby observers have not yet targeted a sufficient sample of the hottest planets.
A menu-driven software package of Bayesian nonparametric (and parametric) mixed models for regression analysis and density estimation.

PubMed

Karabatsos, George

2017-02-01

Most of applied statistics involves regression analysis of data. In practice, it is important to specify a regression model that has minimal assumptions which are not violated by data, to ensure that statistical inferences from the model are informative and not misleading. This paper presents a stand-alone and menu-driven software package, Bayesian Regression: Nonparametric and Parametric Models, constructed from MATLAB Compiler. Currently, this package gives the user a choice from 83 Bayesian models for data analysis. They include 47 Bayesian nonparametric (BNP) infinite-mixture regression models; 5 BNP infinite-mixture models for density estimation; and 31 normal random effects models (HLMs), including normal linear models. Each of the 78 regression models handles either a continuous, binary, or ordinal dependent variable, and can handle multi-level (grouped) data. All 83 Bayesian models can handle the analysis of weighted observations (e.g., for meta-analysis), and the analysis of left-censored, right-censored, and/or interval-censored data. Each BNP infinite-mixture model has a mixture distribution assigned one of various BNP prior distributions, including priors defined by either the Dirichlet process, Pitman-Yor process (including the normalized stable process), beta (two-parameter) process, normalized inverse-Gaussian process, geometric weights prior, dependent Dirichlet process, or the dependent infinite-probits prior. The software user can mouse-click to select a Bayesian model and perform data analysis via Markov chain Monte Carlo (MCMC) sampling. After the sampling completes, the software automatically opens text output that reports MCMC-based estimates of the model's posterior distribution and model predictive fit to the data. Additional text and/or graphical output can be generated by mouse-clicking other menu options. This includes output of MCMC convergence analyses, and estimates of the model's posterior predictive distribution, for selected functionals and values of covariates. The software is illustrated through the BNP regression analysis of real data.
Parametric modelling of cost data in medical studies.

PubMed

Nixon, R M; Thompson, S G

2004-04-30

The cost of medical resources used is often recorded for each patient in clinical studies in order to inform decision-making. Although cost data are generally skewed to the right, interest is in making inferences about the population mean cost. Common methods for non-normal data, such as data transformation, assuming asymptotic normality of the sample mean or non-parametric bootstrapping, are not ideal. This paper describes possible parametric models for analysing cost data. Four example data sets are considered, which have different sample sizes and degrees of skewness. Normal, gamma, log-normal, and log-logistic distributions are fitted, together with three-parameter versions of the latter three distributions. Maximum likelihood estimates of the population mean are found; confidence intervals are derived by a parametric BC(a) bootstrap and checked by MCMC methods. Differences between model fits and inferences are explored.Skewed parametric distributions fit cost data better than the normal distribution, and should in principle be preferred for estimating the population mean cost. However for some data sets, we find that models that fit badly can give similar inferences to those that fit well. Conversely, particularly when sample sizes are not large, different parametric models that fit the data equally well can lead to substantially different inferences. We conclude that inferences are sensitive to choice of statistical model, which itself can remain uncertain unless there is enough data to model the tail of the distribution accurately. Investigating the sensitivity of conclusions to choice of model should thus be an essential component of analysing cost data in practice. Copyright 2004 John Wiley & Sons, Ltd.
Transformation (normalization) of slope gradient and surface curvatures, automated for statistical analyses from DEMs

NASA Astrophysics Data System (ADS)

Csillik, O.; Evans, I. S.; Drăguţ, L.

2015-03-01

Automated procedures are developed to alleviate long tails in frequency distributions of morphometric variables. They minimize the skewness of slope gradient frequency distributions, and modify the kurtosis of profile and plan curvature distributions toward that of the Gaussian (normal) model. Box-Cox (for slope) and arctangent (for curvature) transformations are tested on nine digital elevation models (DEMs) of varying origin and resolution, and different landscapes, and shown to be effective. Resulting histograms are illustrated and show considerable improvements over those for previously recommended slope transformations (sine, square root of sine, and logarithm of tangent). Unlike previous approaches, the proposed method evaluates the frequency distribution of slope gradient values in a given area and applies the most appropriate transform if required. Sensitivity of the arctangent transformation is tested, showing that Gaussian-kurtosis transformations are acceptable also in terms of histogram shape. Cube root transformations of curvatures produced bimodal histograms. The transforms are applicable to morphometric variables and many others with skewed or long-tailed distributions. By avoiding long tails and outliers, they permit parametric statistics such as correlation, regression and principal component analyses to be applied, with greater confidence that requirements for linearity, additivity and even scatter of residuals (constancy of error variance) are likely to be met. It is suggested that such transformations should be routinely applied in all parametric analyses of long-tailed variables. Our Box-Cox and curvature automated transformations are based on a Python script, implemented as an easy-to-use script tool in ArcGIS.
Antibiotic-Induced Anomalous Statistics of Collective Bacterial Swarming

NASA Astrophysics Data System (ADS)

Benisty, Sivan; Ben-Jacob, Eshel; Ariel, Gil; Be'er, Avraham

2015-01-01

Under sublethal antibiotics concentrations, the statistics of collectively swarming Bacillus subtilis transitions from normal to anomalous, with a heavy-tailed speed distribution and a two-step temporal correlation of velocities. The transition is due to changes in the properties of the bacterial motion and the formation of a motility-defective subpopulation that self-segregates into regions. As a result, both the colonial expansion and the growth rate are not affected by antibiotics. This phenomenon suggests a new strategy bacteria employ to fight antibiotic stress.
Effects of Non-Normal Outlier-Prone Error Distribution on Kalman Filter Track

DTIC Science & Technology

1991-09-01

other possibilities exist. For example the GST (Generic Statistical Tracker) uses four motion models [Ref. 41. The GST keeps track of both the target...1.011 + + + 3.113 1.291 4 Although this procedure is not easily statistically interpretable, it was used for the sake of comparison with the other... TRANSITOR TARGET’ WRITE(6,*)’ 3 SECOND ORDER GAUSS MARKOV TARGET’ WRITE(6,*)’ 4 RANDOM TOUR TARGET’ READ(6,*) CHOICE IF((CHOICE.LT.1).OR.(CHOICE.GT.4
Brain tissues volume measurements from 2D MRI using parametric approach

NASA Astrophysics Data System (ADS)

L'vov, A. A.; Toropova, O. A.; Litovka, Yu. V.

2018-04-01

The purpose of the paper is to propose a fully automated method of volume assessment of structures within human brain. Our statistical approach uses maximum interdependency principle for decision making process of measurements consistency and unequal observations. Detecting outliers performed using maximum normalized residual test. We propose a statistical model which utilizes knowledge of tissues distribution in human brain and applies partial data restoration for precision improvement. The approach proposes completed computationally efficient and independent from segmentation algorithm used in the application.
Flat Plate Wake Velocity Statistics Obtained With Circular And Elliptic Trailing Edges

NASA Technical Reports Server (NTRS)

Rai, Man Mohan

2016-01-01

The near wake of a flat plate with circular and elliptic trailing edges is investigated with data from direct numerical simulations. The plate length and thickness are the same in both cases. The separating boundary layers are turbulent and statistically identical. Therefore the wake is symmetric in the two cases. The emphasis in this study is on a comparison of the wake-distributions of velocity components, normal intensity and fluctuating shear stress obtained in the two cases.
Breast MRI background parenchymal enhancement (BPE) correlates with the risk of breast cancer.

PubMed

Telegrafo, Michele; Rella, Leonarda; Stabile Ianora, Amato Antonio; Angelelli, Giuseppe; Moschetta, Marco

2016-02-01

To investigate whether background parenchymal enhancement (BPE) and breast cancer would correlate searching for any significant difference of BPE pattern distribution in case of benign or malignant lesions. 386 patients, including 180 pre-menopausal (group 1) and 206 post-menopausal (group 2), underwent MR examination. Two radiologists evaluated MR images classifying normal BPE as minimal, mild, moderate or marked. The two groups of patients were subdivided into 3 categories based on MRI findings (negative, benign and malignant lesions). The distribution of BPE patterns within the two groups and within the three MR categories was calculated. The χ2 test was used to evaluate BPE type distribution in the three patient categories and any statistically significant correlation of BPE with lesion type was calculated. The Student t test was applied to search for any statistically significant difference between BPE type rates in group 1 and 2. The χ2 test demonstrated a statistically significant difference in the distribution of BPE types in negative patients and benign lesions as compared with malignant ones (p<0.05). A significantly higher prevalence of moderate and marked BPE was found among malignant lesions (group 1: 32% and 42%, respectively; group 2: 31% and 46%, respectively) while a predominance of minimal and mild BPE among negative patients (group 1: 60% and 36%, respectively; group 2: 68% and 32%, respectively) and benign lesions (group 1: 54% and 38%, respectively; group 2: 75% and 17%, respectively) was found. The Student t test did not show a statistically significant difference between BPE type rates in group 1 and 2 (p>0.05). Normal BPE could correlate with the risk of breast cancer being such BPE patterns as moderate and marked associated with patients with malignant lesions in both pre and post-menopausal women. Copyright © 2015 Elsevier Inc. All rights reserved.
Review of Statistical Methods for Analysing Healthcare Resources and Costs

PubMed Central

Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G

2011-01-01

We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344
The Statistical Value of Raw Fluorescence Signal in Luminex xMAP Based Multiplex Immunoassays

PubMed Central

Breen, Edmond J.; Tan, Woei; Khan, Alamgir

2016-01-01

Tissue samples (plasma, saliva, serum or urine) from 169 patients classified as either normal or having one of seven possible diseases are analysed across three 96-well plates for the presences of 37 analytes using cytokine inflammation multiplexed immunoassay panels. Censoring for concentration data caused problems for analysis of the low abundant analytes. Using fluorescence analysis over concentration based analysis allowed analysis of these low abundant analytes. Mixed-effects analysis on the resulting fluorescence and concentration responses reveals a combination of censoring and mapping the fluorescence responses to concentration values, through a 5PL curve, changed observed analyte concentrations. Simulation verifies this, by showing a dependence on the mean florescence response and its distribution on the observed analyte concentration levels. Differences from normality, in the fluorescence responses, can lead to differences in concentration estimates and unreliable probabilities for treatment effects. It is seen that when fluorescence responses are normally distributed, probabilities of treatment effects for fluorescence based t-tests has greater statistical power than the same probabilities from concentration based t-tests. We add evidence that the fluorescence response, unlike concentration values, doesn’t require censoring and we show with respect to differential analysis on the fluorescence responses that background correction is not required. PMID:27243383
Statistical modeling of optical attenuation measurements in continental fog conditions

NASA Astrophysics Data System (ADS)

Khan, Muhammad Saeed; Amin, Muhammad; Awan, Muhammad Saleem; Minhas, Abid Ali; Saleem, Jawad; Khan, Rahimdad

2017-03-01

Free-space optics is an innovative technology that uses atmosphere as a propagation medium to provide higher data rates. These links are heavily affected by atmospheric channel mainly because of fog and clouds that act to scatter and even block the modulated beam of light from reaching the receiver end, hence imposing severe attenuation. A comprehensive statistical study of the fog effects and deep physical understanding of the fog phenomena are very important for suggesting improvements (reliability and efficiency) in such communication systems. In this regard, 6-months real-time measured fog attenuation data are considered and statistically investigated. A detailed statistical analysis related to each fog event for that period is presented; the best probability density functions are selected on the basis of Akaike information criterion, while the estimates of unknown parameters are computed by maximum likelihood estimation technique. The results show that most fog attenuation events follow normal mixture distribution and some follow the Weibull distribution.
Influence of the nucleus area distribution on the survival fraction after charged particles broad beam irradiation.

PubMed

Wéra, A-C; Barazzuol, L; Jeynes, J C G; Merchant, M J; Suzuki, M; Kirkby, K J

2014-08-07

It is well known that broad beam irradiation with heavy ions leads to variation in the number of hit(s) received by each cell as the distribution of particles follows the Poisson statistics. Although the nucleus area will determine the number of hit(s) received for a given dose, variation amongst its irradiated cell population is generally not considered. In this work, we investigate the effect of the nucleus area's distribution on the survival fraction. More specifically, this work aims to explain the deviation, or tail, which might be observed in the survival fraction at high irradiation doses. For this purpose, the nucleus area distribution was added to the beam Poisson statistics and the Linear-Quadratic model in order to fit the experimental data. As shown in this study, nucleus size variation, and the associated Poisson statistics, can lead to an upward survival trend after broad beam irradiation. The influence of the distribution parameters (mean area and standard deviation) was studied using a normal distribution, along with the Linear-Quadratic model parameters (α and β). Finally, the model proposed here was successfully tested to the survival fraction of LN18 cells irradiated with a 85 keV µm(- 1) carbon ion broad beam for which the distribution in the area of the nucleus had been determined.
Distribution of erlotinib in rash and normal skin in cancer patients receiving erlotinib visualized by matrix assisted laser desorption/ionization mass spectrometry imaging.

PubMed

Nishimura, Meiko; Hayashi, Mitsuhiro; Mizutani, Yu; Takenaka, Kei; Imamura, Yoshinori; Chayahara, Naoko; Toyoda, Masanori; Kiyota, Naomi; Mukohara, Toru; Aikawa, Hiroaki; Fujiwara, Yasuhiro; Hamada, Akinobu; Minami, Hironobu

2018-04-06

The development of skin rashes is the most common adverse event observed in cancer patients treated with epidermal growth factor receptor-tyrosine kinase inhibitors such as erlotinib. However, the pharmacological evidence has not been fully revealed. Erlotinib distribution in the rashes was more heterogeneous than that in the normal skin, and the rashes contained statistically higher concentrations of erlotinib than adjacent normal skin in the superficial skin layer (229 ± 192 vs. 120 ± 103 ions/mm 2 ; P = 0.009 in paired t -test). LC-MS/MS confirmed that the concentration of erlotinib in the skin rashes was higher than that in normal skin in the superficial skin layer (1946 ± 1258 vs. 1174 ± 662 ng/cm 3 ; P = 0.028 in paired t -test). The results of MALDI-MSI and LC-MS/MS were well correlated (coefficient of correlation 0.879, P < 0.0001). Focal distribution of erlotinib in the skin tissue was visualized using non-labeled MALDI-MSI. Erlotinib concentration in the superficial layer of the skin rashes was higher than that in the adjacent normal skin. We examined patients with advanced pancreatic cancer who developed skin rashes after treatment with erlotinib and gemcitabine. We biopsied both the rash and adjacent normal skin tissues, and visualized and compared the distribution of erlotinib within the skin using matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI). The tissue concentration of erlotinib was also measured by liquid chromatography-tandem mass spectrometry (LC-MS/MS) with laser microdissection.
Distribution of erlotinib in rash and normal skin in cancer patients receiving erlotinib visualized by matrix assisted laser desorption/ionization mass spectrometry imaging

PubMed Central

Mizutani, Yu; Takenaka, Kei; Imamura, Yoshinori; Chayahara, Naoko; Toyoda, Masanori; Kiyota, Naomi; Mukohara, Toru; Aikawa, Hiroaki; Fujiwara, Yasuhiro; Hamada, Akinobu; Minami, Hironobu

2018-01-01

Background The development of skin rashes is the most common adverse event observed in cancer patients treated with epidermal growth factor receptor-tyrosine kinase inhibitors such as erlotinib. However, the pharmacological evidence has not been fully revealed. Results Erlotinib distribution in the rashes was more heterogeneous than that in the normal skin, and the rashes contained statistically higher concentrations of erlotinib than adjacent normal skin in the superficial skin layer (229 ± 192 vs. 120 ± 103 ions/mm2; P = 0.009 in paired t-test). LC-MS/MS confirmed that the concentration of erlotinib in the skin rashes was higher than that in normal skin in the superficial skin layer (1946 ± 1258 vs. 1174 ± 662 ng/cm3; P = 0.028 in paired t-test). The results of MALDI-MSI and LC-MS/MS were well correlated (coefficient of correlation 0.879, P < 0.0001). Conclusions Focal distribution of erlotinib in the skin tissue was visualized using non-labeled MALDI-MSI. Erlotinib concentration in the superficial layer of the skin rashes was higher than that in the adjacent normal skin. Methods We examined patients with advanced pancreatic cancer who developed skin rashes after treatment with erlotinib and gemcitabine. We biopsied both the rash and adjacent normal skin tissues, and visualized and compared the distribution of erlotinib within the skin using matrix-assisted laser desorption/ionization mass spectrometry imaging (MALDI-MSI). The tissue concentration of erlotinib was also measured by liquid chromatography-tandem mass spectrometry (LC–MS/MS) with laser microdissection. PMID:29719624
Statistics of Optical Coherence Tomography Data From Human Retina

PubMed Central

de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

2010-01-01

Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733
ROBUST: an interactive FORTRAN-77 package for exploratory data analysis using parametric, ROBUST and nonparametric location and scale estimates, data transformations, normality tests, and outlier assessment

NASA Astrophysics Data System (ADS)

Rock, N. M. S.

ROBUST calculates 53 statistics, plus significance levels for 6 hypothesis tests, on each of up to 52 variables. These together allow the following properties of the data distribution for each variable to be examined in detail: (1) Location. Three means (arithmetic, geometric, harmonic) are calculated, together with the midrange and 19 high-performance robust L-, M-, and W-estimates of location (combined, adaptive, trimmed estimates, etc.) (2) Scale. The standard deviation is calculated along with the H-spread/2 (≈ semi-interquartile range), the mean and median absolute deviations from both mean and median, and a biweight scale estimator. The 23 location and 6 scale estimators programmed cover all possible degrees of robustness. (3) Normality: Distributions are tested against the null hypothesis that they are normal, using the 3rd (√ h1) and 4th ( b 2) moments, Geary's ratio (mean deviation/standard deviation), Filliben's probability plot correlation coefficient, and a more robust test based on the biweight scale estimator. These statistics collectively are sensitive to most usual departures from normality. (4) Presence of outliers. The maximum and minimum values are assessed individually or jointly using Grubbs' maximum Studentized residuals, Harvey's and Dixon's criteria, and the Studentized range. For a single input variable, outliers can be either winsorized or eliminated and all estimates recalculated iteratively as desired. The following data-transformations also can be applied: linear, log 10, generalized Box Cox power (including log, reciprocal, and square root), exponentiation, and standardization. For more than one variable, all results are tabulated in a single run of ROBUST. Further options are incorporated to assess ratios (of two variables) as well as discrete variables, and be concerned with missing data. Cumulative S-plots (for assessing normality graphically) also can be generated. The mutual consistency or inconsistency of all these measures helps to detect errors in data as well as to assess data-distributions themselves.
Comparable Analysis of the Distribution Functions of Runup Heights of the 1896, 1933 and 2011 Japanese Tsunamis in the Sanriku Area

NASA Astrophysics Data System (ADS)

Choi, B. H.; Min, B. I.; Yoshinobu, T.; Kim, K. O.; Pelinovsky, E.

2012-04-01

Data from a field survey of the 2011 tsunami in the Sanriku area of Japan is presented and used to plot the distribution function of runup heights along the coast. It is shown that the distribution function can be approximated using a theoretical log-normal curve [Choi et al, 2002]. The characteristics of the distribution functions derived from the runup-heights data obtained during the 2011 event are compared with data from two previous gigantic tsunamis (1896 and 1933) that occurred in almost the same region. The number of observations during the last tsunami is very large (more than 5,247), which provides an opportunity to revise the conception of the distribution of tsunami wave heights and the relationship between statistical characteristics and number of observations suggested by Kajiura [1983]. The distribution function of the 2011 event demonstrates the sensitivity to the number of observation points (many of them cannot be considered independent measurements) and can be used to determine the characteristic scale of the coast, which corresponds to the statistical independence of observed wave heights.

Variance stabilization and normalization for one-color microarray data using a data-driven multiscale approach.

PubMed

Motakis, E S; Nason, G P; Fryzlewicz, P; Rutter, G A

2006-10-15

Many standard statistical techniques are effective on data that are normally distributed with constant variance. Microarray data typically violate these assumptions since they come from non-Gaussian distributions with a non-trivial mean-variance relationship. Several methods have been proposed that transform microarray data to stabilize variance and draw its distribution towards the Gaussian. Some methods, such as log or generalized log, rely on an underlying model for the data. Others, such as the spread-versus-level plot, do not. We propose an alternative data-driven multiscale approach, called the Data-Driven Haar-Fisz for microarrays (DDHFm) with replicates. DDHFm has the advantage of being 'distribution-free' in the sense that no parametric model for the underlying microarray data is required to be specified or estimated; hence, DDHFm can be applied very generally, not just to microarray data. DDHFm achieves very good variance stabilization of microarray data with replicates and produces transformed intensities that are approximately normally distributed. Simulation studies show that it performs better than other existing methods. Application of DDHFm to real one-color cDNA data validates these results. The R package of the Data-Driven Haar-Fisz transform (DDHFm) for microarrays is available in Bioconductor and CRAN.
Empirical Correction to the Likelihood Ratio Statistic for Structural Equation Modeling with Many Variables.

PubMed

Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu

2015-06-01

Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.
Statistical procedures for evaluating daily and monthly hydrologic model predictions

USGS Publications Warehouse

Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

2004-01-01

The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.
Normal probability plots with confidence.

PubMed

Chantarangsi, Wanpen; Liu, Wei; Bretz, Frank; Kiatsupaibul, Seksan; Hayter, Anthony J; Wan, Fang

2015-01-01

Normal probability plots are widely used as a statistical tool for assessing whether an observed simple random sample is drawn from a normally distributed population. The users, however, have to judge subjectively, if no objective rule is provided, whether the plotted points fall close to a straight line. In this paper, we focus on how a normal probability plot can be augmented by intervals for all the points so that, if the population distribution is normal, then all the points should fall into the corresponding intervals simultaneously with probability 1-α. These simultaneous 1-α probability intervals provide therefore an objective mean to judge whether the plotted points fall close to the straight line: the plotted points fall close to the straight line if and only if all the points fall into the corresponding intervals. The powers of several normal probability plot based (graphical) tests and the most popular nongraphical Anderson-Darling and Shapiro-Wilk tests are compared by simulation. Based on this comparison, recommendations are given in Section 3 on which graphical tests should be used in what circumstances. An example is provided to illustrate the methods. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Statistical patterns of visual search for hidden objects

PubMed Central

Credidio, Heitor F.; Teixeira, Elisângela N.; Reis, Saulo D. S.; Moreira, André A.; Andrade Jr, José S.

2012-01-01

The movement of the eyes has been the subject of intensive research as a way to elucidate inner mechanisms of cognitive processes. A cognitive task that is rather frequent in our daily life is the visual search for hidden objects. Here we investigate through eye-tracking experiments the statistical properties associated with the search of target images embedded in a landscape of distractors. Specifically, our results show that the twofold process of eye movement, composed of sequences of fixations (small steps) intercalated by saccades (longer jumps), displays characteristic statistical signatures. While the saccadic jumps follow a log-normal distribution of distances, which is typical of multiplicative processes, the lengths of the smaller steps in the fixation trajectories are consistent with a power-law distribution. Moreover, the present analysis reveals a clear transition between a directional serial search to an isotropic random movement as the difficulty level of the searching task is increased. PMID:23226829
Outliers: A Potential Data Problem.

ERIC Educational Resources Information Center

Douzenis, Cordelia; Rakow, Ernest A.

Outliers, extreme data values relative to others in a sample, may distort statistics that assume internal levels of measurement and normal distribution. The outlier may be a valid value or an error. Several procedures are available for identifying outliers, and each may be applied to errors of prediction from the regression lines for utility in a…
A statistical study of ion pitch-angle distributions

NASA Technical Reports Server (NTRS)

Sibeck, D. G.; Mcentire, R. W.; Lui, A. T. Y.; Krimigis, S. M.

1987-01-01

Preliminary results of a statistical study of energetic (34-50 keV) ion pitch-angle distributions (PADs) within 9 Re of earth provide evidence for an orderly pattern consistent with both drift-shell splitting and magnetopause shadowing. Normal ion PADs dominate the dayside and inner magnetosphere. Butterfly PADs typically occur in a narrow belt stretching from dusk to dawn through midnight, where they approach within 6 Re of earth. While those ion butterfly PADs that typically occur on closed drift paths are mainly caused by drift-shell splitting, there is also evidence for magnetopause shadowing in observations of more frequent butterfly PAD occurrence in the outer magnetosphere near dawn than dusk. Isotropic and gradient boundary PADs terminate the tailward extent of the butterfly ion PAD belt.
Thermodynamic Model of Spatial Memory

NASA Astrophysics Data System (ADS)

Kaufman, Miron; Allen, P.

1998-03-01

We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.
A Statistical Model of the Fluctuations in the Geomagnetic Field from Paleosecular Variation to Reversal

PubMed

Camps; Prevot

1996-08-09

The statistical characteristics of the local magnetic field of Earth during paleosecular variation, excursions, and reversals are described on the basis of a database that gathers the cleaned mean direction and average remanent intensity of 2741 lava flows that have erupted over the last 20 million years. A model consisting of a normally distributed axial dipole component plus an independent isotropic set of vectors with a Maxwellian distribution that simulates secular variation fits the range of geomagnetic fluctuations, in terms of both direction and intensity. This result suggests that the magnitude of secular variation vectors is independent of the magnitude of Earth's axial dipole moment and that the amplitude of secular variation is unchanged during reversals.
Circularly-symmetric complex normal ratio distribution for scalar transmissibility functions. Part I: Fundamentals

NASA Astrophysics Data System (ADS)

Yan, Wang-Ji; Ren, Wei-Xin

2016-12-01

Recent advances in signal processing and structural dynamics have spurred the adoption of transmissibility functions in academia and industry alike. Due to the inherent randomness of measurement and variability of environmental conditions, uncertainty impacts its applications. This study is focused on statistical inference for raw scalar transmissibility functions modeled as complex ratio random variables. The goal is achieved through companion papers. This paper (Part I) is dedicated to dealing with a formal mathematical proof. New theorems on multivariate circularly-symmetric complex normal ratio distribution are proved on the basis of principle of probabilistic transformation of continuous random vectors. The closed-form distributional formulas for multivariate ratios of correlated circularly-symmetric complex normal random variables are analytically derived. Afterwards, several properties are deduced as corollaries and lemmas to the new theorems. Monte Carlo simulation (MCS) is utilized to verify the accuracy of some representative cases. This work lays the mathematical groundwork to find probabilistic models for raw scalar transmissibility functions, which are to be expounded in detail in Part II of this study.
Orientational tomography of optical axes directions distributions of multilayer biological tissues birefringent polycrystalline networks

NASA Astrophysics Data System (ADS)

Zabolotna, Natalia I.; Dovhaliuk, Rostyslav Y.

2013-09-01

We present a novel measurement method of optic axes orientation distribution which uses a relatively simple measurement setup. The principal difference of our method from other well-known methods lies in direct approach for measuring the orientation of optical axis of polycrystalline networks biological crystals. Our test polarimetry setup consists of HeNe laser, quarter wave plate, two linear polarizers and a CCD camera. We also propose a methodology for processing of measured optic axes orientation distribution which consists of evaluation of statistical, correlational and spectral moments. Such processing of obtained data can be used to classify particular tissue sample as "healthy" or "pathological". For our experiment we use thin layers of histological section of normal and muscular dystrophy tissue sections. It is shown that the difference between mentioned moments` values of normal and pathological samples can be quite noticeable with relative difference up to 6.26.
Exact and Approximate Statistical Inference for Nonlinear Regression and the Estimating Equation Approach.

PubMed

Demidenko, Eugene

2017-09-01

The exact density distribution of the nonlinear least squares estimator in the one-parameter regression model is derived in closed form and expressed through the cumulative distribution function of the standard normal variable. Several proposals to generalize this result are discussed. The exact density is extended to the estimating equation (EE) approach and the nonlinear regression with an arbitrary number of linear parameters and one intrinsically nonlinear parameter. For a very special nonlinear regression model, the derived density coincides with the distribution of the ratio of two normally distributed random variables previously obtained by Fieller (1932), unlike other approximations previously suggested by other authors. Approximations to the density of the EE estimators are discussed in the multivariate case. Numerical complications associated with the nonlinear least squares are illustrated, such as nonexistence and/or multiple solutions, as major factors contributing to poor density approximation. The nonlinear Markov-Gauss theorem is formulated based on the near exact EE density approximation.
Statistical hypothesis tests of some micrometeorological observations

DOE Office of Scientific and Technical Information (OSTI.GOV)

SethuRaman, S.; Tichler, J.

Chi-square goodness-of-fit is used to test the hypothesis that the medium scale of turbulence in the atmospheric surface layer is normally distributed. Coefficients of skewness and excess are computed from the data. If the data are not normal, these coefficients are used in Edgeworth's asymptotic expansion of Gram-Charlier series to determine an altrnate probability density function. The observed data are then compared with the modified probability densities and the new chi-square values computed.Seventy percent of the data analyzed was either normal or approximatley normal. The coefficient of skewness g/sub 1/ has a good correlation with the chi-square values. Events withmore » vertical-barg/sub 1/vertical-bar<0.21 were normal to begin with and those with 0.21« less
Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1)

PubMed Central

Lingner, Thomas; Kataya, Amr R. A.; Reumann, Sigrun

2012-01-01

We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences.1 As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity.” Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals. PMID:22415050
Experimental and statistical post-validation of positive example EST sequences carrying peroxisome targeting signals type 1 (PTS1).

PubMed

Lingner, Thomas; Kataya, Amr R A; Reumann, Sigrun

2012-02-01

We recently developed the first algorithms specifically for plants to predict proteins carrying peroxisome targeting signals type 1 (PTS1) from genome sequences. As validated experimentally, the prediction methods are able to correctly predict unknown peroxisomal Arabidopsis proteins and to infer novel PTS1 tripeptides. The high prediction performance is primarily determined by the large number and sequence diversity of the underlying positive example sequences, which mainly derived from EST databases. However, a few constructs remained cytosolic in experimental validation studies, indicating sequencing errors in some ESTs. To identify erroneous sequences, we validated subcellular targeting of additional positive example sequences in the present study. Moreover, we analyzed the distribution of prediction scores separately for each orthologous group of PTS1 proteins, which generally resembled normal distributions with group-specific mean values. The cytosolic sequences commonly represented outliers of low prediction scores and were located at the very tail of a fitted normal distribution. Three statistical methods for identifying outliers were compared in terms of sensitivity and specificity." Their combined application allows elimination of erroneous ESTs from positive example data sets. This new post-validation method will further improve the prediction accuracy of both PTS1 and PTS2 protein prediction models for plants, fungi, and mammals.
Blood pressure in head‐injured patients

PubMed Central

Mitchell, Patrick; Gregson, Barbara A; Piper, Ian; Citerio, Giuseppe; Mendelow, A David; Chambers, Iain R

2007-01-01

Objective To determine the statistical characteristics of blood pressure (BP) readings from a large number of head‐injured patients. Methods The BrainIT group has collected high time‐resolution physiological and clinical data from head‐injured patients who require intracranial pressure (ICP) monitoring. The statistical features of this dataset of BP measurements with time resolution of 1 min from 200 patients is examined. The distributions of BP measurements and their relationship with simultaneous ICP measurements are described. Results The distributions of mean, systolic and diastolic readings are close to normal with modest skewing towards higher values. There is a trend towards an increase in blood pressure with advancing age, but this is not significant. Simultaneous blood pressure and ICP values suggest a triphasic relationship with a BP rising at 0.28 mm Hg/mm Hg of ICP, for ICP up to 32 mm Hg, and 0.9 mm Hg/mm Hg of ICP for ICP from 33 to 55 mm Hg, and falling sharply with rising ICP for ICP >55 mm Hg. Conclusions Patients with head injury appear to have a near normal distribution of blood pressure readings that are skewed towards higher values. The relationship between BP and ICP may be triphasic. PMID:17138594
[The research protocol VI: How to choose the appropriate statistical test. Inferential statistics].

PubMed

Flores-Ruiz, Eric; Miranda-Novales, María Guadalupe; Villasís-Keever, Miguel Ángel

2017-01-01

The statistical analysis can be divided in two main components: descriptive analysis and inferential analysis. An inference is to elaborate conclusions from the tests performed with the data obtained from a sample of a population. Statistical tests are used in order to establish the probability that a conclusion obtained from a sample is applicable to the population from which it was obtained. However, choosing the appropriate statistical test in general poses a challenge for novice researchers. To choose the statistical test it is necessary to take into account three aspects: the research design, the number of measurements and the scale of measurement of the variables. Statistical tests are divided into two sets, parametric and nonparametric. Parametric tests can only be used if the data show a normal distribution. Choosing the right statistical test will make it easier for readers to understand and apply the results.
Quantum jumps on Anderson attractors

NASA Astrophysics Data System (ADS)

Yusipov, I. I.; Laptyeva, T. V.; Ivanchenko, M. V.

2018-01-01

In a closed single-particle quantum system, spatial disorder induces Anderson localization of eigenstates and halts wave propagation. The phenomenon is vulnerable to interaction with environment and decoherence that is believed to restore normal diffusion. We demonstrate that for a class of experimentally feasible non-Hermitian dissipators, which admit signatures of localization in asymptotic states, quantum particle opts between diffusive and ballistic regimes, depending on the phase parameter of dissipators, with sticking about localization centers. In a diffusive regime, statistics of quantum jumps is non-Poissonian and has a power-law interval, a footprint of intermittent locking in Anderson modes. Ballistic propagation reflects dispersion of an ordered lattice and introduces the second timescale for jumps, resulting in non-nonmonotonous probability distribution. Hermitian dephasing dissipation makes localization features vanish, and Poissonian jump statistics along with normal diffusion are recovered.
Anomaly detection in hyperspectral imagery: statistics vs. graph-based algorithms

NASA Astrophysics Data System (ADS)

Berkson, Emily E.; Messinger, David W.

2016-05-01

Anomaly detection (AD) algorithms are frequently applied to hyperspectral imagery, but different algorithms produce different outlier results depending on the image scene content and the assumed background model. This work provides the first comparison of anomaly score distributions between common statistics-based anomaly detection algorithms (RX and subspace-RX) and the graph-based Topological Anomaly Detector (TAD). Anomaly scores in statistical AD algorithms should theoretically approximate a chi-squared distribution; however, this is rarely the case with real hyperspectral imagery. The expected distribution of scores found with graph-based methods remains unclear. We also look for general trends in algorithm performance with varied scene content. Three separate scenes were extracted from the hyperspectral MegaScene image taken over downtown Rochester, NY with the VIS-NIR-SWIR ProSpecTIR instrument. In order of most to least cluttered, we study an urban, suburban, and rural scene. The three AD algorithms were applied to each scene, and the distributions of the most anomalous 5% of pixels were compared. We find that subspace-RX performs better than RX, because the data becomes more normal when the highest variance principal components are removed. We also see that compared to statistical detectors, anomalies detected by TAD are easier to separate from the background. Due to their different underlying assumptions, the statistical and graph-based algorithms highlighted different anomalies within the urban scene. These results will lead to a deeper understanding of these algorithms and their applicability across different types of imagery.
A probabilistic approach to photovoltaic generator performance prediction

NASA Astrophysics Data System (ADS)

Khallat, M. A.; Rahman, S.

1986-09-01

A method for predicting the performance of a photovoltaic (PV) generator based on long term climatological data and expected cell performance is described. The equations for cell model formulation are provided. Use of the statistical model for characterizing the insolation level is discussed. The insolation data is fitted to appropriate probability distribution functions (Weibull, beta, normal). The probability distribution functions are utilized to evaluate the capacity factors of PV panels or arrays. An example is presented revealing the applicability of the procedure.

Modified retrieval algorithm for three types of precipitation distribution using x-band synthetic aperture radar

NASA Astrophysics Data System (ADS)

Xie, Yanan; Zhou, Mingliang; Pan, Dengke

2017-10-01

The forward-scattering model is introduced to describe the response of normalized radar cross section (NRCS) of precipitation with synthetic aperture radar (SAR). Since the distribution of near-surface rainfall is related to the rate of near-surface rainfall and horizontal distribution factor, a retrieval algorithm called modified regression empirical and model-oriented statistical (M-M) based on the volterra integration theory is proposed. Compared with the model-oriented statistical and volterra integration (MOSVI) algorithm, the biggest difference is that the M-M algorithm is based on the modified regression empirical algorithm rather than the linear regression formula to retrieve the value of near-surface rainfall rate. Half of the empirical parameters are reduced in the weighted integral work and a smaller average relative error is received while the rainfall rate is less than 100 mm/h. Therefore, the algorithm proposed in this paper can obtain high-precision rainfall information.
Algorithm of probabilistic assessment of fully-mechanized longwall downtime

NASA Astrophysics Data System (ADS)

Domrachev, A. N.; Rib, S. V.; Govorukhin, Yu M.; Krivopalov, V. G.

2017-09-01

The problem of increasing the load on a long fully-mechanized longwall has several aspects, one of which is the improvement of efficiency in using available stoping equipment due to the increase in coefficient of the machine operating time of a shearer and other mining machines that form an integral part of the longwall set of equipment. The task of predicting the reliability indicators of stoping equipment is solved by the statistical evaluation of parameters of downtime exponential distribution and failure recovery. It is more difficult to solve the problems of downtime accounting in case of accidents in the face workings and, despite the statistical data on accidents in mine workings, no solution has been found to date. The authors have proposed a variant of probability assessment of workings caving using Poisson distribution and the duration of their restoration using normal distribution. The above results confirm the possibility of implementing the approach proposed by the authors.
Impact of distributions on the archetypes and prototypes in heterogeneous nanoparticle ensembles.

PubMed

Fernandez, Michael; Wilson, Hugh F; Barnard, Amanda S

2017-01-05

The magnitude and complexity of the structural and functional data available on nanomaterials requires data analytics, statistical analysis and information technology to drive discovery. We demonstrate that multivariate statistical analysis can recognise the sets of truly significant nanostructures and their most relevant properties in heterogeneous ensembles with different probability distributions. The prototypical and archetypal nanostructures of five virtual ensembles of Si quantum dots (SiQDs) with Boltzmann, frequency, normal, Poisson and random distributions are identified using clustering and archetypal analysis, where we find that their diversity is defined by size and shape, regardless of the type of distribution. At the complex hull of the SiQD ensembles, simple configuration archetypes can efficiently describe a large number of SiQDs, whereas more complex shapes are needed to represent the average ordering of the ensembles. This approach provides a route towards the characterisation of computationally intractable virtual nanomaterial spaces, which can convert big data into smart data, and significantly reduce the workload to simulate experimentally relevant virtual samples.
SU-D-BRD-07: Evaluation of the Effectiveness of Statistical Process Control Methods to Detect Systematic Errors For Routine Electron Energy Verification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parker, S

2015-06-15

Purpose: To evaluate the ability of statistical process control methods to detect systematic errors when using a two dimensional (2D) detector array for routine electron beam energy verification. Methods: Electron beam energy constancy was measured using an aluminum wedge and a 2D diode array on four linear accelerators. Process control limits were established. Measurements were recorded in control charts and compared with both calculated process control limits and TG-142 recommended specification limits. The data was tested for normality, process capability and process acceptability. Additional measurements were recorded while systematic errors were intentionally introduced. Systematic errors included shifts in the alignmentmore » of the wedge, incorrect orientation of the wedge, and incorrect array calibration. Results: Control limits calculated for each beam were smaller than the recommended specification limits. Process capability and process acceptability ratios were greater than one in all cases. All data was normally distributed. Shifts in the alignment of the wedge were most apparent for low energies. The smallest shift (0.5 mm) was detectable using process control limits in some cases, while the largest shift (2 mm) was detectable using specification limits in only one case. The wedge orientation tested did not affect the measurements as this did not affect the thickness of aluminum over the detectors of interest. Array calibration dependence varied with energy and selected array calibration. 6 MeV was the least sensitive to array calibration selection while 16 MeV was the most sensitive. Conclusion: Statistical process control methods demonstrated that the data distribution was normally distributed, the process was capable of meeting specifications, and that the process was centered within the specification limits. Though not all systematic errors were distinguishable from random errors, process control limits increased the ability to detect systematic errors using routine measurement of electron beam energy constancy.« less
Indexing the relative abundance of age-0 white sturgeons in an impoundment of the lower Columbia River from highly skewed trawling data

USGS Publications Warehouse

Counihan, T.D.; Miller, Allen I.; Parsley, M.J.

1999-01-01

The development of recruitment monitoring programs for age-0 white sturgeons Acipenser transmontanus is complicated by the statistical properties of catch-per-unit-effort (CPUE) data. We found that age-0 CPUE distributions from bottom trawl surveys violated assumptions of statistical procedures based on normal probability theory. Further, no single data transformation uniformly satisfied these assumptions because CPUE distribution properties varied with the sample mean (??(CPUE)). Given these analytic problems, we propose that an additional index of age-0 white sturgeon relative abundance, the proportion of positive tows (Ep), be used to estimate sample sizes before conducting age-0 recruitment surveys and to evaluate statistical hypothesis tests comparing the relative abundance of age-0 white sturgeons among years. Monte Carlo simulations indicated that Ep was consistently more precise than ??(CPUE), and because Ep is binomially rather than normally distributed, surveys can be planned and analyzed without violating the assumptions of procedures based on normal probability theory. However, we show that Ep may underestimate changes in relative abundance at high levels and confound our ability to quantify responses to management actions if relative abundance is consistently high. If data suggest that most samples will contain age-0 white sturgeons, estimators of relative abundance other than Ep should be considered. Because Ep may also obscure correlations to climatic and hydrologic variables if high abundance levels are present in time series data, we recommend ??(CPUE) be used to describe relations to environmental variables. The use of both Ep and ??(CPUE) will facilitate the evaluation of hypothesis tests comparing relative abundance levels and correlations to variables affecting age-0 recruitment. Estimated sample sizes for surveys should therefore be based on detecting predetermined differences in Ep, but data necessary to calculate ??(CPUE) should also be collected.
Multivariate probability distribution for sewer system vulnerability assessment under data-limited conditions.

PubMed

Del Giudice, G; Padulano, R; Siciliano, D

2016-01-01

The lack of geometrical and hydraulic information about sewer networks often excludes the adoption of in-deep modeling tools to obtain prioritization strategies for funds management. The present paper describes a novel statistical procedure for defining the prioritization scheme for preventive maintenance strategies based on a small sample of failure data collected by the Sewer Office of the Municipality of Naples (IT). Novelty issues involve, among others, considering sewer parameters as continuous statistical variables and accounting for their interdependences. After a statistical analysis of maintenance interventions, the most important available factors affecting the process are selected and their mutual correlations identified. Then, after a Box-Cox transformation of the original variables, a methodology is provided for the evaluation of a vulnerability map of the sewer network by adopting a joint multivariate normal distribution with different parameter sets. The goodness-of-fit is eventually tested for each distribution by means of a multivariate plotting position. The developed methodology is expected to assist municipal engineers in identifying critical sewers, prioritizing sewer inspections in order to fulfill rehabilitation requirements.
Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions.

PubMed

Broberg, Per

2013-07-19

One major concern with adaptive designs, such as the sample size adjustable designs, has been the fear of inflating the type I error rate. In (Stat Med 23:1023-1038, 2004) it is however proven that when observations follow a normal distribution and the interim result show promise, meaning that the conditional power exceeds 50%, type I error rate is protected. This bound and the distributional assumptions may seem to impose undesirable restrictions on the use of these designs. In (Stat Med 30:3267-3284, 2011) the possibility of going below 50% is explored and a region that permits an increased sample size without inflation is defined in terms of the conditional power at the interim. A criterion which is implicit in (Stat Med 30:3267-3284, 2011) is derived by elementary methods and expressed in terms of the test statistic at the interim to simplify practical use. Mathematical and computational details concerning this criterion are exhibited. Under very general conditions the type I error rate is preserved under sample size adjustable schemes that permit a raise. The main result states that for normally distributed observations raising the sample size when the result looks promising, where the definition of promising depends on the amount of knowledge gathered so far, guarantees the protection of the type I error rate. Also, in the many situations where the test statistic approximately follows a normal law, the deviation from the main result remains negligible. This article provides details regarding the Weibull and binomial distributions and indicates how one may approach these distributions within the current setting. There is thus reason to consider such designs more often, since they offer a means of adjusting an important design feature at little or no cost in terms of error rate.
High cumulants of conserved charges and their statistical uncertainties

NASA Astrophysics Data System (ADS)

Li-Zhu, Chen; Ye-Yin, Zhao; Xue, Pan; Zhi-Ming, Li; Yuan-Fang, Wu

2017-10-01

We study the influence of measured high cumulants of conserved charges on their associated statistical uncertainties in relativistic heavy-ion collisions. With a given number of events, the measured cumulants randomly fluctuate with an approximately normal distribution, while the estimated statistical uncertainties are found to be correlated with corresponding values of the obtained cumulants. Generally, with a given number of events, the larger the cumulants we measure, the larger the statistical uncertainties that are estimated. The error-weighted averaged cumulants are dependent on statistics. Despite this effect, however, it is found that the three sigma rule of thumb is still applicable when the statistics are above one million. Supported by NSFC (11405088, 11521064, 11647093), Major State Basic Research Development Program of China (2014CB845402) and Ministry of Science and Technology (MoST) (2016YFE0104800)
Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arul Kumar, M.; Wroński, M.; McCabe, Rodney James

In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less
Role of microstructure on twin nucleation and growth in HCP titanium: A statistical study

DOE PAGES

Arul Kumar, M.; Wroński, M.; McCabe, Rodney James; ...

2018-02-01

In this study, a detailed statistical analysis is performed using Electron Back Scatter Diffraction (EBSD) to establish the effect of microstructure on twin nucleation and growth in deformed commercial purity hexagonal close packed (HCP) titanium. Rolled titanium samples are compressed along rolling, transverse and normal directions to establish statistical correlations for {10–12}, {11–21}, and {11–22} twins. A recently developed automated EBSD-twinning analysis software is employed for the statistical analysis. Finally, the analysis provides the following key findings: (I) grain size and strain dependence is different for twin nucleation and growth; (II) twinning statistics can be generalized for the HCP metalsmore » magnesium, zirconium and titanium; and (III) complex microstructure, where grain shape and size distribution is heterogeneous, requires multi-point statistical correlations.« less
On the intrinsic shape of the gamma-ray spectrum for Fermi blazars

NASA Astrophysics Data System (ADS)

Kang, Shi-Ju; Wu, Qingwen; Zheng, Yong-Gang; Yin, Yue; Song, Jia-Li; Zou, Hang; Feng, Jian-Chao; Dong, Ai-Jun; Wu, Zhong-Zu; Zhang, Zhi-Bin; Wu, Lin-Hui

2018-05-01

The curvature of the γ-ray spectrumin blazarsmay reflect the intrinsic distribution of emitting electrons, which will further give some information on the possible acceleration and cooling processes in the emitting region. The γ-ray spectra of Fermi blazars are normally fitted either by a single power-law (PL) or a log-normal (call Logarithmic Parabola, LP) form. The possible reason for this difference is not clear. We statistically explore this issue based on the different observational properties of 1419 Fermi blazars in the 3LAC Clean Sample.We find that the γ-ray flux (100MeV–100GeV) and variability index follow bimodal distributions for PL and LP blazars, where the γ-ray flux and variability index show a positive correlation. However, the distributions of γ-ray luminosity and redshift follow a unimodal distribution. Our results suggest that the bimodal distribution of γ-ray fluxes for LP and PL blazars may not be intrinsic and all blazars may have an intrinsically curved γ-ray spectrum, and the PL spectrum is just caused by the fitting effect due to less photons.
Double stars with wide separations in the AGK3 - II. The wide binaries and the multiple systems*

NASA Astrophysics Data System (ADS)

Halbwachs, J.-L.; Mayor, M.; Udry, S.

2017-02-01

A large observation programme was carried out to measure the radial velocities of the components of a selection of common proper motion (CPM) stars to select the physical binaries. 80 wide binaries (WBs) were detected, and 39 optical pairs were identified. By adding CPM stars with separations close enough to be almost certain that they are physical, a bias-controlled sample of 116 WBs was obtained, and used to derive the distribution of separations from 100 to 30 000 au. The distribution obtained does not match the log-constant distribution, but agrees with the log-normal distribution. The spectroscopic binaries detected among the WB components were used to derive statistical information about the multiple systems. The close binaries in WBs seem to be like those detected in other field stars. As for the WBs, they seem to obey the log-normal distribution of periods. The number of quadruple systems agrees with the no correlation hypothesis; this indicates that an environment conducive to the formation of WBs does not favour the formation of subsystems with periods shorter than 10 yr.
Landscape-scale spatial abundance distributions discriminate core from random components of boreal lake bacterioplankton.

PubMed

Niño-García, Juan Pablo; Ruiz-González, Clara; Del Giorgio, Paul A

2016-12-01

Aquatic bacterial communities harbour thousands of coexisting taxa. To meet the challenge of discriminating between a 'core' and a sporadically occurring 'random' component of these communities, we explored the spatial abundance distribution of individual bacterioplankton taxa across 198 boreal lakes and their associated fluvial networks (188 rivers). We found that all taxa could be grouped into four distinct categories based on model statistical distributions (normal like, bimodal, logistic and lognormal). The distribution patterns across lakes and their associated river networks showed that lake communities are composed of a core of taxa whose distribution appears to be linked to in-lake environmental sorting (normal-like and bimodal categories), and a large fraction of mostly rare bacteria (94% of all taxa) whose presence appears to be largely random and linked to downstream transport in aquatic networks (logistic and lognormal categories). These rare taxa are thus likely to reflect species sorting at upstream locations, providing a perspective of the conditions prevailing in entire aquatic networks rather than only in lakes. © 2016 John Wiley & Sons Ltd/CNRS.
Analysis of data from NASA B-57B gust gradient program

NASA Technical Reports Server (NTRS)

Frost, W.; Lin, M. C.; Chang, H. P.; Ringnes, E.

1985-01-01

Statistical analysis of the turbulence measured in flight 6 of the NASA B-57B over Denver, Colorado, from July 7 to July 23, 1982 included the calculations of average turbulence parameters, integral length scales, probability density functions, single point autocorrelation coefficients, two point autocorrelation coefficients, normalized autospectra, normalized two point autospectra, and two point cross sectra for gust velocities. The single point autocorrelation coefficients were compared with the theoretical model developed by von Karman. Theoretical analyses were developed which address the effects spanwise gust distributions, using two point spatial turbulence correlations.
Statistical process control applied to mechanized peanut sowing as a function of soil texture.

PubMed

Zerbato, Cristiano; Furlani, Carlos Eduardo Angeli; Ormond, Antonio Tassio Santana; Gírio, Lucas Augusto da Silva; Carneiro, Franciele Morlin; da Silva, Rouverson Pereira

2017-01-01

The successful establishment of agricultural crops depends on sowing quality, machinery performance, soil type and conditions, among other factors. This study evaluates the operational quality of mechanized peanut sowing in three soil types (sand, silt, and clay) with variable moisture contents. The experiment was conducted in three locations in the state of São Paulo, Brazil. The track-sampling scheme was used for 80 sampling locations of each soil type. Descriptive statistics and statistical process control (SPC) were used to evaluate the quality indicators of mechanized peanut sowing. The variables had normal distributions and were stable from the viewpoint of SPC. The best performance for peanut sowing density, normal spacing, and the initial seedling growing stand was found for clayey soil followed by sandy soil and then silty soil. Sandy or clayey soils displayed similar results regarding sowing depth, which was deeper than in the silty soil. Overall, the texture and the moisture of clayey soil provided the best operational performance for mechanized peanut sowing.
Assessment of variations in thermal cycle life data of thermal barrier coated rods

NASA Astrophysics Data System (ADS)

Hendricks, R. C.; McDonald, G.

An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.
Assessment of variations in thermal cycle life data of thermal barrier coated rods

NASA Technical Reports Server (NTRS)

Hendricks, R. C.; Mcdonald, G.

1981-01-01

An analysis of thermal cycle life data for 22 thermal barrier coated (TBC) specimens was conducted. The Zr02-8Y203/NiCrAlY plasma spray coated Rene 41 rods were tested in a Mach 0.3 Jet A/air burner flame. All specimens were subjected to the same coating and subsequent test procedures in an effort to control three parametric groups; material properties, geometry and heat flux. Statistically, the data sample space had a mean of 1330 cycles with a standard deviation of 520 cycles. The data were described by normal or log-normal distributions, but other models could also apply; the sample size must be increased to clearly delineate a statistical failure model. The statistical methods were also applied to adhesive/cohesive strength data for 20 TBC discs of the same composition, with similar results. The sample space had a mean of 9 MPa with a standard deviation of 4.2 MPa.
Statistical process control applied to mechanized peanut sowing as a function of soil texture

PubMed Central

Furlani, Carlos Eduardo Angeli; da Silva, Rouverson Pereira

2017-01-01

The successful establishment of agricultural crops depends on sowing quality, machinery performance, soil type and conditions, among other factors. This study evaluates the operational quality of mechanized peanut sowing in three soil types (sand, silt, and clay) with variable moisture contents. The experiment was conducted in three locations in the state of São Paulo, Brazil. The track-sampling scheme was used for 80 sampling locations of each soil type. Descriptive statistics and statistical process control (SPC) were used to evaluate the quality indicators of mechanized peanut sowing. The variables had normal distributions and were stable from the viewpoint of SPC. The best performance for peanut sowing density, normal spacing, and the initial seedling growing stand was found for clayey soil followed by sandy soil and then silty soil. Sandy or clayey soils displayed similar results regarding sowing depth, which was deeper than in the silty soil. Overall, the texture and the moisture of clayey soil provided the best operational performance for mechanized peanut sowing. PMID:28742095
Likert scales, levels of measurement and the "laws" of statistics.

PubMed

Norman, Geoff

2010-12-01

Reviewers of research reports frequently criticize the choice of statistical methods. While some of these criticisms are well-founded, frequently the use of various parametric methods such as analysis of variance, regression, correlation are faulted because: (a) the sample size is too small, (b) the data may not be normally distributed, or (c) The data are from Likert scales, which are ordinal, so parametric statistics cannot be used. In this paper, I dissect these arguments, and show that many studies, dating back to the 1930s consistently show that parametric statistics are robust with respect to violations of these assumptions. Hence, challenges like those above are unfounded, and parametric methods can be utilized without concern for "getting the wrong answer".
The social architecture of capitalism

NASA Astrophysics Data System (ADS)

Wright, Ian

2005-02-01

A dynamic model of the social relations between workers and capitalists is introduced. The model self-organises into a dynamic equilibrium with statistical properties that are in close qualitative and in many cases quantitative agreement with a broad range of known empirical distributions of developed capitalism, including the power-law firm size distribution, the Laplace firm and GDP growth distribution, the lognormal firm demises distribution, the exponential recession duration distribution, the lognormal-Pareto income distribution, and the gamma-like firm rate-of-profit distribution. Normally these distributions are studied in isolation, but this model unifies and connects them within a single causal framework. The model also generates business cycle phenomena, including fluctuating wage and profit shares in national income about values consistent with empirical studies. The generation of an approximately lognormal-Pareto income distribution and an exponential-Pareto wealth distribution demonstrates that the power-law regime of the income distribution can be explained by an additive process on a power-law network that models the social relation between employers and employees organised in firms, rather than a multiplicative process that models returns to investment in financial markets. A testable consequence of the model is the conjecture that the rate-of-profit distribution is consistent with a parameter-mix of a ratio of normal variates with means and variances that depend on a firm size parameter that is distributed according to a power-law.

A normalization method for combination of laboratory test results from different electronic healthcare databases in a distributed research network.

PubMed

Yoon, Dukyong; Schuemie, Martijn J; Kim, Ju Han; Kim, Dong Ki; Park, Man Young; Ahn, Eun Kyoung; Jung, Eun-Young; Park, Dong Kyun; Cho, Soo Yeon; Shin, Dahye; Hwang, Yeonsoo; Park, Rae Woong

2016-03-01

Distributed research networks (DRNs) afford statistical power by integrating observational data from multiple partners for retrospective studies. However, laboratory test results across care sites are derived using different assays from varying patient populations, making it difficult to simply combine data for analysis. Additionally, existing normalization methods are not suitable for retrospective studies. We normalized laboratory results from different data sources by adjusting for heterogeneous clinico-epidemiologic characteristics of the data and called this the subgroup-adjusted normalization (SAN) method. Subgroup-adjusted normalization renders the means and standard deviations of distributions identical under population structure-adjusted conditions. To evaluate its performance, we compared SAN with existing methods for simulated and real datasets consisting of blood urea nitrogen, serum creatinine, hematocrit, hemoglobin, serum potassium, and total bilirubin. Various clinico-epidemiologic characteristics can be applied together in SAN. For simplicity of comparison, age and gender were used to adjust population heterogeneity in this study. In simulations, SAN had the lowest standardized difference in means (SDM) and Kolmogorov-Smirnov values for all tests (p < 0.05). In a real dataset, SAN had the lowest SDM and Kolmogorov-Smirnov values for blood urea nitrogen, hematocrit, hemoglobin, and serum potassium, and the lowest SDM for serum creatinine (p < 0.05). Subgroup-adjusted normalization performed better than normalization using other methods. The SAN method is applicable in a DRN environment and should facilitate analysis of data integrated across DRN partners for retrospective observational studies. Copyright © 2015 John Wiley & Sons, Ltd.
Performance of Modified Test Statistics in Covariance and Correlation Structure Analysis under Conditions of Multivariate Nonnormality.

ERIC Educational Resources Information Center

Fouladi, Rachel T.

2000-01-01

Provides an overview of standard and modified normal theory and asymptotically distribution-free covariance and correlation structure analysis techniques and details Monte Carlo simulation results on Type I and Type II error control. Demonstrates through the simulation that robustness and nonrobustness of structure analysis techniques vary as a…
Reliability formulation for the strength and fire endurance of glued-laminated beams

Treesearch

D. A. Bender

A model was developed for predicting the statistical distribution of glued-laminated beam strength and stiffness under normal temperature conditions using available long span modulus of elasticity data, end joint tension test data, and tensile strength data for laminating-grade lumber. The beam strength model predictions compared favorably with test data for glued-...
Aggregate and Individual Replication Probability within an Explicit Model of the Research Process

ERIC Educational Resources Information Center

Miller, Jeff; Schwarz, Wolf

2011-01-01

We study a model of the research process in which the true effect size, the replication jitter due to changes in experimental procedure, and the statistical error of effect size measurement are all normally distributed random variables. Within this model, we analyze the probability of successfully replicating an initial experimental result by…
Pedagogical Simulation of Sampling Distributions and the Central Limit Theorem

ERIC Educational Resources Information Center

Hagtvedt, Reidar; Jones, Gregory Todd; Jones, Kari

2007-01-01

Students often find the fact that a sample statistic is a random variable very hard to grasp. Even more mysterious is why a sample mean should become ever more Normal as the sample size increases. This simulation tool is meant to illustrate the process, thereby giving students some intuitive grasp of the relationship between a parent population…
Association of gene polymorphisms in ABO blood group chromosomal regions and menstrual disorders

PubMed Central

SU, YONG; KONG, GUI-LIAN; SU, YA-LI; ZHOU, YAN; LV, LI-FANG; WANG, QIONG; HUANG, BAO-PING; ZHENG, RUI-ZHI; LI, QUAN-ZHONG; YUAN, HUI-JUAN; ZHAO, ZHI-GANG

2015-01-01

This study aimed to investigate whether single nucleotide polymorphisms (SNPs) located near the gene of the ABO blood group play an important role in the genetic aetiology of menstrual disorders (MDs). Polymerase chain reaction-ligase detection reaction technology was used to detect eight SNPs near the ABO gene location on the chromosomes in 250 cases of MD and 250 cases of normal menstruation. The differences in the distribution of each genotype, as well as the allele frequency in the normal and control groups, were analysed using Pearson's χ2 test to search for disease-associated loci. SHEsis software was used to analyse the linkage disequilibrium and haplotype frequencies and to inspect the correlation between haplotypes and the disease. Compared with the control group, the experimental group exhibited statistically significant differences in the genotype distribution frequencies of the rs657152 locus of the ABO blood group gene and the rs17250673 locus of the tumour necrosis factor cofactor 2 (TRAF2) gene, which is located downstream of the ABO gene. The allele distribution frequencies of rs657152 and rs495828 loci in the ABO blood group gene exhibited significant differences between the groups. Dominant and recessive genetic model analysis of each locus revealed that the experimental group exhibited statistically significant differences from the control group in the genotype distribution frequencies of rs657152 and rs495828 loci, respectively. These results indicate that the ABO blood group gene and TRAF2 gene may be a cause of MDs. PMID:26136981
On designing a new cumulative sum Wilcoxon signed rank chart for monitoring process location

PubMed Central

Nazir, Hafiz Zafar; Tahir, Muhammad; Riaz, Muhammad

2018-01-01

In this paper, ranked set sampling is used for developing a non-parametric location chart which is developed on the basis of Wilcoxon signed rank statistic. The average run length and some other characteristics of run length are used as the measures to assess the performance of the proposed scheme. Some selective distributions including Laplace (or double exponential), logistic, normal, contaminated normal and student’s t-distributions are considered to examine the performance of the proposed Wilcoxon signed rank control chart. It has been observed that the proposed scheme shows superior shift detection ability than some of the competing counterpart schemes covered in this study. Moreover, the proposed control chart is also implemented and illustrated with a real data set. PMID:29664919
Estimation and confidence intervals for empirical mixing distributions

USGS Publications Warehouse

Link, W.A.; Sauer, J.R.

1995-01-01

Questions regarding collections of parameter estimates can frequently be expressed in terms of an empirical mixing distribution (EMD). This report discusses empirical Bayes estimation of an EMD, with emphasis on the construction of interval estimates. Estimation of the EMD is accomplished by substitution of estimates of prior parameters in the posterior mean of the EMD. This procedure is examined in a parametric model (the normal-normal mixture) and in a semi-parametric model. In both cases, the empirical Bayes bootstrap of Laird and Louis (1987, Journal of the American Statistical Association 82, 739-757) is used to assess the variability of the estimated EMD arising from the estimation of prior parameters. The proposed methods are applied to a meta-analysis of population trend estimates for groups of birds.
Statistics Refresher for Molecular Imaging Technologists, Part 2: Accuracy of Interpretation, Significance, and Variance.

PubMed

Farrell, Mary Beth

2018-06-01

This article is the second part of a continuing education series reviewing basic statistics that nuclear medicine and molecular imaging technologists should understand. In this article, the statistics for evaluating interpretation accuracy, significance, and variance are discussed. Throughout the article, actual statistics are pulled from the published literature. We begin by explaining 2 methods for quantifying interpretive accuracy: interreader and intrareader reliability. Agreement among readers can be expressed simply as a percentage. However, the Cohen κ-statistic is a more robust measure of agreement that accounts for chance. The higher the κ-statistic is, the higher is the agreement between readers. When 3 or more readers are being compared, the Fleiss κ-statistic is used. Significance testing determines whether the difference between 2 conditions or interventions is meaningful. Statistical significance is usually expressed using a number called a probability ( P ) value. Calculation of P value is beyond the scope of this review. However, knowing how to interpret P values is important for understanding the scientific literature. Generally, a P value of less than 0.05 is considered significant and indicates that the results of the experiment are due to more than just chance. Variance, standard deviation (SD), confidence interval, and standard error (SE) explain the dispersion of data around a mean of a sample drawn from a population. SD is commonly reported in the literature. A small SD indicates that there is not much variation in the sample data. Many biologic measurements fall into what is referred to as a normal distribution taking the shape of a bell curve. In a normal distribution, 68% of the data will fall within 1 SD, 95% will fall within 2 SDs, and 99.7% will fall within 3 SDs. Confidence interval defines the range of possible values within which the population parameter is likely to lie and gives an idea of the precision of the statistic being measured. A wide confidence interval indicates that if the experiment were repeated multiple times on other samples, the measured statistic would lie within a wide range of possibilities. The confidence interval relies on the SE. © 2018 by the Society of Nuclear Medicine and Molecular Imaging.
Exact infinite-time statistics of the Loschmidt echo for a quantum quench.

PubMed

Campos Venuti, Lorenzo; Jacobson, N Tobias; Santra, Siddhartha; Zanardi, Paolo

2011-07-01

The equilibration dynamics of a closed quantum system is encoded in the long-time distribution function of generic observables. In this Letter we consider the Loschmidt echo generalized to finite temperature, and show that we can obtain an exact expression for its long-time distribution for a closed system described by a quantum XY chain following a sudden quench. In the thermodynamic limit the logarithm of the Loschmidt echo becomes normally distributed, whereas for small quenches in the opposite, quasicritical regime, the distribution function acquires a universal double-peaked form indicating poor equilibration. These findings, obtained by a central limit theorem-type result, extend to completely general models in the small-quench regime.
Evidence for criticality in financial data

NASA Astrophysics Data System (ADS)

Ruiz, G.; de Marcos, A. F.

2018-01-01

We provide evidence that cumulative distributions of absolute normalized returns for the 100 American companies with the highest market capitalization, uncover a critical behavior for different time scales Δt. Such cumulative distributions, in accordance with a variety of complex - and financial - systems, can be modeled by the cumulative distribution functions of q-Gaussians, the distribution function that, in the context of nonextensive statistical mechanics, maximizes a non-Boltzmannian entropy. These q-Gaussians are characterized by two parameters, namely ( q, β), that are uniquely defined by Δt. From these dependencies, we find a monotonic relationship between q and β, which can be seen as evidence of criticality. We numerically determine the various exponents which characterize this criticality.
Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials

PubMed Central

Jiang, Xuejun; Guo, Xu; Zhang, Ning; Wang, Bo

2018-01-01

This article presents and investigates performance of a series of robust multivariate nonparametric tests for detection of location shift between two multivariate samples in randomized controlled trials. The tests are built upon robust estimators of distribution locations (medians, Hodges-Lehmann estimators, and an extended U statistic) with both unscaled and scaled versions. The nonparametric tests are robust to outliers and do not assume that the two samples are drawn from multivariate normal distributions. Bootstrap and permutation approaches are introduced for determining the p-values of the proposed test statistics. Simulation studies are conducted and numerical results are reported to examine performance of the proposed statistical tests. The numerical results demonstrate that the robust multivariate nonparametric tests constructed from the Hodges-Lehmann estimators are more efficient than those based on medians and the extended U statistic. The permutation approach can provide a more stringent control of Type I error and is generally more powerful than the bootstrap procedure. The proposed robust nonparametric tests are applied to detect multivariate distributional difference between the intervention and control groups in the Thai Healthy Choices study and examine the intervention effect of a four-session motivational interviewing-based intervention developed in the study to reduce risk behaviors among youth living with HIV. PMID:29672555
Determining prescription durations based on the parametric waiting time distribution.

PubMed

Støvring, Henrik; Pottegård, Anton; Hallas, Jesper

2016-12-01

The purpose of the study is to develop a method to estimate the duration of single prescriptions in pharmacoepidemiological studies when the single prescription duration is not available. We developed an estimation algorithm based on maximum likelihood estimation of a parametric two-component mixture model for the waiting time distribution (WTD). The distribution component for prevalent users estimates the forward recurrence density (FRD), which is related to the distribution of time between subsequent prescription redemptions, the inter-arrival density (IAD), for users in continued treatment. We exploited this to estimate percentiles of the IAD by inversion of the estimated FRD and defined the duration of a prescription as the time within which 80% of current users will have presented themselves again. Statistical properties were examined in simulation studies, and the method was applied to empirical data for four model drugs: non-steroidal anti-inflammatory drugs (NSAIDs), warfarin, bendroflumethiazide, and levothyroxine. Simulation studies found negligible bias when the data-generating model for the IAD coincided with the FRD used in the WTD estimation (Log-Normal). When the IAD consisted of a mixture of two Log-Normal distributions, but was analyzed with a single Log-Normal distribution, relative bias did not exceed 9%. Using a Log-Normal FRD, we estimated prescription durations of 117, 91, 137, and 118 days for NSAIDs, warfarin, bendroflumethiazide, and levothyroxine, respectively. Similar results were found with a Weibull FRD. The algorithm allows valid estimation of single prescription durations, especially when the WTD reliably separates current users from incident users, and may replace ad-hoc decision rules in automated implementations. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Technical note: An improved approach to determining background aerosol concentrations with PILS sampling on aircraft

NASA Astrophysics Data System (ADS)

Fukami, Christine S.; Sullivan, Amy P.; Ryan Fulgham, S.; Murschell, Trey; Borch, Thomas; Smith, James N.; Farmer, Delphine K.

2016-07-01

Particle-into-Liquid Samplers (PILS) have become a standard aerosol collection technique, and are widely used in both ground and aircraft measurements in conjunction with off-line ion chromatography (IC) measurements. Accurate and precise background samples are essential to account for gas-phase components not efficiently removed and any interference in the instrument lines, collection vials or off-line analysis procedures. For aircraft sampling with PILS, backgrounds are typically taken with in-line filters to remove particles prior to sample collection once or twice per flight with more numerous backgrounds taken on the ground. Here, we use data collected during the Front Range Air Pollution and Photochemistry Éxperiment (FRAPPÉ) to demonstrate that not only are multiple background filter samples are essential to attain a representative background, but that the chemical background signals do not follow the Gaussian statistics typically assumed. Instead, the background signals for all chemical components analyzed from 137 background samples (taken from ∼78 total sampling hours over 18 flights) follow a log-normal distribution, meaning that the typical approaches of averaging background samples and/or assuming a Gaussian distribution cause an over-estimation of background samples - and thus an underestimation of sample concentrations. Our approach of deriving backgrounds from the peak of the log-normal distribution results in detection limits of 0.25, 0.32, 3.9, 0.17, 0.75 and 0.57 μg m-3 for sub-micron aerosol nitrate (NO3-), nitrite (NO2-), ammonium (NH4+), sulfate (SO42-), potassium (K+) and calcium (Ca2+), respectively. The difference in backgrounds calculated from assuming a Gaussian distribution versus a log-normal distribution were most extreme for NH4+, resulting in a background that was 1.58× that determined from fitting a log-normal distribution.
Assessing differential expression in two-color microarrays: a resampling-based empirical Bayes approach.

PubMed

Li, Dongmei; Le Pape, Marc A; Parikh, Nisha I; Chen, Will X; Dye, Timothy D

2013-01-01

Microarrays are widely used for examining differential gene expression, identifying single nucleotide polymorphisms, and detecting methylation loci. Multiple testing methods in microarray data analysis aim at controlling both Type I and Type II error rates; however, real microarray data do not always fit their distribution assumptions. Smyth's ubiquitous parametric method, for example, inadequately accommodates violations of normality assumptions, resulting in inflated Type I error rates. The Significance Analysis of Microarrays, another widely used microarray data analysis method, is based on a permutation test and is robust to non-normally distributed data; however, the Significance Analysis of Microarrays method fold change criteria are problematic, and can critically alter the conclusion of a study, as a result of compositional changes of the control data set in the analysis. We propose a novel approach, combining resampling with empirical Bayes methods: the Resampling-based empirical Bayes Methods. This approach not only reduces false discovery rates for non-normally distributed microarray data, but it is also impervious to fold change threshold since no control data set selection is needed. Through simulation studies, sensitivities, specificities, total rejections, and false discovery rates are compared across the Smyth's parametric method, the Significance Analysis of Microarrays, and the Resampling-based empirical Bayes Methods. Differences in false discovery rates controls between each approach are illustrated through a preterm delivery methylation study. The results show that the Resampling-based empirical Bayes Methods offer significantly higher specificity and lower false discovery rates compared to Smyth's parametric method when data are not normally distributed. The Resampling-based empirical Bayes Methods also offers higher statistical power than the Significance Analysis of Microarrays method when the proportion of significantly differentially expressed genes is large for both normally and non-normally distributed data. Finally, the Resampling-based empirical Bayes Methods are generalizable to next generation sequencing RNA-seq data analysis.
Incorporating big data into treatment plan evaluation: Development of statistical DVH metrics and visualization dashboards.

PubMed

Mayo, Charles S; Yao, John; Eisbruch, Avraham; Balter, James M; Litzenberg, Dale W; Matuszak, Martha M; Kessler, Marc L; Weyburn, Grant; Anderson, Carlos J; Owen, Dawn; Jackson, William C; Haken, Randall Ten

2017-01-01

To develop statistical dose-volume histogram (DVH)-based metrics and a visualization method to quantify the comparison of treatment plans with historical experience and among different institutions. The descriptive statistical summary (ie, median, first and third quartiles, and 95% confidence intervals) of volume-normalized DVH curve sets of past experiences was visualized through the creation of statistical DVH plots. Detailed distribution parameters were calculated and stored in JavaScript Object Notation files to facilitate management, including transfer and potential multi-institutional comparisons. In the treatment plan evaluation, structure DVH curves were scored against computed statistical DVHs and weighted experience scores (WESs). Individual, clinically used, DVH-based metrics were integrated into a generalized evaluation metric (GEM) as a priority-weighted sum of normalized incomplete gamma functions. Historical treatment plans for 351 patients with head and neck cancer, 104 with prostate cancer who were treated with conventional fractionation, and 94 with liver cancer who were treated with stereotactic body radiation therapy were analyzed to demonstrate the usage of statistical DVH, WES, and GEM in a plan evaluation. A shareable dashboard plugin was created to display statistical DVHs and integrate GEM and WES scores into a clinical plan evaluation within the treatment planning system. Benchmarking with normal tissue complication probability scores was carried out to compare the behavior of GEM and WES scores. DVH curves from historical treatment plans were characterized and presented, with difficult-to-spare structures (ie, frequently compromised organs at risk) identified. Quantitative evaluations by GEM and/or WES compared favorably with the normal tissue complication probability Lyman-Kutcher-Burman model, transforming a set of discrete threshold-priority limits into a continuous model reflecting physician objectives and historical experience. Statistical DVH offers an easy-to-read, detailed, and comprehensive way to visualize the quantitative comparison with historical experiences and among institutions. WES and GEM metrics offer a flexible means of incorporating discrete threshold-prioritizations and historic context into a set of standardized scoring metrics. Together, they provide a practical approach for incorporating big data into clinical practice for treatment plan evaluations.
Using the Bootstrap Method to Evaluate the Critical Range of Misfit for Polytomous Rasch Fit Statistics.

PubMed

Seol, Hyunsoo

2016-06-01

The purpose of this study was to apply the bootstrap procedure to evaluate how the bootstrapped confidence intervals (CIs) for polytomous Rasch fit statistics might differ according to sample sizes and test lengths in comparison with the rule-of-thumb critical value of misfit. A total of 25 simulated data sets were generated to fit the Rasch measurement and then a total of 1,000 replications were conducted to compute the bootstrapped CIs under each of 25 testing conditions. The results showed that rule-of-thumb critical values for assessing the magnitude of misfit were not applicable because the infit and outfit mean square error statistics showed different magnitudes of variability over testing conditions and the standardized fit statistics did not exactly follow the standard normal distribution. Further, they also do not share the same critical range for the item and person misfit. Based on the results of the study, the bootstrapped CIs can be used to identify misfitting items or persons as they offer a reasonable alternative solution, especially when the distributions of the infit and outfit statistics are not well known and depend on sample size. © The Author(s) 2016.
Studies of the Ability to Hold the Eye in Eccentric Gaze: Measurements in Normal Subjects with the Head Erect

NASA Technical Reports Server (NTRS)

Reschke, Millard F.; Somers, Jeffrey T.; Feiveson, Alan H.; Leigh, R. John; Wood, Scott J.; Paloski, William H.; Kornilova, Ludmila

2006-01-01

We studied the ability to hold the eyes in eccentric horizontal or vertical gaze angles in 68 normal humans, age range 19-56. Subjects attempted to sustain visual fixation of a briefly flashed target located 30 in the horizontal plane and 15 in the vertical plane in a dark environment. Conventionally, the ability to hold eccentric gaze is estimated by fitting centripetal eye drifts by exponential curves and calculating the time constant (t(sub c)) of these slow phases of gazeevoked nystagmus. Although the distribution of time-constant measurements (t(sub c)) in our normal subjects was extremely skewed due to occasional test runs that exhibited near-perfect stability (large t(sub c) values), we found that log10(tc) was approximately normally distributed within classes of target direction. Therefore, statistical estimation and inference on the effect of target direction was performed on values of z identical with log10t(sub c). Subjects showed considerable variation in their eyedrift performance over repeated trials; nonetheless, statistically significant differences emerged: values of tc were significantly higher for gaze elicited to targets in the horizontal plane than for the vertical plane (P less than 10(exp -5), suggesting eccentric gazeholding is more stable in the horizontal than in the vertical plane. Furthermore, centrifugal eye drifts were observed in 13.3, 16.0 and 55.6% of cases for horizontal, upgaze and downgaze tests, respectively. Fifth percentile values of the time constant were estimated to be 10.2 sec, 3.3 sec and 3.8 sec for horizontal, upward and downward gaze, respectively. The difference between horizontal and vertical gazeholding may be ascribed to separate components of the velocity position neural integrator for eye movements, and to differences in orbital mechanics. Our statistical method for representing the range of normal eccentric gaze stability can be readily applied in a clinical setting to patients who were exposed to environments that may have modified their central integrators and thus require monitoring. Patients with gaze-evoked nystagmus can be flagged by comparing to the above established normative criteria.
L-statistics for Repeated Measurements Data With Application to Trimmed Means, Quantiles and Tolerance Intervals.

PubMed

Assaad, Houssein I; Choudhary, Pankaj K

2013-01-01

The L -statistics form an important class of estimators in nonparametric statistics. Its members include trimmed means and sample quantiles and functions thereof. This article is devoted to theory and applications of L -statistics for repeated measurements data, wherein the measurements on the same subject are dependent and the measurements from different subjects are independent. This article has three main goals: (a) Show that the L -statistics are asymptotically normal for repeated measurements data. (b) Present three statistical applications of this result, namely, location estimation using trimmed means, quantile estimation and construction of tolerance intervals. (c) Obtain a Bahadur representation for sample quantiles. These results are generalizations of similar results for independently and identically distributed data. The practical usefulness of these results is illustrated by analyzing a real data set involving measurement of systolic blood pressure. The properties of the proposed point and interval estimators are examined via simulation.
NDVI statistical distribution of pasture areas at different times in the Community of Madrid (Spain)

NASA Astrophysics Data System (ADS)

Martín-Sotoca, Juan J.; Saa-Requejo, Antonio; Díaz-Ambrona, Carlos G. H.; Tarquis, Ana M.

2015-04-01

The severity of drought has many implications for society, including its impacts on the water supply, water pollution, reservoir management and ecosystem. However, its impacts on rain-fed agriculture are especially direct. Because of the importance of drought, there have been many attempts to characterize its severity, resulting in the numerous drought indices that have been developed (Niemeyer 2008). 'Biomass index' based on satellite image derived Normalized Difference Vegetation Index (NDVI) has been used in countries like United States of America, Canada and Spain for pasture and forage crops for some years (Rao, 2010). This type of agricultural insurance is named as 'index-based insurance' (IBI). IBI is perceived to be substantially less costly to operate and manage than multiple peril insurance. IBI contracts pay indemnities based not on the actual yield (or revenue) losses experienced by the insurance purchaser but rather based on realized NDVI values (historical data) that is correlated with farm-level losses (Xiaohui Deng et al., 2008). Definition of when drought event occurs is defined on NDVI threshold values mainly based in statistical parameters, average and standard deviation that characterize a normal distribution. In this work a pasture area at the north of Community of Madrid (Spain) has been delimited. Then, NDVI historical data was reconstructed based on remote sensing imaging MODIS, with 500x500m2 resolution. A statistical analysis of the NDVI histograms at consecutives 46 intervals of that area was applied to search for the best statistical distribution based on the maximum likelihood criteria. The results show that the normal distribution is not the optimal representation when IBI is available; the implications in the context of crop insurance are discussed (Martín-Sotoca, 2014). References Kolli N Rao. 2010. Index based Crop Insurance. Agriculture and Agricultural Science Procedia 1, 193-203. Martín-Sotoca, J.J. (2014) Estructura Espacial de la Sequía en Pastos y sus Aplicaciones en el Seguro Agrario. Master Thesis, UPM (In Spanish). Niemeyer, S., 2008: New drought indices. First Int. Conf. on Drought Management: Scientific and Technological Innovations, Zaragoza, Spain, Joint Research Centre of the European Commission. [Available online at http://www.iamz.ciheam.org/medroplan/zaragoza2008/Sequia2008/Session3/S.Niemeyer.pdf.] Xiaohui Deng, Barry J. Barnett, Gerrit Hoogenboom, Yingzhuo Yu and Axel Garcia y Garcia 2008. Alternative Crop Insurance Indexes. Journal of Agricultural and Applied Economics, 40(1), 223-237. Acknowledgements First author acknowledges the Research Grant obtained from CEIGRAM in 2014

A positive correlation between energetic electron butterfly distributions and magnetosonic waves in the radiation belt slot region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yang, Chang; Su, Z.; Xiao, F.

Energetic (hundreds of keV) electrons in the radiation belt slot region have been found to exhibit the butterfly pitch angle distributions. Resonant interactions with magnetosonic and whistler-mode waves are two potential mechanisms for the formation of these peculiar distributions. Here we perform a statistical study of energetic electron pitch angle distribution characteristics measured by Van Allen Probes in the slot region during a three-year period from May 2013 to May 2016. Our results show that electron butterfly distributions are closely related to magnetosonic waves rather than to whistlermode waves. Both electron butterfly distributions and magnetosonic waves occur more frequently atmore » the geomagnetically active times than at the quiet times. In a statistical sense, more distinct butterfly distributions usually correspond to magnetosonic waves with larger amplitudes and vice versa. The averaged magnetosonic wave amplitude is less than 5 pT in the case of normal and flat-top distributions with a butterfly index BI = 1 but reaches ~ 35–95 pT in the case of distinct butterfly distributions with BI > 1:3. For magnetosonic waves with amplitudes > 50 pT, the occurrence rate of butterfly distribution is above 80%. Our study suggests that energetic electron butterfly distributions in the slot region are primarily caused by magnetosonic waves.« less
A positive correlation between energetic electron butterfly distributions and magnetosonic waves in the radiation belt slot region

DOE PAGES

Yang, Chang; Su, Z.; Xiao, F.; ...

2017-05-14

Energetic (hundreds of keV) electrons in the radiation belt slot region have been found to exhibit the butterfly pitch angle distributions. Resonant interactions with magnetosonic and whistler-mode waves are two potential mechanisms for the formation of these peculiar distributions. Here we perform a statistical study of energetic electron pitch angle distribution characteristics measured by Van Allen Probes in the slot region during a three-year period from May 2013 to May 2016. Our results show that electron butterfly distributions are closely related to magnetosonic waves rather than to whistlermode waves. Both electron butterfly distributions and magnetosonic waves occur more frequently atmore » the geomagnetically active times than at the quiet times. In a statistical sense, more distinct butterfly distributions usually correspond to magnetosonic waves with larger amplitudes and vice versa. The averaged magnetosonic wave amplitude is less than 5 pT in the case of normal and flat-top distributions with a butterfly index BI = 1 but reaches ~ 35–95 pT in the case of distinct butterfly distributions with BI > 1:3. For magnetosonic waves with amplitudes > 50 pT, the occurrence rate of butterfly distribution is above 80%. Our study suggests that energetic electron butterfly distributions in the slot region are primarily caused by magnetosonic waves.« less
A modified weighted function method for parameter estimation of Pearson type three distribution

NASA Astrophysics Data System (ADS)

Liang, Zhongmin; Hu, Yiming; Li, Binquan; Yu, Zhongbo

2014-04-01

In this paper, an unconventional method called Modified Weighted Function (MWF) is presented for the conventional moment estimation of a probability distribution function. The aim of MWF is to estimate the coefficient of variation (CV) and coefficient of skewness (CS) from the original higher moment computations to the first-order moment calculations. The estimators for CV and CS of Pearson type three distribution function (PE3) were derived by weighting the moments of the distribution with two weight functions, which were constructed by combining two negative exponential-type functions. The selection of these weight functions was based on two considerations: (1) to relate weight functions to sample size in order to reflect the relationship between the quantity of sample information and the role of weight function and (2) to allocate more weights to data close to medium-tail positions in a sample series ranked in an ascending order. A Monte-Carlo experiment was conducted to simulate a large number of samples upon which statistical properties of MWF were investigated. For the PE3 parent distribution, results of MWF were compared to those of the original Weighted Function (WF) and Linear Moments (L-M). The results indicate that MWF was superior to WF and slightly better than L-M, in terms of statistical unbiasness and effectiveness. In addition, the robustness of MWF, WF, and L-M were compared by designing the Monte-Carlo experiment that samples are obtained from Log-Pearson type three distribution (LPE3), three parameter Log-Normal distribution (LN3), and Generalized Extreme Value distribution (GEV), respectively, but all used as samples from the PE3 distribution. The results show that in terms of statistical unbiasness, no one method possesses the absolutely overwhelming advantage among MWF, WF, and L-M, while in terms of statistical effectiveness, the MWF is superior to WF and L-M.
Statistics analysis of distribution of Bradysia Ocellaris insect on Oyster mushroom cultivation

NASA Astrophysics Data System (ADS)

Sari, Kurnia Novita; Amelia, Ririn

2015-12-01

Bradysia Ocellaris insect is a pest on Oyster mushroom cultivation. The disitribution of Bradysia Ocellaris have a special pattern that can observed every week with several asumption such as independent, normality and homogenity. We can analyze the number of Bradysia Ocellaris for each week through descriptive analysis. Next, the distribution pattern of Bradysia Ocellaris is described through by semivariogram that is diagram of variance from difference value between pair of observation that separeted by d. Semivariogram model that suitable for Bradysia Ocellaris data is spherical isotropic model.
Bayesian analysis of the kinetics of quantal transmitter secretion at the neuromuscular junction.

PubMed

Saveliev, Anatoly; Khuzakhmetova, Venera; Samigullin, Dmitry; Skorinkin, Andrey; Kovyazina, Irina; Nikolsky, Eugeny; Bukharaeva, Ellya

2015-10-01

The timing of transmitter release from nerve endings is considered nowadays as one of the factors determining the plasticity and efficacy of synaptic transmission. In the neuromuscular junction, the moments of release of individual acetylcholine quanta are related to the synaptic delays of uniquantal endplate currents recorded under conditions of lowered extracellular calcium. Using Bayesian modelling, we performed a statistical analysis of synaptic delays in mouse neuromuscular junction with different patterns of rhythmic nerve stimulation and when the entry of calcium ions into the nerve terminal was modified. We have obtained a statistical model of the release timing which is represented as the summation of two independent statistical distributions. The first of these is the exponentially modified Gaussian distribution. The mixture of normal and exponential components in this distribution can be interpreted as a two-stage mechanism of early and late periods of phasic synchronous secretion. The parameters of this distribution depend on both the stimulation frequency of the motor nerve and the calcium ions' entry conditions. The second distribution was modelled as quasi-uniform, with parameters independent of nerve stimulation frequency and calcium entry. Two different probability density functions for the distribution of synaptic delays suggest at least two independent processes controlling the time course of secretion, one of them potentially involving two stages. The relative contribution of these processes to the total number of mediator quanta released depends differently on the motor nerve stimulation pattern and on calcium ion entry into nerve endings.
Rocks: A Concrete Activity That Introduces Normal Distribution, Sampling Error, Central Limit Theorem and True Score Theory

ERIC Educational Resources Information Center

Van Duzer, Eric

2011-01-01

This report introduces a short, hands-on activity that addresses a key challenge in teaching quantitative methods to students who lack confidence or experience with statistical analysis. Used near the beginning of the course, this activity helps students develop an intuitive insight regarding a number of abstract concepts which are key to…
Strange Data: When the Numbers Just Aren't Normal.

PubMed

Jupiter, Daniel C

2015-01-01

Many statistical tests assume that the populations from which we draw our data samples roughly follow a given probability distribution. Here, I review what these assumptions mean, why they are important, and how to deal with situations where the assumptions are not met. Copyright © 2015 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Non-operative management (NOM) of blunt hepatic trauma: 80 cases.

PubMed

Özoğul, Bünyami; Kısaoğlu, Abdullah; Aydınlı, Bülent; Öztürk, Gürkan; Bayramoğlu, Atıf; Sarıtemur, Murat; Aköz, Ayhan; Bulut, Özgür Hakan; Atamanalp, Sabri Selçuk

2014-03-01

Liver is the most frequently injured organ upon abdominal trauma. We present a group of patients with blunt hepatic trauma who were managed without any invasive diagnostic tools and/or surgical intervention. A total of 80 patients with blunt liver injury who were hospitalized to the general surgery clinic or other clinics due to the concomitant injuries were followed non-operatively. The normally distributed numeric variables were evaluated by Student's t-test or one way analysis of variance, while non-normally distributed variables were analyzed by Mann-Whitney U-test or Kruskal-Wallis variance analysis. Chi-square test was also employed for the comparison of categorical variables. Statistical significance was assumed for p<0.05. There was no significant relationship between patients' Hgb level and liver injury grade, outcome, and mechanism of injury. Also, there was no statistical relationship between liver injury grade, outcome, and mechanism of injury and ALT levels as well as AST level. There was no mortality in any of the patients. During the last quarter of century, changes in the diagnosis and treatment of liver injury were associated with increased survival. NOM of liver injury in patients with stable hemodynamics and hepatic trauma seems to be the gold standard.
Slice simulation from a model of the parenchymous vascularization to evaluate texture features: work in progress.

PubMed

Rolland, Y; Bézy-Wendling, J; Duvauferrier, R; Coatrieux, J L

1999-03-01

To demonstrate the usefulness of a model of the parenchymous vascularization to evaluate texture analysis methods. Slices with thickness varying from 1 to 4 mm were reformatted from a 3D vascular model corresponding to either normal tissue perfusion or local hypervascularization. Parameters of statistical methods were measured on 16128x128 regions of interest, and mean values and standard deviation were calculated. For each parameter, the performances (discrimination power and stability) were evaluated. Among 11 calculated statistical parameters, three (homogeneity, entropy, mean of gradients) were found to have a good discriminating power to differentiate normal perfusion from hypervascularization, but only the gradient mean was found to have a good stability with respect to the thickness. Five parameters (run percentage, run length distribution, long run emphasis, contrast, and gray level distribution) were found to have intermediate results. In the remaining three, curtosis and correlation was found to have little discrimination power, skewness none. This 3D vascular model, which allows the generation of various examples of vascular textures, is a powerful tool to assess the performance of texture analysis methods. This improves our knowledge of the methods and should contribute to their a priori choice when designing clinical studies.
Influence of weight and body fat distribution on bone density in postmenopausal women.

PubMed

Murillo-Uribe, A; Carranza-Lira, S; Martínez-Trejo, N; Santos-González, J

2000-01-01

To determine whether obesity or body fat distribution induces a greater modification on bone remodeling biochemistry (BRB) and bone density in postmenopausal women. One hundred and thirteen postmenopausal patients were studied. They were initially divided according to body mass index (BMI), and afterwards by waist-hip ratio (WHR) as well as combinations of the two factors. Hormone measurements and assessments of BRB were also done. Dual-emission X-ray absorptiometry from the lumbar column and hip was performed with Lunar DPXL equipment, and the standard deviation in relation to young adult (T) and age-matched subjects (Z) was calculated. Statistical analysis was done by the Mann-Whitney U test. The relation of BMI and WHR with the variables was calculated by simple regression analysis. When divided according to BMI, there was greater bone density in the femoral neck in those with normal weight. After dividing according to WHR, the Z scores had a trend to a lesser decrease in those with upper level body fat distribution. Divided according to BMI and WHR, obese patients with upper-level body fat distribution had greater bone density in the lumbar column than those with normal weight and lower-level body fat distribution. With the same WHR, those with normal weight had greater bone density than those who were obese. A beneficial effect of upper-level body fat distribution on bone density was found. It is greater than that from obesity alone, and obesity and upper-level body fat distribution have an additive effect on bone density.
A model for the statistical description of analytical errors occurring in clinical chemical laboratories with time.

PubMed

Hyvärinen, A

1985-01-01

The main purpose of the present study was to describe the statistical behaviour of daily analytical errors in the dimensions of place and time, providing a statistical basis for realistic estimates of the analytical error, and hence allowing the importance of the error and the relative contributions of its different sources to be re-evaluated. The observation material consists of creatinine and glucose results for control sera measured in daily routine quality control in five laboratories for a period of one year. The observation data were processed and computed by means of an automated data processing system. Graphic representations of time series of daily observations, as well as their means and dispersion limits when grouped over various time intervals, were investigated. For partition of the total variation several two-way analyses of variance were done with laboratory and various time classifications as factors. Pooled sets of observations were tested for normality of distribution and for consistency of variances, and the distribution characteristics of error variation in different categories of place and time were compared. Errors were found from the time series to vary typically between days. Due to irregular fluctuations in general and particular seasonal effects in creatinine, stable estimates of means or of dispersions for errors in individual laboratories could not be easily obtained over short periods of time but only from data sets pooled over long intervals (preferably at least one year). Pooled estimates of proportions of intralaboratory variation were relatively low (less than 33%) when the variation was pooled within days. However, when the variation was pooled over longer intervals this proportion increased considerably, even to a maximum of 89-98% (95-98% in each method category) when an outlying laboratory in glucose was omitted, with a concomitant decrease in the interaction component (representing laboratory-dependent variation with time). This indicates that a substantial part of the variation comes from intralaboratory variation with time rather than from constant interlaboratory differences. Normality and consistency of statistical distributions were best achieved in the long-term intralaboratory sets of the data, under which conditions the statistical estimates of error variability were also most characteristic of the individual laboratories rather than necessarily being similar to one another. Mixing of data from different laboratories may give heterogeneous and nonparametric distributions and hence is not advisable.(ABSTRACT TRUNCATED AT 400 WORDS)
Estimating the extreme low-temperature event using nonparametric methods

NASA Astrophysics Data System (ADS)

D'Silva, Anisha

This thesis presents a new method of estimating the one-in-N low temperature threshold using a non-parametric statistical method called kernel density estimation applied to daily average wind-adjusted temperatures. We apply our One-in-N Algorithm to local gas distribution companies (LDCs), as they have to forecast the daily natural gas needs of their consumers. In winter, demand for natural gas is high. Extreme low temperature events are not directly related to an LDCs gas demand forecasting, but knowledge of extreme low temperatures is important to ensure that an LDC has enough capacity to meet customer demands when extreme low temperatures are experienced. We present a detailed explanation of our One-in-N Algorithm and compare it to the methods using the generalized extreme value distribution, the normal distribution, and the variance-weighted composite distribution. We show that our One-in-N Algorithm estimates the one-in- N low temperature threshold more accurately than the methods using the generalized extreme value distribution, the normal distribution, and the variance-weighted composite distribution according to root mean square error (RMSE) measure at a 5% level of significance. The One-in- N Algorithm is tested by counting the number of times the daily average wind-adjusted temperature is less than or equal to the one-in- N low temperature threshold.
Statistical Data Editing in Scientific Articles.

PubMed

Habibzadeh, Farrokh

2017-07-01

Scientific journals are important scholarly forums for sharing research findings. Editors have important roles in safeguarding standards of scientific publication and should be familiar with correct presentation of results, among other core competencies. Editors do not have access to the raw data and should thus rely on clues in the submitted manuscripts. To identify probable errors, they should look for inconsistencies in presented results. Common statistical problems that can be picked up by a knowledgeable manuscript editor are discussed in this article. Manuscripts should contain a detailed section on statistical analyses of the data. Numbers should be reported with appropriate precisions. Standard error of the mean (SEM) should not be reported as an index of data dispersion. Mean (standard deviation [SD]) and median (interquartile range [IQR]) should be used for description of normally and non-normally distributed data, respectively. If possible, it is better to report 95% confidence interval (CI) for statistics, at least for main outcome variables. And, P values should be presented, and interpreted with caution, if there is a hypothesis. To advance knowledge and skills of their members, associations of journal editors are better to develop training courses on basic statistics and research methodology for non-experts. This would in turn improve research reporting and safeguard the body of scientific evidence. © 2017 The Korean Academy of Medical Sciences.
Statistical characteristics of mechanical heart valve cavitation in accelerated testing.

PubMed

Wu, Changfu; Hwang, Ned H C; Lin, Yu-Kweng M

2004-07-01

Cavitation damage has been observed on mechanical heart valves (MHVs) undergoing accelerated testing. Cavitation itself can be modeled as a stochastic process, as it varies from beat to beat of the testing machine. This in-vitro study was undertaken to investigate the statistical characteristics of MHV cavitation. A 25-mm St. Jude Medical bileaflet MHV (SJM 25) was tested in an accelerated tester at various pulse rates, ranging from 300 to 1,000 bpm, with stepwise increments of 100 bpm. A miniature pressure transducer was placed near a leaflet tip on the inflow side of the valve, to monitor regional transient pressure fluctuations at instants of valve closure. The pressure trace associated with each beat was passed through a 70 kHz high-pass digital filter to extract the high-frequency oscillation (HFO) components resulting from the collapse of cavitation bubbles. Three intensity-related measures were calculated for each HFO burst: its time span; its local root-mean-square (LRMS) value; and the area enveloped by the absolute value of the HFO pressure trace and the time axis, referred to as cavitation impulse. These were treated as stochastic processes, of which the first-order probability density functions (PDFs) were estimated for each test rate. Both the LRMS value and cavitation impulse were log-normal distributed, and the time span was normal distributed. These distribution laws were consistent at different test rates. The present investigation was directed at understanding MHV cavitation as a stochastic process. The results provide a basis for establishing further the statistical relationship between cavitation intensity and time-evolving cavitation damage on MHV surfaces. These data are required to assess and compare the performance of MHVs of different designs.
Using the Quantile Mapping to improve a weather generator

NASA Astrophysics Data System (ADS)

Chen, Y.; Themessl, M.; Gobiet, A.

2012-04-01

We developed a weather generator (WG) by using statistical and stochastic methods, among them are quantile mapping (QM), Monte-Carlo, auto-regression, empirical orthogonal function (EOF). One of the important steps in the WG is using QM, through which all the variables, no matter what distribution they originally are, are transformed into normal distributed variables. Therefore, the WG can work on normally distributed variables, which greatly facilitates the treatment of random numbers in the WG. Monte-Carlo and auto-regression are used to generate the realization; EOFs are employed for preserving spatial relationships and the relationships between different meteorological variables. We have established a complete model named WGQM (weather generator and quantile mapping), which can be applied flexibly to generate daily or hourly time series. For example, with 30-year daily (hourly) data and 100-year monthly (daily) data as input, the 100-year daily (hourly) data would be relatively reasonably produced. Some evaluation experiments with WGQM have been carried out in the area of Austria and the evaluation results will be presented.
A mechanism producing power law etc. distributions

NASA Astrophysics Data System (ADS)

Li, Heling; Shen, Hongjun; Yang, Bin

2017-07-01

Power law distribution is playing an increasingly important role in the complex system study. Based on the insolvability of complex systems, the idea of incomplete statistics is utilized and expanded, three different exponential factors are introduced in equations about the normalization condition, statistical average and Shannon entropy, with probability distribution function deduced about exponential function, power function and the product form between power function and exponential function derived from Shannon entropy and maximal entropy principle. So it is shown that maximum entropy principle can totally replace equal probability hypothesis. Owing to the fact that power and probability distribution in the product form between power function and exponential function, which cannot be derived via equal probability hypothesis, can be derived by the aid of maximal entropy principle, it also can be concluded that maximal entropy principle is a basic principle which embodies concepts more extensively and reveals basic principles on motion laws of objects more fundamentally. At the same time, this principle also reveals the intrinsic link between Nature and different objects in human society and principles complied by all.
Maximum-likelihood fitting of data dominated by Poisson statistical uncertainties

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stoneking, M.R.; Den Hartog, D.J.

1996-06-01

The fitting of data by {chi}{sup 2}-minimization is valid only when the uncertainties in the data are normally distributed. When analyzing spectroscopic or particle counting data at very low signal level (e.g., a Thomson scattering diagnostic), the uncertainties are distributed with a Poisson distribution. The authors have developed a maximum-likelihood method for fitting data that correctly treats the Poisson statistical character of the uncertainties. This method maximizes the total probability that the observed data are drawn from the assumed fit function using the Poisson probability function to determine the probability for each data point. The algorithm also returns uncertainty estimatesmore » for the fit parameters. They compare this method with a {chi}{sup 2}-minimization routine applied to both simulated and real data. Differences in the returned fits are greater at low signal level (less than {approximately}20 counts per measurement). the maximum-likelihood method is found to be more accurate and robust, returning a narrower distribution of values for the fit parameters with fewer outliers.« less
Use of density equalizing map projections (DEMP) in the analysis of childhood cancer in four California counties

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merrill, D.W.; Selvin, S.; Close, E.R.

In studying geographic disease distributions, one normally compares rates of arbitrarily defined geographic subareas (e.g. census tracts), thereby sacrificing the geographic detail of the original data. The sparser the data, the larger the subareas must be in order to calculate stable rates. This dilemma is avoided with the technique of Density Equalizing Map Projections (DEMP). Boundaries of geographic subregions are adjusted to equalize population density over the entire study area. Case locations plotted on the transformed map should have a uniform distribution if the underlying disease-rates are constant. On the transformed map, the statistical analysis of the observed distribution ismore » greatly simplified. Even for sparse distributions, the statistical significance of a supposed disease cluster can be reliably calculated. The present report describes the first successful application of the DEMP technique to a sizeable ``real-world`` data set of epidemiologic interest. An improved DEMP algorithm [GUSE93, CLOS94] was applied to a data set previously analyzed with conventional techniques [SATA90, REYN91]. The results from the DEMP analysis and a conventional analysis are compared.« less
Regression-assisted deconvolution.

PubMed

McIntyre, Julie; Stefanski, Leonard A

2011-06-30

We present a semi-parametric deconvolution estimator for the density function of a random variable biX that is measured with error, a common challenge in many epidemiological studies. Traditional deconvolution estimators rely only on assumptions about the distribution of X and the error in its measurement, and ignore information available in auxiliary variables. Our method assumes the availability of a covariate vector statistically related to X by a mean-variance function regression model, where regression errors are normally distributed and independent of the measurement errors. Simulations suggest that the estimator achieves a much lower integrated squared error than the observed-data kernel density estimator when models are correctly specified and the assumption of normal regression errors is met. We illustrate the method using anthropometric measurements of newborns to estimate the density function of newborn length. Copyright © 2011 John Wiley & Sons, Ltd.
The probability distribution model of air pollution index and its dominants in Kuala Lumpur

NASA Astrophysics Data System (ADS)

AL-Dhurafi, Nasr Ahmed; Razali, Ahmad Mahir; Masseran, Nurulkamal; Zamzuri, Zamira Hasanah

2016-11-01

This paper focuses on the statistical modeling for the distributions of air pollution index (API) and its sub-indexes data observed at Kuala Lumpur in Malaysia. Five pollutants or sub-indexes are measured including, carbon monoxide (CO); sulphur dioxide (SO2); nitrogen dioxide (NO2), and; particulate matter (PM10). Four probability distributions are considered, namely log-normal, exponential, Gamma and Weibull in search for the best fit distribution to the Malaysian air pollutants data. In order to determine the best distribution for describing the air pollutants data, five goodness-of-fit criteria's are applied. This will help in minimizing the uncertainty in pollution resource estimates and improving the assessment phase of planning. The conflict in criterion results for selecting the best distribution was overcome by using the weight of ranks method. We found that the Gamma distribution is the best distribution for the majority of air pollutants data in Kuala Lumpur.

Diameter distribution in a Brazilian tropical dry forest domain: predictions for the stand and species.

PubMed

Lima, Robson B DE; Bufalino, Lina; Alves, Francisco T; Silva, José A A DA; Ferreira, Rinaldo L C

2017-01-01

Currently, there is a lack of studies on the correct utilization of continuous distributions for dry tropical forests. Therefore, this work aims to investigate the diameter structure of a brazilian tropical dry forest and to select suitable continuous distributions by means of statistic tools for the stand and the main species. Two subsets were randomly selected from 40 plots. Diameter at base height was obtained. The following functions were tested: log-normal; gamma; Weibull 2P and Burr. The best fits were selected by Akaike's information validation criterion. Overall, the diameter distribution of the dry tropical forest was better described by negative exponential curves and positive skewness. The forest studied showed diameter distributions with decreasing probability for larger trees. This behavior was observed for both the main species and the stand. The generalization of the function fitted for the main species show that the development of individual models is needed. The Burr function showed good flexibility to describe the diameter structure of the stand and the behavior of Mimosa ophthalmocentra and Bauhinia cheilantha species. For Poincianella bracteosa, Aspidosperma pyrifolium and Myracrodum urundeuva better fitting was obtained with the log-normal function.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

PubMed

Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

2014-01-13

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Statistical Characteristics of the Gaussian-Noise Spikes Exceeding the Specified Threshold as Applied to Discharges in a Thundercloud

NASA Astrophysics Data System (ADS)

Klimenko, V. V.

2017-12-01

We obtain expressions for the probabilities of the normal-noise spikes with the Gaussian correlation function and for the probability density of the inter-spike intervals. As distinct from the delta-correlated noise, in which the intervals are distributed by the exponential law, the probability of the subsequent spike depends on the previous spike and the interval-distribution law deviates from the exponential one for a finite noise-correlation time (frequency-bandwidth restriction). This deviation is the most pronounced for a low detection threshold. Similarity of the behaviors of the distributions of the inter-discharge intervals in a thundercloud and the noise spikes for the varying repetition rate of the discharges/spikes, which is determined by the ratio of the detection threshold to the root-mean-square value of noise, is observed. The results of this work can be useful for the quantitative description of the statistical characteristics of the noise spikes and studying the role of fluctuations for the discharge emergence in a thundercloud.
Principle of maximum entropy for reliability analysis in the design of machine components

NASA Astrophysics Data System (ADS)

Zhang, Yimin

2018-03-01

We studied the reliability of machine components with parameters that follow an arbitrary statistical distribution using the principle of maximum entropy (PME). We used PME to select the statistical distribution that best fits the available information. We also established a probability density function (PDF) and a failure probability model for the parameters of mechanical components using the concept of entropy and the PME. We obtained the first four moments of the state function for reliability analysis and design. Furthermore, we attained an estimate of the PDF with the fewest human bias factors using the PME. This function was used to calculate the reliability of the machine components, including a connecting rod, a vehicle half-shaft, a front axle, a rear axle housing, and a leaf spring, which have parameters that typically follow a non-normal distribution. Simulations were conducted for comparison. This study provides a design methodology for the reliability of mechanical components for practical engineering projects.
Asymptotic approximations to posterior distributions via conditional moment equations

USGS Publications Warehouse

Yee, J.L.; Johnson, W.O.; Samaniego, F.J.

2002-01-01

We consider asymptotic approximations to joint posterior distributions in situations where the full conditional distributions referred to in Gibbs sampling are asymptotically normal. Our development focuses on problems where data augmentation facilitates simpler calculations, but results hold more generally. Asymptotic mean vectors are obtained as simultaneous solutions to fixed point equations that arise naturally in the development. Asymptotic covariance matrices flow naturally from the work of Arnold & Press (1989) and involve the conditional asymptotic covariance matrices and first derivative matrices for conditional mean functions. When the fixed point equations admit an analytical solution, explicit formulae are subsequently obtained for the covariance structure of the joint limiting distribution, which may shed light on the use of the given statistical model. Two illustrations are given. ?? 2002 Biometrika Trust.
On the scaling of the distribution of daily price fluctuations in the Mexican financial market index

NASA Astrophysics Data System (ADS)

Alfonso, Léster; Mansilla, Ricardo; Terrero-Escalante, César A.

2012-05-01

In this paper, a statistical analysis of log-return fluctuations of the IPC, the Mexican Stock Market Index is presented. A sample of daily data covering the period from 04/09/2000-04/09/2010 was analyzed, and fitted to different distributions. Tests of the goodness of fit were performed in order to quantitatively asses the quality of the estimation. Special attention was paid to the impact of the size of the sample on the estimated decay of the distributions tail. In this study a forceful rejection of normality was obtained. On the other hand, the null hypothesis that the log-fluctuations are fitted to a α-stable Lévy distribution cannot be rejected at the 5% significance level.
Statistical analysis of experimental data for mathematical modeling of physical processes in the atmosphere

NASA Astrophysics Data System (ADS)

Karpushin, P. A.; Popov, Yu B.; Popova, A. I.; Popova, K. Yu; Krasnenko, N. P.; Lavrinenko, A. V.

2017-11-01

In this paper, the probabilities of faultless operation of aerologic stations are analyzed, the hypothesis of normality of the empirical data required for using the Kalman filter algorithms is tested, and the spatial correlation functions of distributions of meteorological parameters are determined. The results of a statistical analysis of two-term (0, 12 GMT) radiosonde observations of the temperature and wind velocity components at some preset altitude ranges in the troposphere in 2001-2016 are presented. These data can be used in mathematical modeling of physical processes in the atmosphere.
Sensitivity of airborne fluorosensor measurements to linear vertical gradients in chlorophyll concentration

NASA Technical Reports Server (NTRS)

Venable, D. D.; Punjabi, A. R.; Poole, L. R.

1984-01-01

A semianalytic Monte Carlo radiative transfer simulation model for airborne laser fluorosensors has been extended to investigate the effects of inhomogeneities in the vertical distribution of phytoplankton concentrations in clear seawater. Simulation results for linearly varying step concentrations of chlorophyll are presented. The results indicate that statistically significant differences can be seen under certain conditions in the water Raman-normalized fluorescence signals between nonhomogeneous and homogeneous cases. A statistical test has been used to establish ranges of surface concentrations and/or verticl gradients in which calibration by surface samples would by inappropriate, and the results are discussed.
Central Limit Theorems for Linear Statistics of Heavy Tailed Random Matrices

NASA Astrophysics Data System (ADS)

Benaych-Georges, Florent; Guionnet, Alice; Male, Camille

2014-07-01

We show central limit theorems (CLT) for the linear statistics of symmetric matrices with independent heavy tailed entries, including entries in the domain of attraction of α-stable laws and entries with moments exploding with the dimension, as in the adjacency matrices of Erdös-Rényi graphs. For the second model, we also prove a central limit theorem of the moments of its empirical eigenvalues distribution. The limit laws are Gaussian, but unlike the case of standard Wigner matrices, the normalization is the one of the classical CLT for independent random variables.
A detailed heterogeneous agent model for a single asset financial market with trading via an order book.

PubMed

Mota Navarro, Roberto; Larralde, Hernán

2017-01-01

We present an agent based model of a single asset financial market that is capable of replicating most of the non-trivial statistical properties observed in real financial markets, generically referred to as stylized facts. In our model agents employ strategies inspired on those used in real markets, and a realistic trade mechanism based on a double auction order book. We study the role of the distinct types of trader on the return statistics: specifically, correlation properties (or lack thereof), volatility clustering, heavy tails, and the degree to which the distribution can be described by a log-normal. Further, by introducing the practice of "profit taking", our model is also capable of replicating the stylized fact related to an asymmetry in the distribution of losses and gains.
A detailed heterogeneous agent model for a single asset financial market with trading via an order book

PubMed Central

2017-01-01

We present an agent based model of a single asset financial market that is capable of replicating most of the non-trivial statistical properties observed in real financial markets, generically referred to as stylized facts. In our model agents employ strategies inspired on those used in real markets, and a realistic trade mechanism based on a double auction order book. We study the role of the distinct types of trader on the return statistics: specifically, correlation properties (or lack thereof), volatility clustering, heavy tails, and the degree to which the distribution can be described by a log-normal. Further, by introducing the practice of “profit taking”, our model is also capable of replicating the stylized fact related to an asymmetry in the distribution of losses and gains. PMID:28245251
Basic biostatistics for post-graduate students

PubMed Central

Dakhale, Ganesh N.; Hiware, Sachin K.; Shinde, Abhijit T.; Mahatme, Mohini S.

2012-01-01

Statistical methods are important to draw valid conclusions from the obtained data. This article provides background information related to fundamental methods and techniques in biostatistics for the use of postgraduate students. Main focus is given to types of data, measurement of central variations and basic tests, which are useful for analysis of different types of observations. Few parameters like normal distribution, calculation of sample size, level of significance, null hypothesis, indices of variability, and different test are explained in detail by giving suitable examples. Using these guidelines, we are confident enough that postgraduate students will be able to classify distribution of data along with application of proper test. Information is also given regarding various free software programs and websites useful for calculations of statistics. Thus, postgraduate students will be benefitted in both ways whether they opt for academics or for industry. PMID:23087501
Disjunctive Normal Shape and Appearance Priors with Applications to Image Segmentation.

PubMed

Mesadi, Fitsum; Cetin, Mujdat; Tasdizen, Tolga

2015-10-01

The use of appearance and shape priors in image segmentation is known to improve accuracy; however, existing techniques have several drawbacks. Active shape and appearance models require landmark points and assume unimodal shape and appearance distributions. Level set based shape priors are limited to global shape similarity. In this paper, we present a novel shape and appearance priors for image segmentation based on an implicit parametric shape representation called disjunctive normal shape model (DNSM). DNSM is formed by disjunction of conjunctions of half-spaces defined by discriminants. We learn shape and appearance statistics at varying spatial scales using nonparametric density estimation. Our method can generate a rich set of shape variations by locally combining training shapes. Additionally, by studying the intensity and texture statistics around each discriminant of our shape model, we construct a local appearance probability map. Experiments carried out on both medical and natural image datasets show the potential of the proposed method.
An Extension of the Chi-Square Procedure for Non-NORMAL Statistics, with Application to Solar Neutrino Data

NASA Astrophysics Data System (ADS)

Sturrock, P. A.

2008-01-01

Using the chi-square statistic, one may conveniently test whether a series of measurements of a variable are consistent with a constant value. However, that test is predicated on the assumption that the appropriate probability distribution function (pdf) is normal in form. This requirement is usually not satisfied by experimental measurements of the solar neutrino flux. This article presents an extension of the chi-square procedure that is valid for any form of the pdf. This procedure is applied to the GALLEX-GNO dataset, and it is shown that the results are in good agreement with the results of Monte Carlo simulations. Whereas application of the standard chi-square test to symmetrized data yields evidence significant at the 1% level for variability of the solar neutrino flux, application of the extended chi-square test to the unsymmetrized data yields only weak evidence (significant at the 4% level) of variability.
When the Single Matters more than the Group (II): Addressing the Problem of High False Positive Rates in Single Case Voxel Based Morphometry Using Non-parametric Statistics.

PubMed

Scarpazza, Cristina; Nichols, Thomas E; Seramondi, Donato; Maumet, Camille; Sartori, Giuseppe; Mechelli, Andrea

2016-01-01

In recent years, an increasing number of studies have used Voxel Based Morphometry (VBM) to compare a single patient with a psychiatric or neurological condition of interest against a group of healthy controls. However, the validity of this approach critically relies on the assumption that the single patient is drawn from a hypothetical population with a normal distribution and variance equal to that of the control group. In a previous investigation, we demonstrated that family-wise false positive error rate (i.e., the proportion of statistical comparisons yielding at least one false positive) in single case VBM are much higher than expected (Scarpazza et al., 2013). Here, we examine whether the use of non-parametric statistics, which does not rely on the assumptions of normal distribution and equal variance, would enable the investigation of single subjects with good control of false positive risk. We empirically estimated false positive rates (FPRs) in single case non-parametric VBM, by performing 400 statistical comparisons between a single disease-free individual and a group of 100 disease-free controls. The impact of smoothing (4, 8, and 12 mm) and type of pre-processing (Modulated, Unmodulated) was also examined, as these factors have been found to influence FPRs in previous investigations using parametric statistics. The 400 statistical comparisons were repeated using two independent, freely available data sets in order to maximize the generalizability of the results. We found that the family-wise error rate was 5% for increases and 3.6% for decreases in one data set; and 5.6% for increases and 6.3% for decreases in the other data set (5% nominal). Further, these results were not dependent on the level of smoothing and modulation. Therefore, the present study provides empirical evidence that single case VBM studies with non-parametric statistics are not susceptible to high false positive rates. The critical implication of this finding is that VBM can be used to characterize neuroanatomical alterations in individual subjects as long as non-parametric statistics are employed.
Modification of Kolmogorov-Smirnov test for DNA content data analysis through distribution alignment.

PubMed

Huang, Shuguang; Yeo, Adeline A; Li, Shuyu Dan

2007-10-01

The Kolmogorov-Smirnov (K-S) test is a statistical method often used for comparing two distributions. In high-throughput screening (HTS) studies, such distributions usually arise from the phenotype of independent cell populations. However, the K-S test has been criticized for being overly sensitive in applications, and it often detects a statistically significant difference that is not biologically meaningful. One major reason is that there is a common phenomenon in HTS studies that systematic drifting exists among the distributions due to reasons such as instrument variation, plate edge effect, accidental difference in sample handling, etc. In particular, in high-content cellular imaging experiments, the location shift could be dramatic since some compounds themselves are fluorescent. This oversensitivity of the K-S test is particularly overpowered in cellular assays where the sample sizes are very big (usually several thousands). In this paper, a modified K-S test is proposed to deal with the nonspecific location-shift problem in HTS studies. Specifically, we propose that the distributions are "normalized" by density curve alignment before the K-S test is conducted. In applications to simulation data and real experimental data, the results show that the proposed method has improved specificity.
A log-sinh transformation for data normalization and variance stabilization

NASA Astrophysics Data System (ADS)

Wang, Q. J.; Shrestha, D. L.; Robertson, D. E.; Pokhrel, P.

2012-05-01

When quantifying model prediction uncertainty, it is statistically convenient to represent model errors that are normally distributed with a constant variance. The Box-Cox transformation is the most widely used technique to normalize data and stabilize variance, but it is not without limitations. In this paper, a log-sinh transformation is derived based on a pattern of errors commonly seen in hydrological model predictions. It is suited to applications where prediction variables are positively skewed and the spread of errors is seen to first increase rapidly, then slowly, and eventually approach a constant as the prediction variable becomes greater. The log-sinh transformation is applied in two case studies, and the results are compared with one- and two-parameter Box-Cox transformations.
Approach for Input Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

NASA Technical Reports Server (NTRS)

Putko, Michele M.; Taylor, Arthur C., III; Newman, Perry A.; Green, Lawrence L.

2002-01-01

An implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for quasi 3-D Euler CFD code is presented. Given uncertainties in statistically independent, random, normally distributed input variables, first- and second-order statistical moment procedures are performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, these moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Statistical analysis of the calibration procedure for personnel radiation measurement instruments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bush, W.J.; Bengston, S.J.; Kalbeitzer, F.L.

1980-11-01

Thermoluminescent analyzer (TLA) calibration procedures were used to estimate personnel radiation exposure levels at the Idaho National Engineering Laboratory (INEL). A statistical analysis is presented herein based on data collected over a six month period in 1979 on four TLA's located in the Department of Energy (DOE) Radiological and Environmental Sciences Laboratory at the INEL. The data were collected according to the day-to-day procedure in effect at that time. Both gamma and beta radiation models are developed. Observed TLA readings of thermoluminescent dosimeters are correlated with known radiation levels. This correlation is then used to predict unknown radiation doses frommore » future analyzer readings of personnel thermoluminescent dosimeters. The statistical techniques applied in this analysis include weighted linear regression, estimation of systematic and random error variances, prediction interval estimation using Scheffe's theory of calibration, the estimation of the ratio of the means of two normal bivariate distributed random variables and their corresponding confidence limits according to Kendall and Stuart, tests of normality, experimental design, a comparison between instruments, and quality control.« less
Simultaneous calibration of ensemble river flow predictions over an entire range of lead times

NASA Astrophysics Data System (ADS)

Hemri, S.; Fundel, F.; Zappa, M.

2013-10-01

Probabilistic estimates of future water levels and river discharge are usually simulated with hydrologic models using ensemble weather forecasts as main inputs. As hydrologic models are imperfect and the meteorological ensembles tend to be biased and underdispersed, the ensemble forecasts for river runoff typically are biased and underdispersed, too. Thus, in order to achieve both reliable and sharp predictions statistical postprocessing is required. In this work Bayesian model averaging (BMA) is applied to statistically postprocess ensemble runoff raw forecasts for a catchment in Switzerland, at lead times ranging from 1 to 240 h. The raw forecasts have been obtained using deterministic and ensemble forcing meteorological models with different forecast lead time ranges. First, BMA is applied based on mixtures of univariate normal distributions, subject to the assumption of independence between distinct lead times. Then, the independence assumption is relaxed in order to estimate multivariate runoff forecasts over the entire range of lead times simultaneously, based on a BMA version that uses multivariate normal distributions. Since river runoff is a highly skewed variable, Box-Cox transformations are applied in order to achieve approximate normality. Both univariate and multivariate BMA approaches are able to generate well calibrated probabilistic forecasts that are considerably sharper than climatological forecasts. Additionally, multivariate BMA provides a promising approach for incorporating temporal dependencies into the postprocessed forecasts. Its major advantage against univariate BMA is an increase in reliability when the forecast system is changing due to model availability.

A Statistical Method for Estimating Missing GHG Emissions in Bottom-Up Inventories: The Case of Fossil Fuel Combustion in Industry in the Bogota Region, Colombia

NASA Astrophysics Data System (ADS)

Jimenez-Pizarro, R.; Rojas, A. M.; Pulido-Guio, A. D.

2012-12-01

The development of environmentally, socially and financially suitable greenhouse gas (GHG) mitigation portfolios requires detailed disaggregation of emissions by activity sector, preferably at the regional level. Bottom-up (BU) emission inventories are intrinsically disaggregated, but although detailed, they are frequently incomplete. Missing and erroneous activity data are rather common in emission inventories of GHG, criteria and toxic pollutants, even in developed countries. The fraction of missing and erroneous data can be rather large in developing country inventories. In addition, the cost and time for obtaining or correcting this information can be prohibitive or can delay the inventory development. This is particularly true for regional BU inventories in the developing world. Moreover, a rather common practice is to disregard or to arbitrarily impute low default activity or emission values to missing data, which typically leads to significant underestimation of the total emissions. Our investigation focuses on GHG emissions by fossil fuel combustion in industry in the Bogota Region, composed by Bogota and its adjacent, semi-rural area of influence, the Province of Cundinamarca. We found that the BU inventories for this sub-category substantially underestimate emissions when compared to top-down (TD) estimations based on sub-sector specific national fuel consumption data and regional energy intensities. Although both BU inventories have a substantial number of missing and evidently erroneous entries, i.e. information on fuel consumption per combustion unit per company, the validated energy use and emission data display clear and smooth frequency distributions, which can be adequately fitted to bimodal log-normal distributions. This is not unexpected as industrial plant sizes are typically log-normally distributed. Moreover, our statistical tests suggest that industrial sub-sectors, as classified by the International Standard Industrial Classification (ISIC), are also well represented by log-normal distributions. Using the validated data, we tested several missing data estimation procedures, including Montecarlo sampling of the real and fitted distributions, and a per ISIC estimation based on bootstrap-calculated mean values. These results will be presented and discussed in detail. Our results suggest that the accuracy of sub-sector BU emission inventories, particularly in developing regions, could be significantly improved if they are designed and carried out to be representative sub-samples (surveys) of the actual universe of emitters. A large fraction the missing data could be subsequently estimated by robust statistical procedures provided that most of the emitters were accounted by number and ISIC.
Surveillance system and method having an adaptive sequential probability fault detection test

NASA Technical Reports Server (NTRS)

Herzog, James P. (Inventor); Bickford, Randall L. (Inventor)

2005-01-01

System and method providing surveillance of an asset such as a process and/or apparatus by providing training and surveillance procedures that numerically fit a probability density function to an observed residual error signal distribution that is correlative to normal asset operation and then utilizes the fitted probability density function in a dynamic statistical hypothesis test for providing improved asset surveillance.
A Bootstrap Algorithm for Mixture Models and Interval Data in Inter-Comparisons

DTIC Science & Technology

2001-07-01

parametric bootstrap. The present algorithm will be applied to a thermometric inter-comparison, where data cannot be assumed to be normally distributed. 2 Data...experimental methods, used in each laboratory) often imply that the statistical assumptions are not satisfied, as for example in several thermometric ...triangular). Indeed, in thermometric experiments these three probabilistic models can represent several common stochastic variabilities for
Surveillance system and method having an adaptive sequential probability fault detection test

NASA Technical Reports Server (NTRS)

Bickford, Randall L. (Inventor); Herzog, James P. (Inventor)

2006-01-01

System and method providing surveillance of an asset such as a process and/or apparatus by providing training and surveillance procedures that numerically fit a probability density function to an observed residual error signal distribution that is correlative to normal asset operation and then utilizes the fitted probability density function in a dynamic statistical hypothesis test for providing improved asset surveillance.
Surveillance System and Method having an Adaptive Sequential Probability Fault Detection Test

NASA Technical Reports Server (NTRS)

Bickford, Randall L. (Inventor); Herzog, James P. (Inventor)

2008-01-01

System and method providing surveillance of an asset such as a process and/or apparatus by providing training and surveillance procedures that numerically fit a probability density function to an observed residual error signal distribution that is correlative to normal asset operation and then utilizes the fitted probability density function in a dynamic statistical hypothesis test for providing improved asset surveillance.
A method of using cluster analysis to study statistical dependence in multivariate data

NASA Technical Reports Server (NTRS)

Borucki, W. J.; Card, D. H.; Lyle, G. C.

1975-01-01

A technique is presented that uses both cluster analysis and a Monte Carlo significance test of clusters to discover associations between variables in multidimensional data. The method is applied to an example of a noisy function in three-dimensional space, to a sample from a mixture of three bivariate normal distributions, and to the well-known Fisher's Iris data.
The Italian primary school-size distribution and the city-size: a complex nexus

NASA Astrophysics Data System (ADS)

Belmonte, Alessandro; di Clemente, Riccardo; Buldyrev, Sergey V.

2014-06-01

We characterize the statistical law according to which Italian primary school-size distributes. We find that the school-size can be approximated by a log-normal distribution, with a fat lower tail that collects a large number of very small schools. The upper tail of the school-size distribution decreases exponentially and the growth rates are distributed with a Laplace PDF. These distributions are similar to those observed for firms and are consistent with a Bose-Einstein preferential attachment process. The body of the distribution features a bimodal shape suggesting some source of heterogeneity in the school organization that we uncover by an in-depth analysis of the relation between schools-size and city-size. We propose a novel cluster methodology and a new spatial interaction approach among schools which outline the variety of policies implemented in Italy. Different regional policies are also discussed shedding lights on the relation between policy and geographical features.
A new method of detecting changes in corneal health in response to toxic insults.

PubMed

Khan, Mohammad Faisal Jamal; Nag, Tapas C; Igathinathane, C; Osuagwu, Uchechukwu L; Rubini, Michele

2015-11-01

The size and arrangement of stromal collagen fibrils (CFs) influence the optical properties of the cornea and hence its function. The spatial arrangement of the collagen is still questionable in relation to the diameter of collagen fibril. In the present study, we introduce a new parameter, edge-fibrillar distance (EFD) to measure how two collagen fibrils are spaced with respect to their closest edges and their spatial distribution through normalized standard deviation of EFD (NSDEFD) accessed through the application of two commercially available multipurpose solutions (MPS): ReNu and Hippia. The corneal buttons were soaked separately in ReNu and Hippia MPS for five hours, fixed overnight in 2.5% glutaraldehyde containing cuprolinic blue and processed for transmission electron microscopy. The electron micrographs were processed using ImageJ user-coded plugin. Statistical analysis was performed to compare the image processed equivalent diameter (ED), inter-fibrillar distance (IFD), and EFD of the CFs of treated versus normal corneas. The ReNu-soaked cornea resulted in partly degenerated epithelium with loose hemidesmosomes and Bowman's collagen. In contrast, the epithelium of the cornea soaked in Hippia was degenerated or lost but showed closely packed Bowman's collagen. Soaking the corneas in both MPS caused a statistically significant decrease in the anterior collagen fibril, ED and a significant change in IFD, and EFD than those of the untreated corneas (p<0.05, for all comparisons). The introduction of EFD measurement in the study directly provided a sense of gap between periphery of the collagen bundles, their spatial distribution; and in combination with ED, they showed how the corneal collagen bundles are spaced in relation to their diameters. The spatial distribution parameter NSDEFD indicated that ReNu treated cornea fibrils were uniformly distributed spatially, followed by normal and Hippia. The EFD measurement with relatively lower standard deviation and NSDEFD, a characteristic of uniform CFs distribution, can be an additional parameter used in evaluating collagen organization and accessing the effects of various treatments on corneal health and transparency. Copyright © 2015 Elsevier Ltd. All rights reserved.
Empirical evaluation of data normalization methods for molecular classification.

PubMed

Huang, Huei-Chung; Qin, Li-Xuan

2018-01-01

Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers-an increasingly important application of microarrays in the era of personalized medicine. In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy.
Statistical properties of the normalized ice particle size distribution

NASA Astrophysics Data System (ADS)

Delanoë, Julien; Protat, Alain; Testud, Jacques; Bouniol, Dominique; Heymsfield, A. J.; Bansemer, A.; Brown, P. R. A.; Forbes, R. M.

2005-05-01

Testud et al. (2001) have recently developed a formalism, known as the "normalized particle size distribution (PSD)", which consists in scaling the diameter and concentration axes in such a way that the normalized PSDs are independent of water content and mean volume-weighted diameter. In this paper we investigate the statistical properties of the normalized PSD for the particular case of ice clouds, which are known to play a crucial role in the Earth's radiation balance. To do so, an extensive database of airborne in situ microphysical measurements has been constructed. A remarkable stability in shape of the normalized PSD is obtained. The impact of using a single analytical shape to represent all PSDs in the database is estimated through an error analysis on the instrumental (radar reflectivity and attenuation) and cloud (ice water content, effective radius, terminal fall velocity of ice crystals, visible extinction) properties. This resulted in a roughly unbiased estimate of the instrumental and cloud parameters, with small standard deviations ranging from 5 to 12%. This error is found to be roughly independent of the temperature range. This stability in shape and its single analytical approximation implies that two parameters are now sufficient to describe any normalized PSD in ice clouds: the intercept parameter N*0 and the mean volume-weighted diameter Dm. Statistical relationships (parameterizations) between N*0 and Dm have then been evaluated in order to reduce again the number of unknowns. It has been shown that a parameterization of N*0 and Dm by temperature could not be envisaged to retrieve the cloud parameters. Nevertheless, Dm-T and mean maximum dimension diameter -T parameterizations have been derived and compared to the parameterization of Kristjánsson et al. (2000) currently used to characterize particle size in climate models. The new parameterization generally produces larger particle sizes at any temperature than the Kristjánsson et al. (2000) parameterization. These new parameterizations are believed to better represent particle size at global scale, owing to a better representativity of the in situ microphysical database used to derive it. We then evaluated the potential of a direct N*0-Dm relationship. While the model parameterized by temperature produces strong errors on the cloud parameters, the N*0-Dm model parameterized by radar reflectivity produces accurate cloud parameters (less than 3% bias and 16% standard deviation). This result implies that the cloud parameters can be estimated from the estimate of only one parameter of the normalized PSD (N*0 or Dm) and a radar reflectivity measurement.
Distribution of normal superficial ocular vessels in digital images.

PubMed

Banaee, Touka; Ehsaei, Asieh; Pourreza, Hamidreza; Khajedaluee, Mohammad; Abrishami, Mojtaba; Basiri, Mohsen; Daneshvar Kakhki, Ramin; Pourreza, Reza

2014-02-01

To investigate the distribution of different-sized vessels in the digital images of the ocular surface, an endeavor which may provide useful information for future studies. This study included 295 healthy individuals. From each participant, four digital photographs of the superior and inferior conjunctivae of both eyes, with a fixed succession of photography (right upper, right lower, left upper, left lower), were taken with a slit lamp mounted camera. Photographs were then analyzed by a previously described algorithm for vessel detection in the digital images. The area (of the image) occupied by vessels (AOV) of different sizes was measured. Height, weight, fasting blood sugar (FBS) and hemoglobin levels were also measured and the relationship between these parameters and the AOV was investigated. These findings indicated a statistically significant difference in the distribution of the AOV among the four conjunctival areas. No significant correlations were noted between the AOV of each conjunctival area and the different demographic and biometric factors. Medium-sized vessels were the most abundant vessels in the photographs of the four investigated conjunctival areas. The AOV of the different sizes of vessels follows a normal distribution curve in the four areas of the conjunctiva. The distribution of the vessels in successive photographs changes in a specific manner, with the mean AOV becoming larger as the photos were taken from the right upper to the left lower area. The AOV of vessel sizes has a normal distribution curve and medium-sized vessels occupy the largest area of the photograph. Copyright © 2013 British Contact Lens Association. Published by Elsevier Ltd. All rights reserved.
Distribution Development for STORM Ingestion Input Parameters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fulton, John

The Sandia-developed Transport of Radioactive Materials (STORM) code suite is used as part of the Radioisotope Power System Launch Safety (RPSLS) program to perform statistical modeling of the consequences due to release of radioactive material given a launch accident. As part of this modeling, STORM samples input parameters from probability distributions with some parameters treated as constants. This report described the work done to convert four of these constant inputs (Consumption Rate, Average Crop Yield, Cropland to Landuse Database Ratio, and Crop Uptake Factor) to sampled values. Consumption rate changed from a constant value of 557.68 kg / yr tomore » a normal distribution with a mean of 102.96 kg / yr and a standard deviation of 2.65 kg / yr. Meanwhile, Average Crop Yield changed from a constant value of 3.783 kg edible / m 2 to a normal distribution with a mean of 3.23 kg edible / m 2 and a standard deviation of 0.442 kg edible / m 2 . The Cropland to Landuse Database ratio changed from a constant value of 0.0996 (9.96%) to a normal distribution with a mean value of 0.0312 (3.12%) and a standard deviation of 0.00292 (0.29%). Finally the crop uptake factor changed from a constant value of 6.37e -4 (Bq crop /kg)/(Bq soil /kg) to a lognormal distribution with a geometric mean value of 3.38e -4 (Bq crop /kg)/(Bq soil /kg) and a standard deviation value of 3.33 (Bq crop /kg)/(Bq soil /kg)« less
Statistical crossover characterization of the heterotic localized-extended transition.

PubMed

Ugajin, Ryuichi

2003-07-01

We investigated the spectral statistics of a quantum particle in a superlattice consisting of a disordered layer and a clean layer, possibly accompanied by random magnetic fields. Because a disordered layer has localized states and a clean layer has extended states, our quantum system shows a heterotic phase of an Anderson insulator and a normal metal. As the ratio of the volume of these two layers changes, the spectral statistics change from Poissonian to one of the Gaussian ensembles which characterize quantum chaos. A crossover distribution specified by two parameters is introduced to distinguish the transition from an integrable system to a quantum chaotic system during the heterotic phase from an Anderson transition in which the degree of random potentials is homogenous.
Tests of Mediation: Paradoxical Decline in Statistical Power as a Function of Mediator Collinearity

PubMed Central

Beasley, T. Mark

2013-01-01

Increasing the correlation between the independent variable and the mediator (a coefficient) increases the effect size (ab) for mediation analysis; however, increasing a by definition increases collinearity in mediation models. As a result, the standard error of product tests increase. The variance inflation due to increases in a at some point outweighs the increase of the effect size (ab) and results in a loss of statistical power. This phenomenon also occurs with nonparametric bootstrapping approaches because the variance of the bootstrap distribution of ab approximates the variance expected from normal theory. Both variances increase dramatically when a exceeds the b coefficient, thus explaining the power decline with increases in a. Implications for statistical analysis and applied researchers are discussed. PMID:24954952
Simulation of the Focal Spot of the Accelerator Bremsstrahlung Radiation

NASA Astrophysics Data System (ADS)

Sorokin, V.; Bespalov, V.

2016-06-01

Testing of thick-walled objects by bremsstrahlung radiation (BR) is primarily performed via high-energy quanta. The testing parameters are specified by the focal spot size of the high-energy bremsstrahlung radiation. In determining the focal spot size, the high- energy BR portion cannot be experimentally separated from the low-energy BR to use high- energy quanta only. The patterns of BR focal spot formation have been investigated via statistical modeling of the radiation transfer in the target material. The distributions of BR quanta emitted by the target for different energies and emission angles under normal distribution of the accelerated electrons bombarding the target have been obtained, and the ratio of the distribution parameters has been determined.
Stochastic Growth Theory of Spatially-Averaged Distributions of Langmuir Fields in Earth's Foreshock

NASA Technical Reports Server (NTRS)

Boshuizen, Christopher R.; Cairns, Iver H.; Robinson, P. A.

2001-01-01

Langmuir-like waves in the foreshock of Earth are characteristically bursty and irregular, and are the subject of a number of recent studies. Averaged over the foreshock, it is observed that the probability distribution is power-law P(bar)(log E) in the wave field E with the bar denoting this averaging over position, In this paper it is shown that stochastic growth theory (SGT) can explain a power-law spatially-averaged distributions P(bar)(log E), when the observed power-law variations of the mean and standard deviation of log E with position are combined with the log normal statistics predicted by SGT at each location.
Intercomparison of textural parameters of intertidal sediments generated by different statistical procedures, and implications for a unifying descriptive nomenclature

NASA Astrophysics Data System (ADS)

Fan, Daidu; Tu, Junbiao; Cai, Guofu; Shang, Shuai

2015-06-01

Grain-size analysis is a basic routine in sedimentology and related fields, but diverse methods of sample collection, processing and statistical analysis often make direct comparisons and interpretations difficult or even impossible. In this paper, 586 published grain-size datasets from the Qiantang Estuary (East China Sea) sampled and analyzed by the same procedures were merged and their textural parameters calculated by a percentile and two moment methods. The aim was to explore which of the statistical procedures performed best in the discrimination of three distinct sedimentary units on the tidal flats of the middle Qiantang Estuary. A Gaussian curve-fitting method served to simulate mixtures of two normal populations having different modal sizes, sorting values and size distributions, enabling a better understanding of the impact of finer tail components on textural parameters, as well as the proposal of a unifying descriptive nomenclature. The results show that percentile and moment procedures yield almost identical results for mean grain size, and that sorting values are also highly correlated. However, more complex relationships exist between percentile and moment skewness (kurtosis), changing from positive to negative correlations when the proportions of the finer populations decrease below 35% (10%). This change results from the overweighting of tail components in moment statistics, which stands in sharp contrast to the underweighting or complete amputation of small tail components by the percentile procedure. Intercomparisons of bivariate plots suggest an advantage of the Friedman & Johnson moment procedure over the McManus moment method in terms of the description of grain-size distributions, and over the percentile method by virtue of a greater sensitivity to small variations in tail components. The textural parameter scalings of Folk & Ward were translated into their Friedman & Johnson moment counterparts by application of mathematical functions derived by regression analysis of measured and modeled grain-size data, or by determining the abscissa values of intersections between auxiliary lines running parallel to the x-axis and vertical lines corresponding to the descriptive percentile limits along the ordinate of representative bivariate plots. Twofold limits were extrapolated for the moment statistics in relation to single descriptive terms in the cases of skewness and kurtosis by considering both positive and negative correlations between percentile and moment statistics. The extrapolated descriptive scalings were further validated by examining entire size-frequency distributions simulated by mixing two normal populations of designated modal size and sorting values, but varying in mixing ratios. These were found to match well in most of the proposed scalings, although platykurtic and very platykurtic categories were questionable when the proportion of the finer population was below 5%. Irrespective of the statistical procedure, descriptive nomenclatures should therefore be cautiously used when tail components contribute less than 5% to grain-size distributions.
Order statistics applied to the most massive and most distant galaxy clusters

NASA Astrophysics Data System (ADS)

Waizmann, J.-C.; Ettori, S.; Bartelmann, M.

2013-06-01

In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.
Statistical properties of the ice particle distribution in stratiform clouds

NASA Astrophysics Data System (ADS)

Delanoe, J.; Tinel, C.; Testud, J.

2003-04-01

This paper presents an extensive analysis of several microphysical data bases CEPEX, EUCREX, CLARE and CARL to determine statistical properties of the Particle Size Distribution (PSD). The data base covers different type of stratiform clouds : tropical cirrus (CEPEX), mid-latitude cirrus (EUCREX) and mid-latitude cirrus and stratus (CARL,CLARE) The approach for analysis uses the concept of normalisation of the PSD developed by Testud et al. (2001). The normalization aims at isolating three independent characteristics of the PSD : its "intrinsic" shape, the "average size" of the spectrum and the ice water content IWC, "average size" is meant the mean mass weighted diameter. It is shown that concentration should be normalized by N_0^* proportional to IWC/D_m^4. The "intrinsic" shape is defined as F(Deq/D_m)=N(Deq)/N_0^* where Deq is the equivalent melted diameter. The "intrinsic" shape is found to be very stable in the range 001.5, more scatter is observed, but future analysis should decide if it is representative of real physical variation or statistical "error" due to counting problem. Considering an overall statistics over the full data base, a large scatter of the N_0^* against Dm plot is found. But in the case of a particular event or a particular leg of a flight, the N_0^* vs. Dm plot is much less scattered and shows a systematic trend for decaying of N_0^* when Dm increases. This trend is interpreted as the manifestation of the predominance of the aggregation process. Finally an important point for cloud remote sensing is investigated : the normalised relationships IWC/N_0^* against Z/N_0^* is much less scattered that the classical IWC against Z the radar reflectivity factor.
Phase 1 of the near term hybrid passenger vehicle development program, appendix A. Mission analysis and performance specification studies. Volume 2: Appendices

NASA Technical Reports Server (NTRS)

Traversi, M.; Barbarek, L. A. C.

1979-01-01

A handy reference for JPL minimum requirements and guidelines is presented as well as information on the use of the fundamental information source represented by the Nationwide Personal Transportation Survey. Data on U.S. demographic statistics and highway speeds are included along with methodology for normal parameters evaluation, synthesis of daily distance distributions, and projection of car ownership distributions. The synthesis of tentative mission quantification results, of intermediate mission quantification results, and of mission quantification parameters are considered and 1985 in place fleet fuel economy data are included.

Statistical Modeling of Retinal Optical Coherence Tomography.

PubMed

Amini, Zahra; Rabbani, Hossein

2016-06-01

In this paper, a new model for retinal Optical Coherence Tomography (OCT) images is proposed. This statistical model is based on introducing a nonlinear Gaussianization transform to convert the probability distribution function (pdf) of each OCT intra-retinal layer to a Gaussian distribution. The retina is a layered structure and in OCT each of these layers has a specific pdf which is corrupted by speckle noise, therefore a mixture model for statistical modeling of OCT images is proposed. A Normal-Laplace distribution, which is a convolution of a Laplace pdf and Gaussian noise, is proposed as the distribution of each component of this model. The reason for choosing Laplace pdf is the monotonically decaying behavior of OCT intensities in each layer for healthy cases. After fitting a mixture model to the data, each component is gaussianized and all of them are combined by Averaged Maximum A Posterior (AMAP) method. To demonstrate the ability of this method, a new contrast enhancement method based on this statistical model is proposed and tested on thirteen healthy 3D OCTs taken by the Topcon 3D OCT and five 3D OCTs from Age-related Macular Degeneration (AMD) patients, taken by Zeiss Cirrus HD-OCT. Comparing the results with two contending techniques, the prominence of the proposed method is demonstrated both visually and numerically. Furthermore, to prove the efficacy of the proposed method for a more direct and specific purpose, an improvement in the segmentation of intra-retinal layers using the proposed contrast enhancement method as a preprocessing step, is demonstrated.
Empirical research on complex networks modeling of combat SoS based on data from real war-game, Part I: Statistical characteristics

NASA Astrophysics Data System (ADS)

Chen, Lei; Kou, Yingxin; Li, Zhanwu; Xu, An; Wu, Cheng

2018-01-01

We build a complex networks model of combat System-of-Systems (SoS) based on empirical data from a real war-game, this model is a combination of command & control (C2) subnetwork, sensors subnetwork, influencers subnetwork and logistical support subnetwork, each subnetwork has idiographic components and statistical characteristics. The C2 subnetwork is the core of whole combat SoS, it has a hierarchical structure with no modularity, of which robustness is strong enough to maintain normal operation after any two nodes is destroyed; the sensors subnetwork and influencers subnetwork are like sense organ and limbs of whole combat SoS, they are both flat modular networks of which degree distribution obey GEV distribution and power-law distribution respectively. The communication network is the combination of all subnetworks, it is an assortative Small-World network with core-periphery structure, the Intelligence & Communication Stations/Command Center integrated with C2 nodes in the first three level act as the hub nodes in communication network, and all the fourth-level C2 nodes, sensors, influencers and logistical support nodes have communication capability, they act as the periphery nodes in communication network, its degree distribution obeys exponential distribution in the beginning, Gaussian distribution in the middle, and power-law distribution in the end, and its path length obeys GEV distribution. The betweenness centrality distribution, closeness centrality distribution and eigenvector centrality are also been analyzed to measure the vulnerability of nodes.
Descriptive Quantitative Analysis of Rearfoot Alignment Radiographic Parameters.

PubMed

Meyr, Andrew J; Wagoner, Matthew R

2015-01-01

Although the radiographic parameters of the transverse talocalcaneal angle (tTCA), calcaneocuboid angle (CCA), talar head uncovering (THU), calcaneal inclination angle (CIA), talar declination angle (TDA), lateral talar-first metatarsal angle (lTFA), and lateral talocalcaneal angle (lTCA) form the basis of the preoperative evaluation and procedure selection for pes planovalgus deformity, the so-called normal values of these measurements are not well-established. The objectives of the present study were to retrospectively evaluate the descriptive statistics of these radiographic parameters (tTCA, CCA, THU, CIA, TDA, lTFA, and lTCA) in a large population, and, second, to determine an objective basis for defining "normal" versus "abnormal" measurements. As a secondary outcome, the relationship of these variables to the body mass index was assessed. Anteroposterior and lateral foot radiographs from 250 consecutive patients without a history of previous foot and ankle surgery and/or trauma were evaluated. The results revealed a mean measurement of 24.12°, 13.20°, 74.32%, 16.41°, 26.64°, 8.37°, and 43.41° for the tTCA, CCA, THU, CIA, TDA, lTFA, and lTCA, respectively. These were generally in line with the reported historical normal values. Descriptive statistical analysis demonstrated that the tTCA, THU, and TDA met the standards to be considered normally distributed but that the CCA, CIA, lTFA, and lTCA demonstrated data characteristics of both parametric and nonparametric distributions. Furthermore, only the CIA (R = -0.2428) and lTCA (R = -0.2449) demonstrated substantial correlation with the body mass index. No differentiations in deformity progression were observed when the radiographic parameters were plotted against each other to lead to a quantitative basis for defining "normal" versus "abnormal" measurements. Copyright © 2015 American College of Foot and Ankle Surgeons. Published by Elsevier Inc. All rights reserved.
Advanced probabilistic methods for quantifying the effects of various uncertainties in structural response

NASA Technical Reports Server (NTRS)

Nagpal, Vinod K.

1988-01-01

The effects of actual variations, also called uncertainties, in geometry and material properties on the structural response of a space shuttle main engine turbopump blade are evaluated. A normal distribution was assumed to represent the uncertainties statistically. Uncertainties were assumed to be totally random, partially correlated, and fully correlated. The magnitude of these uncertainties were represented in terms of mean and variance. Blade responses, recorded in terms of displacements, natural frequencies, and maximum stress, was evaluated and plotted in the form of probabilistic distributions under combined uncertainties. These distributions provide an estimate of the range of magnitudes of the response and probability of occurrence of a given response. Most importantly, these distributions provide the information needed to estimate quantitatively the risk in a structural design.
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

USGS Publications Warehouse

Cohn, T.A.; England, J.F.; Berenbrock, C.E.; Mason, R.R.; Stedinger, J.R.; Lamontagne, J.R.

2013-01-01

he Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as “less-than” values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
Shielding Effectiveness in a Two-Dimensional Reverberation Chamber Using Finite-Element Techniques

NASA Technical Reports Server (NTRS)

Bunting, Charles F.

2006-01-01

Reverberation chambers are attaining an increased importance in determination of electromagnetic susceptibility of avionics equipment. Given the nature of the variable boundary condition, the ability of a given source to couple energy into certain modes and the passband characteristic due the chamber Q, the fields are typically characterized by statistical means. The emphasis of this work is to apply finite-element techniques at cutoff to the analysis of a two-dimensional structure to examine the notion of shielding-effectiveness issues in a reverberating environment. Simulated mechanical stirring will be used to obtain the appropriate statistical field distribution. The shielding effectiveness (SE) in a simulated reverberating environment is compared to measurements in a reverberation chamber. A log-normal distribution for the SE is observed with implications for system designers. The work is intended to provide further refinement in the consideration of SE in a complex electromagnetic environment.
Experimental programme on absolute fission fragment yields with the lohengrin spectrometer: New optical and statistical methodologies

NASA Astrophysics Data System (ADS)

Abdelaziz, Chebboubi; Grégoire, Kessedjian; Olivier, Serot; Sylvain, Julien-Laferriere; Christophe, Sage; Florence, Martin; Olivier, Méplan; David, Bernard; Olivier, Litaize; Aurélien, Blanc; Herbert, Faust; Paolo, Mutti; Ulli, Köster; Alain, Letourneau; Thomas, Materna; Michal, Rapala

2017-09-01

The study of fission yields has a major impact on the characterization and understanding of the fission process and is mandatory for reactor applications. In the past with the LOHENGRIN spectrometer of the ILL, priority has been given for the studies in the light fission fragment mass range. The LPSC in collaboration with ILL and CEA has developed a measurement program on symmetric and heavy mass fission fragment distributions. The combination of measurements with ionisation chamber and Ge detectors is necessary to describe precisely the heavy fission fragment region in mass and charge. Recently, new measurements of fission yields and kinetic energy distributions are has been made on the 233U(nth,f) reaction. The focus of this work has been on the new optical and statistical methodology and the self-normalization of the data to provide new absolute measurements, independently of any libraries, and the associated experimental covariance matrix.
A generalized Grubbs-Beck test statistic for detecting multiple potentially influential low outliers in flood series

NASA Astrophysics Data System (ADS)

Cohn, T. A.; England, J. F.; Berenbrock, C. E.; Mason, R. R.; Stedinger, J. R.; Lamontagne, J. R.

2013-08-01

The Grubbs-Beck test is recommended by the federal guidelines for detection of low outliers in flood flow frequency computation in the United States. This paper presents a generalization of the Grubbs-Beck test for normal data (similar to the Rosner (1983) test; see also Spencer and McCuen (1996)) that can provide a consistent standard for identifying multiple potentially influential low flows. In cases where low outliers have been identified, they can be represented as "less-than" values, and a frequency distribution can be developed using censored-data statistical techniques, such as the Expected Moments Algorithm. This approach can improve the fit of the right-hand tail of a frequency distribution and provide protection from lack-of-fit due to unimportant but potentially influential low flows (PILFs) in a flood series, thus making the flood frequency analysis procedure more robust.
GPS FOM Chimney Analysis using Generalized Extreme Value Distribution

NASA Technical Reports Server (NTRS)

Ott, Rick; Frisbee, Joe; Saha, Kanan

2004-01-01

Many a time an objective of a statistical analysis is to estimate a limit value like 3-sigma 95% confidence upper limit from a data sample. The generalized Extreme Value Distribution method can be profitably employed in many situations for such an estimate. . .. It is well known that according to the Central Limit theorem the mean value of a large data set is normally distributed irrespective of the distribution of the data from which the mean value is derived. In a somewhat similar fashion it is observed that many times the extreme value of a data set has a distribution that can be formulated with a Generalized Distribution. In space shuttle entry with 3-string GPS navigation the Figure Of Merit (FOM) value gives a measure of GPS navigated state accuracy. A GPS navigated state with FOM of 6 or higher is deemed unacceptable and is said to form a FOM 6 or higher chimney. A FOM chimney is a period of time during which the FOM value stays higher than 5. A longer period of FOM of value 6 or higher causes navigated state to accumulate more error for a lack of state update. For an acceptable landing it is imperative that the state error remains low and hence at low altitude during entry GPS data of FOM greater than 5 must not last more than 138 seconds. I To test the GPS performAnce many entry test cases were simulated at the Avionics Development Laboratory. Only high value FoM chimneys are consequential. The extreme value statistical technique is applied to analyze high value FOM chimneys. The Maximum likelihood method is used to determine parameters that characterize the GEV distribution, and then the limit value statistics are estimated.
Evaluation of probabilistic forecasts with the scoringRules package

NASA Astrophysics Data System (ADS)

Jordan, Alexander; Krüger, Fabian; Lerch, Sebastian

2017-04-01

Over the last decades probabilistic forecasts in the form of predictive distributions have become popular in many scientific disciplines. With the proliferation of probabilistic models arises the need for decision-theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way in order to better understand sources of prediction errors and to improve the models. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. In coherence with decision-theoretical principles they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This contribution presents the software package scoringRules for the statistical programming language R, which provides functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. For univariate variables, two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, ensemble weather forecasts take this form. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices. Recent developments include the addition of scoring rules to evaluate multivariate forecast distributions. The use of the scoringRules package is illustrated in an example on post-processing ensemble forecasts of temperature.
Log-amplitude statistics for Beck-Cohen superstatistics

NASA Astrophysics Data System (ADS)

Kiyono, Ken; Konno, Hidetoshi

2013-05-01

As a possible generalization of Beck-Cohen superstatistical processes, we study non-Gaussian processes with temporal heterogeneity of local variance. To characterize the variance heterogeneity, we define log-amplitude cumulants and log-amplitude autocovariance and derive closed-form expressions of the log-amplitude cumulants for χ2, inverse χ2, and log-normal superstatistical distributions. Furthermore, we show that χ2 and inverse χ2 superstatistics with degree 2 are closely related to an extreme value distribution, called the Gumbel distribution. In these cases, the corresponding superstatistical distributions result in the q-Gaussian distribution with q=5/3 and the bilateral exponential distribution, respectively. Thus, our finding provides a hypothesis that the asymptotic appearance of these two special distributions may be explained by a link with the asymptotic limit distributions involving extreme values. In addition, as an application of our approach, we demonstrated that non-Gaussian fluctuations observed in a stock index futures market can be well approximated by the χ2 superstatistical distribution with degree 2.
Results of module electrical measurement of the DOE 46-kilowatt procurement

NASA Technical Reports Server (NTRS)

Curtis, H. B.

1978-01-01

Current-voltage measurements have been made on terrestrial solar cell modules of the DOE/JPL Low Cost Silicon Solar Array procurement. Data on short circuit current, open circuit voltage, and maximum power for the four types of modules are presented in normalized form, showing distribution of the measured values. Standard deviations from the mean values are also given. Tests of the statistical significance of the data are discussed.
Certified Reduced Basis Model Characterization: a Frequentistic Uncertainty Framework

DTIC Science & Technology

2011-01-11

14) It then follows that the Legendre coefficient random vector, (Z [0], Z [1], . . . , Z [I])(ω), is (I+1)– variate normally distributed with mean (δ...I. Note each two-sided inequality represents two constraints. 3. PDE-Based Statistical Inference We now proceed to the parametrized partial...appearance of defects or geometric variations relative to an initial baseline, or perhaps manufacturing departures from nominal specifications; if our
Asymptotic Normality of Poly-T Densities with Bayesian Applications.

DTIC Science & Technology

1987-10-01

be extended to the case of many t-like factors in a straightforward manner. Obviously, the computational complexity will increase rapidly as the number...York: Marcel-Dekker. Broemeling, L.D. and Abdullah, M.Y. (1984). An approximation to the poly-t distribution. Communciations in Statistics A,11, 1407...Street Center Champaign, IL 61820 Austin, TX 78703 Dr. Steven Hunks Dr. James Krantz Department of Education Computer -based Education University of
Accessible Information Without Disturbing Partially Known Quantum States on a von Neumann Algebra

NASA Astrophysics Data System (ADS)

Kuramochi, Yui

2018-04-01

This paper addresses the problem of how much information we can extract without disturbing a statistical experiment, which is a family of partially known normal states on a von Neumann algebra. We define the classical part of a statistical experiment as the restriction of the equivalent minimal sufficient statistical experiment to the center of the outcome space, which, in the case of density operators on a Hilbert space, corresponds to the classical probability distributions appearing in the maximal decomposition by Koashi and Imoto (Phys. Rev. A 66, 022,318 2002). We show that we can access by a Schwarz or completely positive channel at most the classical part of a statistical experiment if we do not disturb the states. We apply this result to the broadcasting problem of a statistical experiment. We also show that the classical part of the direct product of statistical experiments is the direct product of the classical parts of the statistical experiments. The proof of the latter result is based on the theorem that the direct product of minimal sufficient statistical experiments is also minimal sufficient.
A Dosimetric Comparison of Proton and Intensity-Modulated Photon Radiotherapy for Pediatric Parameningeal Rhabdomyosarcomas

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kozak, Kevin R.; Adams, Judith; Krejcarek, Stephanie J.

Purpose: We compared tumor and normal tissue dosimetry of proton radiation therapy with intensity-modulated radiation therapy (IMRT) for pediatric parameningeal rhabdomyosarcomas (PRMS). Methods and Materials: To quantify dosimetric differences between contemporary proton and photon treatment for pediatric PRMS, proton beam plans were compared with IMRT plans. Ten patients treated with proton radiation therapy at Massachusetts General Hospital had IMRT plans generated. To facilitate dosimetric comparisons, clinical target volumes and normal tissue volumes were held constant. Plans were optimized for target volume coverage and normal tissue sparing. Results: Proton and IMRT plans provided acceptable and comparable target volume coverage, with atmore » least 99% of the CTV receiving 95% of the prescribed dose in all cases. Improved dose conformality provided by proton therapy resulted in significant sparing of all examined normal tissues except for ipsilateral cochlea and mastoid; ipsilateral parotid gland sparing was of borderline statistical significance (p = 0.05). More profound sparing of contralateral structures by protons resulted in greater dose asymmetry between ipsilateral and contralateral retina, optic nerves, cochlea, and mastoids; dose asymmetry between ipsilateral and contralateral parotids was of borderline statistical significance (p = 0.05). Conclusions: For pediatric PRMS, superior normal tissue sparing is achieved with proton radiation therapy compared with IMRT. Because of enhanced conformality, proton plans also demonstrate greater normal tissue dose distribution asymmetry. Longitudinal studies assessing the impact of proton radiotherapy and IMRT on normal tissue function and growth symmetry are necessary to define the clinical consequences of these differences.« less
The intermediates take it all: asymptotics of higher criticism statistics and a powerful alternative based on equal local levels.

PubMed

Gontscharuk, Veronika; Landwehr, Sandra; Finner, Helmut

2015-01-01

The higher criticism (HC) statistic, which can be seen as a normalized version of the famous Kolmogorov-Smirnov statistic, has a long history, dating back to the mid seventies. Originally, HC statistics were used in connection with goodness of fit (GOF) tests but they recently gained some attention in the context of testing the global null hypothesis in high dimensional data. The continuing interest for HC seems to be inspired by a series of nice asymptotic properties related to this statistic. For example, unlike Kolmogorov-Smirnov tests, GOF tests based on the HC statistic are known to be asymptotically sensitive in the moderate tails, hence it is favorably applied for detecting the presence of signals in sparse mixture models. However, some questions around the asymptotic behavior of the HC statistic are still open. We focus on two of them, namely, why a specific intermediate range is crucial for GOF tests based on the HC statistic and why the convergence of the HC distribution to the limiting one is extremely slow. Moreover, the inconsistency in the asymptotic and finite behavior of the HC statistic prompts us to provide a new HC test that has better finite properties than the original HC test while showing the same asymptotics. This test is motivated by the asymptotic behavior of the so-called local levels related to the original HC test. By means of numerical calculations and simulations we show that the new HC test is typically more powerful than the original HC test in normal mixture models. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A statistical assessment of zero-polarization catalogues

NASA Astrophysics Data System (ADS)

Clarke, D.; Naghizadeh-Khouei, J.; Simmons, J. F. L.; Stewart, B. G.

1993-03-01

The statistical behavior associated with polarization measurements is presented. The cumulative distribution function for measurements of unpolarized sources normalized by the measurement error is considered and Kolmogorov tests have been applied to data which might be considered as being representative of assemblies of unpolarized stars. Tinbergen's (1979, 1982) and Piirola's I (1977) catalogs have been examined and reveal shortcomings, the former indicating the presence of uncorrected instrumental polarization in part of the data and both suggesting that the quoted errors are in general slightly underestimated. Citings of these catalogs as providing evidence that middle-type stars in general exhibit weak intrinsic polarizations are shown to be invalid.
Universal Inverse Power-Law Distribution for Fractal Fluctuations in Dynamical Systems: Applications for Predictability of Inter-Annual Variability of Indian and USA Region Rainfall

NASA Astrophysics Data System (ADS)

Selvam, A. M.

2017-01-01

Dynamical systems in nature exhibit self-similar fractal space-time fluctuations on all scales indicating long-range correlations and, therefore, the statistical normal distribution with implicit assumption of independence, fixed mean and standard deviation cannot be used for description and quantification of fractal data sets. The author has developed a general systems theory based on classical statistical physics for fractal fluctuations which predicts the following. (1) The fractal fluctuations signify an underlying eddy continuum, the larger eddies being the integrated mean of enclosed smaller-scale fluctuations. (2) The probability distribution of eddy amplitudes and the variance (square of eddy amplitude) spectrum of fractal fluctuations follow the universal Boltzmann inverse power law expressed as a function of the golden mean. (3) Fractal fluctuations are signatures of quantum-like chaos since the additive amplitudes of eddies when squared represent probability densities analogous to the sub-atomic dynamics of quantum systems such as the photon or electron. (4) The model predicted distribution is very close to statistical normal distribution for moderate events within two standard deviations from the mean but exhibits a fat long tail that are associated with hazardous extreme events. Continuous periodogram power spectral analyses of available GHCN annual total rainfall time series for the period 1900-2008 for Indian and USA stations show that the power spectra and the corresponding probability distributions follow model predicted universal inverse power law form signifying an eddy continuum structure underlying the observed inter-annual variability of rainfall. On a global scale, man-made greenhouse gas related atmospheric warming would result in intensification of natural climate variability, seen immediately in high frequency fluctuations such as QBO and ENSO and even shorter timescales. Model concepts and results of analyses are discussed with reference to possible prediction of climate change. Model concepts, if correct, rule out unambiguously, linear trends in climate. Climate change will only be manifested as increase or decrease in the natural variability. However, more stringent tests of model concepts and predictions are required before applications to such an important issue as climate change. Observations and simulations with climate models show that precipitation extremes intensify in response to a warming climate (O'Gorman in Curr Clim Change Rep 1:49-59, 2015).
Statistical variability and confidence intervals for planar dose QA pass rates

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, Daniel W.; Nelms, Benjamin E.; Attwood, Kristopher

Purpose: The most common metric for comparing measured to calculated dose, such as for pretreatment quality assurance of intensity-modulated photon fields, is a pass rate (%) generated using percent difference (%Diff), distance-to-agreement (DTA), or some combination of the two (e.g., gamma evaluation). For many dosimeters, the grid of analyzed points corresponds to an array with a low areal density of point detectors. In these cases, the pass rates for any given comparison criteria are not absolute but exhibit statistical variability that is a function, in part, on the detector sampling geometry. In this work, the authors analyze the statistics ofmore » various methods commonly used to calculate pass rates and propose methods for establishing confidence intervals for pass rates obtained with low-density arrays. Methods: Dose planes were acquired for 25 prostate and 79 head and neck intensity-modulated fields via diode array and electronic portal imaging device (EPID), and matching calculated dose planes were created via a commercial treatment planning system. Pass rates for each dose plane pair (both centered to the beam central axis) were calculated with several common comparison methods: %Diff/DTA composite analysis and gamma evaluation, using absolute dose comparison with both local and global normalization. Specialized software was designed to selectively sample the measured EPID response (very high data density) down to discrete points to simulate low-density measurements. The software was used to realign the simulated detector grid at many simulated positions with respect to the beam central axis, thereby altering the low-density sampled grid. Simulations were repeated with 100 positional iterations using a 1 detector/cm{sup 2} uniform grid, a 2 detector/cm{sup 2} uniform grid, and similar random detector grids. For each simulation, %/DTA composite pass rates were calculated with various %Diff/DTA criteria and for both local and global %Diff normalization techniques. Results: For the prostate and head/neck cases studied, the pass rates obtained with gamma analysis of high density dose planes were 2%-5% higher than respective %/DTA composite analysis on average (ranging as high as 11%), depending on tolerances and normalization. Meanwhile, the pass rates obtained via local normalization were 2%-12% lower than with global maximum normalization on average (ranging as high as 27%), depending on tolerances and calculation method. Repositioning of simulated low-density sampled grids leads to a distribution of possible pass rates for each measured/calculated dose plane pair. These distributions can be predicted using a binomial distribution in order to establish confidence intervals that depend largely on the sampling density and the observed pass rate (i.e., the degree of difference between measured and calculated dose). These results can be extended to apply to 3D arrays of detectors, as well. Conclusions: Dose plane QA analysis can be greatly affected by choice of calculation metric and user-defined parameters, and so all pass rates should be reported with a complete description of calculation method. Pass rates for low-density arrays are subject to statistical uncertainty (vs. the high-density pass rate), but these sampling errors can be modeled using statistical confidence intervals derived from the sampled pass rate and detector density. Thus, pass rates for low-density array measurements should be accompanied by a confidence interval indicating the uncertainty of each pass rate.« less

Bridging stylized facts in finance and data non-stationarities

NASA Astrophysics Data System (ADS)

Camargo, Sabrina; Duarte Queirós, Sílvio M.; Anteneodo, Celia

2013-04-01

Employing a recent technique which allows the representation of nonstationary data by means of a juxtaposition of locally stationary paths of different length, we introduce a comprehensive analysis of the key observables in a financial market: the trading volume and the price fluctuations. From the segmentation procedure we are able to introduce a quantitative description of statistical features of these two quantities, which are often named stylized facts, namely the tails of the distribution of trading volume and price fluctuations and a dynamics compatible with the U-shaped profile of the volume in a trading section and the slow decay of the autocorrelation function. The segmentation of the trading volume series provides evidence of slow evolution of the fluctuating parameters of each patch, pointing to the mixing scenario. Assuming that long-term features are the outcome of a statistical mixture of simple local forms, we test and compare different probability density functions to provide the long-term distribution of the trading volume, concluding that the log-normal gives the best agreement with the empirical distribution. Moreover, the segmentation of the magnitude price fluctuations are quite different from the results for the trading volume, indicating that changes in the statistics of price fluctuations occur at a faster scale than in the case of trading volume.
Statistical dynamics of regional populations and economies

NASA Astrophysics Data System (ADS)

Huo, Jie; Wang, Xu-Ming; Hao, Rui; Wang, Peng

Quantitative analysis of human behavior and social development is becoming a hot spot of some interdisciplinary studies. A statistical analysis on the population and GDP of 150 cities in China from 1990 to 2013 is conducted. The result indicates the cumulative probability distribution of the populations and that of the GDPs obeying the shifted power law, respectively. In order to understand these characteristics, a generalized Langevin equation describing variation of population is proposed, which is based on the correlations between population and GDP as well as the random fluctuations of the related factors. The equation is transformed into the Fokker-Plank equation to express the evolution of population distribution. The general solution demonstrates a transition of the distribution from the normal Gaussian distribution to a shifted power law, which suggests a critical point of time at which the transition takes place. The shifted power law distribution in the supercritical situation is qualitatively in accordance with the practical result. The distribution of the GDPs is derived from the well-known Cobb-Douglas production function. The result presents a change, in supercritical situation, from a shifted power law to the Gaussian distribution. This is a surprising result-the regional GDP distribution of our world will be the Gaussian distribution one day in the future. The discussions based on the changing trend of economic growth suggest it will be true. Therefore, these theoretical attempts may draw a historical picture of our society in the aspects of population and economy.
Estimating the age of Hb G-Coushatta [β22(B4)Glu→Ala] mutation by haplotypes of β-globin gene cluster in Denizli, Turkey.

PubMed

Ozturk, Onur; Arikan, Sanem; Atalay, Ayfer; Atalay, Erol O

2018-05-01

Hb G-Coushatta variant was reported from various populations' parts of the world such as Thai, Korea, Algeria, Thailand, China, Japan and Turkey. In our study, we aimed to discuss the possible historical relationships of the Hb G-Coushatta mutation with the possible migration routes of the world. For this purpose, associated haplotypes were determined using polymorphic loci in the beta globin gene cluster of hemoglobin G-Coushatta and normal populations in Denizli, Turkey. We performed statistical analysis such as haplotype analysis, Hardy-Weinberg equilibrium, measurement of genetic diversity and population differentiation parameters, analysis of molecular variance using F-statistics, historical-demographic analyses, mismatch distribution analysis of both populations and applied the test statistics in Arlequin ver. 3.5 software program. The diversity of haplotypes has been shown to indicate different genetic origins for two populations. However, AMOVA results, molecular diversity parameters and population demographic expansion times showed that the Hb G-Coushatta mutation develops on the normal population gene pool. Our estimated τ values showed the average time since the demographic expansion for normal and Hb G-Coushatta populations ranged from approximately 42,000 to 38,000 ybp, respectively. Our data suggest that Hb G-Coushatta population originate in normal population in Denizli, Turkey. These results support the hypothesis that the multiple origin of Hb G-Coushatta and indicate that mutation may have been triggered the formation of new variants on beta globin haplotypes. © 2018 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.
[IMPORTANCE OF SHEAR WAVE ELASTOGRAPHY OF LIVERS IN PRACTICALLY HEALTHY PREGNANT WOMEN].

PubMed

Sariyeva, E; Salahova, S; Bayramov, N

2017-01-01

Pulse-wave elastography (SWE) that is one of the mostly used methods in the recent years holds important place in assessment of liver fibrosis. However there is no exact information on the results of liver elastography in healthy pregnant women in the world literature. The aim of the study was to investigate theSWE parameters of liver elastography in practically healthy pregnant women. The subject of the research was 50 practically healthy pregnant women within 18-45 years old (mean age 27.7±0.7). The pregnant women with genital and extragenital diseases were not included to the research. The research work was executed in the II Department of Obstetrics and Gynecology of Azerbaijan Medical University. SWE of liver in pregnant women was conducted in the I Department of Surgical Diseases of Azerbaijan Medical University through Supersonic Aixplorer Multi Wave device presented by the Scientific Development Foundation under the President of the Azerbaijan Republic. The obtained tissue hardness indicators are assessed under METAVIR scale. The results of the research showed that the measures of liver in practically healthy pregnant women are normal, edges flat, its echogenicity mainly normal, echostructure of its parenchyma homogenous, hardness was F0-F1 (normal) under METAVIR scale, fibrosis not observed. The obtained results were processed by variational (power average, percentile distribution) and correlation (ρ-Spearman) analyzes using the statistical package SPSS-20. A statistical study of the distribution of liver density in healthy women showed that the average density was 4,43±0,01 with 95% confidence interval (4,23 - 4,63). The histogram of distribution of liver density in practically healthy women belongs to the family of normal distributions with coefficients of variation coefficient (16.3%), asymmetry (-0.861±0.337) and excess (-0.068±0.662). Correlation analysis in healthy women did not reveal a reliable relationship between age and liver density (ρ=0.082, p=0.571), but a significant inverse correlation was found between the body mass index (BMI) and liver density (ρ=-0.317; p=0.025). Easy application, non-invasiveness, maximum exactness within the real time, repeatedly application of procedure and no risk to fetus by Shear Wave elastography of liver allow applying this method in pregnant women. Study of liver elasticity in pregnant women allows assessing the grades of hepatic fibrosis and differentiating liver disease.
The Statistical Drake Equation

NASA Astrophysics Data System (ADS)

Maccone, Claudio

2010-12-01

We provide the statistical generalization of the Drake equation. From a simple product of seven positive numbers, the Drake equation is now turned into the product of seven positive random variables. We call this "the Statistical Drake Equation". The mathematical consequences of this transformation are then derived. The proof of our results is based on the Central Limit Theorem (CLT) of Statistics. In loose terms, the CLT states that the sum of any number of independent random variables, each of which may be ARBITRARILY distributed, approaches a Gaussian (i.e. normal) random variable. This is called the Lyapunov Form of the CLT, or the Lindeberg Form of the CLT, depending on the mathematical constraints assumed on the third moments of the various probability distributions. In conclusion, we show that: The new random variable N, yielding the number of communicating civilizations in the Galaxy, follows the LOGNORMAL distribution. Then, as a consequence, the mean value of this lognormal distribution is the ordinary N in the Drake equation. The standard deviation, mode, and all the moments of this lognormal N are also found. The seven factors in the ordinary Drake equation now become seven positive random variables. The probability distribution of each random variable may be ARBITRARY. The CLT in the so-called Lyapunov or Lindeberg forms (that both do not assume the factors to be identically distributed) allows for that. In other words, the CLT "translates" into our statistical Drake equation by allowing an arbitrary probability distribution for each factor. This is both physically realistic and practically very useful, of course. An application of our statistical Drake equation then follows. The (average) DISTANCE between any two neighboring and communicating civilizations in the Galaxy may be shown to be inversely proportional to the cubic root of N. Then, in our approach, this distance becomes a new random variable. We derive the relevant probability density function, apparently previously unknown and dubbed "Maccone distribution" by Paul Davies. DATA ENRICHMENT PRINCIPLE. It should be noticed that ANY positive number of random variables in the Statistical Drake Equation is compatible with the CLT. So, our generalization allows for many more factors to be added in the future as long as more refined scientific knowledge about each factor will be known to the scientists. This capability to make room for more future factors in the statistical Drake equation, we call the "Data Enrichment Principle," and we regard it as the key to more profound future results in the fields of Astrobiology and SETI. Finally, a practical example is given of how our statistical Drake equation works numerically. We work out in detail the case, where each of the seven random variables is uniformly distributed around its own mean value and has a given standard deviation. For instance, the number of stars in the Galaxy is assumed to be uniformly distributed around (say) 350 billions with a standard deviation of (say) 1 billion. Then, the resulting lognormal distribution of N is computed numerically by virtue of a MathCad file that the author has written. This shows that the mean value of the lognormal random variable N is actually of the same order as the classical N given by the ordinary Drake equation, as one might expect from a good statistical generalization.
Normal and Extreme Wind Conditions for Power at Coastal Locations in China

PubMed Central

Gao, Meng; Ning, Jicai; Wu, Xiaoqing

2015-01-01

In this paper, the normal and extreme wind conditions for power at 12 coastal locations along China’s coastline were investigated. For this purpose, the daily meteorological data measured at the standard 10-m height above ground for periods of 40–62 years are statistically analyzed. The East Asian Monsoon that affects almost China’s entire coastal region is considered as the leading factor determining wind energy resources. For most stations, the mean wind speed is higher in winter and lower in summer. Meanwhile, the wind direction analysis indicates that the prevalent winds in summer are southerly, while those in winter are northerly. The air densities at different coastal locations differ significantly, resulting in the difference in wind power density. The Weibull and lognormal distributions are applied to fit the yearly wind speeds. The lognormal distribution performs better than the Weibull distribution at 8 coastal stations according to two judgement criteria, the Kolmogorov–Smirnov test and absolute error (AE). Regarding the annual maximum extreme wind speed, the generalized extreme value (GEV) distribution performs better than the commonly-used Gumbel distribution. At these southeastern coastal locations, strong winds usually occur in typhoon season. These 4 coastal provinces, that is, Guangdong, Fujian, Hainan, and Zhejiang, which have abundant wind resources, are also prone to typhoon disasters. PMID:26313256
Methane Leaks from Natural Gas Systems Follow Extreme Distributions.

PubMed

Brandt, Adam R; Heath, Garvin A; Cooley, Daniel

2016-11-15

Future energy systems may rely on natural gas as a low-cost fuel to support variable renewable power. However, leaking natural gas causes climate damage because methane (CH 4 ) has a high global warming potential. In this study, we use extreme-value theory to explore the distribution of natural gas leak sizes. By analyzing ∼15 000 measurements from 18 prior studies, we show that all available natural gas leakage data sets are statistically heavy-tailed, and that gas leaks are more extremely distributed than other natural and social phenomena. A unifying result is that the largest 5% of leaks typically contribute over 50% of the total leakage volume. While prior studies used log-normal model distributions, we show that log-normal functions poorly represent tail behavior. Our results suggest that published uncertainty ranges of CH 4 emissions are too narrow, and that larger sample sizes are required in future studies to achieve targeted confidence intervals. Additionally, we find that cross-study aggregation of data sets to increase sample size is not recommended due to apparent deviation between sampled populations. Understanding the nature of leak distributions can improve emission estimates, better illustrate their uncertainty, allow prioritization of source categories, and improve sampling design. Also, these data can be used for more effective design of leak detection technologies.
Normal and Extreme Wind Conditions for Power at Coastal Locations in China.

PubMed

Gao, Meng; Ning, Jicai; Wu, Xiaoqing

2015-01-01

In this paper, the normal and extreme wind conditions for power at 12 coastal locations along China's coastline were investigated. For this purpose, the daily meteorological data measured at the standard 10-m height above ground for periods of 40-62 years are statistically analyzed. The East Asian Monsoon that affects almost China's entire coastal region is considered as the leading factor determining wind energy resources. For most stations, the mean wind speed is higher in winter and lower in summer. Meanwhile, the wind direction analysis indicates that the prevalent winds in summer are southerly, while those in winter are northerly. The air densities at different coastal locations differ significantly, resulting in the difference in wind power density. The Weibull and lognormal distributions are applied to fit the yearly wind speeds. The lognormal distribution performs better than the Weibull distribution at 8 coastal stations according to two judgement criteria, the Kolmogorov-Smirnov test and absolute error (AE). Regarding the annual maximum extreme wind speed, the generalized extreme value (GEV) distribution performs better than the commonly-used Gumbel distribution. At these southeastern coastal locations, strong winds usually occur in typhoon season. These 4 coastal provinces, that is, Guangdong, Fujian, Hainan, and Zhejiang, which have abundant wind resources, are also prone to typhoon disasters.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

PubMed Central

2014-01-01

Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Discrete geometric analysis of message passing algorithm on graphs

NASA Astrophysics Data System (ADS)

Watanabe, Yusuke

2010-04-01

We often encounter probability distributions given as unnormalized products of non-negative functions. The factorization structures are represented by hypergraphs called factor graphs. Such distributions appear in various fields, including statistics, artificial intelligence, statistical physics, error correcting codes, etc. Given such a distribution, computations of marginal distributions and the normalization constant are often required. However, they are computationally intractable because of their computational costs. One successful approximation method is Loopy Belief Propagation (LBP) algorithm. The focus of this thesis is an analysis of the LBP algorithm. If the factor graph is a tree, i.e. having no cycle, the algorithm gives the exact quantities. If the factor graph has cycles, however, the LBP algorithm does not give exact results and possibly exhibits oscillatory and non-convergent behaviors. The thematic question of this thesis is "How the behaviors of the LBP algorithm are affected by the discrete geometry of the factor graph?" The primary contribution of this thesis is the discovery of a formula that establishes the relation between the LBP, the Bethe free energy and the graph zeta function. This formula provides new techniques for analysis of the LBP algorithm, connecting properties of the graph and of the LBP and the Bethe free energy. We demonstrate applications of the techniques to several problems including (non) convexity of the Bethe free energy, the uniqueness and stability of the LBP fixed point. We also discuss the loop series initiated by Chertkov and Chernyak. The loop series is a subgraph expansion of the normalization constant, or partition function, and reflects the graph geometry. We investigate theoretical natures of the series. Moreover, we show a partial connection between the loop series and the graph zeta function.
LORETA imaging of P300 in schizophrenia with individual MRI and 128-channel EEG.

PubMed

Pae, Ji Soo; Kwon, Jun Soo; Youn, Tak; Park, Hae-Jeong; Kim, Myung Sun; Lee, Boreom; Park, Kwang Suk

2003-11-01

We investigated the characteristics of P300 generators in schizophrenics by using voxel-based statistical parametric mapping of current density images. P300 generators, produced by a rare target tone of 1500 Hz (15%) under a frequent nontarget tone of 1000 Hz (85%), were measured in 20 right-handed schizophrenics and 21 controls. Low-resolution electromagnetic tomography (LORETA), using a realistic head model of the boundary element method based on individual MRI, was applied to the 128-channel EEG. Three-dimensional current density images were reconstructed from the LORETA intensity maps that covered the whole cortical gray matter. Spatial normalization and intensity normalization of the smoothed current density images were used to reduce anatomical variance and subject-specific global activity and statistical parametric mapping (SPM) was applied for the statistical analysis. We found that the sources of P300 were consistently localized at the left superior parietal area in normal subjects, while those of schizophrenics were diversely distributed. Upon statistical comparison, schizophrenics, with globally reduced current densities, showed a significant P300 current density reduction in the left medial temporal area and in the left inferior parietal area, while both left prefrontal and right orbitofrontal areas were relatively activated. The left parietotemporal area was found to correlate negatively with Positive and Negative Syndrome Scale total scores of schizophrenic patients. In conclusion, the reduced and increased areas of current density in schizophrenic patients suggest that the medial temporal and frontal areas contribute to the pathophysiology of schizophrenia, the frontotemporal circuitry abnormality.
Empirical evaluation of data normalization methods for molecular classification

PubMed Central

Huang, Huei-Chung

2018-01-01

Background Data artifacts due to variations in experimental handling are ubiquitous in microarray studies, and they can lead to biased and irreproducible findings. A popular approach to correct for such artifacts is through post hoc data adjustment such as data normalization. Statistical methods for data normalization have been developed and evaluated primarily for the discovery of individual molecular biomarkers. Their performance has rarely been studied for the development of multi-marker molecular classifiers—an increasingly important application of microarrays in the era of personalized medicine. Methods In this study, we set out to evaluate the performance of three commonly used methods for data normalization in the context of molecular classification, using extensive simulations based on re-sampling from a unique pair of microRNA microarray datasets for the same set of samples. The data and code for our simulations are freely available as R packages at GitHub. Results In the presence of confounding handling effects, all three normalization methods tended to improve the accuracy of the classifier when evaluated in an independent test data. The level of improvement and the relative performance among the normalization methods depended on the relative level of molecular signal, the distributional pattern of handling effects (e.g., location shift vs scale change), and the statistical method used for building the classifier. In addition, cross-validation was associated with biased estimation of classification accuracy in the over-optimistic direction for all three normalization methods. Conclusion Normalization may improve the accuracy of molecular classification for data with confounding handling effects; however, it cannot circumvent the over-optimistic findings associated with cross-validation for assessing classification accuracy. PMID:29666754
A model for the flux-r.m.s. correlation in blazar variability or the minijets-in-a-jet statistical model

NASA Astrophysics Data System (ADS)

Biteau, J.; Giebels, B.

2012-12-01

Very high energy gamma-ray variability of blazar emission remains of puzzling origin. Fast flux variations down to the minute time scale, as observed with H.E.S.S. during flares of the blazar PKS 2155-304, suggests that variability originates from the jet, where Doppler boosting can be invoked to relax causal constraints on the size of the emission region. The observation of log-normality in the flux distributions should rule out additive processes, such as those resulting from uncorrelated multiple-zone emission models, and favour an origin of the variability from multiplicative processes not unlike those observed in a broad class of accreting systems. We show, using a simple kinematic model, that Doppler boosting of randomly oriented emitting regions generates flux distributions following a Pareto law, that the linear flux-r.m.s. relation found for a single zone holds for a large number of emitting regions, and that the skewed distribution of the total flux is close to a log-normal, despite arising from an additive process.
Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal.

PubMed

Conlon, Anna S C; Taylor, Jeremy M G; Elliott, Michael R

2014-04-01

In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21-29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431-440). The method is applied to data from a macular degeneration study and an ovarian cancer study.
Surrogacy assessment using principal stratification when surrogate and outcome measures are multivariate normal

PubMed Central

Conlon, Anna S. C.; Taylor, Jeremy M. G.; Elliott, Michael R.

2014-01-01

In clinical trials, a surrogate outcome variable (S) can be measured before the outcome of interest (T) and may provide early information regarding the treatment (Z) effect on T. Using the principal surrogacy framework introduced by Frangakis and Rubin (2002. Principal stratification in causal inference. Biometrics 58, 21–29), we consider an approach that has a causal interpretation and develop a Bayesian estimation strategy for surrogate validation when the joint distribution of potential surrogate and outcome measures is multivariate normal. From the joint conditional distribution of the potential outcomes of T, given the potential outcomes of S, we propose surrogacy validation measures from this model. As the model is not fully identifiable from the data, we propose some reasonable prior distributions and assumptions that can be placed on weakly identified parameters to aid in estimation. We explore the relationship between our surrogacy measures and the surrogacy measures proposed by Prentice (1989. Surrogate endpoints in clinical trials: definition and operational criteria. Statistics in Medicine 8, 431–440). The method is applied to data from a macular degeneration study and an ovarian cancer study. PMID:24285772
Effectiveness of the Flipped Classroom Model in Anatomy and Physiology Laboratory Courses at a Hispanic Serving Institution

NASA Astrophysics Data System (ADS)

Sanchez, Gerardo

A flipped laboratory model involves significant preparation by the students on lab material prior to entry to the laboratory. This allows laboratory time to be focused on active learning through experiments. The aim of this study was to observe changes in student performance through the transition from a traditional laboratory format, to a flipped format. The data showed that for both Anatomy and Physiology (I and II) laboratories a more normal distribution of grades was observed once labs were flipped and lecture grade averages increased. Chi square and analysis of variance tests showed grade changes to a statistically significant degree, with a p value of less than 0.05 on both analyses. Regression analyses gave decreasing numbers after the flipped labs were introduced with an r. 2 value of .485 for A&P I, and .564 for A&P II. Results indicate improved scores for the lecture part of the A&P course, decreased outlying scores above 100, and all score distributions approached a more normal distribution.
Population Synthesis of Radio and Y-ray Normal, Isolated Pulsars Using Markov Chain Monte Carlo

NASA Astrophysics Data System (ADS)

Billman, Caleb; Gonthier, P. L.; Harding, A. K.

2013-04-01

We present preliminary results of a population statistics study of normal pulsars (NP) from the Galactic disk using Markov Chain Monte Carlo techniques optimized according to two different methods. The first method compares the detected and simulated cumulative distributions of series of pulsar characteristics, varying the model parameters to maximize the overall agreement. The advantage of this method is that the distributions do not have to be binned. The other method varies the model parameters to maximize the log of the maximum likelihood obtained from the comparisons of four-two dimensional distributions of radio and γ-ray pulsar characteristics. The advantage of this method is that it provides a confidence region of the model parameter space. The computer code simulates neutron stars at birth using Monte Carlo procedures and evolves them to the present assuming initial spatial, kick velocity, magnetic field, and period distributions. Pulsars are spun down to the present and given radio and γ-ray emission characteristics, implementing an empirical γ-ray luminosity model. A comparison group of radio NPs detected in ten-radio surveys is used to normalize the simulation, adjusting the model radio luminosity to match a birth rate. We include the Fermi pulsars in the forthcoming second pulsar catalog. We present preliminary results comparing the simulated and detected distributions of radio and γ-ray NPs along with a confidence region in the parameter space of the assumed models. We express our gratitude for the generous support of the National Science Foundation (REU and RUI), Fermi Guest Investigator Program and the NASA Astrophysics Theory and Fundamental Program.
Location tests for biomarker studies: a comparison using simulations for the two-sample case.

PubMed

Scheinhardt, M O; Ziegler, A

2013-01-01

Gene, protein, or metabolite expression levels are often non-normally distributed, heavy tailed and contain outliers. Standard statistical approaches may fail as location tests in this situation. In three Monte-Carlo simulation studies, we aimed at comparing the type I error levels and empirical power of standard location tests and three adaptive tests [O'Gorman, Can J Stat 1997; 25: 269 -279; Keselman et al., Brit J Math Stat Psychol 2007; 60: 267- 293; Szymczak et al., Stat Med 2013; 32: 524 - 537] for a wide range of distributions. We simulated two-sample scenarios using the g-and-k-distribution family to systematically vary tail length and skewness with identical and varying variability between groups. All tests kept the type I error level when groups did not vary in their variability. The standard non-parametric U-test performed well in all simulated scenarios. It was outperformed by the two non-parametric adaptive methods in case of heavy tails or large skewness. Most tests did not keep the type I error level for skewed data in the case of heterogeneous variances. The standard U-test was a powerful and robust location test for most of the simulated scenarios except for very heavy tailed or heavy skewed data, and it is thus to be recommended except for these cases. The non-parametric adaptive tests were powerful for both normal and non-normal distributions under sample variance homogeneity. But when sample variances differed, they did not keep the type I error level. The parametric adaptive test lacks power for skewed and heavy tailed distributions.
Statistical Features of Complex Systems ---Toward Establishing Sociological Physics---

NASA Astrophysics Data System (ADS)

Kobayashi, Naoki; Kuninaka, Hiroto; Wakita, Jun-ichi; Matsushita, Mitsugu

2011-07-01

Complex systems have recently attracted much attention, both in natural sciences and in sociological sciences. Members constituting a complex system evolve through nonlinear interactions among each other. This means that in a complex system the multiplicative experience or, so to speak, the history of each member produces its present characteristics. If attention is paid to any statistical property in any complex system, the lognormal distribution is the most natural and appropriate among the standard or ``normal'' statistics to overview the whole system. In fact, the lognormality emerges rather conspicuously when we examine, as familiar and typical examples of statistical aspects in complex systems, the nursing-care period for the aged, populations of prefectures and municipalities, and our body height and weight. Many other examples are found in nature and society. On the basis of these observations, we discuss the possibility of sociological physics.
Universal statistics of selected values

NASA Astrophysics Data System (ADS)

Smerlak, Matteo; Youssef, Ahmed

2017-03-01

Selection, the tendency of some traits to become more frequent than others under the influence of some (natural or artificial) agency, is a key component of Darwinian evolution and countless other natural and social phenomena. Yet a general theory of selection, analogous to the Fisher-Tippett-Gnedenko theory of extreme events, is lacking. Here we introduce a probabilistic definition of selection and show that selected values are attracted to a universal family of limiting distributions which generalize the log-normal distribution. The universality classes and scaling exponents are determined by the tail thickness of the random variable under selection. Our results provide a possible explanation for skewed distributions observed in diverse contexts where selection plays a key role, from molecular biology to agriculture and sport.

Design, development and manufacture of a breadboard radio frequency mass gauging system

NASA Technical Reports Server (NTRS)

1975-01-01

The feasibility of the RF gauging mode, counting technique was demonstrated for gauging liquid hydrogen and liquid oxygen under all attitude conditions. With LH2, it was also demonstrated under dynamic fluid conditions, in which the fluid assumes ever changing positions within the tank, that the RF gauging technique on the average provides a very good indication of mass. It is significant that the distribution of the mode count data at each fill level during dynamic LH2 and LOX orientation testing does approach a statistical normal distribution. Multiple space-diversity probes provide better coupling to the resonant modes than utilization of a single probe element. The variable sweep rate generator technique provides a more uniform mode versus time distribution for processing.
Statistical study of spatio-temporal distribution of precursor solar flares associated with major flares

NASA Astrophysics Data System (ADS)

Gyenge, N.; Ballai, I.; Baranyi, T.

2016-07-01

The aim of the present investigation is to study the spatio-temporal distribution of precursor flares during the 24 h interval preceding M- and X-class major flares and the evolution of follower flares. Information on associated (precursor and follower) flares is provided by Reuven Ramaty High Energy Solar Spectroscopic Imager (RHESSI). Flare list, while the major flares are observed by the Geostationary Operational Environmental Satellite (GOES) system satellites between 2002 and 2014. There are distinct evolutionary differences between the spatio-temporal distributions of associated flares in about one-day period depending on the type of the main flare. The spatial distribution was characterized by the normalized frequency distribution of the quantity δ (the distance between the major flare and its precursor flare normalized by the sunspot group diameter) in four 6 h time intervals before the major event. The precursors of X-class flares have a double-peaked spatial distribution for more than half a day prior to the major flare, but it changes to a lognormal-like distribution roughly 6 h prior to the event. The precursors of M-class flares show lognormal-like distribution in each 6 h subinterval. The most frequent sites of the precursors in the active region are within a distance of about 0.1 diameter of sunspot group from the site of the major flare in each case. Our investigation shows that the build-up of energy is more effective than the release of energy because of precursors.
Distribution analysis of airborne nicotine concentrations in hospitality facilities.

PubMed

Schorp, Matthias K; Leyden, Donald E

2002-02-01

A number of publications report statistical summaries for environmental tobacco smoke (ETS) concentrations. Despite compelling evidence for the data not being normally distributed, these publications typically report the arithmetic mean and standard deviation of the data, thereby losing important information related to the distribution of values contained in the original data. We were interested in the frequency distributions of reported nicotine concentrations in hospitality environments and subjected available data to distribution analyses. The distribution of experimental indoor airborne nicotine concentration data taken from hospitality facilities worldwide was fit to lognormal, Weibull, exponential, Pearson (Type V), logistic, and loglogistic distribution models. Comparison of goodness of fit (GOF) parameters and indications from the literature verified the selection of a lognormal distribution as the overall best model. When individual data were not reported in the literature, statistical summaries of results were used to model sets of lognormally distributed data that are intended to mimic the original data distribution. Grouping the data into various categories led to 31 frequency distributions that were further interpreted. The median values in nonsmoking environments are about half of the median values in smoking sections. When different continents are compared, Asian, European, and North American median values in restaurants are about a factor of three below levels encountered in other hospitality facilities. On a comparison of nicotine concentrations in North American smoking sections and nonsmoking sections, median values are about one-third of the European levels. The results obtained may be used to address issues related to exposure to ETS in the hospitality sector.
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

PubMed

Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

2015-11-15

High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

PubMed Central

Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

2015-01-01

Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307
Low-energy helium-neon laser irradiation and the tensile strength of incisional wounds in the rat.

PubMed

Broadley, C; Broadley, K N; Disimone, G; Riensch, L; Davidson, J M

1995-01-01

Low-level laser energy has been reported to modulate the wound healing process in some but not all studies. To examine this hypothesis, we investigated incisional wounds made on the dorsal pelt of rats for changes in the healing produced by low-level irradiation with a helium-neon laser. The incisions were made with a scalpel and closed with sutures. The rats were irradiated daily for 12 days with four levels of laser light (0.0, 0.47, 0.93, and 1.73 J/cm(2)). Analysis of wound tensile strength indicated a possible strengthening of fresh wounds at the highest levels of irradiation (1.73 J/cm(2)). No change was observed in the tensile strength of formalin-fixed wounds. The distribution of measured tensile strengths did not follow normal statistics; instead they showed a platykurtic distribution. Using resampling statistics, where no assumption is made as to the nature of the distribution, we found that the results were contrary to other studies: no biostimulatory effect was seen.
Two Populations of Sunspots: Differential Rotation

NASA Astrophysics Data System (ADS)

Nagovitsyn, Yu. A.; Pevtsov, A. A.; Osipova, A. A.

2018-03-01

To investigate the differential rotation of sunspot groups using the Greenwich data, we propose an approach based on a statistical analysis of the histograms of particular longitudinal velocities in different latitude intervals. The general statistical velocity distributions for all such intervals are shown to be described by two rather than one normal distribution, so that two fundamental rotation modes exist simultaneously: fast and slow. The differentiality of rotation for the modes is the same: the coefficient at sin2 in Faye's law is 2.87-2.88 deg/day, while the equatorial rotation rates differ significantly, 0.27 deg/day. On the other hand, an analysis of the longitudinal velocities for the previously revealed two differing populations of sunspot groups has shown that small short-lived groups (SSGs) are associated with the fast rotation mode, while large long-lived groups (LLGs) are associated with both fast and slow modes. The results obtained not only suggest a real physical difference between the two populations of sunspots but also give new empirical data for the development of a dynamo theory, in particular, for the theory of a spatially distributed dynamo.
Pooling of cross-cultural PRO data in multinational clinical trials: how much can poor measurement affect statistical power?

PubMed

Regnault, Antoine; Hamel, Jean-François; Patrick, Donald L

2015-02-01

Cultural differences and/or poor linguistic validation of patient-reported outcome (PRO) instruments may result in differences in the assessment of the targeted concept across languages. In the context of multinational clinical trials, these measurement differences may add noise and potentially measurement bias to treatment effect estimation. Our objective was to explore the potential effect on treatment effect estimation of the "contamination" of a cultural subgroup by a flawed PRO measurement. We ran a simulation exercise in which the distribution of the score in the overall sample was considered a mixture of two normal distributions: a standard normal distribution was assumed in a "main" subgroup and a normal distribution which differed either in mean (bias) or in variance (noise) in a "contaminated" subgroup (the subgroup with potential flaws in the PRO measurement). The observed power was compared to the expected power (i.e., the power that would have been observed if the subgroup had not been contaminated). Even if differences between the expected and observed power were small, some substantial differences were obtained (up to a 0.375 point drop in power). No situation was systematically protected against loss of power. The impact of poor PRO measurement in a cultural subgroup may induce a notable drop in the study power and consequently reduce the chance of showing an actual treatment effect. These results illustrate the importance of the efforts to optimize conceptual and linguistic equivalence of PRO measures when pooling data in international clinical trials.
Probability density functions for use when calculating standardised drought indices

NASA Astrophysics Data System (ADS)

Svensson, Cecilia; Prosdocimi, Ilaria; Hannaford, Jamie

2015-04-01

Time series of drought indices like the standardised precipitation index (SPI) and standardised flow index (SFI) require a statistical probability density function to be fitted to the observed (generally monthly) precipitation and river flow data. Once fitted, the quantiles are transformed to a Normal distribution with mean = 0 and standard deviation = 1. These transformed data are the SPI/SFI, which are widely used in drought studies, including for drought monitoring and early warning applications. Different distributions were fitted to rainfall and river flow data accumulated over 1, 3, 6 and 12 months for 121 catchments in the United Kingdom. These catchments represent a range of catchment characteristics in a mid-latitude climate. Both rainfall and river flow data have a lower bound at 0, as rains and flows cannot be negative. Their empirical distributions also tend to have positive skewness, and therefore the Gamma distribution has often been a natural and suitable choice for describing the data statistically. However, after transformation of the data to Normal distributions to obtain the SPIs and SFIs for the 121 catchments, the distributions are rejected in 11% and 19% of cases, respectively, by the Shapiro-Wilk test. Three-parameter distributions traditionally used in hydrological applications, such as the Pearson type 3 for rainfall and the Generalised Logistic and Generalised Extreme Value distributions for river flow, tend to make the transformed data fit better, with rejection rates of 5% or less. However, none of these three-parameter distributions have a lower bound at zero. This means that the lower tail of the fitted distribution may potentially go below zero, which would result in a lower limit to the calculated SPI and SFI values (as observations can never reach into this lower tail of the theoretical distribution). The Tweedie distribution can overcome the problems found when using either the Gamma or the above three-parameter distributions. The Tweedie is a three-parameter distribution which includes the Gamma distribution as a special case. It is bounded below at zero and has enough flexibility to fit most behaviours observed in the data. It does not always outperform the three-parameter distributions, but the rejection rates are similar. In addition, for certain parameter values the Tweedie distribution has a positive mass at zero, which means that ephemeral streams and months with zero rainfall can be modelled. It holds potential for wider application in drought studies in other climates and types of catchment.
A Guerilla Guide to Common Problems in ‘Neurostatistics’: Essential Statistical Topics in Neuroscience

PubMed Central

Smith, Paul F.

2017-01-01

Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins. PMID:29371855
A Guerilla Guide to Common Problems in 'Neurostatistics': Essential Statistical Topics in Neuroscience.

PubMed

Smith, Paul F

2017-01-01

Effective inferential statistical analysis is essential for high quality studies in neuroscience. However, recently, neuroscience has been criticised for the poor use of experimental design and statistical analysis. Many of the statistical issues confronting neuroscience are similar to other areas of biology; however, there are some that occur more regularly in neuroscience studies. This review attempts to provide a succinct overview of some of the major issues that arise commonly in the analyses of neuroscience data. These include: the non-normal distribution of the data; inequality of variance between groups; extensive correlation in data for repeated measurements across time or space; excessive multiple testing; inadequate statistical power due to small sample sizes; pseudo-replication; and an over-emphasis on binary conclusions about statistical significance as opposed to effect sizes. Statistical analysis should be viewed as just another neuroscience tool, which is critical to the final outcome of the study. Therefore, it needs to be done well and it is a good idea to be proactive and seek help early, preferably before the study even begins.
Evaluation of trace analyte identification in complex matrices by low-resolution gas chromatography--Mass spectrometry through signal simulation.

PubMed

Bettencourt da Silva, Ricardo J N

2016-04-01

The identification of trace levels of compounds in complex matrices by conventional low-resolution gas chromatography hyphenated with mass spectrometry is based in the comparison of retention times and abundance ratios of characteristic mass spectrum fragments of analyte peaks from calibrators with sample peaks. Statistically sound criteria for the comparison of these parameters were developed based on the normal distribution of retention times and the simulation of possible non-normal distribution of correlated abundances ratios. The confidence level used to set the statistical maximum and minimum limits of parameters defines the true positive rates of identifications. The false positive rate of identification was estimated from worst-case signal noise models. The estimated true and false positive identifications rate from one retention time and two correlated ratios of three fragments abundances were combined using simple Bayes' statistics to estimate the probability of compound identification being correct designated examination uncertainty. Models of the variation of examination uncertainty with analyte quantity allowed the estimation of the Limit of Examination as the lowest quantity that produced "Extremely strong" evidences of compound presence. User friendly MS-Excel files are made available to allow the easy application of developed approach in routine and research laboratories. The developed approach was successfully applied to the identification of chlorpyrifos-methyl and malathion in QuEChERS method extracts of vegetables with high water content for which the estimated Limit of Examination is 0.14 mg kg(-1) and 0.23 mg kg(-1) respectively. Copyright © 2015 Elsevier B.V. All rights reserved.
Statistical learning of speech sounds is most robust during the period of perceptual attunement.

PubMed

Liu, Liquan; Kager, René

2017-12-01

Although statistical learning has been shown to be a domain-general mechanism, its constraints, such as its interactions with perceptual development, are less well understood and discussed. This study is among the first to investigate the distributional learning of lexical pitch in non-tone-language-learning infants, exploring its interaction with language-specific perceptual attunement during the first 2years after birth. A total of 88 normally developing Dutch infants of 5, 11, and 14months were tested via a distributional learning paradigm and were familiarized on a unimodal or bimodal distribution of high-level versus high-falling tones in Mandarin Chinese. After familiarization, they were tested on a tonal contrast that shared equal distributional information in either modality. At 5months, infants in both conditions discriminated the contrast, whereas 11-month-olds showed discrimination only in the bimodal condition. By 14months, infants failed to discriminate the contrast in either condition. Results indicate interplay between infants' long-term linguistic experience throughout development and short-term distributional learning during the experiment, and they suggest that the influence of tonal distributional learning varies along the perceptual attunement trajectory, such that opportunities for distributional learning effects appear to be constrained in the beginning and at the end of perceptual attunement. The current study contributes to previous research by demonstrating an effect of age on learning from distributional cues. Copyright © 2017 Elsevier Inc. All rights reserved.
Investigation into the performance of different models for predicting stutter.

PubMed

Bright, Jo-Anne; Curran, James M; Buckleton, John S

2013-07-01

In this paper we have examined five possible models for the behaviour of the stutter ratio, SR. These were two log-normal models, two gamma models, and a two-component normal mixture model. A two-component normal mixture model was chosen with different behaviours of variance; at each locus SR was described with two distributions, both with the same mean. The distributions have difference variances: one for the majority of the observations and a second for the less well-behaved ones. We apply each model to a set of known single source Identifiler™, NGM SElect™ and PowerPlex(®) 21 DNA profiles to show the applicability of our findings to different data sets. SR determined from the single source profiles were compared to the calculated SR after application of the models. The model performance was tested by calculating the log-likelihoods and comparing the difference in Akaike information criterion (AIC). The two-component normal mixture model systematically outperformed all others, despite the increase in the number of parameters. This model, as well as performing well statistically, has intuitive appeal for forensic biologists and could be implemented in an expert system with a continuous method for DNA interpretation. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
THE DEPENDENCE OF PRESTELLAR CORE MASS DISTRIBUTIONS ON THE STRUCTURE OF THE PARENTAL CLOUD

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parravano, Antonio; Sanchez, Nestor; Alfaro, Emilio J.

2012-08-01

The mass distribution of prestellar cores is obtained for clouds with arbitrary internal mass distributions using a selection criterion based on the thermal and turbulent Jeans mass and applied hierarchically from small to large scales. We have checked this methodology by comparing our results for a log-normal density probability distribution function with the theoretical core mass function (CMF) derived by Hennebelle and Chabrier, namely a power law at large scales and a log-normal cutoff at low scales, but our method can be applied to any mass distributions representing a star-forming cloud. This methodology enables us to connect the parental cloudmore » structure with the mass distribution of the cores and their spatial distribution, providing an efficient tool for investigating the physical properties of the molecular clouds that give rise to the prestellar core distributions observed. Simulated fractional Brownian motion (fBm) clouds with the Hurst exponent close to the value H = 1/3 give the best agreement with the theoretical CMF derived by Hennebelle and Chabrier and Chabrier's system initial mass function. Likewise, the spatial distribution of the cores derived from our methodology shows a surface density of companions compatible with those observed in Trapezium and Ophiucus star-forming regions. This method also allows us to analyze the properties of the mass distribution of cores for different realizations. We found that the variations in the number of cores formed in different realizations of fBm clouds (with the same Hurst exponent) are much larger than the expected root N statistical fluctuations, increasing with H.« less
The Dependence of Prestellar Core Mass Distributions on the Structure of the Parental Cloud

NASA Astrophysics Data System (ADS)

Parravano, Antonio; Sánchez, Néstor; Alfaro, Emilio J.

2012-08-01

The mass distribution of prestellar cores is obtained for clouds with arbitrary internal mass distributions using a selection criterion based on the thermal and turbulent Jeans mass and applied hierarchically from small to large scales. We have checked this methodology by comparing our results for a log-normal density probability distribution function with the theoretical core mass function (CMF) derived by Hennebelle & Chabrier, namely a power law at large scales and a log-normal cutoff at low scales, but our method can be applied to any mass distributions representing a star-forming cloud. This methodology enables us to connect the parental cloud structure with the mass distribution of the cores and their spatial distribution, providing an efficient tool for investigating the physical properties of the molecular clouds that give rise to the prestellar core distributions observed. Simulated fractional Brownian motion (fBm) clouds with the Hurst exponent close to the value H = 1/3 give the best agreement with the theoretical CMF derived by Hennebelle & Chabrier and Chabrier's system initial mass function. Likewise, the spatial distribution of the cores derived from our methodology shows a surface density of companions compatible with those observed in Trapezium and Ophiucus star-forming regions. This method also allows us to analyze the properties of the mass distribution of cores for different realizations. We found that the variations in the number of cores formed in different realizations of fBm clouds (with the same Hurst exponent) are much larger than the expected root {\\cal N} statistical fluctuations, increasing with H.
Photon event distribution sampling: an image formation technique for scanning microscopes that permits tracking of sub-diffraction particles with high spatial and temporal resolutions.

PubMed

Larkin, J D; Publicover, N G; Sutko, J L

2011-01-01

In photon event distribution sampling, an image formation technique for scanning microscopes, the maximum likelihood position of origin of each detected photon is acquired as a data set rather than binning photons in pixels. Subsequently, an intensity-related probability density function describing the uncertainty associated with the photon position measurement is applied to each position and individual photon intensity distributions are summed to form an image. Compared to pixel-based images, photon event distribution sampling images exhibit increased signal-to-noise and comparable spatial resolution. Photon event distribution sampling is superior to pixel-based image formation in recognizing the presence of structured (non-random) photon distributions at low photon counts and permits use of non-raster scanning patterns. A photon event distribution sampling based method for localizing single particles derived from a multi-variate normal distribution is more precise than statistical (Gaussian) fitting to pixel-based images. Using the multi-variate normal distribution method, non-raster scanning and a typical confocal microscope, localizations with 8 nm precision were achieved at 10 ms sampling rates with acquisition of ~200 photons per frame. Single nanometre precision was obtained with a greater number of photons per frame. In summary, photon event distribution sampling provides an efficient way to form images when low numbers of photons are involved and permits particle tracking with confocal point-scanning microscopes with nanometre precision deep within specimens. © 2010 The Authors Journal of Microscopy © 2010 The Royal Microscopical Society.
Is a data set distributed as a power law? A test, with application to gamma-ray burst brightnesses

NASA Technical Reports Server (NTRS)

Wijers, Ralph A. M. J.; Lubin, Lori M.

1994-01-01

We present a method to determine whether an observed sample of data is drawn from a parent distribution that is pure power law. The method starts from a class of statistics which have zero expectation value under the null hypothesis, H(sub 0), that the distribution is a pure power law: F(x) varies as x(exp -alpha). We study one simple member of the class, named the `bending statistic' B, in detail. It is most effective for detection a type of deviation from a power law where the power-law slope varies slowly and monotonically as a function of x. Our estimator of B has a distribution under H(sub 0) that depends only on the size of the sample, not on the parameters of the parent population, and is approximated well by a normal distribution even for modest sample sizes. The bending statistic can therefore be used to test a set of numbers is drawn from any power-law parent population. Since many measurable quantities in astrophysics have distriibutions that are approximately power laws, and since deviations from the ideal power law often provide interesting information about the object of study (e.g., a `bend' or `break' in a luminosity function, a line in an X- or gamma-ray spectrum), we believe that a test of this type will be useful in many different contexts. In the present paper, we apply our test to various subsamples of gamma-ray burst brightness from the first-year Burst and Transient Source Experiment (BATSE) catalog and show that we can only marginally detect the expected steepening of the log (N (greater than C(sub max))) - log (C(sub max)) distribution.
scoringRules - A software package for probabilistic model evaluation

NASA Astrophysics Data System (ADS)

Lerch, Sebastian; Jordan, Alexander; Krüger, Fabian

2016-04-01

Models in the geosciences are generally surrounded by uncertainty, and being able to quantify this uncertainty is key to good decision making. Accordingly, probabilistic forecasts in the form of predictive distributions have become popular over the last decades. With the proliferation of probabilistic models arises the need for decision theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way. Various scoring rules have been developed over the past decades to address this demand. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. As such, they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This poster presents the software package scoringRules for the statistical programming language R, which contains functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. Two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, Bayesian forecasts produced via Markov Chain Monte Carlo take this form. Thereby, the scoringRules package provides a framework for generalized model evaluation that both includes Bayesian as well as classical parametric models. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices.
Rapid and Accurate Multiple Testing Correction and Power Estimation for Millions of Correlated Markers

PubMed Central

Han, Buhm; Kang, Hyun Min; Eskin, Eleazar

2009-01-01

With the development of high-throughput sequencing and genotyping technologies, the number of markers collected in genetic association studies is growing rapidly, increasing the importance of methods for correcting for multiple hypothesis testing. The permutation test is widely considered the gold standard for accurate multiple testing correction, but it is often computationally impractical for these large datasets. Recently, several studies proposed efficient alternative approaches to the permutation test based on the multivariate normal distribution (MVN). However, they cannot accurately correct for multiple testing in genome-wide association studies for two reasons. First, these methods require partitioning of the genome into many disjoint blocks and ignore all correlations between markers from different blocks. Second, the true null distribution of the test statistic often fails to follow the asymptotic distribution at the tails of the distribution. We propose an accurate and efficient method for multiple testing correction in genome-wide association studies—SLIDE. Our method accounts for all correlation within a sliding window and corrects for the departure of the true null distribution of the statistic from the asymptotic distribution. In simulations using the Wellcome Trust Case Control Consortium data, the error rate of SLIDE's corrected p-values is more than 20 times smaller than the error rate of the previous MVN-based methods' corrected p-values, while SLIDE is orders of magnitude faster than the permutation test and other competing methods. We also extend the MVN framework to the problem of estimating the statistical power of an association study with correlated markers and propose an efficient and accurate power estimation method SLIP. SLIP and SLIDE are available at http://slide.cs.ucla.edu. PMID:19381255

Probabilities and statistics for backscatter estimates obtained by a scatterometer with applications to new scatterometer design data

NASA Technical Reports Server (NTRS)

Pierson, Willard J., Jr.

1989-01-01

The values of the Normalized Radar Backscattering Cross Section (NRCS), sigma (o), obtained by a scatterometer are random variables whose variance is a known function of the expected value. The probability density function can be obtained from the normal distribution. Models for the expected value obtain it as a function of the properties of the waves on the ocean and the winds that generated the waves. Point estimates of the expected value were found from various statistics given the parameters that define the probability density function for each value. Random intervals were derived with a preassigned probability of containing that value. A statistical test to determine whether or not successive values of sigma (o) are truly independent was derived. The maximum likelihood estimates for wind speed and direction were found, given a model for backscatter as a function of the properties of the waves on the ocean. These estimates are biased as a result of the terms in the equation that involve natural logarithms, and calculations of the point estimates of the maximum likelihood values are used to show that the contributions of the logarithmic terms are negligible and that the terms can be omitted.
A Note on Asymptotic Joint Distribution of the Eigenvalues of a Noncentral Multivariate F Matrix.

DTIC Science & Technology

1984-11-01

Krishnaiah (1982). Now, let us consider the samples drawn from the k multivariate normal popuiejons. Let (Xlt....Xpt) denote the mean vector of the t...to maltivariate problems. Sankh-ya, 4, 381-39(s. (71 KRISHNAIAH , P. R. (1982). Selection of variables in discrimlnant analysis. In Handbook of...Statistics, Volume 2 (P. R. Krishnaiah , editor), 805-820. North-Holland Publishing Company. 6. Unclassifie INSTRUCTIONS REPORT DOCUMENTATION PAGE
Technical Reports Prepared Under Contract N00014-76-C-0475.

DTIC Science & Technology

1987-05-29

264 Approximations to Densities in Geometric H. Solomon 10/27/78 Probability M.A. Stephens 3. Technical Relort No. Title Author Date 265 Sequential ...Certain Multivariate S. Iyengar 8/12/82 Normal Probabilities 323 EDF Statistics for Testing for the Gamma M.A. Stephens 8/13/82 Distribution with...20-85 Nets 360 Random Sequential Coding By Hamming Distance Yoshiaki Itoh 07-11-85 Herbert Solomon 361 Transforming Censored Samples And Testing Fit
Electron bulk speed lags the protons in the collisionless solar wind

NASA Astrophysics Data System (ADS)

Tong, Y.; Bale, S. D.; Salem, C. S.; Pulupa, M.

2017-12-01

We use a large, statistical set of in situ measurements of the solar wind electron distribution from the Wind/3DP instrument to show that the magnetic field-aligned core electron-proton drift speed tend to small values at high collisionality and asymptotes towards a large limiting value in the collisionless limit. This collisionless drift-limit, when normalized to the local Alfven speed is large and may drive instabilities.
Approach for Uncertainty Propagation and Robust Design in CFD Using Sensitivity Derivatives

NASA Technical Reports Server (NTRS)

Putko, Michele M.; Newman, Perry A.; Taylor, Arthur C., III; Green, Lawrence L.

2001-01-01

This paper presents an implementation of the approximate statistical moment method for uncertainty propagation and robust optimization for a quasi 1-D Euler CFD (computational fluid dynamics) code. Given uncertainties in statistically independent, random, normally distributed input variables, a first- and second-order statistical moment matching procedure is performed to approximate the uncertainty in the CFD output. Efficient calculation of both first- and second-order sensitivity derivatives is required. In order to assess the validity of the approximations, the moments are compared with statistical moments generated through Monte Carlo simulations. The uncertainties in the CFD input variables are also incorporated into a robust optimization procedure. For this optimization, statistical moments involving first-order sensitivity derivatives appear in the objective function and system constraints. Second-order sensitivity derivatives are used in a gradient-based search to successfully execute a robust optimization. The approximate methods used throughout the analyses are found to be valid when considering robustness about input parameter mean values.
Estimation of distribution overlap of urn models.

PubMed

Hampton, Jerrad; Lladser, Manuel E

2012-01-01

A classical problem in statistics is estimating the expected coverage of a sample, which has had applications in gene expression, microbial ecology, optimization, and even numismatics. Here we consider a related extension of this problem to random samples of two discrete distributions. Specifically, we estimate what we call the dissimilarity probability of a sample, i.e., the probability of a draw from one distribution not being observed in [Formula: see text] draws from another distribution. We show our estimator of dissimilarity to be a [Formula: see text]-statistic and a uniformly minimum variance unbiased estimator of dissimilarity over the largest appropriate range of [Formula: see text]. Furthermore, despite the non-Markovian nature of our estimator when applied sequentially over [Formula: see text], we show it converges uniformly in probability to the dissimilarity parameter, and we present criteria when it is approximately normally distributed and admits a consistent jackknife estimator of its variance. As proof of concept, we analyze V35 16S rRNA data to discern between various microbial environments. Other potential applications concern any situation where dissimilarity of two discrete distributions may be of interest. For instance, in SELEX experiments, each urn could represent a random RNA pool and each draw a possible solution to a particular binding site problem over that pool. The dissimilarity of these pools is then related to the probability of finding binding site solutions in one pool that are absent in the other.
A Performance Comparison on the Probability Plot Correlation Coefficient Test using Several Plotting Positions for GEV Distribution.

NASA Astrophysics Data System (ADS)

Ahn, Hyunjun; Jung, Younghun; Om, Ju-Seong; Heo, Jun-Haeng

2014-05-01

It is very important to select the probability distribution in Statistical hydrology. Goodness of fit test is a statistical method that selects an appropriate probability model for a given data. The probability plot correlation coefficient (PPCC) test as one of the goodness of fit tests was originally developed for normal distribution. Since then, this test has been widely applied to other probability models. The PPCC test is known as one of the best goodness of fit test because it shows higher rejection powers among them. In this study, we focus on the PPCC tests for the GEV distribution which is widely used in the world. For the GEV model, several plotting position formulas are suggested. However, the PPCC statistics are derived only for the plotting position formulas (Goel and De, In-na and Nguyen, and Kim et al.) in which the skewness coefficient (or shape parameter) are included. And then the regression equations are derived as a function of the shape parameter and sample size for a given significance level. In addition, the rejection powers of these formulas are compared using Monte-Carlo simulation. Keywords: Goodness-of-fit test, Probability plot correlation coefficient test, Plotting position, Monte-Carlo Simulation ACKNOWLEDGEMENTS This research was supported by a grant 'Establishing Active Disaster Management System of Flood Control Structures by using 3D BIM Technique' [NEMA-12-NH-57] from the Natural Hazard Mitigation Research Group, National Emergency Management Agency of Korea.
The Italian primary school-size distribution and the city-size: a complex nexus

PubMed Central

Belmonte, Alessandro; Di Clemente, Riccardo; Buldyrev, Sergey V.

2014-01-01

We characterize the statistical law according to which Italian primary school-size distributes. We find that the school-size can be approximated by a log-normal distribution, with a fat lower tail that collects a large number of very small schools. The upper tail of the school-size distribution decreases exponentially and the growth rates are distributed with a Laplace PDF. These distributions are similar to those observed for firms and are consistent with a Bose-Einstein preferential attachment process. The body of the distribution features a bimodal shape suggesting some source of heterogeneity in the school organization that we uncover by an in-depth analysis of the relation between schools-size and city-size. We propose a novel cluster methodology and a new spatial interaction approach among schools which outline the variety of policies implemented in Italy. Different regional policies are also discussed shedding lights on the relation between policy and geographical features. PMID:24954714
Obesity-induced oocyte mitochondrial defects are partially prevented and rescued by supplementation with co-enzyme Q10 in a mouse model

PubMed Central

Boots, C.E.; Boudoures, A.; Zhang, W.; Drury, A.; Moley, K.H.

2016-01-01

STUDY QUESTION Does supplementation with co-enzyme Q10 (CoQ10) improve the oocyte mitochondrial abnormalities associated with obesity in mice? SUMMARY ANSWER In an obese mouse model, CoQ10 improves the mitochondrial function of oocytes. WHAT IS KNOWN ALREADY Obesity impairs oocyte quality. Oocytes from mice fed a high-fat/high-sugar (HF/HS) diet have abnormalities in mitochondrial distribution and function and in meiotic progression. STUDY DESIGN, SIZE, DURATION Mice were randomly assigned to a normal, chow diet or an isocaloric HF/HS diet for 12 weeks. After 6 weeks on the diet, half of the mice receiving a normal diet and half of the mice receiving a HF/HS diet were randomly assigned to receive CoQ10 supplementation injections for the remaining 6 weeks. PARTICIPANTS/MATERIALS, SETTING, METHODS Dietary intervention was initiated on C57Bl6 female mice at 4 weeks of age, CoQ10 versus vehicle injections were assigned at 10 weeks, and assays were conducted at 16 weeks of age. Mice were super-ovulated, and oocytes were collected and stained to assess mitochondrial distribution, quantify reactive oxygen species (ROS), assess meiotic spindle formation, and measure metabolites. In vitro fertilization was performed, and blastocyst embryos were transferred into control mice. Oocyte number, fertilization rate, blastulation rate and implantation rate were compared between the four cohorts. Bivariate statistics were performed appropriately. MAIN RESULTS AND THE ROLE OF CHANCE HF/HS mice weighed significantly more than normal diet mice (29 versus 22 g, P< 0.001). CoQ10 supplementation did not influence weight. Levels of ATP, citrate, and phosphocreatine were lower and ROS levels were higher in HF/HS mice than in controls (P< 0.001). CoQ10 supplementation significantly increased the levels of metabolites and decreased ROS levels in oocytes from normal diet mice but not in oocytes from HF/HS mice. However, CoQ10 completely prevented the mitochondrial distribution abnormalities observed in the HF/HS mice. Overall, CoQ10 supplementation significantly increased the percentage of normal spindle and chromosome alignment (92.3 versus 80.2%, P= 0.039). In the sub-analysis by diet, the difference did not reach statistical significance. When undergoing IVF, there were no statistically significant differences in the number of mature oocytes, the fertilization rate, blastocyst formation rates, implantation rates, resorption rates or litter size between HF/HS mice receiving CoQ10 or vehicle injections. LIMITATIONS, REASONS FOR CAUTION Experiments were limited to one species and strain of mice. The majority of experiments were performed after ovulation induction, which may not represent natural cycle fertility. WIDER IMPLICATIONS OF THE FINDINGS Improvement in oocyte mitochondrial distribution and function of normal, chow-fed mice and HF/HS-fed mice demonstrates the importance of CoQ10 and the efficiency of the mitochondrial respiratory chain in oocyte competence. Clinical studies are now needed to evaluate the therapeutic potential of CoQ10 in women's reproductive health. STUDY FUNDING/COMPETING INTEREST(S) C.E.B. received support from the National Research Training Program in Reproductive Medicine sponsored by the National Institute of Health (T32 HD040135-13) and the Scientific Advisory Board of Vivere Health. K.H.M received support from the American Diabetes Association and the National Institute of Health (R01 HD083895). There are no conflicts of interest to declare. TRIAL REGISTRATION NUMBER This study is not a clinical trial. PMID:27432748
Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry

PubMed Central

2013-01-01

Background Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima’s D, Fay and Wu’s H and Fu and Li’s D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Results Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. Conclusions We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation. PMID:23848512
Exploring signatures of positive selection in pigmentation candidate genes in populations of East Asian ancestry.

PubMed

Hider, Jessica L; Gittelman, Rachel M; Shah, Tapan; Edwards, Melissa; Rosenbloom, Arnold; Akey, Joshua M; Parra, Esteban J

2013-07-12

Currently, there is very limited knowledge about the genes involved in normal pigmentation variation in East Asian populations. We carried out a genome-wide scan of signatures of positive selection using the 1000 Genomes Phase I dataset, in order to identify pigmentation genes showing putative signatures of selective sweeps in East Asia. We applied a broad range of methods to detect signatures of selection including: 1) Tests designed to identify deviations of the Site Frequency Spectrum (SFS) from neutral expectations (Tajima's D, Fay and Wu's H and Fu and Li's D* and F*), 2) Tests focused on the identification of high-frequency haplotypes with extended linkage disequilibrium (iHS and Rsb) and 3) Tests based on genetic differentiation between populations (LSBL). Based on the results obtained from a genome wide analysis of 25 kb windows, we constructed an empirical distribution for each statistic across all windows, and identified pigmentation genes that are outliers in the distribution. Our tests identified twenty genes that are relevant for pigmentation biology. Of these, eight genes (ATRN, EDAR, KLHL7, MITF, OCA2, TH, TMEM33 and TRPM1,) were extreme outliers (top 0.1% of the empirical distribution) for at least one statistic, and twelve genes (ADAM17, BNC2, CTSD, DCT, EGFR, LYST, MC1R, MLPH, OPRM1, PDIA6, PMEL (SILV) and TYRP1) were in the top 1% of the empirical distribution for at least one statistic. Additionally, eight of these genes (BNC2, EGFR, LYST, MC1R, OCA2, OPRM1, PMEL (SILV) and TYRP1) have been associated with pigmentary traits in association studies. We identified a number of putative pigmentation genes showing extremely unusual patterns of genetic variation in East Asia. Most of these genes are outliers for different tests and/or different populations, and have already been described in previous scans for positive selection, providing strong support to the hypothesis that recent selective sweeps left a signature in these regions. However, it will be necessary to carry out association and functional studies to demonstrate the implication of these genes in normal pigmentation variation.
Statistical distribution of blood serotonin as a predictor of early autistic brain abnormalities

PubMed Central

Janušonis, Skirmantas

2005-01-01

Background A wide range of abnormalities has been reported in autistic brains, but these abnormalities may be the result of an earlier underlying developmental alteration that may no longer be evident by the time autism is diagnosed. The most consistent biological finding in autistic individuals has been their statistically elevated levels of 5-hydroxytryptamine (5-HT, serotonin) in blood platelets (platelet hyperserotonemia). The early developmental alteration of the autistic brain and the autistic platelet hyperserotonemia may be caused by the same biological factor expressed in the brain and outside the brain, respectively. Unlike the brain, blood platelets are short-lived and continue to be produced throughout the life span, suggesting that this factor may continue to operate outside the brain years after the brain is formed. The statistical distributions of the platelet 5-HT levels in normal and autistic groups have characteristic features and may contain information about the nature of this yet unidentified factor. Results The identity of this factor was studied by using a novel, quantitative approach that was applied to published distributions of the platelet 5-HT levels in normal and autistic groups. It was shown that the published data are consistent with the hypothesis that a factor that interferes with brain development in autism may also regulate the release of 5-HT from gut enterochromaffin cells. Numerical analysis revealed that this factor may be non-functional in autistic individuals. Conclusion At least some biological factors, the abnormal function of which leads to the development of the autistic brain, may regulate the release of 5-HT from the gut years after birth. If the present model is correct, it will allow future efforts to be focused on a limited number of gene candidates, some of which have not been suspected to be involved in autism (such as the 5-HT4 receptor gene) based on currently available clinical and experimental studies. PMID:16029508
Statistical distribution of blood serotonin as a predictor of early autistic brain abnormalities.

PubMed

Janusonis, Skirmantas

2005-07-19

A wide range of abnormalities has been reported in autistic brains, but these abnormalities may be the result of an earlier underlying developmental alteration that may no longer be evident by the time autism is diagnosed. The most consistent biological finding in autistic individuals has been their statistically elevated levels of 5-hydroxytryptamine (5-HT, serotonin) in blood platelets (platelet hyperserotonemia). The early developmental alteration of the autistic brain and the autistic platelet hyperserotonemia may be caused by the same biological factor expressed in the brain and outside the brain, respectively. Unlike the brain, blood platelets are short-lived and continue to be produced throughout the life span, suggesting that this factor may continue to operate outside the brain years after the brain is formed. The statistical distributions of the platelet 5-HT levels in normal and autistic groups have characteristic features and may contain information about the nature of this yet unidentified factor. The identity of this factor was studied by using a novel, quantitative approach that was applied to published distributions of the platelet 5-HT levels in normal and autistic groups. It was shown that the published data are consistent with the hypothesis that a factor that interferes with brain development in autism may also regulate the release of 5-HT from gut enterochromaffin cells. Numerical analysis revealed that this factor may be non-functional in autistic individuals. At least some biological factors, the abnormal function of which leads to the development of the autistic brain, may regulate the release of 5-HT from the gut years after birth. If the present model is correct, it will allow future efforts to be focused on a limited number of gene candidates, some of which have not been suspected to be involved in autism (such as the 5-HT4 receptor gene) based on currently available clinical and experimental studies.
Statistical Techniques For Real-time Anomaly Detection Using Spark Over Multi-source VMware Performance Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solaimani, Mohiuddin; Iftekhar, Mohammed; Khan, Latifur

Anomaly detection refers to the identi cation of an irregular or unusual pat- tern which deviates from what is standard, normal, or expected. Such deviated patterns typically correspond to samples of interest and are assigned different labels in different domains, such as outliers, anomalies, exceptions, or malware. Detecting anomalies in fast, voluminous streams of data is a formidable chal- lenge. This paper presents a novel, generic, real-time distributed anomaly detection framework for heterogeneous streaming data where anomalies appear as a group. We have developed a distributed statistical approach to build a model and later use it to detect anomaly. Asmore » a case study, we investigate group anomaly de- tection for a VMware-based cloud data center, which maintains a large number of virtual machines (VMs). We have built our framework using Apache Spark to get higher throughput and lower data processing time on streaming data. We have developed a window-based statistical anomaly detection technique to detect anomalies that appear sporadically. We then relaxed this constraint with higher accuracy by implementing a cluster-based technique to detect sporadic and continuous anomalies. We conclude that our cluster-based technique out- performs other statistical techniques with higher accuracy and lower processing time.« less
The Novel Quantitative Technique for Assessment of Gait Symmetry Using Advanced Statistical Learning Algorithm

PubMed Central

Wu, Jianning; Wu, Bin

2015-01-01

The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis. PMID:25705672
The novel quantitative technique for assessment of gait symmetry using advanced statistical learning algorithm.

PubMed

Wu, Jianning; Wu, Bin

2015-01-01

The accurate identification of gait asymmetry is very beneficial to the assessment of at-risk gait in the clinical applications. This paper investigated the application of classification method based on statistical learning algorithm to quantify gait symmetry based on the assumption that the degree of intrinsic change in dynamical system of gait is associated with the different statistical distributions between gait variables from left-right side of lower limbs; that is, the discrimination of small difference of similarity between lower limbs is considered the reorganization of their different probability distribution. The kinetic gait data of 60 participants were recorded using a strain gauge force platform during normal walking. The classification method is designed based on advanced statistical learning algorithm such as support vector machine algorithm for binary classification and is adopted to quantitatively evaluate gait symmetry. The experiment results showed that the proposed method could capture more intrinsic dynamic information hidden in gait variables and recognize the right-left gait patterns with superior generalization performance. Moreover, our proposed techniques could identify the small significant difference between lower limbs when compared to the traditional symmetry index method for gait. The proposed algorithm would become an effective tool for early identification of the elderly gait asymmetry in the clinical diagnosis.
DENBRAN: A basic program for a significance test for multivariate normality of clusters from branching patterns in dendrograms

NASA Astrophysics Data System (ADS)

Sneath, P. H. A.

A BASIC program is presented for significance tests to determine whether a dendrogram is derived from clustering of points that belong to a single multivariate normal distribution. The significance tests are based on statistics of the Kolmogorov—Smirnov type, obtained by comparing the observed cumulative graph of branch levels with a graph for the hypothesis of multivariate normality. The program also permits testing whether the dendrogram could be from a cluster of lower dimensionality due to character correlations. The program makes provision for three similarity coefficients, (1) Euclidean distances, (2) squared Euclidean distances, and (3) Simple Matching Coefficients, and for five cluster methods (1) WPGMA, (2) UPGMA, (3) Single Linkage (or Minimum Spanning Trees), (4) Complete Linkage, and (5) Ward's Increase in Sums of Squares. The program is entitled DENBRAN.
Spatiotemporal stick-slip phenomena in a coupled continuum-granular system

NASA Astrophysics Data System (ADS)

Ecke, Robert

In sheared granular media, stick-slip behavior is ubiquitous, especially at very small shear rates and weak drive coupling. The resulting slips are characteristic of natural phenomena such as earthquakes and well as being a delicate probe of the collective dynamics of the granular system. In that spirit, we developed a laboratory experiment consisting of sheared elastic plates separated by a narrow gap filled with quasi-two-dimensional granular material (bi-dispersed nylon rods) . We directly determine the spatial and temporal distributions of strain displacements of the elastic continuum over 200 spatial points located adjacent to the gap. Slip events can be divided into large system-spanning events and spatially distributed smaller events. The small events have a probability distribution of event moment consistent with an M - 3 / 2 power law scaling and a Poisson distributed recurrence time distribution. Large events have a broad, log-normal moment distribution and a mean repetition time. As the applied normal force increases, there are fractionally more (less) large (small) events, and the large-event moment distribution broadens. The magnitude of the slip motion of the plates is well correlated with the root-mean-square displacements of the granular matter. Our results are consistent with mean field descriptions of statistical models of earthquakes and avalanches. We further explore the high-speed dynamics of system events and also discuss the effective granular friction of the sheared layer. We find that large events result from stored elastic energy in the plates in this coupled granular-continuum system.
Intra-Individual Response Variability Assessed by Ex-Gaussian Analysis may be a New Endophenotype for Attention-Deficit/Hyperactivity Disorder.

PubMed

Henríquez-Henríquez, Marcela Patricia; Billeke, Pablo; Henríquez, Hugo; Zamorano, Francisco Javier; Rothhammer, Francisco; Aboitiz, Francisco

2014-01-01

Intra-individual variability of response times (RTisv) is considered as potential endophenotype for attentional deficit/hyperactivity disorder (ADHD). Traditional methods for estimating RTisv lose information regarding response times (RTs) distribution along the task, with eventual effects on statistical power. Ex-Gaussian analysis captures the dynamic nature of RTisv, estimating normal and exponential components for RT distribution, with specific phenomenological correlates. Here, we applied ex-Gaussian analysis to explore whether intra-individual variability of RTs agrees with criteria proposed by Gottesman and Gould for endophenotypes. Specifically, we evaluated if normal and/or exponential components of RTs may (a) present the stair-like distribution expected for endophenotypes (ADHD > siblings > typically developing children (TD) without familiar history of ADHD) and (b) represent a phenotypic correlate for previously described genetic risk variants. This is a pilot study including 55 subjects (20 ADHD-discordant sibling-pairs and 15 TD children), all aged between 8 and 13 years. Participants resolved a visual Go/Nogo with 10% Nogo probability. Ex-Gaussian distributions were fitted to individual RT data and compared among the three samples. In order to test whether intra-individual variability may represent a correlate for previously described genetic risk variants, VNTRs at DRD4 and SLC6A3 were identified in all sibling-pairs following standard protocols. Groups were compared adjusting independent general linear models for the exponential and normal components from the ex-Gaussian analysis. Identified trends were confirmed by the non-parametric Jonckheere-Terpstra test. Stair-like distributions were observed for μ (p = 0.036) and σ (p = 0.009). An additional "DRD4-genotype" × "clinical status" interaction was present for τ (p = 0.014) reflecting a possible severity factor. Thus, normal and exponential RTisv components are suitable as ADHD endophenotypes.
About normal distribution on SO(3) group in texture analysis

NASA Astrophysics Data System (ADS)

Savyolova, T. I.; Filatov, S. V.

2017-12-01

This article studies and compares different normal distributions (NDs) on SO(3) group, which are used in texture analysis. Those NDs are: Fisher normal distribution (FND), Bunge normal distribution (BND), central normal distribution (CND) and wrapped normal distribution (WND). All of the previously mentioned NDs are central functions on SO(3) group. CND is a subcase for normal CLT-motivated distributions on SO(3) (CLT here is Parthasarathy’s central limit theorem). WND is motivated by CLT in R 3 and mapped to SO(3) group. A Monte Carlo method for modeling normally distributed values was studied for both CND and WND. All of the NDs mentioned above are used for modeling different components of crystallites orientation distribution function in texture analysis.

Statistical methods for astronomical data with upper limits. II - Correlation and regression

NASA Technical Reports Server (NTRS)

Isobe, T.; Feigelson, E. D.; Nelson, P. I.

1986-01-01

Statistical methods for calculating correlations and regressions in bivariate censored data where the dependent variable can have upper or lower limits are presented. Cox's regression and the generalization of Kendall's rank correlation coefficient provide significant levels of correlations, and the EM algorithm, under the assumption of normally distributed errors, and its nonparametric analog using the Kaplan-Meier estimator, give estimates for the slope of a regression line. Monte Carlo simulations demonstrate that survival analysis is reliable in determining correlations between luminosities at different bands. Survival analysis is applied to CO emission in infrared galaxies, X-ray emission in radio galaxies, H-alpha emission in cooling cluster cores, and radio emission in Seyfert galaxies.
System of Mueller-Jones matrix polarizing mapping of blood plasma films in breast pathology

NASA Astrophysics Data System (ADS)

Zabolotna, Natalia I.; Radchenko, Kostiantyn O.; Tarnovskiy, Mykola H.

2017-08-01

The combined method of Jones-Mueller matrix mapping and blood plasma films analysis based on the system that proposed in this paper. Based on the obtained data about the structure and state of blood plasma samples the diagnostic conclusions can be make about the state of breast cancer patients ("normal" or "pathology"). Then, by using the statistical analysis obtain statistical and correlational moments for every coordinate distributions; these indicators are served as diagnostic criterias. The final step is to comparing results and choosing the most effective diagnostic indicators. The paper presents the results of Mueller-Jones matrix mapping of optically thin (attenuation coefficient ,τ≤0,1) blood plasma layers.
Mueller matrix mapping of biological polycrystalline layers using reference wave

NASA Astrophysics Data System (ADS)

Dubolazov, A.; Ushenko, O. G.; Ushenko, Yu. O.; Pidkamin, L. Y.; Sidor, M. I.; Grytsyuk, M.; Prysyazhnyuk, P. V.

2018-01-01

The paper consists of two parts. The first part is devoted to the short theoretical basics of the method of differential Mueller-matrix description of properties of partially depolarizing layers. It was provided the experimentally measured maps of differential matrix of the 1st order of polycrystalline structure of the histological section of brain tissue. It was defined the statistical moments of the 1st-4th orders, which characterize the distribution of matrix elements. In the second part of the paper it was provided the data of statistic analysis of birefringence and dichroism of the histological sections of mice liver tissue (normal and with diabetes). It were defined the objective criteria of differential diagnostics of diabetes.
Cold dark matter and degree-scale cosmic microwave background anisotropy statistics after COBE

NASA Technical Reports Server (NTRS)

Gorski, Krzysztof M.; Stompor, Radoslaw; Juszkiewicz, Roman

1993-01-01

We conduct a Monte Carlo simulation of the cosmic microwave background (CMB) anisotropy in the UCSB South Pole 1991 degree-scale experiment. We examine cold dark matter cosmology with large-scale structure seeded by the Harrison-Zel'dovich hierarchy of Gaussian-distributed primordial inhomogeneities normalized to the COBE-DMR measurement of large-angle CMB anisotropy. We find it statistically implausible (in the sense of low cumulative probability F lower than 5 percent, of not measuring a cosmological delta-T/T signal) that the degree-scale cosmological CMB anisotropy predicted in such models could have escaped a detection at the level of sensitivity achieved in the South Pole 1991 experiment.
Operating Characteristics of Statistical Methods for Detecting Gene-by-Measured Environment Interaction in the Presence of Gene-Environment Correlation under Violations of Distributional Assumptions.

PubMed

Van Hulle, Carol A; Rathouz, Paul J

2015-02-01

Accurately identifying interactions between genetic vulnerabilities and environmental factors is of critical importance for genetic research on health and behavior. In the previous work of Van Hulle et al. (Behavior Genetics, Vol. 43, 2013, pp. 71-84), we explored the operating characteristics for a set of biometric (e.g., twin) models of Rathouz et al. (Behavior Genetics, Vol. 38, 2008, pp. 301-315), for testing gene-by-measured environment interaction (GxM) in the presence of gene-by-measured environment correlation (rGM) where data followed the assumed distributional structure. Here we explore the effects that violating distributional assumptions have on the operating characteristics of these same models even when structural model assumptions are correct. We simulated N = 2,000 replicates of n = 1,000 twin pairs under a number of conditions. Non-normality was imposed on either the putative moderator or on the ultimate outcome by ordinalizing or censoring the data. We examined the empirical Type I error rates and compared Bayesian information criterion (BIC) values. In general, non-normality in the putative moderator had little impact on the Type I error rates or BIC comparisons. In contrast, non-normality in the outcome was often mistaken for or masked GxM, especially when the outcome data were censored.
Surgical Treatment for Discogenic Low-Back Pain: Lumbar Arthroplasty Results in Superior Pain Reduction and Disability Level Improvement Compared With Lumbar Fusion

PubMed Central

2007-01-01

Background The US Food and Drug Administration approved the Charité artificial disc on October 26, 2004. This approval was based on an extensive analysis and review process; 20 years of disc usage worldwide; and the results of a prospective, randomized, controlled clinical trial that compared lumbar artificial disc replacement to fusion. The results of the investigational device exemption (IDE) study led to a conclusion that clinical outcomes following lumbar arthroplasty were at least as good as outcomes from fusion. Methods The author performed a new analysis of the Visual Analog Scale pain scores and the Oswestry Disability Index scores from the Charité artificial disc IDE study and used a nonparametric statistical test, because observed data distributions were not normal. The analysis included all of the enrolled subjects in both the nonrandomized and randomized phases of the study. Results Subjects from both the treatment and control groups improved from the baseline situation (P < .001) at all follow-up times (6 weeks to 24 months). Additionally, these pain and disability levels with artificial disc replacement were superior (P < .05) to the fusion treatment at all follow-up times including 2 years. Conclusions The a priori statistical plan for an IDE study may not adequately address the final distribution of the data. Therefore, statistical analyses more appropriate to the distribution may be necessary to develop meaningful statistical conclusions from the study. A nonparametric statistical analysis of the Charité artificial disc IDE outcomes scores demonstrates superiority for lumbar arthroplasty versus fusion at all follow-up time points to 24 months. PMID:25802574
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

2015-09-01

Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
On the Helicity in 3D-Periodic Navier-Stokes Equations II: The Statistical Case

NASA Astrophysics Data System (ADS)

Foias, Ciprian; Hoang, Luan; Nicolaenko, Basil

2009-09-01

We study the asymptotic behavior of the statistical solutions to the Navier-Stokes equations using the normalization map [9]. It is then applied to the study of mean energy, mean dissipation rate of energy, and mean helicity of the spatial periodic flows driven by potential body forces. The statistical distribution of the asymptotic Beltrami flows are also investigated. We connect our mathematical analysis with the empirical theory of decaying turbulence. With appropriate mathematically defined ensemble averages, the Kolmogorov universal features are shown to be transient in time. We provide an estimate for the time interval in which those features may still be present. Our collaborator and friend Basil Nicolaenko passed away in September of 2007, after this work was completed. Honoring his contribution and friendship, we dedicate this article to him.
Estimating times of surgeries with two component procedures: comparison of the lognormal and normal models.

PubMed

Strum, David P; May, Jerrold H; Sampson, Allan R; Vargas, Luis G; Spangler, William E

2003-01-01

Variability inherent in the duration of surgical procedures complicates surgical scheduling. Modeling the duration and variability of surgeries might improve time estimates. Accurate time estimates are important operationally to improve utilization, reduce costs, and identify surgeries that might be considered outliers. Surgeries with multiple procedures are difficult to model because they are difficult to segment into homogenous groups and because they are performed less frequently than single-procedure surgeries. The authors studied, retrospectively, 10,740 surgeries each with exactly two CPTs and 46,322 surgical cases with only one CPT from a large teaching hospital to determine if the distribution of dual-procedure surgery times fit more closely a lognormal or a normal model. The authors tested model goodness of fit to their data using Shapiro-Wilk tests, studied factors affecting the variability of time estimates, and examined the impact of coding permutations (ordered combinations) on modeling. The Shapiro-Wilk tests indicated that the lognormal model is statistically superior to the normal model for modeling dual-procedure surgeries. Permutations of component codes did not appear to differ significantly with respect to total procedure time and surgical time. To improve individual models for infrequent dual-procedure surgeries, permutations may be reduced and estimates may be based on the longest component procedure and type of anesthesia. The authors recommend use of the lognormal model for estimating surgical times for surgeries with two component procedures. Their results help legitimize the use of log transforms to normalize surgical procedure times prior to hypothesis testing using linear statistical models. Multiple-procedure surgeries may be modeled using the longest (statistically most important) component procedure and type of anesthesia.
Normality Tests for Statistical Analysis: A Guide for Non-Statisticians

PubMed Central

Ghasemi, Asghar; Zahediasl, Saleh

2012-01-01

Statistical errors are common in scientific literature and about 50% of the published articles have at least one error. The assumption of normality needs to be checked for many statistical procedures, namely parametric tests, because their validity depends on it. The aim of this commentary is to overview checking for normality in statistical analysis using SPSS. PMID:23843808
Developing the Effective Method of Spectral Harmonic Energy Ratio to Analyze the Arterial Pulse Spectrum

PubMed Central

Huang, Chin-Ming; Wei, Ching-Chuan; Liao, Yin-Tzu; Chang, Hsien-Cheh; Kao, Shung-Te; Li, Tsai-Chung

2011-01-01

In this article, we analyze the arterial pulse in the spectral domain. A parameter, the spectral harmonic energy ratio (SHER), is developed to assess the features of the overly decreased spectral energy in the fourth to sixth harmonic for palpitation patients. Compared with normal subjects, the statistical results reveal that the mean value of SHER in the patient group (57.7 ± 27.9) is significantly higher than that of the normal group (39.7 ± 20.9) (P-value = .0066 < .01). This means that the total energy in the fourth to sixth harmonic of palpitation patients is significantly less than it is in normal subjects. In other words, the spectral distribution of the arterial pulse gradually decreases for normal subjects while it decreases abruptly in higher-order harmonics (the fourth, fifth and sixth harmonics) for palpitation patients. Hence, SHER is an effective method to distinguish the two groups in the spectral domain. Also, we can thus know that a “gradual decrease” might mean a “balanced” state, whereas an “abrupt decrease” might mean an “unbalanced” state in blood circulation and pulse diagnosis. By SHER, we can determine the ratio of energy distribution in different harmonic bands, and this method gives us a novel viewpoint from which to comprehend and quantify the spectral harmonic distribution of circulation information conveyed by the arterial pulse. These concepts can be further applied to improve the clinical diagnosis not only in Western medicine but also in traditional Chinese medicine (TCM). PMID:21845200
Normal and tumoral melanocytes exhibit q-Gaussian random search patterns.

PubMed

da Silva, Priscila C A; Rosembach, Tiago V; Santos, Anésia A; Rocha, Márcio S; Martins, Marcelo L

2014-01-01

In multicellular organisms, cell motility is central in all morphogenetic processes, tissue maintenance, wound healing and immune surveillance. Hence, failures in its regulation potentiates numerous diseases. Here, cell migration assays on plastic 2D surfaces were performed using normal (Melan A) and tumoral (B16F10) murine melanocytes in random motility conditions. The trajectories of the centroids of the cell perimeters were tracked through time-lapse microscopy. The statistics of these trajectories was analyzed by building velocity and turn angle distributions, as well as velocity autocorrelations and the scaling of mean-squared displacements. We find that these cells exhibit a crossover from a normal to a super-diffusive motion without angular persistence at long time scales. Moreover, these melanocytes move with non-Gaussian velocity distributions. This major finding indicates that amongst those animal cells supposedly migrating through Lévy walks, some of them can instead perform q-Gaussian walks. Furthermore, our results reveal that B16F10 cells infected by mycoplasmas exhibit essentially the same diffusivity than their healthy counterparts. Finally, a q-Gaussian random walk model was proposed to account for these melanocytic migratory traits. Simulations based on this model correctly describe the crossover to super-diffusivity in the cell migration tracks.
The MDI Method as a Generalization of Logit, Probit and Hendry Analyses in Marketing.

DTIC Science & Technology

1980-04-01

model involves nothing more than fitting a normal distribution function ( Hanushek and Jackson (1977)). For a given value of x, the probit model...preference shifts within the soft drink category. --For applications of probit models relevant for marketing, see Hausman and Wise (1978) and Hanushek and...Marketing Research" JMR XIV, Feb. (1977). Hanushek , E.A., and J.E. Jackson, Statistical Methods for Social Scientists. Academic Press, New York (1977
Counting nodal domains on surfaces of revolution

NASA Astrophysics Data System (ADS)

Karageorge, Panos D.; Smilansky, Uzy

2008-05-01

We consider eigenfunctions of the Laplace-Beltrami operator on special surfaces of revolution. For this separable system, the nodal domains of the (real) eigenfunctions form a checkerboard pattern, and their number νn is proportional to the product of the angular and the 'surface' quantum numbers. Arranging the wavefunctions by increasing values of the Laplace-Beltrami spectrum, we obtain the nodal sequence, whose statistical properties we study. In particular, we investigate the distribution of the normalized counts \\frac{\
Le vocabulaire disponible du francais, Tome 1. Le vocabulaire concret usuel des enfants francais et acadiens: Etude temoin (The Working French Vocabulary, Volume 1. Common Generic Terms Used by French and Acadian Children: A Field Study).

ERIC Educational Resources Information Center

Mackey, William F.; And Others

The first of a two-volume study of the relative accessibility of French vocabulary in French-speaking Canada presents statistical data concerning the frequency, distribution, valence, and accessibility of vocabulary related to 16 fundamental centers of interest found in normal conversation. The scope, procedures, and results of the study are…
Use of Brain MRI Atlases to Determine Boundaries of Age-Related Pathology: The Importance of Statistical Method

PubMed Central

Dickie, David Alexander; Job, Dominic E.; Gonzalez, David Rodriguez; Shenkin, Susan D.; Wardlaw, Joanna M.

2015-01-01

Introduction Neurodegenerative disease diagnoses may be supported by the comparison of an individual patient’s brain magnetic resonance image (MRI) with a voxel-based atlas of normal brain MRI. Most current brain MRI atlases are of young to middle-aged adults and parametric, e.g., mean ±standard deviation (SD); these atlases require data to be Gaussian. Brain MRI data, e.g., grey matter (GM) proportion images, from normal older subjects are apparently not Gaussian. We created a nonparametric and a parametric atlas of the normal limits of GM proportions in older subjects and compared their classifications of GM proportions in Alzheimer’s disease (AD) patients. Methods Using publicly available brain MRI from 138 normal subjects and 138 subjects diagnosed with AD (all 55–90 years), we created: a mean ±SD atlas to estimate parametrically the percentile ranks and limits of normal ageing GM; and, separately, a nonparametric, rank order-based GM atlas from the same normal ageing subjects. GM images from AD patients were then classified with respect to each atlas to determine the effect statistical distributions had on classifications of proportions of GM in AD patients. Results The parametric atlas often defined the lower normal limit of the proportion of GM to be negative (which does not make sense physiologically as the lowest possible proportion is zero). Because of this, for approximately half of the AD subjects, 25–45% of voxels were classified as normal when compared to the parametric atlas; but were classified as abnormal when compared to the nonparametric atlas. These voxels were mainly concentrated in the frontal and occipital lobes. Discussion To our knowledge, we have presented the first nonparametric brain MRI atlas. In conditions where there is increasing variability in brain structure, such as in old age, nonparametric brain MRI atlases may represent the limits of normal brain structure more accurately than parametric approaches. Therefore, we conclude that the statistical method used for construction of brain MRI atlases should be selected taking into account the population and aim under study. Parametric methods are generally robust for defining central tendencies, e.g., means, of brain structure. Nonparametric methods are advisable when studying the limits of brain structure in ageing and neurodegenerative disease. PMID:26023913
A reevaluaion of data on atmospheric turbulence and airplane gust loads for application in spectral calculations

NASA Technical Reports Server (NTRS)

Press, Harry; Meadows, May T; Hadlock, Ivan

1956-01-01

The available information on the spectrum of atmospheric turbulence is first briefly reviewed. On the basis of these results, methods are developed for the conversion of available gust statistics normally given in terms of counts of gusts or acceleration peaks into a form appropriate for use in spectral calculations. The fundamental quantity for this purpose appears to be the probability distribution of the root-mean-square gust velocity. Estimates of this distribution are derived from data for a number of load histories of transport operations; also, estimates of the variation of this distribution with altitude and weather condition are derived from available data and the method of applying these results to the calculation of airplane gust-response histories in operations is also outlined. (author)
Maximally Informative Stimuli and Tuning Curves for Sigmoidal Rate-Coding Neurons and Populations

NASA Astrophysics Data System (ADS)

McDonnell, Mark D.; Stocks, Nigel G.

2008-08-01

A general method for deriving maximally informative sigmoidal tuning curves for neural systems with small normalized variability is presented. The optimal tuning curve is a nonlinear function of the cumulative distribution function of the stimulus and depends on the mean-variance relationship of the neural system. The derivation is based on a known relationship between Shannon’s mutual information and Fisher information, and the optimality of Jeffrey’s prior. It relies on the existence of closed-form solutions to the converse problem of optimizing the stimulus distribution for a given tuning curve. It is shown that maximum mutual information corresponds to constant Fisher information only if the stimulus is uniformly distributed. As an example, the case of sub-Poisson binomial firing statistics is analyzed in detail.
Estimation of the Scatterer Distribution of the Cirrhotic Liver using Ultrasonic Image

NASA Astrophysics Data System (ADS)

Yamaguchi, Tadashi; Hachiya, Hiroyuki

1998-05-01

In the B-mode image of the liver obtained by an ultrasonic imaging system, the speckled pattern changes with the progression of the disease such as liver cirrhosis.In this paper we present the statistical characteristics of the echo envelope of the liver, and the technique to extract information of the scatterer distribution from the normal and cirrhotic liver images using constant false alarm rate (CFAR) processing.We analyze the relationship between the extracted scatterer distribution and the stage of liver cirrhosis. The ratio of the area in which the amplitude of the processing signal is more than the threshold to the entire processed image area is related quantitatively to the stage of liver cirrhosis.It is found that the proposed technique is valid for the quantitative diagnosis of liver cirrhosis.
EEG source analysis of data from paralysed subjects

NASA Astrophysics Data System (ADS)

Carabali, Carmen A.; Willoughby, John O.; Fitzgibbon, Sean P.; Grummett, Tyler; Lewis, Trent; DeLosAngeles, Dylan; Pope, Kenneth J.

2015-12-01

One of the limitations of Encephalography (EEG) data is its quality, as it is usually contaminated with electric signal from muscle. This research intends to study results of two EEG source analysis methods applied to scalp recordings taken in paralysis and in normal conditions during the performance of a cognitive task. The aim is to determinate which types of analysis are appropriate for dealing with EEG data containing myogenic components. The data used are the scalp recordings of six subjects in normal conditions and during paralysis while performing different cognitive tasks including the oddball task which is the object of this research. The data were pre-processed by filtering it and correcting artefact, then, epochs of one second long for targets and distractors were extracted. Distributed source analysis was performed in BESA Research 6.0, using its results and information from the literature, 9 ideal locations for source dipoles were identified. The nine dipoles were used to perform discrete source analysis, fitting them to the averaged epochs for obtaining source waveforms. The results were statistically analysed comparing the outcomes before and after the subjects were paralysed. Finally, frequency analysis was performed for better explain the results. The findings were that distributed source analysis could produce confounded results for EEG contaminated with myogenic signals, conversely, statistical analysis of the results from discrete source analysis showed that this method could help for dealing with EEG data contaminated with muscle electrical signal.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.