Sample records for statistical distribution models

  1. Results of the Verification of the Statistical Distribution Model of Microseismicity Emission Characteristics

    NASA Astrophysics Data System (ADS)

    Cianciara, Aleksander

    2016-09-01

    The paper presents the results of research aimed at verifying the hypothesis that the Weibull distribution is an appropriate statistical distribution model of microseismicity emission characteristics, namely: energy of phenomena and inter-event time. It is understood that the emission under consideration is induced by the natural rock mass fracturing. Because the recorded emission contain noise, therefore, it is subjected to an appropriate filtering. The study has been conducted using the method of statistical verification of null hypothesis that the Weibull distribution fits the empirical cumulative distribution function. As the model describing the cumulative distribution function is given in an analytical form, its verification may be performed using the Kolmogorov-Smirnov goodness-of-fit test. Interpretations by means of probabilistic methods require specifying the correct model describing the statistical distribution of data. Because in these methods measurement data are not used directly, but their statistical distributions, e.g., in the method based on the hazard analysis, or in that that uses maximum value statistics.

  2. Selecting Summary Statistics in Approximate Bayesian Computation for Calibrating Stochastic Models

    PubMed Central

    Burr, Tom

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the “go-to” option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example. PMID:24288668

  3. Selecting summary statistics in approximate Bayesian computation for calibrating stochastic models.

    PubMed

    Burr, Tom; Skurikhin, Alexei

    2013-01-01

    Approximate Bayesian computation (ABC) is an approach for using measurement data to calibrate stochastic computer models, which are common in biology applications. ABC is becoming the "go-to" option when the data and/or parameter dimension is large because it relies on user-chosen summary statistics rather than the full data and is therefore computationally feasible. One technical challenge with ABC is that the quality of the approximation to the posterior distribution of model parameters depends on the user-chosen summary statistics. In this paper, the user requirement to choose effective summary statistics in order to accurately estimate the posterior distribution of model parameters is investigated and illustrated by example, using a model and corresponding real data of mitochondrial DNA population dynamics. We show that for some choices of summary statistics, the posterior distribution of model parameters is closely approximated and for other choices of summary statistics, the posterior distribution is not closely approximated. A strategy to choose effective summary statistics is suggested in cases where the stochastic computer model can be run at many trial parameter settings, as in the example.

  4. Directional statistics-based reflectance model for isotropic bidirectional reflectance distribution functions.

    PubMed

    Nishino, Ko; Lombardi, Stephen

    2011-01-01

    We introduce a novel parametric bidirectional reflectance distribution function (BRDF) model that can accurately encode a wide variety of real-world isotropic BRDFs with a small number of parameters. The key observation we make is that a BRDF may be viewed as a statistical distribution on a unit hemisphere. We derive a novel directional statistics distribution, which we refer to as the hemispherical exponential power distribution, and model real-world isotropic BRDFs as mixtures of it. We derive a canonical probabilistic method for estimating the parameters, including the number of components, of this novel directional statistics BRDF model. We show that the model captures the full spectrum of real-world isotropic BRDFs with high accuracy, but a small footprint. We also demonstrate the advantages of the novel BRDF model by showing its use for reflection component separation and for exploring the space of isotropic BRDFs.

  5. Statistical Parameter Study of the Time Interval Distribution for Nonparalyzable, Paralyzable, and Hybrid Dead Time Models

    NASA Astrophysics Data System (ADS)

    Syam, Nur Syamsi; Maeng, Seongjin; Kim, Myo Gwang; Lim, Soo Yeon; Lee, Sang Hoon

    2018-05-01

    A large dead time of a Geiger Mueller (GM) detector may cause a large count loss in radiation measurements and consequently may cause distortion of the Poisson statistic of radiation events into a new distribution. The new distribution will have different statistical parameters compared to the original distribution. Therefore, the variance, skewness, and excess kurtosis in association with the observed count rate of the time interval distribution for well-known nonparalyzable, paralyzable, and nonparalyzable-paralyzable hybrid dead time models of a Geiger Mueller detector were studied using Monte Carlo simulation (GMSIM). These parameters were then compared with the statistical parameters of a perfect detector to observe the change in the distribution. The results show that the behaviors of the statistical parameters for the three dead time models were different. The values of the skewness and the excess kurtosis of the nonparalyzable model are equal or very close to those of the perfect detector, which are ≅2 for skewness, and ≅6 for excess kurtosis, while the statistical parameters in the paralyzable and hybrid model obtain minimum values that occur around the maximum observed count rates. The different trends of the three models resulting from the GMSIM simulation can be used to distinguish the dead time behavior of a GM counter; i.e. whether the GM counter can be described best by using the nonparalyzable, paralyzable, or hybrid model. In a future study, these statistical parameters need to be analyzed further to determine the possibility of using them to determine a dead time for each model, particularly for paralyzable and hybrid models.

  6. The beta distribution: A statistical model for world cloud cover

    NASA Technical Reports Server (NTRS)

    Falls, L. W.

    1973-01-01

    Much work has been performed in developing empirical global cloud cover models. This investigation was made to determine an underlying theoretical statistical distribution to represent worldwide cloud cover. The beta distribution with probability density function is given to represent the variability of this random variable. It is shown that the beta distribution possesses the versatile statistical characteristics necessary to assume the wide variety of shapes exhibited by cloud cover. A total of 160 representative empirical cloud cover distributions were investigated and the conclusion was reached that this study provides sufficient statical evidence to accept the beta probability distribution as the underlying model for world cloud cover.

  7. Statistics of the geomagnetic secular variation for the past 5Ma

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1986-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  8. Statistics of the geomagnetic secular variation for the past 5 m.y

    NASA Technical Reports Server (NTRS)

    Constable, C. G.; Parker, R. L.

    1988-01-01

    A new statistical model is proposed for the geomagnetic secular variation over the past 5Ma. Unlike previous models, the model makes use of statistical characteristics of the present day geomagnetic field. The spatial power spectrum of the non-dipole field is consistent with a white source near the core-mantle boundary with Gaussian distribution. After a suitable scaling, the spherical harmonic coefficients may be regarded as statistical samples from a single giant Gaussian process; this is the model of the non-dipole field. The model can be combined with an arbitrary statistical description of the dipole and probability density functions and cumulative distribution functions can be computed for declination and inclination that would be observed at any site on Earth's surface. Global paleomagnetic data spanning the past 5Ma are used to constrain the statistics of the dipole part of the field. A simple model is found to be consistent with the available data. An advantage of specifying the model in terms of the spherical harmonic coefficients is that it is a complete statistical description of the geomagnetic field, enabling us to test specific properties for a general description. Both intensity and directional data distributions may be tested to see if they satisfy the expected model distributions.

  9. New approach in the quantum statistical parton distribution

    NASA Astrophysics Data System (ADS)

    Sohaily, Sozha; Vaziri (Khamedi), Mohammad

    2017-12-01

    An attempt to find simple parton distribution functions (PDFs) based on quantum statistical approach is presented. The PDFs described by the statistical model have very interesting physical properties which help to understand the structure of partons. The longitudinal portion of distribution functions are given by applying the maximum entropy principle. An interesting and simple approach to determine the statistical variables exactly without fitting and fixing parameters is surveyed. Analytic expressions of the x-dependent PDFs are obtained in the whole x region [0, 1], and the computed distributions are consistent with the experimental observations. The agreement with experimental data, gives a robust confirm of our simple presented statistical model.

  10. Statistics for characterizing data on the periphery

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Theiler, James P; Hush, Donald R

    2010-01-01

    We introduce a class of statistics for characterizing the periphery of a distribution, and show that these statistics are particularly valuable for problems in target detection. Because so many detection algorithms are rooted in Gaussian statistics, we concentrate on ellipsoidal models of high-dimensional data distributions (that is to say: covariance matrices), but we recommend several alternatives to the sample covariance matrix that more efficiently model the periphery of a distribution, and can more effectively detect anomalous data samples.

  11. Modified Distribution-Free Goodness-of-Fit Test Statistic.

    PubMed

    Chun, So Yeon; Browne, Michael W; Shapiro, Alexander

    2018-03-01

    Covariance structure analysis and its structural equation modeling extensions have become one of the most widely used methodologies in social sciences such as psychology, education, and economics. An important issue in such analysis is to assess the goodness of fit of a model under analysis. One of the most popular test statistics used in covariance structure analysis is the asymptotically distribution-free (ADF) test statistic introduced by Browne (Br J Math Stat Psychol 37:62-83, 1984). The ADF statistic can be used to test models without any specific distribution assumption (e.g., multivariate normal distribution) of the observed data. Despite its advantage, it has been shown in various empirical studies that unless sample sizes are extremely large, this ADF statistic could perform very poorly in practice. In this paper, we provide a theoretical explanation for this phenomenon and further propose a modified test statistic that improves the performance in samples of realistic size. The proposed statistic deals with the possible ill-conditioning of the involved large-scale covariance matrices.

  12. Current state of the art for statistical modeling of species distributions [Chapter 16

    Treesearch

    Troy M. Hegel; Samuel A. Cushman; Jeffrey Evans; Falk Huettmann

    2010-01-01

    Over the past decade the number of statistical modelling tools available to ecologists to model species' distributions has increased at a rapid pace (e.g. Elith et al. 2006; Austin 2007), as have the number of species distribution models (SDM) published in the literature (e.g. Scott et al. 2002). Ten years ago, basic logistic regression (Hosmer and Lemeshow 2000)...

  13. Statistical modeling of space shuttle environmental data

    NASA Technical Reports Server (NTRS)

    Tubbs, J. D.; Brewer, D. W.

    1983-01-01

    Statistical models which use a class of bivariate gamma distribution are examined. Topics discussed include: (1) the ratio of positively correlated gamma varieties; (2) a method to determine if unequal shape parameters are necessary in bivariate gamma distribution; (3) differential equations for modal location of a family of bivariate gamma distribution; and (4) analysis of some wind gust data using the analytical results developed for modeling application.

  14. Development of uncertainty-based work injury model using Bayesian structural equation modelling.

    PubMed

    Chatterjee, Snehamoy

    2014-01-01

    This paper proposed a Bayesian method-based structural equation model (SEM) of miners' work injury for an underground coal mine in India. The environmental and behavioural variables for work injury were identified and causal relationships were developed. For Bayesian modelling, prior distributions of SEM parameters are necessary to develop the model. In this paper, two approaches were adopted to obtain prior distribution for factor loading parameters and structural parameters of SEM. In the first approach, the prior distributions were considered as a fixed distribution function with specific parameter values, whereas, in the second approach, prior distributions of the parameters were generated from experts' opinions. The posterior distributions of these parameters were obtained by applying Bayesian rule. The Markov Chain Monte Carlo sampling in the form Gibbs sampling was applied for sampling from the posterior distribution. The results revealed that all coefficients of structural and measurement model parameters are statistically significant in experts' opinion-based priors, whereas, two coefficients are not statistically significant when fixed prior-based distributions are applied. The error statistics reveals that Bayesian structural model provides reasonably good fit of work injury with high coefficient of determination (0.91) and less mean squared error as compared to traditional SEM.

  15. Normal versus Noncentral Chi-Square Asymptotics of Misspecified Models

    ERIC Educational Resources Information Center

    Chun, So Yeon; Shapiro, Alexander

    2009-01-01

    The noncentral chi-square approximation of the distribution of the likelihood ratio (LR) test statistic is a critical part of the methodology in structural equation modeling. Recently, it was argued by some authors that in certain situations normal distributions may give a better approximation of the distribution of the LR test statistic. The main…

  16. Comparison of hypertabastic survival model with other unimodal hazard rate functions using a goodness-of-fit test.

    PubMed

    Tahir, M Ramzan; Tran, Quang X; Nikulin, Mikhail S

    2017-05-30

    We studied the problem of testing a hypothesized distribution in survival regression models when the data is right censored and survival times are influenced by covariates. A modified chi-squared type test, known as Nikulin-Rao-Robson statistic, is applied for the comparison of accelerated failure time models. This statistic is used to test the goodness-of-fit for hypertabastic survival model and four other unimodal hazard rate functions. The results of simulation study showed that the hypertabastic distribution can be used as an alternative to log-logistic and log-normal distribution. In statistical modeling, because of its flexible shape of hazard functions, this distribution can also be used as a competitor of Birnbaum-Saunders and inverse Gaussian distributions. The results for the real data application are shown. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

    NASA Astrophysics Data System (ADS)

    Sergeenko, N. P.

    2017-11-01

    An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.

  18. Variability-aware compact modeling and statistical circuit validation on SRAM test array

    NASA Astrophysics Data System (ADS)

    Qiao, Ying; Spanos, Costas J.

    2016-03-01

    Variability modeling at the compact transistor model level can enable statistically optimized designs in view of limitations imposed by the fabrication technology. In this work we propose a variability-aware compact model characterization methodology based on stepwise parameter selection. Transistor I-V measurements are obtained from bit transistor accessible SRAM test array fabricated using a collaborating foundry's 28nm FDSOI technology. Our in-house customized Monte Carlo simulation bench can incorporate these statistical compact models; and simulation results on SRAM writability performance are very close to measurements in distribution estimation. Our proposed statistical compact model parameter extraction methodology also has the potential of predicting non-Gaussian behavior in statistical circuit performances through mixtures of Gaussian distributions.

  19. Multiplicative Modeling of Children's Growth and Its Statistical Properties

    NASA Astrophysics Data System (ADS)

    Kuninaka, Hiroto; Matsushita, Mitsugu

    2014-03-01

    We develop a numerical growth model that can predict the statistical properties of the height distribution of Japanese children. Our previous studies have clarified that the height distribution of schoolchildren shows a transition from the lognormal distribution to the normal distribution during puberty. In this study, we demonstrate by simulation that the transition occurs owing to the variability of the onset of puberty.

  20. Tsallis p⊥ distribution from statistical clusters

    NASA Astrophysics Data System (ADS)

    Bialas, A.

    2015-07-01

    It is shown that the transverse momentum distributions of particles emerging from the decay of statistical clusters, distributed according to a power law in their transverse energy, closely resemble those following from the Tsallis non-extensive statistical model. The experimental data are well reproduced with the cluster temperature T ≈ 160 MeV.

  1. Ice Water Classification Using Statistical Distribution Based Conditional Random Fields in RADARSAT-2 Dual Polarization Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.

    2017-09-01

    In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.

  2. Scaled test statistics and robust standard errors for non-normal data in covariance structure analysis: a Monte Carlo study.

    PubMed

    Chou, C P; Bentler, P M; Satorra, A

    1991-11-01

    Research studying robustness of maximum likelihood (ML) statistics in covariance structure analysis has concluded that test statistics and standard errors are biased under severe non-normality. An estimation procedure known as asymptotic distribution free (ADF), making no distributional assumption, has been suggested to avoid these biases. Corrections to the normal theory statistics to yield more adequate performance have also been proposed. This study compares the performance of a scaled test statistic and robust standard errors for two models under several non-normal conditions and also compares these with the results from ML and ADF methods. Both ML and ADF test statistics performed rather well in one model and considerably worse in the other. In general, the scaled test statistic seemed to behave better than the ML test statistic and the ADF statistic performed the worst. The robust and ADF standard errors yielded more appropriate estimates of sampling variability than the ML standard errors, which were usually downward biased, in both models under most of the non-normal conditions. ML test statistics and standard errors were found to be quite robust to the violation of the normality assumption when data had either symmetric and platykurtic distributions, or non-symmetric and zero kurtotic distributions.

  3. Leads Detection Using Mixture Statistical Distribution Based CRF Algorithm from Sentinel-1 Dual Polarization SAR Imagery

    NASA Astrophysics Data System (ADS)

    Zhang, Yu; Li, Fei; Zhang, Shengkai; Zhu, Tingting

    2017-04-01

    Synthetic Aperture Radar (SAR) is significantly important for polar remote sensing since it can provide continuous observations in all days and all weather. SAR can be used for extracting the surface roughness information characterized by the variance of dielectric properties and different polarization channels, which make it possible to observe different ice types and surface structure for deformation analysis. In November, 2016, Chinese National Antarctic Research Expedition (CHINARE) 33rd cruise has set sails in sea ice zone in Antarctic. Accurate leads spatial distribution in sea ice zone for routine planning of ship navigation is essential. In this study, the semantic relationship between leads and sea ice categories has been described by the Conditional Random Fields (CRF) model, and leads characteristics have been modeled by statistical distributions in SAR imagery. In the proposed algorithm, a mixture statistical distribution based CRF is developed by considering the contexture information and the statistical characteristics of sea ice for improving leads detection in Sentinel-1A dual polarization SAR imagery. The unary potential and pairwise potential in CRF model is constructed by integrating the posteriori probability estimated from statistical distributions. For mixture statistical distribution parameter estimation, Method of Logarithmic Cumulants (MoLC) is exploited for single statistical distribution parameters estimation. The iteration based Expectation Maximal (EM) algorithm is investigated to calculate the parameters in mixture statistical distribution based CRF model. In the posteriori probability inference, graph-cut energy minimization method is adopted in the initial leads detection. The post-processing procedures including aspect ratio constrain and spatial smoothing approaches are utilized to improve the visual result. The proposed method is validated on Sentinel-1A SAR C-band Extra Wide Swath (EW) Ground Range Detected (GRD) imagery with a pixel spacing of 40 meters near Prydz Bay area, East Antarctica. Main work is listed as follows: 1) A mixture statistical distribution based CRF algorithm has been developed for leads detection from Sentinel-1A dual polarization images. 2) The assessment of the proposed mixture statistical distribution based CRF method and single distribution based CRF algorithm has been presented. 3) The preferable parameters sets including statistical distributions, the aspect ratio threshold and spatial smoothing window size have been provided. In the future, the proposed algorithm will be developed for the operational Sentinel series data sets processing due to its less time consuming cost and high accuracy in leads detection.

  4. From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling

    NASA Astrophysics Data System (ADS)

    Nazé, Pierre

    2018-03-01

    Copula modeling consists in finding a probabilistic distribution, called copula, whereby its coupling with the marginal distributions of a set of random variables produces their joint distribution. The present work aims to use this technique to connect the statistical distributions of weakly chaotic dynamics and deterministic subdiffusion. More precisely, we decompose the jumps distribution of Geisel-Thomae map into a bivariate one and determine the marginal and copula distributions respectively by infinite ergodic theory and statistical inference techniques. We verify therefore that the characteristic tail distribution of subdiffusion is an extreme value copula coupling Mittag-Leffler distributions. We also present a method to calculate the exact copula and joint distributions in the case where weakly chaotic dynamics and deterministic subdiffusion statistical distributions are already known. Numerical simulations and consistency with the dynamical aspects of the map support our results.

  5. Statistical study of air pollutant concentrations via generalized gamma distribution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marani, A.; Lavagnini, I.; Buttazzoni, C.

    1986-11-01

    This paper deals with modeling observed frequency distributions of air quality data measured in the area of Venice, Italy. The paper discusses the application of the generalized gamma distribution (ggd) which has not been commonly applied to air quality data notwithstanding the fact that it embodies most distribution models used for air quality analyses. The approach yields important simplifications for statistical analyses. A comparison among the ggd and other relevant models (standard gamma, Weibull, lognormal), carried out on daily sulfur dioxide concentrations in the area of Venice underlines the efficiency of ggd models in portraying experimental data.

  6. Fragment size distribution statistics in dynamic fragmentation of laser shock-loaded tin

    NASA Astrophysics Data System (ADS)

    He, Weihua; Xin, Jianting; Zhao, Yongqiang; Chu, Genbai; Xi, Tao; Shui, Min; Lu, Feng; Gu, Yuqiu

    2017-06-01

    This work investigates the geometric statistics method to characterize the size distribution of tin fragments produced in the laser shock-loaded dynamic fragmentation process. In the shock experiments, the ejection of the tin sample with etched V-shape groove in the free surface are collected by the soft recovery technique. Subsequently, the produced fragments are automatically detected with the fine post-shot analysis techniques including the X-ray micro-tomography and the improved watershed method. To characterize the size distributions of the fragments, a theoretical random geometric statistics model based on Poisson mixtures is derived for dynamic heterogeneous fragmentation problem, which reveals linear combinational exponential distribution. The experimental data related to fragment size distributions of the laser shock-loaded tin sample are examined with the proposed theoretical model, and its fitting performance is compared with that of other state-of-the-art fragment size distribution models. The comparison results prove that our proposed model can provide far more reasonable fitting result for the laser shock-loaded tin.

  7. The statistical average of optical properties for alumina particle cluster in aircraft plume

    NASA Astrophysics Data System (ADS)

    Li, Jingying; Bai, Lu; Wu, Zhensen; Guo, Lixin

    2018-04-01

    We establish a model for lognormal distribution of monomer radius and number of alumina particle clusters in plume. According to the Multi-Sphere T Matrix (MSTM) theory, we provide a method for finding the statistical average of optical properties for alumina particle clusters in plume, analyze the effect of different distributions and different detection wavelengths on the statistical average of optical properties for alumina particle cluster, and compare the statistical average optical properties under the alumina particle cluster model established in this study and those under three simplified alumina particle models. The calculation results show that the monomer number of alumina particle cluster and its size distribution have a considerable effect on its statistical average optical properties. The statistical average of optical properties for alumina particle cluster at common detection wavelengths exhibit obvious differences, whose differences have a great effect on modeling IR and UV radiation properties of plume. Compared with the three simplified models, the alumina particle cluster model herein features both higher extinction and scattering efficiencies. Therefore, we may find that an accurate description of the scattering properties of alumina particles in aircraft plume is of great significance in the study of plume radiation properties.

  8. Comparing Simulated and Theoretical Sampling Distributions of the U3 Person-Fit Statistic.

    ERIC Educational Resources Information Center

    Emons, Wilco H. M.; Meijer, Rob R.; Sijtsma, Klaas

    2002-01-01

    Studied whether the theoretical sampling distribution of the U3 person-fit statistic is in agreement with the simulated sampling distribution under different item response theory models and varying item and test characteristics. Simulation results suggest that the use of standard normal deviates for the standardized version of the U3 statistic may…

  9. Derivation of the Statistical Distribution of the Mass Peak Centroids of Mass Spectrometers Employing Analog-to-Digital Converters and Electron Multipliers

    DOE PAGES

    Ipsen, Andreas

    2017-02-03

    Here, the mass peak centroid is a quantity that is at the core of mass spectrometry (MS). However, despite its central status in the field, models of its statistical distribution are often chosen quite arbitrarily and without attempts at establishing a proper theoretical justification for their use. Recent work has demonstrated that for mass spectrometers employing analog-to-digital converters (ADCs) and electron multipliers, the statistical distribution of the mass peak intensity can be described via a relatively simple model derived essentially from first principles. Building on this result, the following article derives the corresponding statistical distribution for the mass peak centroidsmore » of such instruments. It is found that for increasing signal strength, the centroid distribution converges to a Gaussian distribution whose mean and variance are determined by physically meaningful parameters and which in turn determine bias and variability of the m/z measurements of the instrument. Through the introduction of the concept of “pulse-peak correlation”, the model also elucidates the complicated relationship between the shape of the voltage pulses produced by the preamplifier and the mean and variance of the centroid distribution. The predictions of the model are validated with empirical data and with Monte Carlo simulations.« less

  10. Derivation of the Statistical Distribution of the Mass Peak Centroids of Mass Spectrometers Employing Analog-to-Digital Converters and Electron Multipliers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ipsen, Andreas

    Here, the mass peak centroid is a quantity that is at the core of mass spectrometry (MS). However, despite its central status in the field, models of its statistical distribution are often chosen quite arbitrarily and without attempts at establishing a proper theoretical justification for their use. Recent work has demonstrated that for mass spectrometers employing analog-to-digital converters (ADCs) and electron multipliers, the statistical distribution of the mass peak intensity can be described via a relatively simple model derived essentially from first principles. Building on this result, the following article derives the corresponding statistical distribution for the mass peak centroidsmore » of such instruments. It is found that for increasing signal strength, the centroid distribution converges to a Gaussian distribution whose mean and variance are determined by physically meaningful parameters and which in turn determine bias and variability of the m/z measurements of the instrument. Through the introduction of the concept of “pulse-peak correlation”, the model also elucidates the complicated relationship between the shape of the voltage pulses produced by the preamplifier and the mean and variance of the centroid distribution. The predictions of the model are validated with empirical data and with Monte Carlo simulations.« less

  11. Statistical wind analysis for near-space applications

    NASA Astrophysics Data System (ADS)

    Roney, Jason A.

    2007-09-01

    Statistical wind models were developed based on the existing observational wind data for near-space altitudes between 60 000 and 100 000 ft (18 30 km) above ground level (AGL) at two locations, Akon, OH, USA, and White Sands, NM, USA. These two sites are envisioned as playing a crucial role in the first flights of high-altitude airships. The analysis shown in this paper has not been previously applied to this region of the stratosphere for such an application. Standard statistics were compiled for these data such as mean, median, maximum wind speed, and standard deviation, and the data were modeled with Weibull distributions. These statistics indicated, on a yearly average, there is a lull or a “knee” in the wind between 65 000 and 72 000 ft AGL (20 22 km). From the standard statistics, trends at both locations indicated substantial seasonal variation in the mean wind speed at these heights. The yearly and monthly statistical modeling indicated that Weibull distributions were a reasonable model for the data. Forecasts and hindcasts were done by using a Weibull model based on 2004 data and comparing the model with the 2003 and 2005 data. The 2004 distribution was also a reasonable model for these years. Lastly, the Weibull distribution and cumulative function were used to predict the 50%, 95%, and 99% winds, which are directly related to the expected power requirements of a near-space station-keeping airship. These values indicated that using only the standard deviation of the mean may underestimate the operational conditions.

  12. Heavy residues from very mass asymmetric heavy ion reactions

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hanold, Karl Alan

    1994-08-01

    The isotopic production cross sections and momenta of all residues with nuclear charge (Z) greater than 39 from the reaction of 26, 40, and 50 MeV/nucleon 129Xe + Be, C, and Al were measured. The isotopic cross sections, the momentum distribution for each isotope, and the cross section as a function of nuclear charge and momentum are presented here. The new cross sections are consistent with previous measurements of the cross sections from similar reaction systems. The shape of the cross section distribution, when considered as a function of Z and velocity, was found to be qualitatively consistent with thatmore » expected from an incomplete fusion reaction mechanism. An incomplete fusion model coupled to a statistical decay model is able to reproduce many features of these reactions: the shapes of the elemental cross section distributions, the emission velocity distributions for the intermediate mass fragments, and the Z versus velocity distributions. This model gives a less satisfactory prediction of the momentum distribution for each isotope. A very different model based on the Boltzman-Nordheim-Vlasov equation and which was also coupled to a statistical decay model reproduces many features of these reactions: the shapes of the elemental cross section distributions, the intermediate mass fragment emission velocity distributions, and the Z versus momentum distributions. Both model calculations over-estimate the average mass for each element by two mass units and underestimate the isotopic and isobaric widths of the experimental distributions. It is shown that the predicted average mass for each element can be brought into agreement with the data by small, but systematic, variation of the particle emission barriers used in the statistical model. The predicted isotopic and isobaric widths of the cross section distributions can not be brought into agreement with the experimental data using reasonable parameters for the statistical model.« less

  13. A generalized statistical model for the size distribution of wealth

    NASA Astrophysics Data System (ADS)

    Clementi, F.; Gallegati, M.; Kaniadakis, G.

    2012-12-01

    In a recent paper in this journal (Clementi et al 2009 J. Stat. Mech. P02037), we proposed a new, physically motivated, distribution function for modeling individual incomes, having its roots in the framework of the κ-generalized statistical mechanics. The performance of the κ-generalized distribution was checked against real data on personal income for the United States in 2003. In this paper we extend our previous model so as to be able to account for the distribution of wealth. Probabilistic functions and inequality measures of this generalized model for wealth distribution are obtained in closed form. In order to check the validity of the proposed model, we analyze the US household wealth distributions from 1984 to 2009 and conclude an excellent agreement with the data that is superior to any other model already known in the literature.

  14. Dominant role of many-body effects on the carrier distribution function of quantum dot lasers

    NASA Astrophysics Data System (ADS)

    Peyvast, Negin; Zhou, Kejia; Hogg, Richard A.; Childs, David T. D.

    2016-03-01

    The effects of free-carrier-induced shift and broadening on the carrier distribution function are studied considering different extreme cases for carrier statistics (Fermi-Dirac and random carrier distributions) as well as quantum dot (QD) ensemble inhomogeneity and state separation using a Monte Carlo model. Using this model, we show that the dominant factor determining the carrier distribution function is the free carrier effects and not the choice of carrier statistics. By using empirical values of the free-carrier-induced shift and broadening, good agreement is obtained with experimental data of QD materials obtained under electrical injection for both extreme cases of carrier statistics.

  15. Multi-scale Characterization and Modeling of Surface Slope Probability Distribution for ~20-km Diameter Lunar Craters

    NASA Astrophysics Data System (ADS)

    Mahanti, P.; Robinson, M. S.; Boyd, A. K.

    2013-12-01

    Craters ~20-km diameter and above significantly shaped the lunar landscape. The statistical nature of the slope distribution on their walls and floors dominate the overall slope distribution statistics for the lunar surface. Slope statistics are inherently useful for characterizing the current topography of the surface, determining accurate photometric and surface scattering properties, and in defining lunar surface trafficability [1-4]. Earlier experimental studies on the statistical nature of lunar surface slopes were restricted either by resolution limits (Apollo era photogrammetric studies) or by model error considerations (photoclinometric and radar scattering studies) where the true nature of slope probability distribution was not discernible at baselines smaller than a kilometer[2,3,5]. Accordingly, historical modeling of lunar surface slopes probability distributions for applications such as in scattering theory development or rover traversability assessment is more general in nature (use of simple statistical models such as the Gaussian distribution[1,2,5,6]). With the advent of high resolution, high precision topographic models of the Moon[7,8], slopes in lunar craters can now be obtained at baselines as low as 6-meters allowing unprecedented multi-scale (multiple baselines) modeling possibilities for slope probability distributions. Topographic analysis (Lunar Reconnaissance Orbiter Camera (LROC) Narrow Angle Camera (NAC) 2-m digital elevation models (DEM)) of ~20-km diameter Copernican lunar craters revealed generally steep slopes on interior walls (30° to 36°, locally exceeding 40°) over 15-meter baselines[9]. In this work, we extend the analysis from a probability distribution modeling point-of-view with NAC DEMs to characterize the slope statistics for the floors and walls for the same ~20-km Copernican lunar craters. The difference in slope standard deviations between the Gaussian approximation and the actual distribution (2-meter sampling) was computed over multiple scales. This slope analysis showed that local slope distributions are non-Gaussian for both crater walls and floors. Over larger baselines (~100 meters), crater wall slope probability distributions do approximate Gaussian distributions better, but have long distribution tails. Crater floor probability distributions however, were always asymmetric (for the baseline scales analyzed) and less affected by baseline scale variations. Accordingly, our results suggest that use of long tailed probability distributions (like Cauchy) and a baseline-dependant multi-scale model can be more effective in describing the slope statistics for lunar topography. Refrences: [1]Moore, H.(1971), JGR,75(11) [2]Marcus, A. H.(1969),JGR,74 (22).[3]R.J. Pike (1970),U.S. Geological Survey Working Paper [4]N. C. Costes, J. E. Farmer and E. B. George (1972),NASA Technical Report TR R-401 [5]M. N. Parker and G. L. Tyler(1973), Radio Science, 8(3),177-184 [6]Alekseev, V. A.et al (1968), Soviet Astronomy, Vol. 11, p.860 [7]Burns et al. (2012) Int. Arch. Photogramm. Remote Sens. Spatial Inf. Sci., XXXIX-B4, 483-488.[8]Smith et al. (2010) GRL 37, L18204, DOI: 10.1029/2010GL043751. [9]Wagner R., Robinson, M., Speyerer E., Mahanti, P., LPSC 2013, #2924.

  16. Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

    PubMed

    Dai, Qi; Yang, Yanchun; Wang, Tianming

    2008-10-15

    Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.

  17. Regional climate model downscaling may improve the prediction of alien plant species distributions

    NASA Astrophysics Data System (ADS)

    Liu, Shuyan; Liang, Xin-Zhong; Gao, Wei; Stohlgren, Thomas J.

    2014-12-01

    Distributions of invasive species are commonly predicted with species distribution models that build upon the statistical relationships between observed species presence data and climate data. We used field observations, climate station data, and Maximum Entropy species distribution models for 13 invasive plant species in the United States, and then compared the models with inputs from a General Circulation Model (hereafter GCM-based models) and a downscaled Regional Climate Model (hereafter, RCM-based models).We also compared species distributions based on either GCM-based or RCM-based models for the present (1990-1999) to the future (2046-2055). RCM-based species distribution models replicated observed distributions remarkably better than GCM-based models for all invasive species under the current climate. This was shown for the presence locations of the species, and by using four common statistical metrics to compare modeled distributions. For two widespread invasive taxa ( Bromus tectorum or cheatgrass, and Tamarix spp. or tamarisk), GCM-based models failed miserably to reproduce observed species distributions. In contrast, RCM-based species distribution models closely matched observations. Future species distributions may be significantly affected by using GCM-based inputs. Because invasive plants species often show high resilience and low rates of local extinction, RCM-based species distribution models may perform better than GCM-based species distribution models for planning containment programs for invasive species.

  18. The invariant statistical rule of aerosol scattering pulse signal modulated by random noise

    NASA Astrophysics Data System (ADS)

    Yan, Zhen-gang; Bian, Bao-Min; Yang, Juan; Peng, Gang; Li, Zhen-hua

    2010-11-01

    A model of the random background noise acting on particle signals is established to study the impact of the background noise of the photoelectric sensor in the laser airborne particle counter on the statistical character of the aerosol scattering pulse signals. The results show that the noises broaden the statistical distribution of the particle's measurement. Further numerical research shows that the output of the signal amplitude still has the same distribution when the airborne particle with the lognormal distribution was modulated by random noise which has lognormal distribution. Namely it follows the statistics law of invariance. Based on this model, the background noise of photoelectric sensor and the counting distributions of random signal for aerosol's scattering pulse are obtained and analyzed by using a high-speed data acquisition card PCI-9812. It is found that the experiment results and simulation results are well consistent.

  19. A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB).

    PubMed

    Shirazi, Mohammadali; Dhavala, Soma Sekhar; Lord, Dominique; Geedipally, Srinivas Reddy

    2017-10-01

    Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Selecting the right statistical model for analysis of insect count data by using information theoretic measures.

    PubMed

    Sileshi, G

    2006-10-01

    Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.

  1. Statistical distribution of mechanical properties for three graphite-epoxy material systems

    NASA Technical Reports Server (NTRS)

    Reese, C.; Sorem, J., Jr.

    1981-01-01

    Graphite-epoxy composites are playing an increasing role as viable alternative materials in structural applications necessitating thorough investigation into the predictability and reproducibility of their material strength properties. This investigation was concerned with tension, compression, and short beam shear coupon testing of large samples from three different material suppliers to determine their statistical strength behavior. Statistical results indicate that a two Parameter Weibull distribution model provides better overall characterization of material behavior for the graphite-epoxy systems tested than does the standard Normal distribution model that is employed for most design work. While either a Weibull or Normal distribution model provides adequate predictions for average strength values, the Weibull model provides better characterization in the lower tail region where the predictions are of maximum design interest. The two sets of the same material were found to have essentially the same material properties, and indicate that repeatability can be achieved.

  2. Statistical thermodynamics of a two-dimensional relativistic gas.

    PubMed

    Montakhab, Afshin; Ghodrat, Malihe; Barati, Mahmood

    2009-03-01

    In this paper we study a fully relativistic model of a two-dimensional hard-disk gas. This model avoids the general problems associated with relativistic particle collisions and is therefore an ideal system to study relativistic effects in statistical thermodynamics. We study this model using molecular-dynamics simulation, concentrating on the velocity distribution functions. We obtain results for x and y components of velocity in the rest frame (Gamma) as well as the moving frame (Gamma;{'}) . Our results confirm that Jüttner distribution is the correct generalization of Maxwell-Boltzmann distribution. We obtain the same "temperature" parameter beta for both frames consistent with a recent study of a limited one-dimensional model. We also address the controversial topic of temperature transformation. We show that while local thermal equilibrium holds in the moving frame, relying on statistical methods such as distribution functions or equipartition theorem are ultimately inconclusive in deciding on a correct temperature transformation law (if any).

  3. On the Determination of Poisson Statistics for Haystack Radar Observations of Orbital Debris

    NASA Technical Reports Server (NTRS)

    Stokely, Christopher L.; Benbrook, James R.; Horstman, Matt

    2007-01-01

    A convenient and powerful method is used to determine if radar detections of orbital debris are observed according to Poisson statistics. This is done by analyzing the time interval between detection events. For Poisson statistics, the probability distribution of the time interval between events is shown to be an exponential distribution. This distribution is a special case of the Erlang distribution that is used in estimating traffic loads on telecommunication networks. Poisson statistics form the basis of many orbital debris models but the statistical basis of these models has not been clearly demonstrated empirically until now. Interestingly, during the fiscal year 2003 observations with the Haystack radar in a fixed staring mode, there are no statistically significant deviations observed from that expected with Poisson statistics, either independent or dependent of altitude or inclination. One would potentially expect some significant clustering of events in time as a result of satellite breakups, but the presence of Poisson statistics indicates that such debris disperse rapidly with respect to Haystack's very narrow radar beam. An exception to Poisson statistics is observed in the months following the intentional breakup of the Fengyun satellite in January 2007.

  4. Counts-in-cylinders in the Sloan Digital Sky Survey with Comparisons to N-body Simulations

    NASA Astrophysics Data System (ADS)

    Berrier, Heather D.; Barton, Elizabeth J.; Berrier, Joel C.; Bullock, James S.; Zentner, Andrew R.; Wechsler, Risa H.

    2011-01-01

    Environmental statistics provide a necessary means of comparing the properties of galaxies in different environments, and a vital test of models of galaxy formation within the prevailing hierarchical cosmological model. We explore counts-in-cylinders, a common statistic defined as the number of companions of a particular galaxy found within a given projected radius and redshift interval. Galaxy distributions with the same two-point correlation functions do not necessarily have the same companion count distributions. We use this statistic to examine the environments of galaxies in the Sloan Digital Sky Survey Data Release 4 (SDSS DR4). We also make preliminary comparisons to four models for the spatial distributions of galaxies, based on N-body simulations and data from SDSS DR4, to study the utility of the counts-in-cylinders statistic. There is a very large scatter between the number of companions a galaxy has and the mass of its parent dark matter halo and the halo occupation, limiting the utility of this statistic for certain kinds of environmental studies. We also show that prevalent empirical models of galaxy clustering, that match observed two- and three-point clustering statistics well, fail to reproduce some aspects of the observed distribution of counts-in-cylinders on 1, 3, and 6 h -1 Mpc scales. All models that we explore underpredict the fraction of galaxies with few or no companions in 3 and 6 h -1 Mpc cylinders. Roughly 7% of galaxies in the real universe are significantly more isolated within a 6 h -1 Mpc cylinder than the galaxies in any of the models we use. Simple phenomenological models that map galaxies to dark matter halos fail to reproduce high-order clustering statistics in low-density environments.

  5. Statistical prediction with Kanerva's sparse distributed memory

    NASA Technical Reports Server (NTRS)

    Rogers, David

    1989-01-01

    A new viewpoint of the processing performed by Kanerva's sparse distributed memory (SDM) is presented. In conditions of near- or over-capacity, where the associative-memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor. Mathematical results are presented which serve as the framework for a new statistical viewpoint of sparse distributed memory and for which the standard formulation of SDM is a special case. This viewpoint suggests possible enhancements to the SDM model, including a procedure for improving the predictiveness of the system based on Holland's work with genetic algorithms, and a method for improving the capacity of SDM even when used as an associative memory.

  6. Menzerath-Altmann Law: Statistical Mechanical Interpretation as Applied to a Linguistic Organization

    NASA Astrophysics Data System (ADS)

    Eroglu, Sertac

    2014-10-01

    The distribution behavior described by the empirical Menzerath-Altmann law is frequently encountered during the self-organization of linguistic and non-linguistic natural organizations at various structural levels. This study presents a statistical mechanical derivation of the law based on the analogy between the classical particles of a statistical mechanical organization and the distinct words of a textual organization. The derived model, a transformed (generalized) form of the Menzerath-Altmann model, was termed as the statistical mechanical Menzerath-Altmann model. The derived model allows interpreting the model parameters in terms of physical concepts. We also propose that many organizations presenting the Menzerath-Altmann law behavior, whether linguistic or not, can be methodically examined by the transformed distribution model through the properly defined structure-dependent parameter and the energy associated states.

  7. Fitting statistical distributions to sea duck count data: implications for survey design and abundance estimation

    USGS Publications Warehouse

    Zipkin, Elise F.; Leirness, Jeffery B.; Kinlan, Brian P.; O'Connell, Allan F.; Silverman, Emily D.

    2014-01-01

    Determining appropriate statistical distributions for modeling animal count data is important for accurate estimation of abundance, distribution, and trends. In the case of sea ducks along the U.S. Atlantic coast, managers want to estimate local and regional abundance to detect and track population declines, to define areas of high and low use, and to predict the impact of future habitat change on populations. In this paper, we used a modified marked point process to model survey data that recorded flock sizes of Common eiders, Long-tailed ducks, and Black, Surf, and White-winged scoters. The data come from an experimental aerial survey, conducted by the United States Fish & Wildlife Service (USFWS) Division of Migratory Bird Management, during which east-west transects were flown along the Atlantic Coast from Maine to Florida during the winters of 2009–2011. To model the number of flocks per transect (the points), we compared the fit of four statistical distributions (zero-inflated Poisson, zero-inflated geometric, zero-inflated negative binomial and negative binomial) to data on the number of species-specific sea duck flocks that were recorded for each transect flown. To model the flock sizes (the marks), we compared the fit of flock size data for each species to seven statistical distributions: positive Poisson, positive negative binomial, positive geometric, logarithmic, discretized lognormal, zeta and Yule–Simon. Akaike’s Information Criterion and Vuong’s closeness tests indicated that the negative binomial and discretized lognormal were the best distributions for all species for the points and marks, respectively. These findings have important implications for estimating sea duck abundances as the discretized lognormal is a more skewed distribution than the Poisson and negative binomial, which are frequently used to model avian counts; the lognormal is also less heavy-tailed than the power law distributions (e.g., zeta and Yule–Simon), which are becoming increasingly popular for group size modeling. Choosing appropriate statistical distributions for modeling flock size data is fundamental to accurately estimating population summaries, determining required survey effort, and assessing and propagating uncertainty through decision-making processes.

  8. Statistical characteristics of storm interevent time, depth, and duration for eastern New Mexico, Oklahoma, and Texas

    USGS Publications Warehouse

    Asquith, William H.; Roussel, Meghan C.; Cleveland, Theodore G.; Fang, Xing; Thompson, David B.

    2006-01-01

    The design of small runoff-control structures, from simple floodwater-detention basins to sophisticated best-management practices, requires the statistical characterization of rainfall as a basis for cost-effective, risk-mitigated, hydrologic engineering design. The U.S. Geological Survey, in cooperation with the Texas Department of Transportation, has developed a framework to estimate storm statistics including storm interevent times, distributions of storm depths, and distributions of storm durations for eastern New Mexico, Oklahoma, and Texas. The analysis is based on hourly rainfall recorded by the National Weather Service. The database contains more than 155 million hourly values from 774 stations in the study area. Seven sets of maps depicting ranges of mean storm interevent time, mean storm depth, and mean storm duration, by county, as well as tables listing each of those statistics, by county, were developed. The mean storm interevent time is used in probabilistic models to assess the frequency distribution of storms. The Poisson distribution is suggested to model the distribution of storm occurrence, and the exponential distribution is suggested to model the distribution of storm interevent times. The four-parameter kappa distribution is judged as an appropriate distribution for modeling the distribution of both storm depth and storm duration. Preference for the kappa distribution is based on interpretation of L-moment diagrams. Parameter estimates for the kappa distributions are provided. Separate dimensionless frequency curves for storm depth and duration are defined for eastern New Mexico, Oklahoma, and Texas. Dimension is restored by multiplying curve ordinates by the mean storm depth or mean storm duration to produce quantile functions of storm depth and duration. Minimum interevent time and location have slight influence on the scale and shape of the dimensionless frequency curves. Ten example problems and solutions to possible applications are provided.

  9. Statistical prescission point model of fission fragment angular distributions

    NASA Astrophysics Data System (ADS)

    John, Bency; Kataria, S. K.

    1998-03-01

    In light of recent developments in fission studies such as slow saddle to scission motion and spin equilibration near the scission point, the theory of fission fragment angular distribution is examined and a new statistical prescission point model is developed. The conditional equilibrium of the collective angular bearing modes at the prescission point, which is guided mainly by their relaxation times and population probabilities, is taken into account in the present model. The present model gives a consistent description of the fragment angular and spin distributions for a wide variety of heavy and light ion induced fission reactions.

  10. Extreme value statistics analysis of fracture strengths of a sintered silicon nitride failing from pores

    NASA Technical Reports Server (NTRS)

    Chao, Luen-Yuan; Shetty, Dinesh K.

    1992-01-01

    Statistical analysis and correlation between pore-size distribution and fracture strength distribution using the theory of extreme-value statistics is presented for a sintered silicon nitride. The pore-size distribution on a polished surface of this material was characterized, using an automatic optical image analyzer. The distribution measured on the two-dimensional plane surface was transformed to a population (volume) distribution, using the Schwartz-Saltykov diameter method. The population pore-size distribution and the distribution of the pore size at the fracture origin were correllated by extreme-value statistics. Fracture strength distribution was then predicted from the extreme-value pore-size distribution, usin a linear elastic fracture mechanics model of annular crack around pore and the fracture toughness of the ceramic. The predicted strength distribution was in good agreement with strength measurements in bending. In particular, the extreme-value statistics analysis explained the nonlinear trend in the linearized Weibull plot of measured strengths without postulating a lower-bound strength.

  11. Statistics of Dark Matter Halos from Gravitational Lensing.

    PubMed

    Jain; Van Waerbeke L

    2000-02-10

    We present a new approach to measure the mass function of dark matter halos and to discriminate models with differing values of Omega through weak gravitational lensing. We measure the distribution of peaks from simulated lensing surveys and show that the lensing signal due to dark matter halos can be detected for a wide range of peak heights. Even when the signal-to-noise ratio is well below the limit for detection of individual halos, projected halo statistics can be constrained for halo masses spanning galactic to cluster halos. The use of peak statistics relies on an analytical model of the noise due to the intrinsic ellipticities of source galaxies. The noise model has been shown to accurately describe simulated data for a variety of input ellipticity distributions. We show that the measured peak distribution has distinct signatures of gravitational lensing, and its non-Gaussian shape can be used to distinguish models with different values of Omega. The use of peak statistics is complementary to the measurement of field statistics, such as the ellipticity correlation function, and is possibly not susceptible to the same systematic errors.

  12. Limitations of the Porter-Thomas distribution

    NASA Astrophysics Data System (ADS)

    Weidenmüller, Hans A.

    2017-12-01

    Data on the distribution of reduced partial neutron widths and on the distribution of total gamma decay widths disagree with the Porter-Thomas distribution (PTD) for reduced partial widths or with predictions of the statistical model. We recall why the disagreement is important: The PTD is a direct consequence of the orthogonal invariance of the Gaussian Orthogonal Ensemble (GOE) of random matrices. The disagreement is reviewed. Two possible causes for violation of orthogonal invariance of the GOE are discussed, and their consequences explored. The disagreement of the distribution of total gamma decay widths with theoretical predictions cannot be blamed on the statistical model.

  13. Statistical Models of Fracture Relevant to Nuclear-Grade Graphite: Review and Recommendations

    NASA Technical Reports Server (NTRS)

    Nemeth, Noel N.; Bratton, Robert L.

    2011-01-01

    The nuclear-grade (low-impurity) graphite needed for the fuel element and moderator material for next-generation (Gen IV) reactors displays large scatter in strength and a nonlinear stress-strain response from damage accumulation. This response can be characterized as quasi-brittle. In this expanded review, relevant statistical failure models for various brittle and quasi-brittle material systems are discussed with regard to strength distribution, size effect, multiaxial strength, and damage accumulation. This includes descriptions of the Weibull, Batdorf, and Burchell models as well as models that describe the strength response of composite materials, which involves distributed damage. Results from lattice simulations are included for a physics-based description of material breakdown. Consideration is given to the predicted transition between brittle and quasi-brittle damage behavior versus the density of damage (level of disorder) within the material system. The literature indicates that weakest-link-based failure modeling approaches appear to be reasonably robust in that they can be applied to materials that display distributed damage, provided that the level of disorder in the material is not too large. The Weibull distribution is argued to be the most appropriate statistical distribution to model the stochastic-strength response of graphite.

  14. Evaluation of high-resolution sea ice models on the basis of statistical and scaling properties of Arctic sea ice drift and deformation

    NASA Astrophysics Data System (ADS)

    Girard, L.; Weiss, J.; Molines, J. M.; Barnier, B.; Bouillon, S.

    2009-08-01

    Sea ice drift and deformation from models are evaluated on the basis of statistical and scaling properties. These properties are derived from two observation data sets: the RADARSAT Geophysical Processor System (RGPS) and buoy trajectories from the International Arctic Buoy Program (IABP). Two simulations obtained with the Louvain-la-Neuve Ice Model (LIM) coupled to a high-resolution ocean model and a simulation obtained with the Los Alamos Sea Ice Model (CICE) were analyzed. Model ice drift compares well with observations in terms of large-scale velocity field and distributions of velocity fluctuations although a significant bias on the mean ice speed is noted. On the other hand, the statistical properties of ice deformation are not well simulated by the models: (1) The distributions of strain rates are incorrect: RGPS distributions of strain rates are power law tailed, i.e., exhibit "wild randomness," whereas models distributions remain in the Gaussian attraction basin, i.e., exhibit "mild randomness." (2) The models are unable to reproduce the spatial and temporal correlations of the deformation fields: In the observations, ice deformation follows spatial and temporal scaling laws that express the heterogeneity and the intermittency of deformation. These relations do not appear in simulated ice deformation. Mean deformation in models is almost scale independent. The statistical properties of ice deformation are a signature of the ice mechanical behavior. The present work therefore suggests that the mechanical framework currently used by models is inappropriate. A different modeling framework based on elastic interactions could improve the representation of the statistical and scaling properties of ice deformation.

  15. Statistics of acoustic emissions and stress drops during granular shearing using a stick-slip fiber bundle mode

    NASA Astrophysics Data System (ADS)

    Cohen, D.; Michlmayr, G.; Or, D.

    2012-04-01

    Shearing of dense granular materials appears in many engineering and Earth sciences applications. Under a constant strain rate, the shearing stress at steady state oscillates with slow rises followed by rapid drops that are linked to the build up and failure of force chains. Experiments indicate that these drops display exponential statistics. Measurements of acoustic emissions during shearing indicates that the energy liberated by failure of these force chains has power-law statistics. Representing force chains as fibers, we use a stick-slip fiber bundle model to obtain analytical solutions of the statistical distribution of stress drops and failure energy. In the model, fibers stretch, fail, and regain strength during deformation. Fibers have Weibull-distributed threshold strengths with either quenched and annealed disorder. The shape of the distribution for drops and energy obtained from the model are similar to those measured during shearing experiments. This simple model may be useful to identify failure events linked to force chain failures. Future generalizations of the model that include different types of fiber failure may also allow identification of different types of granular failures that have distinct statistical acoustic emission signatures.

  16. Developing Teachers' Reasoning about Comparing Distributions: A Cross-Institutional Effort

    ERIC Educational Resources Information Center

    Tran, Dung; Lee, Hollylynne; Doerr, Helen

    2016-01-01

    The research reported here uses a pre/post-test model and stimulated recall interviews to assess teachers' statistical reasoning about comparing distributions, when enrolled in a graduate-level statistics education course. We discuss key aspects of the course design aimed at improving teachers' learning and teaching of statistics, and the…

  17. Statistical methods for investigating quiescence and other temporal seismicity patterns

    USGS Publications Warehouse

    Matthews, M.V.; Reasenberg, P.A.

    1988-01-01

    We propose a statistical model and a technique for objective recognition of one of the most commonly cited seismicity patterns:microearthquake quiescence. We use a Poisson process model for seismicity and define a process with quiescence as one with a particular type of piece-wise constant intensity function. From this model, we derive a statistic for testing stationarity against a 'quiescence' alternative. The large-sample null distribution of this statistic is approximated from simulated distributions of appropriate functionals applied to Brownian bridge processes. We point out the restrictiveness of the particular model we propose and of the quiescence idea in general. The fact that there are many point processes which have neither constant nor quiescent rate functions underscores the need to test for and describe nonuniformity thoroughly. We advocate the use of the quiescence test in conjunction with various other tests for nonuniformity and with graphical methods such as density estimation. ideally these methods may promote accurate description of temporal seismicity distributions and useful characterizations of interesting patterns. ?? 1988 Birkha??user Verlag.

  18. Exponential order statistic models of software reliability growth

    NASA Technical Reports Server (NTRS)

    Miller, D. R.

    1985-01-01

    Failure times of a software reliabilty growth process are modeled as order statistics of independent, nonidentically distributed exponential random variables. The Jelinsky-Moranda, Goel-Okumoto, Littlewood, Musa-Okumoto Logarithmic, and Power Law models are all special cases of Exponential Order Statistic Models, but there are many additional examples also. Various characterizations, properties and examples of this class of models are developed and presented.

  19. Invasive Species Distribution Modeling (iSDM): Are absence data and dispersal constraints needed to predict actual distributions?

    Treesearch

    Tomáš Václavík; Ross K. Meentemeyer

    2009-01-01

    Species distribution models (SDMs) based on statistical relationships between occurrence data and underlying environmental conditions are increasingly used to predict spatial patterns of biological invasions and prioritize locations for early detection and control of invasion outbreaks. However, invasive species distribution models (iSDMs) face special challenges...

  20. Predicting the potential distribution of invasive exotic species using GIS and information-theoretic approaches: A case of ragweed (Ambrosia artemisiifolia L.) distribution in China

    USGS Publications Warehouse

    Hao, Chen; LiJun, Chen; Albright, Thomas P.

    2007-01-01

    Invasive exotic species pose a growing threat to the economy, public health, and ecological integrity of nations worldwide. Explaining and predicting the spatial distribution of invasive exotic species is of great importance to prevention and early warning efforts. We are investigating the potential distribution of invasive exotic species, the environmental factors that influence these distributions, and the ability to predict them using statistical and information-theoretic approaches. For some species, detailed presence/absence occurrence data are available, allowing the use of a variety of standard statistical techniques. However, for most species, absence data are not available. Presented with the challenge of developing a model based on presence-only information, we developed an improved logistic regression approach using Information Theory and Frequency Statistics to produce a relative suitability map. This paper generated a variety of distributions of ragweed (Ambrosia artemisiifolia L.) from logistic regression models applied to herbarium specimen location data and a suite of GIS layers including climatic, topographic, and land cover information. Our logistic regression model was based on Akaike's Information Criterion (AIC) from a suite of ecologically reasonable predictor variables. Based on the results we provided a new Frequency Statistical method to compartmentalize habitat-suitability in the native range. Finally, we used the model and the compartmentalized criterion developed in native ranges to "project" a potential distribution onto the exotic ranges to build habitat-suitability maps. ?? Science in China Press 2007.

  1. Statistical Mechanics of Prion Diseases

    NASA Astrophysics Data System (ADS)

    Slepoy, A.; Singh, R. R.; Pázmándi, F.; Kulkarni, R. V.; Cox, D. L.

    2001-07-01

    We present a two-dimensional, lattice based, protein-level statistical mechanical model for prion diseases (e.g., mad cow disease) with concomitant prion protein misfolding and aggregation. Our studies lead us to the hypothesis that the observed broad incubation time distribution in epidemiological data reflect fluctuation dominated growth seeded by a few nanometer scale aggregates, while much narrower incubation time distributions for innoculated lab animals arise from statistical self-averaging. We model ``species barriers'' to prion infection and assess a related treatment protocol.

  2. Estimating order statistics of network degrees

    NASA Astrophysics Data System (ADS)

    Chu, J.; Nadarajah, S.

    2018-01-01

    We model the order statistics of network degrees of big data sets by a range of generalised beta distributions. A three parameter beta distribution due to Libby and Novick (1982) is shown to give the best overall fit for at least four big data sets. The fit of this distribution is significantly better than the fit suggested by Olhede and Wolfe (2012) across the whole range of order statistics for all four data sets.

  3. Statistical distributions of avalanche size and waiting times in an inter-sandpile cascade model

    NASA Astrophysics Data System (ADS)

    Batac, Rene; Longjas, Anthony; Monterola, Christopher

    2012-02-01

    Sandpile-based models have successfully shed light on key features of nonlinear relaxational processes in nature, particularly the occurrence of fat-tailed magnitude distributions and exponential return times, from simple local stress redistributions. In this work, we extend the existing sandpile paradigm into an inter-sandpile cascade, wherein the avalanches emanating from a uniformly-driven sandpile (first layer) is used to trigger the next (second layer), and so on, in a successive fashion. Statistical characterizations reveal that avalanche size distributions evolve from a power-law p(S)≈S-1.3 for the first layer to gamma distributions p(S)≈Sαexp(-S/S0) for layers far away from the uniformly driven sandpile. The resulting avalanche size statistics is found to be associated with the corresponding waiting time distribution, as explained in an accompanying analytic formulation. Interestingly, both the numerical and analytic models show good agreement with actual inventories of non-uniformly driven events in nature.

  4. Random walk to a nonergodic equilibrium concept

    NASA Astrophysics Data System (ADS)

    Bel, G.; Barkai, E.

    2006-01-01

    Random walk models, such as the trap model, continuous time random walks, and comb models, exhibit weak ergodicity breaking, when the average waiting time is infinite. The open question is, what statistical mechanical theory replaces the canonical Boltzmann-Gibbs theory for such systems? In this paper a nonergodic equilibrium concept is investigated, for a continuous time random walk model in a potential field. In particular we show that in the nonergodic phase the distribution of the occupation time of the particle in a finite region of space approaches U- or W-shaped distributions related to the arcsine law. We show that when conditions of detailed balance are applied, these distributions depend on the partition function of the problem, thus establishing a relation between the nonergodic dynamics and canonical statistical mechanics. In the ergodic phase the distribution function of the occupation times approaches a δ function centered on the value predicted based on standard Boltzmann-Gibbs statistics. The relation of our work to single-molecule experiments is briefly discussed.

  5. Statistical methods and neural network approaches for classification of data from multiple sources

    NASA Technical Reports Server (NTRS)

    Benediktsson, Jon Atli; Swain, Philip H.

    1990-01-01

    Statistical methods for classification of data from multiple data sources are investigated and compared to neural network models. A problem with using conventional multivariate statistical approaches for classification of data of multiple types is in general that a multivariate distribution cannot be assumed for the classes in the data sources. Another common problem with statistical classification methods is that the data sources are not equally reliable. This means that the data sources need to be weighted according to their reliability but most statistical classification methods do not have a mechanism for this. This research focuses on statistical methods which can overcome these problems: a method of statistical multisource analysis and consensus theory. Reliability measures for weighting the data sources in these methods are suggested and investigated. Secondly, this research focuses on neural network models. The neural networks are distribution free since no prior knowledge of the statistical distribution of the data is needed. This is an obvious advantage over most statistical classification methods. The neural networks also automatically take care of the problem involving how much weight each data source should have. On the other hand, their training process is iterative and can take a very long time. Methods to speed up the training procedure are introduced and investigated. Experimental results of classification using both neural network models and statistical methods are given, and the approaches are compared based on these results.

  6. On the Seasonality of Sudden Stratospheric Warmings

    NASA Astrophysics Data System (ADS)

    Reichler, T.; Horan, M.

    2017-12-01

    The downward influence of sudden stratospheric warmings (SSWs) creates significant tropospheric circulation anomalies that last for weeks. It is therefore of theoretical and practical interest to understand the time when SSWs are most likely to occur and the controlling factors for the temporal distribution of SSWs. Conceivably, the distribution between mid-winter and late-winter is controlled by the interplay between decreasing eddy convergence in the region of the polar vortex and the weakening strength of the polar vortex. General circulation models (GCMs) tend to produce SSW maxima later in winter than observations, which has been considered as a model deficiency. However, the observed record is short, suggesting that under-sampling of SSWs may contribute to this discrepancy. Here, we study the climatological frequency distribution of SSWs and related events in a long control simulation with a stratosphere resolving GCM. We also create a simple statistical model to determine the primary factors controlling the SSW distribution. The statistical model is based on the daily climatological mean, standard deviation, and autocorrelation of stratospheric winds, and assumes that the winds follow a normal distribution. We find that the null hypothesis, that model and observations stem from the same distribution, cannot be rejected, suggesting that the mid-winter SSW maximum seen in the observations is due to sampling uncertainty. We also find that the statistical model faithfully reproduces the seasonal distribution of SSWs, and that the decreasing climatological strength of the polar vortex is the primary factor for it. We conclude that the late-winter SSW maximum seen in most models is realistic and that late events will be more prominent in future observations. We further conclude that SSWs simply form the tail of normally distributed stratospheric winds, suggesting that there is a continuum of weak polar vortex states and that statistically there is nothing special about the zero-threshold used to define SSWs.

  7. Sample sizes and model comparison metrics for species distribution models

    Treesearch

    B.B. Hanberry; H.S. He; D.C. Dey

    2012-01-01

    Species distribution models use small samples to produce continuous distribution maps. The question of how small a sample can be to produce an accurate model generally has been answered based on comparisons to maximum sample sizes of 200 observations or fewer. In addition, model comparisons often are made with the kappa statistic, which has become controversial....

  8. Modeling and experiments of the adhesion force distribution between particles and a surface.

    PubMed

    You, Siming; Wan, Man Pun

    2014-06-17

    Due to the existence of surface roughness in real surfaces, the adhesion force between particles and the surface where the particles are deposited exhibits certain statistical distributions. Despite the importance of adhesion force distribution in a variety of applications, the current understanding of modeling adhesion force distribution is still limited. In this work, an adhesion force distribution model based on integrating the root-mean-square (RMS) roughness distribution (i.e., the variation of RMS roughness on the surface in terms of location) into recently proposed mean adhesion force models was proposed. The integration was accomplished by statistical analysis and Monte Carlo simulation. A series of centrifuge experiments were conducted to measure the adhesion force distributions between polystyrene particles (146.1 ± 1.99 μm) and various substrates (stainless steel, aluminum and plastic, respectively). The proposed model was validated against the measured adhesion force distributions from this work and another previous study. Based on the proposed model, the effect of RMS roughness distribution on the adhesion force distribution of particles on a rough surface was explored, showing that both the median and standard deviation of adhesion force distribution could be affected by the RMS roughness distribution. The proposed model could predict both van der Waals force and capillary force distributions and consider the multiscale roughness feature, greatly extending the current capability of adhesion force distribution prediction.

  9. Probability and Statistics in Sensor Performance Modeling

    DTIC Science & Technology

    2010-12-01

    language software program is called Environmental Awareness for Sensor and Emitter Employment. Some important numerical issues in the implementation...3 Statistical analysis for measuring sensor performance...complementary cumulative distribution function cdf cumulative distribution function DST decision-support tool EASEE Environmental Awareness of

  10. SMART-DS: Synthetic Models for Advanced, Realistic Testing: Distribution

    Science.gov Websites

    statistical summary of the U.S. distribution systems World-class, high spatial/temporal resolution of solar Systems and Scenarios | Grid Modernization | NREL SMART-DS: Synthetic Models for Advanced , Realistic Testing: Distribution Systems and Scenarios SMART-DS: Synthetic Models for Advanced, Realistic

  11. Variety and volatility in financial markets

    NASA Astrophysics Data System (ADS)

    Lillo, Fabrizio; Mantegna, Rosario N.

    2000-11-01

    We study the price dynamics of stocks traded in a financial market by considering the statistical properties of both a single time series and an ensemble of stocks traded simultaneously. We use the n stocks traded on the New York Stock Exchange to form a statistical ensemble of daily stock returns. For each trading day of our database, we study the ensemble return distribution. We find that a typical ensemble return distribution exists in most of the trading days with the exception of crash and rally days and of the days following these extreme events. We analyze each ensemble return distribution by extracting its first two central moments. We observe that these moments fluctuate in time and are stochastic processes, themselves. We characterize the statistical properties of ensemble return distribution central moments by investigating their probability density functions and temporal correlation properties. In general, time-averaged and portfolio-averaged price returns have different statistical properties. We infer from these differences information about the relative strength of correlation between stocks and between different trading days. Last, we compare our empirical results with those predicted by the single-index model and we conclude that this simple model cannot explain the statistical properties of the second moment of the ensemble return distribution.

  12. Bayesian models: A statistical primer for ecologists

    USGS Publications Warehouse

    Hobbs, N. Thompson; Hooten, Mevin B.

    2015-01-01

    Bayesian modeling has become an indispensable tool for ecological research because it is uniquely suited to deal with complexity in a statistically coherent way. This textbook provides a comprehensive and accessible introduction to the latest Bayesian methods—in language ecologists can understand. Unlike other books on the subject, this one emphasizes the principles behind the computations, giving ecologists a big-picture understanding of how to implement this powerful statistical approach.Bayesian Models is an essential primer for non-statisticians. It begins with a definition of probability and develops a step-by-step sequence of connected ideas, including basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and inference from single and multiple models. This unique book places less emphasis on computer coding, favoring instead a concise presentation of the mathematical statistics needed to understand how and why Bayesian analysis works. It also explains how to write out properly formulated hierarchical Bayesian models and use them in computing, research papers, and proposals.This primer enables ecologists to understand the statistical principles behind Bayesian modeling and apply them to research, teaching, policy, and management.Presents the mathematical and statistical foundations of Bayesian modeling in language accessible to non-statisticiansCovers basic distribution theory, network diagrams, hierarchical models, Markov chain Monte Carlo, and moreDeemphasizes computer coding in favor of basic principlesExplains how to write out properly factored statistical expressions representing Bayesian models

  13. Comparisons of non-Gaussian statistical models in DNA methylation analysis.

    PubMed

    Ma, Zhanyu; Teschendorff, Andrew E; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-06-16

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance.

  14. Comparisons of Non-Gaussian Statistical Models in DNA Methylation Analysis

    PubMed Central

    Ma, Zhanyu; Teschendorff, Andrew E.; Yu, Hong; Taghia, Jalil; Guo, Jun

    2014-01-01

    As a key regulatory mechanism of gene expression, DNA methylation patterns are widely altered in many complex genetic diseases, including cancer. DNA methylation is naturally quantified by bounded support data; therefore, it is non-Gaussian distributed. In order to capture such properties, we introduce some non-Gaussian statistical models to perform dimension reduction on DNA methylation data. Afterwards, non-Gaussian statistical model-based unsupervised clustering strategies are applied to cluster the data. Comparisons and analysis of different dimension reduction strategies and unsupervised clustering methods are presented. Experimental results show that the non-Gaussian statistical model-based methods are superior to the conventional Gaussian distribution-based method. They are meaningful tools for DNA methylation analysis. Moreover, among several non-Gaussian methods, the one that captures the bounded nature of DNA methylation data reveals the best clustering performance. PMID:24937687

  15. Statistical analysis of water-quality data containing multiple detection limits: S-language software for regression on order statistics

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2005-01-01

    Trace contaminants in water, including metals and organics, often are measured at sufficiently low concentrations to be reported only as values below the instrument detection limit. Interpretation of these "less thans" is complicated when multiple detection limits occur. Statistical methods for multiply censored, or multiple-detection limit, datasets have been developed for medical and industrial statistics, and can be employed to estimate summary statistics or model the distributions of trace-level environmental data. We describe S-language-based software tools that perform robust linear regression on order statistics (ROS). The ROS method has been evaluated as one of the most reliable procedures for developing summary statistics of multiply censored data. It is applicable to any dataset that has 0 to 80% of its values censored. These tools are a part of a software library, or add-on package, for the R environment for statistical computing. This library can be used to generate ROS models and associated summary statistics, plot modeled distributions, and predict exceedance probabilities of water-quality standards. ?? 2005 Elsevier Ltd. All rights reserved.

  16. Quantifying risks with exact analytical solutions of derivative pricing distribution

    NASA Astrophysics Data System (ADS)

    Zhang, Kun; Liu, Jing; Wang, Erkang; Wang, Jin

    2017-04-01

    Derivative (i.e. option) pricing is essential for modern financial instrumentations. Despite of the previous efforts, the exact analytical forms of the derivative pricing distributions are still challenging to obtain. In this study, we established a quantitative framework using path integrals to obtain the exact analytical solutions of the statistical distribution for bond and bond option pricing for the Vasicek model. We discuss the importance of statistical fluctuations away from the expected option pricing characterized by the distribution tail and their associations to value at risk (VaR). The framework established here is general and can be applied to other financial derivatives for quantifying the underlying statistical distributions.

  17. Linear models: permutation methods

    USGS Publications Warehouse

    Cade, B.S.; Everitt, B.S.; Howell, D.C.

    2005-01-01

    Permutation tests (see Permutation Based Inference) for the linear model have applications in behavioral studies when traditional parametric assumptions about the error term in a linear model are not tenable. Improved validity of Type I error rates can be achieved with properly constructed permutation tests. Perhaps more importantly, increased statistical power, improved robustness to effects of outliers, and detection of alternative distributional differences can be achieved by coupling permutation inference with alternative linear model estimators. For example, it is well-known that estimates of the mean in linear model are extremely sensitive to even a single outlying value of the dependent variable compared to estimates of the median [7, 19]. Traditionally, linear modeling focused on estimating changes in the center of distributions (means or medians). However, quantile regression allows distributional changes to be estimated in all or any selected part of a distribution or responses, providing a more complete statistical picture that has relevance to many biological questions [6]...

  18. Frequency-selective fading statistics of shallow-water acoustic communication channel with a few multipaths

    NASA Astrophysics Data System (ADS)

    Bae, Minja; Park, Jihyun; Kim, Jongju; Xue, Dandan; Park, Kyu-Chil; Yoon, Jong Rak

    2016-07-01

    The bit error rate of an underwater acoustic communication system is related to multipath fading statistics, which determine the signal-to-noise ratio. The amplitude and delay of each path depend on sea surface roughness, propagation medium properties, and source-to-receiver range as a function of frequency. Therefore, received signals will show frequency-dependent fading. A shallow-water acoustic communication channel generally shows a few strong multipaths that interfere with each other and the resulting interference affects the fading statistics model. In this study, frequency-selective fading statistics are modeled on the basis of the phasor representation of the complex path amplitude. The fading statistics distribution is parameterized by the frequency-dependent constructive or destructive interference of multipaths. At a 16 m depth with a muddy bottom, a wave height of 0.2 m, and source-to-receiver ranges of 100 and 400 m, fading statistics tend to show a Rayleigh distribution at a destructive interference frequency, but a Rice distribution at a constructive interference frequency. The theoretical fading statistics well matched the experimental ones.

  19. Discriminative Random Field Models for Subsurface Contamination Uncertainty Quantification

    NASA Astrophysics Data System (ADS)

    Arshadi, M.; Abriola, L. M.; Miller, E. L.; De Paolis Kaluza, C.

    2017-12-01

    Application of flow and transport simulators for prediction of the release, entrapment, and persistence of dense non-aqueous phase liquids (DNAPLs) and associated contaminant plumes is a computationally intensive process that requires specification of a large number of material properties and hydrologic/chemical parameters. Given its computational burden, this direct simulation approach is particularly ill-suited for quantifying both the expected performance and uncertainty associated with candidate remediation strategies under real field conditions. Prediction uncertainties primarily arise from limited information about contaminant mass distributions, as well as the spatial distribution of subsurface hydrologic properties. Application of direct simulation to quantify uncertainty would, thus, typically require simulating multiphase flow and transport for a large number of permeability and release scenarios to collect statistics associated with remedial effectiveness, a computationally prohibitive process. The primary objective of this work is to develop and demonstrate a methodology that employs measured field data to produce equi-probable stochastic representations of a subsurface source zone that capture the spatial distribution and uncertainty associated with key features that control remediation performance (i.e., permeability and contamination mass). Here we employ probabilistic models known as discriminative random fields (DRFs) to synthesize stochastic realizations of initial mass distributions consistent with known, and typically limited, site characterization data. Using a limited number of full scale simulations as training data, a statistical model is developed for predicting the distribution of contaminant mass (e.g., DNAPL saturation and aqueous concentration) across a heterogeneous domain. Monte-Carlo sampling methods are then employed, in conjunction with the trained statistical model, to generate realizations conditioned on measured borehole data. Performance of the statistical model is illustrated through comparisons of generated realizations with the `true' numerical simulations. Finally, we demonstrate how these realizations can be used to determine statistically optimal locations for further interrogation of the subsurface.

  20. Statistical Maps of Ground Magnetic Disturbance Derived from Global Geospace Models

    NASA Astrophysics Data System (ADS)

    Rigler, E. J.; Wiltberger, M. J.; Love, J. J.

    2017-12-01

    Electric currents in space are the principal driver of magnetic variations measured at Earth's surface. These in turn induce geoelectric fields that present a natural hazard for technological systems like high-voltage power distribution networks. Modern global geospace models can reasonably simulate large-scale geomagnetic response to solar wind variations, but they are less successful at deterministic predictions of intense localized geomagnetic activity that most impacts technological systems on the ground. Still, recent studies have shown that these models can accurately reproduce the spatial statistical distributions of geomagnetic activity, suggesting that their physics are largely correct. Since the magnetosphere is a largely externally driven system, most model-measurement discrepancies probably arise from uncertain boundary conditions. So, with realistic distributions of solar wind parameters to establish its boundary conditions, we use the Lyon-Fedder-Mobarry (LFM) geospace model to build a synthetic multivariate statistical model of gridded ground magnetic disturbance. From this, we analyze the spatial modes of geomagnetic response, regress on available measurements to fill in unsampled locations on the grid, and estimate the global probability distribution of extreme magnetic disturbance. The latter offers a prototype geomagnetic "hazard map", similar to those used to characterize better-known geophysical hazards like earthquakes and floods.

  1. Particle size distributions by transmission electron microscopy: an interlaboratory comparison case study

    PubMed Central

    Rice, Stephen B; Chan, Christopher; Brown, Scott C; Eschbach, Peter; Han, Li; Ensor, David S; Stefaniak, Aleksandr B; Bonevich, John; Vladár, András E; Hight Walker, Angela R; Zheng, Jiwen; Starnes, Catherine; Stromberg, Arnold; Ye, Jia; Grulke, Eric A

    2015-01-01

    This paper reports an interlaboratory comparison that evaluated a protocol for measuring and analysing the particle size distribution of discrete, metallic, spheroidal nanoparticles using transmission electron microscopy (TEM). The study was focused on automated image capture and automated particle analysis. NIST RM8012 gold nanoparticles (30 nm nominal diameter) were measured for area-equivalent diameter distributions by eight laboratories. Statistical analysis was used to (1) assess the data quality without using size distribution reference models, (2) determine reference model parameters for different size distribution reference models and non-linear regression fitting methods and (3) assess the measurement uncertainty of a size distribution parameter by using its coefficient of variation. The interlaboratory area-equivalent diameter mean, 27.6 nm ± 2.4 nm (computed based on a normal distribution), was quite similar to the area-equivalent diameter, 27.6 nm, assigned to NIST RM8012. The lognormal reference model was the preferred choice for these particle size distributions as, for all laboratories, its parameters had lower relative standard errors (RSEs) than the other size distribution reference models tested (normal, Weibull and Rosin–Rammler–Bennett). The RSEs for the fitted standard deviations were two orders of magnitude higher than those for the fitted means, suggesting that most of the parameter estimate errors were associated with estimating the breadth of the distributions. The coefficients of variation for the interlaboratory statistics also confirmed the lognormal reference model as the preferred choice. From quasi-linear plots, the typical range for good fits between the model and cumulative number-based distributions was 1.9 fitted standard deviations less than the mean to 2.3 fitted standard deviations above the mean. Automated image capture, automated particle analysis and statistical evaluation of the data and fitting coefficients provide a framework for assessing nanoparticle size distributions using TEM for image acquisition. PMID:26361398

  2. Pearson-type goodness-of-fit test with bootstrap maximum likelihood estimation.

    PubMed

    Yin, Guosheng; Ma, Yanyuan

    2013-01-01

    The Pearson test statistic is constructed by partitioning the data into bins and computing the difference between the observed and expected counts in these bins. If the maximum likelihood estimator (MLE) of the original data is used, the statistic generally does not follow a chi-squared distribution or any explicit distribution. We propose a bootstrap-based modification of the Pearson test statistic to recover the chi-squared distribution. We compute the observed and expected counts in the partitioned bins by using the MLE obtained from a bootstrap sample. This bootstrap-sample MLE adjusts exactly the right amount of randomness to the test statistic, and recovers the chi-squared distribution. The bootstrap chi-squared test is easy to implement, as it only requires fitting exactly the same model to the bootstrap data to obtain the corresponding MLE, and then constructs the bin counts based on the original data. We examine the test size and power of the new model diagnostic procedure using simulation studies and illustrate it with a real data set.

  3. Probability distribution and statistical properties of spherically compensated cosmic regions in ΛCDM cosmology

    NASA Astrophysics Data System (ADS)

    Alimi, Jean-Michel; de Fromont, Paul

    2018-04-01

    The statistical properties of cosmic structures are well known to be strong probes for cosmology. In particular, several studies tried to use the cosmic void counting number to obtain tight constrains on dark energy. In this paper, we model the statistical properties of these regions using the CoSphere formalism (de Fromont & Alimi) in both primordial and non-linearly evolved Universe in the standard Λ cold dark matter model. This formalism applies similarly for minima (voids) and maxima (such as DM haloes), which are here considered symmetrically. We first derive the full joint Gaussian distribution of CoSphere's parameters in the Gaussian random field. We recover the results of Bardeen et al. only in the limit where the compensation radius becomes very large, i.e. when the central extremum decouples from its cosmic environment. We compute the probability distribution of the compensation size in this primordial field. We show that this distribution is redshift independent and can be used to model cosmic voids size distribution. We also derive the statistical distribution of the peak parameters introduced by Bardeen et al. and discuss their correlation with the cosmic environment. We show that small central extrema with low density are associated with narrow compensation regions with deep compensation density, while higher central extrema are preferentially located in larger but smoother over/under massive regions.

  4. A wavelet-based statistical analysis of FMRI data: I. motivation and data distribution modeling.

    PubMed

    Dinov, Ivo D; Boscardin, John W; Mega, Michael S; Sowell, Elizabeth L; Toga, Arthur W

    2005-01-01

    We propose a new method for statistical analysis of functional magnetic resonance imaging (fMRI) data. The discrete wavelet transformation is employed as a tool for efficient and robust signal representation. We use structural magnetic resonance imaging (MRI) and fMRI to empirically estimate the distribution of the wavelet coefficients of the data both across individuals and spatial locations. An anatomical subvolume probabilistic atlas is used to tessellate the structural and functional signals into smaller regions each of which is processed separately. A frequency-adaptive wavelet shrinkage scheme is employed to obtain essentially optimal estimations of the signals in the wavelet space. The empirical distributions of the signals on all the regions are computed in a compressed wavelet space. These are modeled by heavy-tail distributions because their histograms exhibit slower tail decay than the Gaussian. We discovered that the Cauchy, Bessel K Forms, and Pareto distributions provide the most accurate asymptotic models for the distribution of the wavelet coefficients of the data. Finally, we propose a new model for statistical analysis of functional MRI data using this atlas-based wavelet space representation. In the second part of our investigation, we will apply this technique to analyze a large fMRI dataset involving repeated presentation of sensory-motor response stimuli in young, elderly, and demented subjects.

  5. Earthquakes: Recurrence and Interoccurrence Times

    NASA Astrophysics Data System (ADS)

    Abaimov, S. G.; Turcotte, D. L.; Shcherbakov, R.; Rundle, J. B.; Yakovlev, G.; Goltz, C.; Newman, W. I.

    2008-04-01

    The purpose of this paper is to discuss the statistical distributions of recurrence times of earthquakes. Recurrence times are the time intervals between successive earthquakes at a specified location on a specified fault. Although a number of statistical distributions have been proposed for recurrence times, we argue in favor of the Weibull distribution. The Weibull distribution is the only distribution that has a scale-invariant hazard function. We consider three sets of characteristic earthquakes on the San Andreas fault: (1) The Parkfield earthquakes, (2) the sequence of earthquakes identified by paleoseismic studies at the Wrightwood site, and (3) an example of a sequence of micro-repeating earthquakes at a site near San Juan Bautista. In each case we make a comparison with the applicable Weibull distribution. The number of earthquakes in each of these sequences is too small to make definitive conclusions. To overcome this difficulty we consider a sequence of earthquakes obtained from a one million year “Virtual California” simulation of San Andreas earthquakes. Very good agreement with a Weibull distribution is found. We also obtain recurrence statistics for two other model studies. The first is a modified forest-fire model and the second is a slider-block model. In both cases good agreements with Weibull distributions are obtained. Our conclusion is that the Weibull distribution is the preferred distribution for estimating the risk of future earthquakes on the San Andreas fault and elsewhere.

  6. Colloquium: Statistical mechanics of money, wealth, and income

    NASA Astrophysics Data System (ADS)

    Yakovenko, Victor M.; Rosser, J. Barkley, Jr.

    2009-10-01

    This Colloquium reviews statistical models for money, wealth, and income distributions developed in the econophysics literature since the late 1990s. By analogy with the Boltzmann-Gibbs distribution of energy in physics, it is shown that the probability distribution of money is exponential for certain classes of models with interacting economic agents. Alternative scenarios are also reviewed. Data analysis of the empirical distributions of wealth and income reveals a two-class distribution. The majority of the population belongs to the lower class, characterized by the exponential (“thermal”) distribution, whereas a small fraction of the population in the upper class is characterized by the power-law (“superthermal”) distribution. The lower part is very stable, stationary in time, whereas the upper part is highly dynamical and out of equilibrium.

  7. Making statistical inferences about software reliability

    NASA Technical Reports Server (NTRS)

    Miller, Douglas R.

    1988-01-01

    Failure times of software undergoing random debugging can be modelled as order statistics of independent but nonidentically distributed exponential random variables. Using this model inferences can be made about current reliability and, if debugging continues, future reliability. This model also shows the difficulty inherent in statistical verification of very highly reliable software such as that used by digital avionics in commercial aircraft.

  8. Invariance in the recurrence of large returns and the validation of models of price dynamics

    NASA Astrophysics Data System (ADS)

    Chang, Lo-Bin; Geman, Stuart; Hsieh, Fushing; Hwang, Chii-Ruey

    2013-08-01

    Starting from a robust, nonparametric definition of large returns (“excursions”), we study the statistics of their occurrences, focusing on the recurrence process. The empirical waiting-time distribution between excursions is remarkably invariant to year, stock, and scale (return interval). This invariance is related to self-similarity of the marginal distributions of returns, but the excursion waiting-time distribution is a function of the entire return process and not just its univariate probabilities. Generalized autoregressive conditional heteroskedasticity (GARCH) models, market-time transformations based on volume or trades, and generalized (Lévy) random-walk models all fail to fit the statistical structure of excursions.

  9. Exact extreme-value statistics at mixed-order transitions.

    PubMed

    Bar, Amir; Majumdar, Satya N; Schehr, Grégory; Mukamel, David

    2016-05-01

    We study extreme-value statistics for spatially extended models exhibiting mixed-order phase transitions (MOT). These are phase transitions that exhibit features common to both first-order (discontinuity of the order parameter) and second-order (diverging correlation length) transitions. We consider here the truncated inverse distance squared Ising model, which is a prototypical model exhibiting MOT, and study analytically the extreme-value statistics of the domain lengths The lengths of the domains are identically distributed random variables except for the global constraint that their sum equals the total system size L. In addition, the number of such domains is also a fluctuating variable, and not fixed. In the paramagnetic phase, we show that the distribution of the largest domain length l_{max} converges, in the large L limit, to a Gumbel distribution. However, at the critical point (for a certain range of parameters) and in the ferromagnetic phase, we show that the fluctuations of l_{max} are governed by novel distributions, which we compute exactly. Our main analytical results are verified by numerical simulations.

  10. Evaluating statistical cloud schemes: What can we gain from ground-based remote sensing?

    NASA Astrophysics Data System (ADS)

    Grützun, V.; Quaas, J.; Morcrette, C. J.; Ament, F.

    2013-09-01

    Statistical cloud schemes with prognostic probability distribution functions have become more important in atmospheric modeling, especially since they are in principle scale adaptive and capture cloud physics in more detail. While in theory the schemes have a great potential, their accuracy is still questionable. High-resolution three-dimensional observational data of water vapor and cloud water, which could be used for testing them, are missing. We explore the potential of ground-based remote sensing such as lidar, microwave, and radar to evaluate prognostic distribution moments using the "perfect model approach." This means that we employ a high-resolution weather model as virtual reality and retrieve full three-dimensional atmospheric quantities and virtual ground-based observations. We then use statistics from the virtual observation to validate the modeled 3-D statistics. Since the data are entirely consistent, any discrepancy occurring is due to the method. Focusing on total water mixing ratio, we find that the mean ratio can be evaluated decently but that it strongly depends on the meteorological conditions as to whether the variance and skewness are reliable. Using some simple schematic description of different synoptic conditions, we show how statistics obtained from point or line measurements can be poor at representing the full three-dimensional distribution of water in the atmosphere. We argue that a careful analysis of measurement data and detailed knowledge of the meteorological situation is necessary to judge whether we can use the data for an evaluation of higher moments of the humidity distribution used by a statistical cloud scheme.

  11. EVALUATION OF A NEW MEAN SCALED AND MOMENT ADJUSTED TEST STATISTIC FOR SEM.

    PubMed

    Tong, Xiaoxiao; Bentler, Peter M

    2013-01-01

    Recently a new mean scaled and skewness adjusted test statistic was developed for evaluating structural equation models in small samples and with potentially nonnormal data, but this statistic has received only limited evaluation. The performance of this statistic is compared to normal theory maximum likelihood and two well-known robust test statistics. A modification to the Satorra-Bentler scaled statistic is developed for the condition that sample size is smaller than degrees of freedom. The behavior of the four test statistics is evaluated with a Monte Carlo confirmatory factor analysis study that varies seven sample sizes and three distributional conditions obtained using Headrick's fifth-order transformation to nonnormality. The new statistic performs badly in most conditions except under the normal distribution. The goodness-of-fit χ(2) test based on maximum-likelihood estimation performed well under normal distributions as well as under a condition of asymptotic robustness. The Satorra-Bentler scaled test statistic performed best overall, while the mean scaled and variance adjusted test statistic outperformed the others at small and moderate sample sizes under certain distributional conditions.

  12. Clusters in the distribution of pulsars in period, pulse-width, and age. [statistical analysis/statistical distributions

    NASA Technical Reports Server (NTRS)

    Baker, K. B.; Sturrock, P. A.

    1975-01-01

    The question of whether pulsars form a single group or whether pulsars come in two or more different groups is discussed. It is proposed that such groups might be related to several factors such as the initial creation of the neutron star, or the orientation of the magnetic field axis with the spin axis. Various statistical models are examined.

  13. A note about Gaussian statistics on a sphere

    NASA Astrophysics Data System (ADS)

    Chave, Alan D.

    2015-11-01

    The statistics of directional data on a sphere can be modelled either using the Fisher distribution that is conditioned on the magnitude being unity, in which case the sample space is confined to the unit sphere, or using the latitude-longitude marginal distribution derived from a trivariate Gaussian model that places no constraint on the magnitude. These two distributions are derived from first principles and compared. The Fisher distribution more closely approximates the uniform distribution on a sphere for a given small value of the concentration parameter, while the latitude-longitude marginal distribution is always slightly larger than the Fisher distribution at small off-axis angles for large values of the concentration parameter. Asymptotic analysis shows that the two distributions only become equivalent in the limit of large concentration parameter and very small off-axis angle.

  14. Empirical Correction to the Likelihood Ratio Statistic for Structural Equation Modeling with Many Variables.

    PubMed

    Yuan, Ke-Hai; Tian, Yubin; Yanagihara, Hirokazu

    2015-06-01

    Survey data typically contain many variables. Structural equation modeling (SEM) is commonly used in analyzing such data. The most widely used statistic for evaluating the adequacy of a SEM model is T ML, a slight modification to the likelihood ratio statistic. Under normality assumption, T ML approximately follows a chi-square distribution when the number of observations (N) is large and the number of items or variables (p) is small. However, in practice, p can be rather large while N is always limited due to not having enough participants. Even with a relatively large N, empirical results show that T ML rejects the correct model too often when p is not too small. Various corrections to T ML have been proposed, but they are mostly heuristic. Following the principle of the Bartlett correction, this paper proposes an empirical approach to correct T ML so that the mean of the resulting statistic approximately equals the degrees of freedom of the nominal chi-square distribution. Results show that empirically corrected statistics follow the nominal chi-square distribution much more closely than previously proposed corrections to T ML, and they control type I errors reasonably well whenever N ≥ max(50,2p). The formulations of the empirically corrected statistics are further used to predict type I errors of T ML as reported in the literature, and they perform well.

  15. Statistical Modeling of Retinal Optical Coherence Tomography.

    PubMed

    Amini, Zahra; Rabbani, Hossein

    2016-06-01

    In this paper, a new model for retinal Optical Coherence Tomography (OCT) images is proposed. This statistical model is based on introducing a nonlinear Gaussianization transform to convert the probability distribution function (pdf) of each OCT intra-retinal layer to a Gaussian distribution. The retina is a layered structure and in OCT each of these layers has a specific pdf which is corrupted by speckle noise, therefore a mixture model for statistical modeling of OCT images is proposed. A Normal-Laplace distribution, which is a convolution of a Laplace pdf and Gaussian noise, is proposed as the distribution of each component of this model. The reason for choosing Laplace pdf is the monotonically decaying behavior of OCT intensities in each layer for healthy cases. After fitting a mixture model to the data, each component is gaussianized and all of them are combined by Averaged Maximum A Posterior (AMAP) method. To demonstrate the ability of this method, a new contrast enhancement method based on this statistical model is proposed and tested on thirteen healthy 3D OCTs taken by the Topcon 3D OCT and five 3D OCTs from Age-related Macular Degeneration (AMD) patients, taken by Zeiss Cirrus HD-OCT. Comparing the results with two contending techniques, the prominence of the proposed method is demonstrated both visually and numerically. Furthermore, to prove the efficacy of the proposed method for a more direct and specific purpose, an improvement in the segmentation of intra-retinal layers using the proposed contrast enhancement method as a preprocessing step, is demonstrated.

  16. Meta-analysis of prediction model performance across multiple studies: Which scale helps ensure between-study normality for the C-statistic and calibration measures?

    PubMed

    Snell, Kym Ie; Ensor, Joie; Debray, Thomas Pa; Moons, Karel Gm; Riley, Richard D

    2017-01-01

    If individual participant data are available from multiple studies or clusters, then a prediction model can be externally validated multiple times. This allows the model's discrimination and calibration performance to be examined across different settings. Random-effects meta-analysis can then be used to quantify overall (average) performance and heterogeneity in performance. This typically assumes a normal distribution of 'true' performance across studies. We conducted a simulation study to examine this normality assumption for various performance measures relating to a logistic regression prediction model. We simulated data across multiple studies with varying degrees of variability in baseline risk or predictor effects and then evaluated the shape of the between-study distribution in the C-statistic, calibration slope, calibration-in-the-large, and E/O statistic, and possible transformations thereof. We found that a normal between-study distribution was usually reasonable for the calibration slope and calibration-in-the-large; however, the distributions of the C-statistic and E/O were often skewed across studies, particularly in settings with large variability in the predictor effects. Normality was vastly improved when using the logit transformation for the C-statistic and the log transformation for E/O, and therefore we recommend these scales to be used for meta-analysis. An illustrated example is given using a random-effects meta-analysis of the performance of QRISK2 across 25 general practices.

  17. Comparison of individual-based model output to data using a model of walleye pollock early life history in the Gulf of Alaska

    NASA Astrophysics Data System (ADS)

    Hinckley, Sarah; Parada, Carolina; Horne, John K.; Mazur, Michael; Woillez, Mathieu

    2016-10-01

    Biophysical individual-based models (IBMs) have been used to study aspects of early life history of marine fishes such as recruitment, connectivity of spawning and nursery areas, and marine reserve design. However, there is no consistent approach to validating the spatial outputs of these models. In this study, we hope to rectify this gap. We document additions to an existing individual-based biophysical model for Alaska walleye pollock (Gadus chalcogrammus), some simulations made with this model and methods that were used to describe and compare spatial output of the model versus field data derived from ichthyoplankton surveys in the Gulf of Alaska. We used visual methods (e.g. distributional centroids with directional ellipses), several indices (such as a Normalized Difference Index (NDI), and an Overlap Coefficient (OC), and several statistical methods: the Syrjala method, the Getis-Ord Gi* statistic, and a geostatistical method for comparing spatial indices. We assess the utility of these different methods in analyzing spatial output and comparing model output to data, and give recommendations for their appropriate use. Visual methods are useful for initial comparisons of model and data distributions. Metrics such as the NDI and OC give useful measures of co-location and overlap, but care must be taken in discretizing the fields into bins. The Getis-Ord Gi* statistic is useful to determine the patchiness of the fields. The Syrjala method is an easily implemented statistical measure of the difference between the fields, but does not give information on the details of the distributions. Finally, the geostatistical comparison of spatial indices gives good information of details of the distributions and whether they differ significantly between the model and the data. We conclude that each technique gives quite different information about the model-data distribution comparison, and that some are easy to apply and some more complex. We also give recommendations for a multistep process to validate spatial output from IBMs.

  18. A comparative study of mixed exponential and Weibull distributions in a stochastic model replicating a tropical rainfall process

    NASA Astrophysics Data System (ADS)

    Abas, Norzaida; Daud, Zalina M.; Yusof, Fadhilah

    2014-11-01

    A stochastic rainfall model is presented for the generation of hourly rainfall data in an urban area in Malaysia. In view of the high temporal and spatial variability of rainfall within the tropical rain belt, the Spatial-Temporal Neyman-Scott Rectangular Pulse model was used. The model, which is governed by the Neyman-Scott process, employs a reasonable number of parameters to represent the physical attributes of rainfall. A common approach is to attach each attribute to a mathematical distribution. With respect to rain cell intensity, this study proposes the use of a mixed exponential distribution. The performance of the proposed model was compared to a model that employs the Weibull distribution. Hourly and daily rainfall data from four stations in the Damansara River basin in Malaysia were used as input to the models, and simulations of hourly series were performed for an independent site within the basin. The performance of the models was assessed based on how closely the statistical characteristics of the simulated series resembled the statistics of the observed series. The findings obtained based on graphical representation revealed that the statistical characteristics of the simulated series for both models compared reasonably well with the observed series. However, a further assessment using the AIC, BIC and RMSE showed that the proposed model yields better results. The results of this study indicate that for tropical climates, the proposed model, using a mixed exponential distribution, is the best choice for generation of synthetic data for ungauged sites or for sites with insufficient data within the limit of the fitted region.

  19. Superthermal photon bunching in terms of simple probability distributions

    NASA Astrophysics Data System (ADS)

    Lettau, T.; Leymann, H. A. M.; Melcher, B.; Wiersig, J.

    2018-05-01

    We analyze the second-order photon autocorrelation function g(2 ) with respect to the photon probability distribution and discuss the generic features of a distribution that results in superthermal photon bunching [g(2 )(0 ) >2 ]. Superthermal photon bunching has been reported for a number of optical microcavity systems that exhibit processes such as superradiance or mode competition. We show that a superthermal photon number distribution cannot be constructed from the principle of maximum entropy if only the intensity and the second-order autocorrelation are given. However, for bimodal systems, an unbiased superthermal distribution can be constructed from second-order correlations and the intensities alone. Our findings suggest modeling superthermal single-mode distributions by a mixture of a thermal and a lasinglike state and thus reveal a generic mechanism in the photon probability distribution responsible for creating superthermal photon bunching. We relate our general considerations to a physical system, i.e., a (single-emitter) bimodal laser, and show that its statistics can be approximated and understood within our proposed model. Furthermore, the excellent agreement of the statistics of the bimodal laser and our model reveals that the bimodal laser is an ideal source of bunched photons, in the sense that it can generate statistics that contain no other features but the superthermal bunching.

  20. Three Strategies for the Critical Use of Statistical Methods in Psychological Research

    ERIC Educational Resources Information Center

    Campitelli, Guillermo; Macbeth, Guillermo; Ospina, Raydonal; Marmolejo-Ramos, Fernando

    2017-01-01

    We present three strategies to replace the null hypothesis statistical significance testing approach in psychological research: (1) visual representation of cognitive processes and predictions, (2) visual representation of data distributions and choice of the appropriate distribution for analysis, and (3) model comparison. The three strategies…

  1. A hybrid model for predicting carbon monoxide from vehicular exhausts in urban environments

    NASA Astrophysics Data System (ADS)

    Gokhale, Sharad; Khare, Mukesh

    Several deterministic-based air quality models evaluate and predict the frequently occurring pollutant concentration well but, in general, are incapable of predicting the 'extreme' concentrations. In contrast, the statistical distribution models overcome the above limitation of the deterministic models and predict the 'extreme' concentrations. However, the environmental damages are caused by both extremes as well as by the sustained average concentration of pollutants. Hence, the model should predict not only 'extreme' ranges but also the 'middle' ranges of pollutant concentrations, i.e. the entire range. Hybrid modelling is one of the techniques that estimates/predicts the 'entire range' of the distribution of pollutant concentrations by combining the deterministic based models with suitable statistical distribution models ( Jakeman, et al., 1988). In the present paper, a hybrid model has been developed to predict the carbon monoxide (CO) concentration distributions at one of the traffic intersections, Income Tax Office (ITO), in the Delhi city, where the traffic is heterogeneous in nature and meteorology is 'tropical'. The model combines the general finite line source model (GFLSM) as its deterministic, and log logistic distribution (LLD) model, as its statistical components. The hybrid (GFLSM-LLD) model is then applied at the ITO intersection. The results show that the hybrid model predictions match with that of the observed CO concentration data within the 5-99 percentiles range. The model is further validated at different street location, i.e. Sirifort roadway. The validation results show that the model predicts CO concentrations fairly well ( d=0.91) in 10-95 percentiles range. The regulatory compliance is also developed to estimate the probability of exceedance of hourly CO concentration beyond the National Ambient Air Quality Standards (NAAQS) of India. It consists of light vehicles, heavy vehicles, three- wheelers (auto rickshaws) and two-wheelers (scooters, motorcycles, etc).

  2. Multiplicative point process as a model of trading activity

    NASA Astrophysics Data System (ADS)

    Gontis, V.; Kaulakys, B.

    2004-11-01

    Signals consisting of a sequence of pulses show that inherent origin of the 1/ f noise is a Brownian fluctuation of the average interevent time between subsequent pulses of the pulse sequence. In this paper, we generalize the model of interevent time to reproduce a variety of self-affine time series exhibiting power spectral density S( f) scaling as a power of the frequency f. Furthermore, we analyze the relation between the power-law correlations and the origin of the power-law probability distribution of the signal intensity. We introduce a stochastic multiplicative model for the time intervals between point events and analyze the statistical properties of the signal analytically and numerically. Such model system exhibits power-law spectral density S( f)∼1/ fβ for various values of β, including β= {1}/{2}, 1 and {3}/{2}. Explicit expressions for the power spectra in the low-frequency limit and for the distribution density of the interevent time are obtained. The counting statistics of the events is analyzed analytically and numerically, as well. The specific interest of our analysis is related with the financial markets, where long-range correlations of price fluctuations largely depend on the number of transactions. We analyze the spectral density and counting statistics of the number of transactions. The model reproduces spectral properties of the real markets and explains the mechanism of power-law distribution of trading activity. The study provides evidence that the statistical properties of the financial markets are enclosed in the statistics of the time interval between trades. A multiplicative point process serves as a consistent model generating this statistics.

  3. Supervised variational model with statistical inference and its application in medical image segmentation.

    PubMed

    Li, Changyang; Wang, Xiuying; Eberl, Stefan; Fulham, Michael; Yin, Yong; Dagan Feng, David

    2015-01-01

    Automated and general medical image segmentation can be challenging because the foreground and the background may have complicated and overlapping density distributions in medical imaging. Conventional region-based level set algorithms often assume piecewise constant or piecewise smooth for segments, which are implausible for general medical image segmentation. Furthermore, low contrast and noise make identification of the boundaries between foreground and background difficult for edge-based level set algorithms. Thus, to address these problems, we suggest a supervised variational level set segmentation model to harness the statistical region energy functional with a weighted probability approximation. Our approach models the region density distributions by using the mixture-of-mixtures Gaussian model to better approximate real intensity distributions and distinguish statistical intensity differences between foreground and background. The region-based statistical model in our algorithm can intuitively provide better performance on noisy images. We constructed a weighted probability map on graphs to incorporate spatial indications from user input with a contextual constraint based on the minimization of contextual graphs energy functional. We measured the performance of our approach on ten noisy synthetic images and 58 medical datasets with heterogeneous intensities and ill-defined boundaries and compared our technique to the Chan-Vese region-based level set model, the geodesic active contour model with distance regularization, and the random walker model. Our method consistently achieved the highest Dice similarity coefficient when compared to the other methods.

  4. Statistical Properties of Echosignal Obtained from Human Dermis In Vivo

    NASA Astrophysics Data System (ADS)

    Piotrzkowska, Hanna; Litniewski, Jerzy; Nowicki, Andrzej; Szymańska, Elżbieta

    The paper presents the classification of the healthy skin and the skin lesions (basal cell carcinoma and actinic keratosis), basing on the statistical parameters of the envelope of ultrasonic echoes. The envelope was modeled using Rayleigh and non-Rayleigh (K-distribution) statistics. Furthermore, the characteristic parameter of the K-distribution, the effective number of scatterers was investigated. Also the attenuation coefficient was used for the skin lesion assessment.

  5. A Comparison between the WATCH Flare Data Statistical Properties and Predictions of the Statistical Flare Model

    NASA Astrophysics Data System (ADS)

    Crosby, N.; Georgoulis, M.; Vilmer, N.

    1999-10-01

    Solar burst observations in the deka-keV energy range originating from the WATCH experiment aboard the GRANAT spacecraft were used to perform frequency distributions built on measured X-ray flare parameters (Crosby et al., 1998). The results of the study show that: 1- the overall distribution functions are robust power laws extending over a number of decades. The typical parameters of events (total counts, peak count rates, duration) are all correlated to each other. 2- the overall distribution functions are the convolution of significantly different distribution functions built on parts of the whole data set filtered by the event duration. These "partial" frequency distributions are still power law distributions over several decades, with a slope systematically decreasing with increasing duration. 3- No correlation is found between the elapsed time interval between successive bursts arising from the same active region and the peak intensity of the flare. In this paper, we attempt a tentative comparison between the statistical properties of the self-organized critical (SOC) cellular automaton statistical flare models (see e.g. Lu and Hamilton (1991), Georgoulis and Vlahos (1996, 1998)) and the respective properties of the WATCH flare data. Despite the inherent weaknesses of the SOC models to simulate a number of physical processes in the active region, it is found that most of the observed statistical properties can be reproduced using the SOC models, including the various frequency distributions and scatter plots. We finally conclude that, even if SOC models must be refined to improve the physical links to MHD approaches, they nevertheless represent a good approach to describe the properties of rapid energy dissipation and magnetic field annihilation in complex and magnetized plasmas. Crosby N., Vilmer N., Lund N. and Sunyaev R., A&A; 334; 299-313; 1998 Crosby N., Lund N., Vilmer N. and Sunyaev R.; A&A Supplement Series; 130, 233, 1998 Georgoulis M. and Vlahos L., 1996, Astrophy. J. Letters, 469, L135 Georgoulis M. and Vlahos L., 1998, in preparation Lu E.T. and Hamilton R.J., 1991, Astroph. J., 380, L89

  6. Gear Fatigue Crack Diagnosis by Vibration Analysis Using Embedded Modeling

    DTIC Science & Technology

    2001-04-05

    gave references on Wigner - Ville Distribution ( WVD ) and some statistical based methods including FM4, NA4 and NB4. There are limitations for vibration...Embedded Modeling DISTRIBUTION : Approved for public release, distribution unlimited This paper is part of the following report: TITLE: New Frontiers in

  7. Application of Statistically Derived CPAS Parachute Parameters

    NASA Technical Reports Server (NTRS)

    Romero, Leah M.; Ray, Eric S.

    2013-01-01

    The Capsule Parachute Assembly System (CPAS) Analysis Team is responsible for determining parachute inflation parameters and dispersions that are ultimately used in verifying system requirements. A model memo is internally released semi-annually documenting parachute inflation and other key parameters reconstructed from flight test data. Dispersion probability distributions published in previous versions of the model memo were uniform because insufficient data were available for determination of statistical based distributions. Uniform distributions do not accurately represent the expected distributions since extreme parameter values are just as likely to occur as the nominal value. CPAS has taken incremental steps to move away from uniform distributions. Model Memo version 9 (MMv9) made the first use of non-uniform dispersions, but only for the reefing cutter timing, for which a large number of sample was available. In order to maximize the utility of the available flight test data, clusters of parachutes were reconstructed individually starting with Model Memo version 10. This allowed for statistical assessment for steady-state drag area (CDS) and parachute inflation parameters such as the canopy fill distance (n), profile shape exponent (expopen), over-inflation factor (C(sub k)), and ramp-down time (t(sub k)) distributions. Built-in MATLAB distributions were applied to the histograms, and parameters such as scale (sigma) and location (mu) were output. Engineering judgment was used to determine the "best fit" distribution based on the test data. Results include normal, log normal, and uniform (where available data remains insufficient) fits of nominal and failure (loss of parachute and skipped stage) cases for all CPAS parachutes. This paper discusses the uniform methodology that was previously used, the process and result of the statistical assessment, how the dispersions were incorporated into Monte Carlo analyses, and the application of the distributions in trajectory benchmark testing assessments with parachute inflation parameters, drag area, and reefing cutter timing used by CPAS.

  8. Assessment of corneal properties based on statistical modeling of OCT speckle.

    PubMed

    Jesus, Danilo A; Iskander, D Robert

    2017-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike's Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence.

  9. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable.

    PubMed

    Austin, Peter C; Steyerberg, Ewout W

    2012-06-20

    When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population.

  10. The impacts of precipitation amount simulation on hydrological modeling in Nordic watersheds

    NASA Astrophysics Data System (ADS)

    Li, Zhi; Brissette, Fancois; Chen, Jie

    2013-04-01

    Stochastic modeling of daily precipitation is very important for hydrological modeling, especially when no observed data are available. Precipitation is usually modeled by two component model: occurrence generation and amount simulation. For occurrence simulation, the most common method is the first-order two-state Markov chain due to its simplification and good performance. However, various probability distributions have been reported to simulate precipitation amount, and spatiotemporal differences exist in the applicability of different distribution models. Therefore, assessing the applicability of different distribution models is necessary in order to provide more accurate precipitation information. Six precipitation probability distributions (exponential, Gamma, Weibull, skewed normal, mixed exponential, and hybrid exponential/Pareto distributions) are directly and indirectly evaluated on their ability to reproduce the original observed time series of precipitation amount. Data from 24 weather stations and two watersheds (Chute-du-Diable and Yamaska watersheds) in the province of Quebec (Canada) are used for this assessment. Various indices or statistics, such as the mean, variance, frequency distribution and extreme values are used to quantify the performance in simulating the precipitation and discharge. Performance in reproducing key statistics of the precipitation time series is well correlated to the number of parameters of the distribution function, and the three-parameter precipitation models outperform the other models, with the mixed exponential distribution being the best at simulating daily precipitation. The advantage of using more complex precipitation distributions is not as clear-cut when the simulated time series are used to drive a hydrological model. While the advantage of using functions with more parameters is not nearly as obvious, the mixed exponential distribution appears nonetheless as the best candidate for hydrological modeling. The implications of choosing a distribution function with respect to hydrological modeling and climate change impact studies are also discussed.

  11. Statistics of Optical Coherence Tomography Data From Human Retina

    PubMed Central

    de Juan, Joaquín; Ferrone, Claudia; Giannini, Daniela; Huang, David; Koch, Giorgio; Russo, Valentina; Tan, Ou; Bruni, Carlo

    2010-01-01

    Optical coherence tomography (OCT) has recently become one of the primary methods for noninvasive probing of the human retina. The pseudoimage formed by OCT (the so-called B-scan) varies probabilistically across pixels due to complexities in the measurement technique. Hence, sensitive automatic procedures of diagnosis using OCT may exploit statistical analysis of the spatial distribution of reflectance. In this paper, we perform a statistical study of retinal OCT data. We find that the stretched exponential probability density function can model well the distribution of intensities in OCT pseudoimages. Moreover, we show a small, but significant correlation between neighbor pixels when measuring OCT intensities with pixels of about 5 µm. We then develop a simple joint probability model for the OCT data consistent with known retinal features. This model fits well the stretched exponential distribution of intensities and their spatial correlation. In normal retinas, fit parameters of this model are relatively constant along retinal layers, but varies across layers. However, in retinas with diabetic retinopathy, large spikes of parameter modulation interrupt the constancy within layers, exactly where pathologies are visible. We argue that these results give hope for improvement in statistical pathology-detection methods even when the disease is in its early stages. PMID:20304733

  12. Work distributions of one-dimensional fermions and bosons with dual contact interactions

    NASA Astrophysics Data System (ADS)

    Wang, Bin; Zhang, Jingning; Quan, H. T.

    2018-05-01

    We extend the well-known static duality [M. Girardeau, J. Math. Phys. 1, 516 (1960), 10.1063/1.1703687; T. Cheon and T. Shigehara, Phys. Rev. Lett. 82, 2536 (1999), 10.1103/PhysRevLett.82.2536] between one-dimensional (1D) bosons and 1D fermions to the dynamical version. By utilizing this dynamical duality, we find the duality of nonequilibrium work distributions between interacting 1D bosonic (Lieb-Liniger model) and 1D fermionic (Cheon-Shigehara model) systems with dual contact interactions. As a special case, the work distribution of the Tonks-Girardeau gas is identical to that of 1D noninteracting fermionic system even though their momentum distributions are significantly different. In the classical limit, the work distributions of Lieb-Liniger models (Cheon-Shigehara models) with arbitrary coupling strength converge to that of the 1D noninteracting distinguishable particles, although their elementary excitations (quasiparticles) obey different statistics, e.g., the Bose-Einstein, the Fermi-Dirac, and the fractional statistics. We also present numerical results of the work distributions of Lieb-Liniger model with various coupling strengths, which demonstrate the convergence of work distributions in the classical limit.

  13. Quantum error-correction failure distributions: Comparison of coherent and stochastic error models

    NASA Astrophysics Data System (ADS)

    Barnes, Jeff P.; Trout, Colin J.; Lucarelli, Dennis; Clader, B. D.

    2017-06-01

    We compare failure distributions of quantum error correction circuits for stochastic errors and coherent errors. We utilize a fully coherent simulation of a fault-tolerant quantum error correcting circuit for a d =3 Steane and surface code. We find that the output distributions are markedly different for the two error models, showing that no simple mapping between the two error models exists. Coherent errors create very broad and heavy-tailed failure distributions. This suggests that they are susceptible to outlier events and that mean statistics, such as pseudothreshold estimates, may not provide the key figure of merit. This provides further statistical insight into why coherent errors can be so harmful for quantum error correction. These output probability distributions may also provide a useful metric that can be utilized when optimizing quantum error correcting codes and decoding procedures for purely coherent errors.

  14. Shallow cumuli ensemble statistics for development of a stochastic parameterization

    NASA Astrophysics Data System (ADS)

    Sakradzija, Mirjana; Seifert, Axel; Heus, Thijs

    2014-05-01

    According to a conventional deterministic approach to the parameterization of moist convection in numerical atmospheric models, a given large scale forcing produces an unique response from the unresolved convective processes. This representation leaves out the small-scale variability of convection, as it is known from the empirical studies of deep and shallow convective cloud ensembles, there is a whole distribution of sub-grid states corresponding to the given large scale forcing. Moreover, this distribution gets broader with the increasing model resolution. This behavior is also consistent with our theoretical understanding of a coarse-grained nonlinear system. We propose an approach to represent the variability of the unresolved shallow-convective states, including the dependence of the sub-grid states distribution spread and shape on the model horizontal resolution. Starting from the Gibbs canonical ensemble theory, Craig and Cohen (2006) developed a theory for the fluctuations in a deep convective ensemble. The micro-states of a deep convective cloud ensemble are characterized by the cloud-base mass flux, which, according to the theory, is exponentially distributed (Boltzmann distribution). Following their work, we study the shallow cumulus ensemble statistics and the distribution of the cloud-base mass flux. We employ a Large-Eddy Simulation model (LES) and a cloud tracking algorithm, followed by a conditional sampling of clouds at the cloud base level, to retrieve the information about the individual cloud life cycles and the cloud ensemble as a whole. In the case of shallow cumulus cloud ensemble, the distribution of micro-states is a generalized exponential distribution. Based on the empirical and theoretical findings, a stochastic model has been developed to simulate the shallow convective cloud ensemble and to test the convective ensemble theory. Stochastic model simulates a compound random process, with the number of convective elements drawn from a Poisson distribution, and cloud properties sub-sampled from a generalized ensemble distribution. We study the role of the different cloud subtypes in a shallow convective ensemble and how the diverse cloud properties and cloud lifetimes affect the system macro-state. To what extent does the cloud-base mass flux distribution deviate from the simple Boltzmann distribution and how does it affect the results from the stochastic model? Is the memory, provided by the finite lifetime of individual clouds, of importance for the ensemble statistics? We also test for the minimal information given as an input to the stochastic model, able to reproduce the ensemble mean statistics and the variability in a convective ensemble. An important property of the resulting distribution of the sub-grid convective states is its scale-adaptivity - the smaller the grid-size, the broader the compound distribution of the sub-grid states.

  15. Clustering, randomness and regularity in cloud fields. I - Theoretical considerations. II - Cumulus cloud fields

    NASA Technical Reports Server (NTRS)

    Weger, R. C.; Lee, J.; Zhu, Tianri; Welch, R. M.

    1992-01-01

    The current controversy existing in reference to the regularity vs. clustering in cloud fields is examined by means of analysis and simulation studies based upon nearest-neighbor cumulative distribution statistics. It is shown that the Poisson representation of random point processes is superior to pseudorandom-number-generated models and that pseudorandom-number-generated models bias the observed nearest-neighbor statistics towards regularity. Interpretation of this nearest-neighbor statistics is discussed for many cases of superpositions of clustering, randomness, and regularity. A detailed analysis is carried out of cumulus cloud field spatial distributions based upon Landsat, AVHRR, and Skylab data, showing that, when both large and small clouds are included in the cloud field distributions, the cloud field always has a strong clustering signal.

  16. Probabilistic Graphical Model Representation in Phylogenetics

    PubMed Central

    Höhna, Sebastian; Heath, Tracy A.; Boussau, Bastien; Landis, Michael J.; Ronquist, Fredrik; Huelsenbeck, John P.

    2014-01-01

    Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to statistical model representation and software development. Clear communication and representation of the chosen model is crucial for: (i) reproducibility of an analysis, (ii) model development, and (iii) software design. Moreover, a unified, clear and understandable framework for model representation lowers the barrier for beginners and nonspecialists to grasp complex phylogenetic models, including their assumptions and parameter/variable dependencies. Graphical modeling is a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea is to break complex models into conditionally independent distributions. The strength lies in the comprehensibility, flexibility, and adaptability of this formalism, and the large body of computational work based on it. Graphical models are well-suited to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference. Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and introduce modules to simplify the representation of standard components in large and complex models. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using, for example, Metropolis–Hastings or Gibbs sampling of the posterior distribution. [Computation; graphical models; inference; modularization; statistical phylogenetics; tree plate.] PMID:24951559

  17. The impact of statistical adjustment on conditional standard errors of measurement in the assessment of physician communication skills.

    PubMed

    Raymond, Mark R; Clauser, Brian E; Furman, Gail E

    2010-10-01

    The use of standardized patients to assess communication skills is now an essential part of assessing a physician's readiness for practice. To improve the reliability of communication scores, it has become increasingly common in recent years to use statistical models to adjust ratings provided by standardized patients. This study employed ordinary least squares regression to adjust ratings, and then used generalizability theory to evaluate the impact of these adjustments on score reliability and the overall standard error of measurement. In addition, conditional standard errors of measurement were computed for both observed and adjusted scores to determine whether the improvements in measurement precision were uniform across the score distribution. Results indicated that measurement was generally less precise for communication ratings toward the lower end of the score distribution; and the improvement in measurement precision afforded by statistical modeling varied slightly across the score distribution such that the most improvement occurred in the upper-middle range of the score scale. Possible reasons for these patterns in measurement precision are discussed, as are the limitations of the statistical models used for adjusting performance ratings.

  18. Statistical analyses to support guidelines for marine avian sampling. Final report

    USGS Publications Warehouse

    Kinlan, Brian P.; Zipkin, Elise; O'Connell, Allan F.; Caldow, Chris

    2012-01-01

    Interest in development of offshore renewable energy facilities has led to a need for high-quality, statistically robust information on marine wildlife distributions. A practical approach is described to estimate the amount of sampling effort required to have sufficient statistical power to identify species-specific “hotspots” and “coldspots” of marine bird abundance and occurrence in an offshore environment divided into discrete spatial units (e.g., lease blocks), where “hotspots” and “coldspots” are defined relative to a reference (e.g., regional) mean abundance and/or occurrence probability for each species of interest. For example, a location with average abundance or occurrence that is three times larger the mean (3x effect size) could be defined as a “hotspot,” and a location that is three times smaller than the mean (1/3x effect size) as a “coldspot.” The choice of the effect size used to define hot and coldspots will generally depend on a combination of ecological and regulatory considerations. A method is also developed for testing the statistical significance of possible hotspots and coldspots. Both methods are illustrated with historical seabird survey data from the USGS Avian Compendium Database. Our approach consists of five main components: 1. A review of the primary scientific literature on statistical modeling of animal group size and avian count data to develop a candidate set of statistical distributions that have been used or may be useful to model seabird counts. 2. Statistical power curves for one-sample, one-tailed Monte Carlo significance tests of differences of observed small-sample means from a specified reference distribution. These curves show the power to detect "hotspots" or "coldspots" of occurrence and abundance at a range of effect sizes, given assumptions which we discuss. 3. A model selection procedure, based on maximum likelihood fits of models in the candidate set, to determine an appropriate statistical distribution to describe counts of a given species in a particular region and season. 4. Using a large database of historical at-sea seabird survey data, we applied this technique to identify appropriate statistical distributions for modeling a variety of species, allowing the distribution to vary by season. For each species and season, we used the selected distribution to calculate and map retrospective statistical power to detect hotspots and coldspots, and map pvalues from Monte Carlo significance tests of hotspots and coldspots, in discrete lease blocks designated by the U.S. Department of Interior, Bureau of Ocean Energy Management (BOEM). 5. Because our definition of hotspots and coldspots does not explicitly include variability over time, we examine the relationship between the temporal scale of sampling and the proportion of variance captured in time series of key environmental correlates of marine bird abundance, as well as available marine bird abundance time series, and use these analyses to develop recommendations for the temporal distribution of sampling to adequately represent both shortterm and long-term variability. We conclude by presenting a schematic “decision tree” showing how this power analysis approach would fit in a general framework for avian survey design, and discuss implications of model assumptions and results. We discuss avenues for future development of this work, and recommendations for practical implementation in the context of siting and wildlife assessment for offshore renewable energy development projects.

  19. Approximations to the distribution of a test statistic in covariance structure analysis: A comprehensive study.

    PubMed

    Wu, Hao

    2018-05-01

    In structural equation modelling (SEM), a robust adjustment to the test statistic or to its reference distribution is needed when its null distribution deviates from a χ 2 distribution, which usually arises when data do not follow a multivariate normal distribution. Unfortunately, existing studies on this issue typically focus on only a few methods and neglect the majority of alternative methods in statistics. Existing simulation studies typically consider only non-normal distributions of data that either satisfy asymptotic robustness or lead to an asymptotic scaled χ 2 distribution. In this work we conduct a comprehensive study that involves both typical methods in SEM and less well-known methods from the statistics literature. We also propose the use of several novel non-normal data distributions that are qualitatively different from the non-normal distributions widely used in existing studies. We found that several under-studied methods give the best performance under specific conditions, but the Satorra-Bentler method remains the most viable method for most situations. © 2017 The British Psychological Society.

  20. Noninformative prior in the quantum statistical model of pure states

    NASA Astrophysics Data System (ADS)

    Tanaka, Fuyuhiko

    2012-06-01

    In the present paper, we consider a suitable definition of a noninformative prior on the quantum statistical model of pure states. While the full pure-states model is invariant under unitary rotation and admits the Haar measure, restricted models, which we often see in quantum channel estimation and quantum process tomography, have less symmetry and no compelling rationale for any choice. We adopt a game-theoretic approach that is applicable to classical Bayesian statistics and yields a noninformative prior for a general class of probability distributions. We define the quantum detection game and show that there exist noninformative priors for a general class of a pure-states model. Theoretically, it gives one of the ways that we represent ignorance on the given quantum system with partial information. Practically, our method proposes a default distribution on the model in order to use the Bayesian technique in the quantum-state tomography with a small sample.

  1. Compounding approach for univariate time series with nonstationary variances

    NASA Astrophysics Data System (ADS)

    Schäfer, Rudi; Barkhofen, Sonja; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich

    2015-12-01

    A defining feature of nonstationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent variances. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here, we consider two concrete, but diverse, examples of such nonstationary systems: the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end, we extract the relevant time scales by decomposing the time signals into windows and determine the distribution function of the thus obtained local variances.

  2. Compounding approach for univariate time series with nonstationary variances.

    PubMed

    Schäfer, Rudi; Barkhofen, Sonja; Guhr, Thomas; Stöckmann, Hans-Jürgen; Kuhl, Ulrich

    2015-12-01

    A defining feature of nonstationary systems is the time dependence of their statistical parameters. Measured time series may exhibit Gaussian statistics on short time horizons, due to the central limit theorem. The sample statistics for long time horizons, however, averages over the time-dependent variances. To model the long-term statistical behavior, we compound the local distribution with the distribution of its parameters. Here, we consider two concrete, but diverse, examples of such nonstationary systems: the turbulent air flow of a fan and a time series of foreign exchange rates. Our main focus is to empirically determine the appropriate parameter distribution for the compounding approach. To this end, we extract the relevant time scales by decomposing the time signals into windows and determine the distribution function of the thus obtained local variances.

  3. Dendritic growth model of multilevel marketing

    NASA Astrophysics Data System (ADS)

    Pang, James Christopher S.; Monterola, Christopher P.

    2017-02-01

    Biologically inspired dendritic network growth is utilized to model the evolving connections of a multilevel marketing (MLM) enterprise. Starting from agents at random spatial locations, a network is formed by minimizing a distance cost function controlled by a parameter, termed the balancing factor bf, that weighs the wiring and the path length costs of connection. The paradigm is compared to an actual MLM membership data and is shown to be successful in statistically capturing the membership distribution, better than the previously reported agent based preferential attachment or analytic branching process models. Moreover, it recovers the known empirical statistics of previously studied MLM, specifically: (i) a membership distribution characterized by the existence of peak levels indicating limited growth, and (ii) an income distribution obeying the 80 - 20 Pareto principle. Extensive types of income distributions from uniform to Pareto to a "winner-take-all" kind are also modeled by varying bf. Finally, the robustness of our dendritic growth paradigm to random agent removals is explored and its implications to MLM income distributions are discussed.

  4. A Probabilistic Model of Local Sequence Alignment That Simplifies Statistical Significance Estimation

    PubMed Central

    Eddy, Sean R.

    2008-01-01

    Sequence database searches require accurate estimation of the statistical significance of scores. Optimal local sequence alignment scores follow Gumbel distributions, but determining an important parameter of the distribution (λ) requires time-consuming computational simulation. Moreover, optimal alignment scores are less powerful than probabilistic scores that integrate over alignment uncertainty (“Forward” scores), but the expected distribution of Forward scores remains unknown. Here, I conjecture that both expected score distributions have simple, predictable forms when full probabilistic modeling methods are used. For a probabilistic model of local sequence alignment, optimal alignment bit scores (“Viterbi” scores) are Gumbel-distributed with constant λ = log 2, and the high scoring tail of Forward scores is exponential with the same constant λ. Simulation studies support these conjectures over a wide range of profile/sequence comparisons, using 9,318 profile-hidden Markov models from the Pfam database. This enables efficient and accurate determination of expectation values (E-values) for both Viterbi and Forward scores for probabilistic local alignments. PMID:18516236

  5. Fine-Grained Linguistic Soft Constraints on Statistical Natural Language Processing Models

    DTIC Science & Technology

    2009-01-01

    88 4 Monolingually -Derived Phrasal Paraphrase Generation for Statistical Ma- chine Translation 90 4.1...123 4.4 Spanish-English (S2E) results . . . . . . . . . . . . . . . . . . . . . . 125 4.5 Gains from using larger monolingual corpora for...96 4.2 Visual example of a phrasal distributional profile . . . . . . . . . . . . 103 4.3 Monolingual corpus-based distributional

  6. Empirical Reference Distributions for Networks of Different Size

    PubMed Central

    Smith, Anna; Calder, Catherine A.; Browning, Christopher R.

    2016-01-01

    Network analysis has become an increasingly prevalent research tool across a vast range of scientific fields. Here, we focus on the particular issue of comparing network statistics, i.e. graph-level measures of network structural features, across multiple networks that differ in size. Although “normalized” versions of some network statistics exist, we demonstrate via simulation why direct comparison is often inappropriate. We consider normalizing network statistics relative to a simple fully parameterized reference distribution and demonstrate via simulation how this is an improvement over direct comparison, but still sometimes problematic. We propose a new adjustment method based on a reference distribution constructed as a mixture model of random graphs which reflect the dependence structure exhibited in the observed networks. We show that using simple Bernoulli models as mixture components in this reference distribution can provide adjusted network statistics that are relatively comparable across different network sizes but still describe interesting features of networks, and that this can be accomplished at relatively low computational expense. Finally, we apply this methodology to a collection of ecological networks derived from the Los Angeles Family and Neighborhood Survey activity location data. PMID:27721556

  7. Syndromic surveillance models using Web data: the case of scarlet fever in the UK.

    PubMed

    Samaras, Loukas; García-Barriocanal, Elena; Sicilia, Miguel-Angel

    2012-03-01

    Recent research has shown the potential of Web queries as a source for syndromic surveillance, and existing studies show that these queries can be used as a basis for estimation and prediction of the development of a syndromic disease, such as influenza, using log linear (logit) statistical models. Two alternative models are applied to the relationship between cases and Web queries in this paper. We examine the applicability of using statistical methods to relate search engine queries with scarlet fever cases in the UK, taking advantage of tools to acquire the appropriate data from Google, and using an alternative statistical method based on gamma distributions. The results show that using logit models, the Pearson correlation factor between Web queries and the data obtained from the official agencies must be over 0.90, otherwise the prediction of the peak and the spread of the distributions gives significant deviations. In this paper, we describe the gamma distribution model and show that we can obtain better results in all cases using gamma transformations, and especially in those with a smaller correlation factor.

  8. Effects of the turnover rate on the size distribution of firms: An application of the kinetic exchange models

    NASA Astrophysics Data System (ADS)

    Chakrabarti, Anindya S.

    2012-12-01

    We address the issue of the distribution of firm size. To this end we propose a model of firms in a closed, conserved economy populated with zero-intelligence agents who continuously move from one firm to another. We then analyze the size distribution and related statistics obtained from the model. There are three well known statistical features obtained from the panel study of the firms i.e., the power law in size (in terms of income and/or employment), the Laplace distribution in the growth rates and the slowly declining standard deviation of the growth rates conditional on the firm size. First, we show that the model generalizes the usual kinetic exchange models with binary interaction to interactions between an arbitrary number of agents. When the number of interacting agents is in the order of the system itself, it is possible to decouple the model. We provide exact results on the distributions which are not known yet for binary interactions. Our model easily reproduces the power law for the size distribution of firms (Zipf’s law). The fluctuations in the growth rate falls with increasing size following a power law (though the exponent does not match with the data). However, the distribution of the difference of the firm size in this model has Laplace distribution whereas the real data suggests that the difference of the log of sizes has the same distribution.

  9. Best Statistical Distribution of flood variables for Johor River in Malaysia

    NASA Astrophysics Data System (ADS)

    Salarpour Goodarzi, M.; Yusop, Z.; Yusof, F.

    2012-12-01

    A complex flood event is always characterized by a few characteristics such as flood peak, flood volume, and flood duration, which might be mutually correlated. This study explored the statistical distribution of peakflow, flood duration and flood volume at Rantau Panjang gauging station on the Johor River in Malaysia. Hourly data were recorded for 45 years. The data were analysed based on water year (July - June). Five distributions namely, Log Normal, Generalize Pareto, Log Pearson, Normal and Generalize Extreme Value (GEV) were used to model the distribution of all the three variables. Anderson-Darling and Kolmogorov-Smirnov goodness-of-fit tests were used to evaluate the best fit. Goodness-of-fit tests at 5% level of significance indicate that all the models can be used to model the distribution of peakflow, flood duration and flood volume. However, Generalize Pareto distribution is found to be the most suitable model when tested with the Anderson-Darling test and the, Kolmogorov-Smirnov suggested that GEV is the best for peakflow. The result of this research can be used to improve flood frequency analysis. Comparison between Generalized Extreme Value, Generalized Pareto and Log Pearson distributions in the Cumulative Distribution Function of peakflow

  10. Statistical characteristics of the spatial distribution of territorial contamination by radionuclides from the Chernobyl accident

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arutyunyan, R.V.; Bol`shov, L.A.; Vasil`ev, S.K.

    1994-06-01

    The objective of this study was to clarify a number of issues related to the spatial distribution of contaminants from the Chernobyl accident. The effects of local statistics were addressed by collecting and analyzing (for Cesium 137) soil samples from a number of regions, and it was found that sample activity differed by a factor of 3-5. The effect of local non-uniformity was estimated by modeling the distribution of the average activity of a set of five samples for each of the regions, with the spread in the activities for a {+-}2 range being equal to 25%. The statistical characteristicsmore » of the distribution of contamination were then analyzed and found to be a log-normal distribution with the standard deviation being a function of test area. All data for the Bryanskaya Oblast area were analyzed statistically and were adequately described by a log-normal function.« less

  11. Statistical analyses support power law distributions found in neuronal avalanches.

    PubMed

    Klaus, Andreas; Yu, Shan; Plenz, Dietmar

    2011-01-01

    The size distribution of neuronal avalanches in cortical networks has been reported to follow a power law distribution with exponent close to -1.5, which is a reflection of long-range spatial correlations in spontaneous neuronal activity. However, identifying power law scaling in empirical data can be difficult and sometimes controversial. In the present study, we tested the power law hypothesis for neuronal avalanches by using more stringent statistical analyses. In particular, we performed the following steps: (i) analysis of finite-size scaling to identify scale-free dynamics in neuronal avalanches, (ii) model parameter estimation to determine the specific exponent of the power law, and (iii) comparison of the power law to alternative model distributions. Consistent with critical state dynamics, avalanche size distributions exhibited robust scaling behavior in which the maximum avalanche size was limited only by the spatial extent of sampling ("finite size" effect). This scale-free dynamics suggests the power law as a model for the distribution of avalanche sizes. Using both the Kolmogorov-Smirnov statistic and a maximum likelihood approach, we found the slope to be close to -1.5, which is in line with previous reports. Finally, the power law model for neuronal avalanches was compared to the exponential and to various heavy-tail distributions based on the Kolmogorov-Smirnov distance and by using a log-likelihood ratio test. Both the power law distribution without and with exponential cut-off provided significantly better fits to the cluster size distributions in neuronal avalanches than the exponential, the lognormal and the gamma distribution. In summary, our findings strongly support the power law scaling in neuronal avalanches, providing further evidence for critical state dynamics in superficial layers of cortex.

  12. Does Breast Cancer Drive the Building of Survival Probability Models among States? An Assessment of Goodness of Fit for Patient Data from SEER Registries

    PubMed

    Khan, Hafiz; Saxena, Anshul; Perisetti, Abhilash; Rafiq, Aamrin; Gabbidon, Kemesha; Mende, Sarah; Lyuksyutova, Maria; Quesada, Kandi; Blakely, Summre; Torres, Tiffany; Afesse, Mahlet

    2016-12-01

    Background: Breast cancer is a worldwide public health concern and is the most prevalent type of cancer in women in the United States. This study concerned the best fit of statistical probability models on the basis of survival times for nine state cancer registries: California, Connecticut, Georgia, Hawaii, Iowa, Michigan, New Mexico, Utah, and Washington. Materials and Methods: A probability random sampling method was applied to select and extract records of 2,000 breast cancer patients from the Surveillance Epidemiology and End Results (SEER) database for each of the nine state cancer registries used in this study. EasyFit software was utilized to identify the best probability models by using goodness of fit tests, and to estimate parameters for various statistical probability distributions that fit survival data. Results: Statistical analysis for the summary of statistics is reported for each of the states for the years 1973 to 2012. Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared goodness of fit test values were used for survival data, the highest values of goodness of fit statistics being considered indicative of the best fit survival model for each state. Conclusions: It was found that California, Connecticut, Georgia, Iowa, New Mexico, and Washington followed the Burr probability distribution, while the Dagum probability distribution gave the best fit for Michigan and Utah, and Hawaii followed the Gamma probability distribution. These findings highlight differences between states through selected sociodemographic variables and also demonstrate probability modeling differences in breast cancer survival times. The results of this study can be used to guide healthcare providers and researchers for further investigations into social and environmental factors in order to reduce the occurrence of and mortality due to breast cancer. Creative Commons Attribution License

  13. Reliability Estimation of Aero-engine Based on Mixed Weibull Distribution Model

    NASA Astrophysics Data System (ADS)

    Yuan, Zhongda; Deng, Junxiang; Wang, Dawei

    2018-02-01

    Aero-engine is a complex mechanical electronic system, based on analysis of reliability of mechanical electronic system, Weibull distribution model has an irreplaceable role. Till now, only two-parameter Weibull distribution model and three-parameter Weibull distribution are widely used. Due to diversity of engine failure modes, there is a big error with single Weibull distribution model. By contrast, a variety of engine failure modes can be taken into account with mixed Weibull distribution model, so it is a good statistical analysis model. Except the concept of dynamic weight coefficient, in order to make reliability estimation result more accurately, three-parameter correlation coefficient optimization method is applied to enhance Weibull distribution model, thus precision of mixed distribution reliability model is improved greatly. All of these are advantageous to popularize Weibull distribution model in engineering applications.

  14. Understanding baseball team standings and streaks

    NASA Astrophysics Data System (ADS)

    Sire, C.; Redner, S.

    2009-02-01

    Can one understand the statistics of wins and losses of baseball teams? Are their consecutive-game winning and losing streaks self-reinforcing or can they be described statistically? We apply the Bradley-Terry model, which incorporates the heterogeneity of team strengths in a minimalist way, to answer these questions. Excellent agreement is found between the predictions of the Bradley-Terry model and the rank dependence of the average number team wins and losses in major-league baseball over the past century when the distribution of team strengths is taken to be uniformly distributed over a finite range. Using this uniform strength distribution, we also find very good agreement between model predictions and the observed distribution of consecutive-game team winning and losing streaks over the last half-century; however, the agreement is less good for the previous half-century. The behavior of the last half-century supports the hypothesis that long streaks are primarily statistical in origin with little self-reinforcing component. The data further show that the past half-century of baseball has been more competitive than the preceding half-century.

  15. Discrete time rescaling theorem: determining goodness of fit for discrete time statistical models of neural spiking.

    PubMed

    Haslinger, Robert; Pipa, Gordon; Brown, Emery

    2010-10-01

    One approach for understanding the encoding of information by spike trains is to fit statistical models and then test their goodness of fit. The time-rescaling theorem provides a goodness-of-fit test consistent with the point process nature of spike trains. The interspike intervals (ISIs) are rescaled (as a function of the model's spike probability) to be independent and exponentially distributed if the model is accurate. A Kolmogorov-Smirnov (KS) test between the rescaled ISIs and the exponential distribution is then used to check goodness of fit. This rescaling relies on assumptions of continuously defined time and instantaneous events. However, spikes have finite width, and statistical models of spike trains almost always discretize time into bins. Here we demonstrate that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed. Poor goodness of fit may be erroneously indicated even if the model is exactly correct. We present two adaptations of the time-rescaling theorem to discrete time models. In the first we propose that instead of assuming the rescaled times to be exponential, the reference distribution be estimated through direct simulation by the fitted model. In the second, we prove a discrete time version of the time-rescaling theorem that analytically corrects for the effects of finite resolution. This allows us to define a rescaled time that is exponentially distributed, even at arbitrary temporal discretizations. We demonstrate the efficacy of both techniques by fitting generalized linear models to both simulated spike trains and spike trains recorded experimentally in monkey V1 cortex. Both techniques give nearly identical results, reducing the false-positive rate of the KS test and greatly increasing the reliability of model evaluation based on the time-rescaling theorem.

  16. Novel formulation of the ℳ model through the Generalized-K distribution for atmospheric optical channels.

    PubMed

    Garrido-Balsells, José María; Jurado-Navas, Antonio; Paris, José Francisco; Castillo-Vazquez, Miguel; Puerta-Notario, Antonio

    2015-03-09

    In this paper, a novel and deeper physical interpretation on the recently published Málaga or ℳ statistical distribution is provided. This distribution, which is having a wide acceptance by the scientific community, models the optical irradiance scintillation induced by the atmospheric turbulence. Here, the analytical expressions previously published are modified in order to express them by a mixture of the known Generalized-K and discrete Binomial and Negative Binomial distributions. In particular, the probability density function (pdf) of the ℳ model is now obtained as a linear combination of these Generalized-K pdf, in which the coefficients depend directly on the parameters of the ℳ distribution. In this way, the Málaga model can be physically interpreted as a superposition of different optical sub-channels each of them described by the corresponding Generalized-K fading model and weighted by the ℳ dependent coefficients. The expressions here proposed are simpler than the equations of the original ℳ model and are validated by means of numerical simulations by generating ℳ -distributed random sequences and their associated histogram. This novel interpretation of the Málaga statistical distribution provides a valuable tool for analyzing the performance of atmospheric optical channels for every turbulence condition.

  17. Modeling Error Distributions of Growth Curve Models through Bayesian Methods

    ERIC Educational Resources Information Center

    Zhang, Zhiyong

    2016-01-01

    Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is…

  18. Lognormal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of α-Particle Track Autoradiography

    PubMed Central

    Neti, Prasad V.S.V.; Howell, Roger W.

    2010-01-01

    Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log-normal (LN) distribution function (J Nucl Med. 2006;47:1049–1058) with the aid of autoradiography. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analysis of these earlier data. Methods The measured distributions of α-particle tracks per cell were subjected to statistical tests with Poisson, LN, and Poisson-lognormal (P-LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL of 210Po-citrate. When cells were exposed to 67 kBq/mL, the P-LN distribution function gave a better fit; however, the underlying activity distribution remained log-normal. Conclusion The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:18483086

  19. Log Normal Distribution of Cellular Uptake of Radioactivity: Statistical Analysis of Alpha Particle Track Autoradiography

    PubMed Central

    Neti, Prasad V.S.V.; Howell, Roger W.

    2008-01-01

    Recently, the distribution of radioactivity among a population of cells labeled with 210Po was shown to be well described by a log normal distribution function (J Nucl Med 47, 6 (2006) 1049-1058) with the aid of an autoradiographic approach. To ascertain the influence of Poisson statistics on the interpretation of the autoradiographic data, the present work reports on a detailed statistical analyses of these data. Methods The measured distributions of alpha particle tracks per cell were subjected to statistical tests with Poisson (P), log normal (LN), and Poisson – log normal (P – LN) models. Results The LN distribution function best describes the distribution of radioactivity among cell populations exposed to 0.52 and 3.8 kBq/mL 210Po-citrate. When cells were exposed to 67 kBq/mL, the P – LN distribution function gave a better fit, however, the underlying activity distribution remained log normal. Conclusions The present analysis generally provides further support for the use of LN distributions to describe the cellular uptake of radioactivity. Care should be exercised when analyzing autoradiographic data on activity distributions to ensure that Poisson processes do not distort the underlying LN distribution. PMID:16741316

  20. Influence of the nucleus area distribution on the survival fraction after charged particles broad beam irradiation.

    PubMed

    Wéra, A-C; Barazzuol, L; Jeynes, J C G; Merchant, M J; Suzuki, M; Kirkby, K J

    2014-08-07

    It is well known that broad beam irradiation with heavy ions leads to variation in the number of hit(s) received by each cell as the distribution of particles follows the Poisson statistics. Although the nucleus area will determine the number of hit(s) received for a given dose, variation amongst its irradiated cell population is generally not considered. In this work, we investigate the effect of the nucleus area's distribution on the survival fraction. More specifically, this work aims to explain the deviation, or tail, which might be observed in the survival fraction at high irradiation doses. For this purpose, the nucleus area distribution was added to the beam Poisson statistics and the Linear-Quadratic model in order to fit the experimental data. As shown in this study, nucleus size variation, and the associated Poisson statistics, can lead to an upward survival trend after broad beam irradiation. The influence of the distribution parameters (mean area and standard deviation) was studied using a normal distribution, along with the Linear-Quadratic model parameters (α and β). Finally, the model proposed here was successfully tested to the survival fraction of LN18 cells irradiated with a 85 keV µm(- 1) carbon ion broad beam for which the distribution in the area of the nucleus had been determined.

  1. Interactions and triggering in a 3D rate and state asperity model

    NASA Astrophysics Data System (ADS)

    Dublanchet, P.; Bernard, P.

    2012-12-01

    Precise relocation of micro-seismicity and careful analysis of seismic source parameters have progressively imposed the concept of seismic asperities embedded in a creeping fault segment as being one of the most important aspect that should appear in a realistic representation of micro-seismic sources. Another important issue concerning micro-seismic activity is the existence of robust empirical laws describing the temporal and magnitude distribution of earthquakes, such as the Omori law, the distribution of inter-event time and the Gutenberg-Richter law. In this framework, this study aims at understanding statistical properties of earthquakes, by generating synthetic catalogs with a 3D, quasi-dynamic continuous rate and state asperity model, that takes into account a realistic geometry of asperities. Our approach contrasts with ETAS models (Kagan and Knopoff, 1981) usually implemented to produce earthquake catalogs, in the sense that the non linearity observed in rock friction experiments (Dieterich, 1979) is fully taken into account by the use of rate and state friction law. Furthermore, our model differs from discrete models of faults (Ziv and Cochard, 2006) because the continuity allows us to define realistic geometries and distributions of asperities by the assembling of sub-critical computational cells that always fail in a single event. Moreover, this model allows us to adress the question of the influence of barriers and distribution of asperities on the event statistics. After recalling the main observations of asperities in the specific case of Parkfield segment of San-Andreas Fault, we analyse earthquake statistical properties computed for this area. Then, we present synthetic statistics obtained by our model that allow us to discuss the role of barriers on clustering and triggering phenomena among a population of sources. It appears that an effective size of barrier, that depends on its frictional strength, controls the presence or the absence, in the synthetic catalog, of statistical laws that are similar to what is observed for real earthquakes. As an application, we attempt to draw a comparison between synthetic statistics and the observed statistics of Parkfield in order to characterize what could be a realistic frictional model of Parkfield area. More generally, we obtained synthetic statistical properties that are in agreement with power-law decays characterized by exponents that match the observations at a global scale, showing that our mechanical model is able to provide new insights into the understanding of earthquake interaction processes in general.

  2. scoringRules - A software package for probabilistic model evaluation

    NASA Astrophysics Data System (ADS)

    Lerch, Sebastian; Jordan, Alexander; Krüger, Fabian

    2016-04-01

    Models in the geosciences are generally surrounded by uncertainty, and being able to quantify this uncertainty is key to good decision making. Accordingly, probabilistic forecasts in the form of predictive distributions have become popular over the last decades. With the proliferation of probabilistic models arises the need for decision theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way. Various scoring rules have been developed over the past decades to address this demand. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. As such, they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This poster presents the software package scoringRules for the statistical programming language R, which contains functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. Two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, Bayesian forecasts produced via Markov Chain Monte Carlo take this form. Thereby, the scoringRules package provides a framework for generalized model evaluation that both includes Bayesian as well as classical parametric models. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices.

  3. Discrete Time Rescaling Theorem: Determining Goodness of Fit for Discrete Time Statistical Models of Neural Spiking

    PubMed Central

    Haslinger, Robert; Pipa, Gordon; Brown, Emery

    2010-01-01

    One approach for understanding the encoding of information by spike trains is to fit statistical models and then test their goodness of fit. The time rescaling theorem provides a goodness of fit test consistent with the point process nature of spike trains. The interspike intervals (ISIs) are rescaled (as a function of the model’s spike probability) to be independent and exponentially distributed if the model is accurate. A Kolmogorov Smirnov (KS) test between the rescaled ISIs and the exponential distribution is then used to check goodness of fit. This rescaling relies upon assumptions of continuously defined time and instantaneous events. However spikes have finite width and statistical models of spike trains almost always discretize time into bins. Here we demonstrate that finite temporal resolution of discrete time models prevents their rescaled ISIs from being exponentially distributed. Poor goodness of fit may be erroneously indicated even if the model is exactly correct. We present two adaptations of the time rescaling theorem to discrete time models. In the first we propose that instead of assuming the rescaled times to be exponential, the reference distribution be estimated through direct simulation by the fitted model. In the second, we prove a discrete time version of the time rescaling theorem which analytically corrects for the effects of finite resolution. This allows us to define a rescaled time which is exponentially distributed, even at arbitrary temporal discretizations. We demonstrate the efficacy of both techniques by fitting Generalized Linear Models (GLMs) to both simulated spike trains and spike trains recorded experimentally in monkey V1 cortex. Both techniques give nearly identical results, reducing the false positive rate of the KS test and greatly increasing the reliability of model evaluation based upon the time rescaling theorem. PMID:20608868

  4. RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics

    PubMed Central

    Alves, Gelio; Ogurtsov, Aleksey Y; Yu, Yi-Kuo

    2007-01-01

    Background The key to mass-spectrometry-based proteomics is peptide identification. A major challenge in peptide identification is to obtain realistic E-values when assigning statistical significance to candidate peptides. Results Using a simple scoring scheme, we propose a database search method with theoretically characterized statistics. Taking into account possible skewness in the random variable distribution and the effect of finite sampling, we provide a theoretical derivation for the tail of the score distribution. For every experimental spectrum examined, we collect the scores of peptides in the database, and find good agreement between the collected score statistics and our theoretical distribution. Using Student's t-tests, we quantify the degree of agreement between the theoretical distribution and the score statistics collected. The T-tests may be used to measure the reliability of reported statistics. When combined with reported P-value for a peptide hit using a score distribution model, this new measure prevents exaggerated statistics. Another feature of RAId_DbS is its capability of detecting multiple co-eluted peptides. The peptide identification performance and statistical accuracy of RAId_DbS are assessed and compared with several other search tools. The executables and data related to RAId_DbS are freely available upon request. PMID:17961253

  5. Study of Analytic Statistical Model for Decay of Light and Medium Mass Nuclei in Nuclear Fragmentation

    NASA Technical Reports Server (NTRS)

    Cucinotta, Francis A.; Wilson, John W.

    1996-01-01

    The angular momentum independent statistical decay model is often applied using a Monte-Carlo simulation to describe the decay of prefragment nuclei in heavy ion reactions. This paper presents an analytical approach to the decay problem of nuclei with mass number less than 60, which is important for galactic cosmic ray (GCR) studies. This decay problem of nuclei with mass number less than 60 incorporates well-known levels of the lightest nuclei (A less than 11) to improve convergence and accuracy. A sensitivity study of the model level density function is used to determine the impact on mass and charge distributions in nuclear fragmentation. This angular momentum independent statistical decay model also describes the momentum and energy distribution of emitted particles (n, p, d, t, h, and a) from a prefragment nucleus.

  6. Sandpile-based model for capturing magnitude distributions and spatiotemporal clustering and separation in regional earthquakes

    NASA Astrophysics Data System (ADS)

    Batac, Rene C.; Paguirigan, Antonino A., Jr.; Tarun, Anjali B.; Longjas, Anthony G.

    2017-04-01

    We propose a cellular automata model for earthquake occurrences patterned after the sandpile model of self-organized criticality (SOC). By incorporating a single parameter describing the probability to target the most susceptible site, the model successfully reproduces the statistical signatures of seismicity. The energy distributions closely follow power-law probability density functions (PDFs) with a scaling exponent of around -1. 6, consistent with the expectations of the Gutenberg-Richter (GR) law, for a wide range of the targeted triggering probability values. Additionally, for targeted triggering probabilities within the range 0.004-0.007, we observe spatiotemporal distributions that show bimodal behavior, which is not observed previously for the original sandpile. For this critical range of values for the probability, model statistics show remarkable comparison with long-period empirical data from earthquakes from different seismogenic regions. The proposed model has key advantages, the foremost of which is the fact that it simultaneously captures the energy, space, and time statistics of earthquakes by just introducing a single parameter, while introducing minimal parameters in the simple rules of the sandpile. We believe that the critical targeting probability parameterizes the memory that is inherently present in earthquake-generating regions.

  7. Statistical models for the distribution of modulus of elasticity and modulus of rupture in lumber with implications for reliability calculations

    Treesearch

    Steve P. Verrill; Frank C. Owens; David E. Kretschmann; Rubin Shmulsky

    2017-01-01

    It is common practice to assume that a two-parameter Weibull probability distribution is suitable for modeling lumber properties. Verrill and co-workers demonstrated theoretically and empirically that the modulus of rupture (MOR) distribution of visually graded or machine stress rated (MSR) lumber is not distributed as a Weibull. Instead, the tails of the MOR...

  8. Modeling the Test-Retest Statistics of a Localization Experiment in the Full Horizontal Plane.

    PubMed

    Morsnowski, André; Maune, Steffen

    2016-10-01

    Two approaches to model the test-retest statistics of a localization experiment basing on Gaussian distribution and on surrogate data are introduced. Their efficiency is investigated using different measures describing directional hearing ability. A localization experiment in the full horizontal plane is a challenging task for hearing impaired patients. In clinical routine, we use this experiment to evaluate the progress of our cochlear implant (CI) recipients. Listening and time effort limit the reproducibility. The localization experiment consists of a 12 loudspeaker circle, placed in an anechoic room, a "camera silens". In darkness, HSM sentences are presented at 65 dB pseudo-erratically from all 12 directions with five repetitions. This experiment is modeled by a set of Gaussian distributions with different standard deviations added to a perfect estimator, as well as by surrogate data. Five repetitions per direction are used to produce surrogate data distributions for the sensation directions. To investigate the statistics, we retrospectively use the data of 33 CI patients with 92 pairs of test-retest-measurements from the same day. The first model does not take inversions into account, (i.e., permutations of the direction from back to front and vice versa are not considered), although they are common for hearing impaired persons particularly in the rear hemisphere. The second model considers these inversions but does not work with all measures. The introduced models successfully describe test-retest statistics of directional hearing. However, since their applications on the investigated measures perform differently no general recommendation can be provided. The presented test-retest statistics enable pair test comparisons for localization experiments.

  9. Football goal distributions and extremal statistics

    NASA Astrophysics Data System (ADS)

    Greenhough, J.; Birch, P. C.; Chapman, S. C.; Rowlands, G.

    2002-12-01

    We analyse the distributions of the number of goals scored by home teams, away teams, and the total scored in the match, in domestic football games from 169 countries between 1999 and 2001. The probability density functions (PDFs) of goals scored are too heavy-tailed to be fitted over their entire ranges by Poisson or negative binomial distributions which would be expected for uncorrelated processes. Log-normal distributions cannot include zero scores and here we find that the PDFs are consistent with those arising from extremal statistics. In addition, we show that it is sufficient to model English top division and FA Cup matches in the seasons of 1970/71-2000/01 on Poisson or negative binomial distributions, as reported in analyses of earlier seasons, and that these are not consistent with extremal statistics.

  10. A nonparametric spatial scan statistic for continuous data.

    PubMed

    Jung, Inkyung; Cho, Ho Jin

    2015-10-20

    Spatial scan statistics are widely used for spatial cluster detection, and several parametric models exist. For continuous data, a normal-based scan statistic can be used. However, the performance of the model has not been fully evaluated for non-normal data. We propose a nonparametric spatial scan statistic based on the Wilcoxon rank-sum test statistic and compared the performance of the method with parametric models via a simulation study under various scenarios. The nonparametric method outperforms the normal-based scan statistic in terms of power and accuracy in almost all cases under consideration in the simulation study. The proposed nonparametric spatial scan statistic is therefore an excellent alternative to the normal model for continuous data and is especially useful for data following skewed or heavy-tailed distributions.

  11. Statistics on continuous IBD data: Exact distribution evaluation for a pair of full(half)-sibs and a pair of a (great-) grandchild with a (great-) grandparent

    PubMed Central

    Stefanov, Valeri T

    2002-01-01

    Background Pairs of related individuals are widely used in linkage analysis. Most of the tests for linkage analysis are based on statistics associated with identity by descent (IBD) data. The current biotechnology provides data on very densely packed loci, and therefore, it may provide almost continuous IBD data for pairs of closely related individuals. Therefore, the distribution theory for statistics on continuous IBD data is of interest. In particular, distributional results which allow the evaluation of p-values for relevant tests are of importance. Results A technology is provided for numerical evaluation, with any given accuracy, of the cumulative probabilities of some statistics on continuous genome data for pairs of closely related individuals. In the case of a pair of full-sibs, the following statistics are considered: (i) the proportion of genome with 2 (at least 1) haplotypes shared identical-by-descent (IBD) on a chromosomal segment, (ii) the number of distinct pieces (subsegments) of a chromosomal segment, on each of which exactly 2 (at least 1) haplotypes are shared IBD. The natural counterparts of these statistics for the other relationships are also considered. Relevant Maple codes are provided for a rapid evaluation of the cumulative probabilities of such statistics. The genomic continuum model, with Haldane's model for the crossover process, is assumed. Conclusions A technology, together with relevant software codes for its automated implementation, are provided for exact evaluation of the distributions of relevant statistics associated with continuous genome data on closely related individuals. PMID:11996673

  12. A superstatistical model of metastasis and cancer survival

    NASA Astrophysics Data System (ADS)

    Leon Chen, L.; Beck, Christian

    2008-05-01

    We introduce a superstatistical model for the progression statistics of malignant cancer cells. The metastatic cascade is modeled as a complex nonequilibrium system with several macroscopic pathways and inverse-chi-square distributed parameters of the underlying Poisson processes. The predictions of the model are in excellent agreement with observed survival-time probability distributions of breast cancer patients.

  13. Uncertainties on Networks

    DTIC Science & Technology

    2011-06-03

    distribution, p. The Erdos- Renyi model (E-R model) has been widely used in the past to capture the probability distributions of ADGs (Erdos and Renyi ...experimental data. Journal of the American Statistical Association, 103:778-789. Erdos, R and Renyi , A. (1959). On random graphs, I

  14. A global goodness-of-fit statistic for Cox regression models.

    PubMed

    Parzen, M; Lipsitz, S R

    1999-06-01

    In this paper, a global goodness-of-fit test statistic for a Cox regression model, which has an approximate chi-squared distribution when the model has been correctly specified, is proposed. Our goodness-of-fit statistic is global and has power to detect if interactions or higher order powers of covariates in the model are needed. The proposed statistic is similar to the Hosmer and Lemeshow (1980, Communications in Statistics A10, 1043-1069) goodness-of-fit statistic for binary data as well as Schoenfeld's (1980, Biometrika 67, 145-153) statistic for the Cox model. The methods are illustrated using data from a Mayo Clinic trial in primary billiary cirrhosis of the liver (Fleming and Harrington, 1991, Counting Processes and Survival Analysis), in which the outcome is the time until liver transplantation or death. The are 17 possible covariates. Two Cox proportional hazards models are fit to the data, and the proposed goodness-of-fit statistic is applied to the fitted models.

  15. [Rank distributions in community ecology from the statistical viewpoint].

    PubMed

    Maksimov, V N

    2004-01-01

    Traditional statistical methods for definition of empirical functions of abundance distribution (population, biomass, production, etc.) of species in a community are applicable for processing of multivariate data contained in the above quantitative indices of the communities. In particular, evaluation of moments of distribution suffices for convolution of the data contained in a list of species and their abundance. At the same time, the species should be ranked in the list in ascending rather than descending population and the distribution models should be analyzed on the basis of the data on abundant species only.

  16. Sub-poissonian photon statistics in the coherent state Jaynes-Cummings model in non-resonance

    NASA Astrophysics Data System (ADS)

    Zhang, Jia-tai; Fan, An-fu

    1992-03-01

    We study a model with a two-level atom (TLA) non-resonance interacting with a single-mode quantized cavity field (QCF). The photon number probability function, the mean photon number and Mandel's fluctuation parameter are calculated. The sub-Poissonian distributions of the photon statistics are obtained in non-resonance interaction. This statistical properties are strongly dependent on the detuning parameters.

  17. Monitoring Method of Cow Anthrax Based on Gis and Spatial Statistical Analysis

    NASA Astrophysics Data System (ADS)

    Li, Lin; Yang, Yong; Wang, Hongbin; Dong, Jing; Zhao, Yujun; He, Jianbin; Fan, Honggang

    Geographic information system (GIS) is a computer application system, which possesses the ability of manipulating spatial information and has been used in many fields related with the spatial information management. Many methods and models have been established for analyzing animal diseases distribution models and temporal-spatial transmission models. Great benefits have been gained from the application of GIS in animal disease epidemiology. GIS is now a very important tool in animal disease epidemiological research. Spatial analysis function of GIS can be widened and strengthened by using spatial statistical analysis, allowing for the deeper exploration, analysis, manipulation and interpretation of spatial pattern and spatial correlation of the animal disease. In this paper, we analyzed the cow anthrax spatial distribution characteristics in the target district A (due to the secret of epidemic data we call it district A) based on the established GIS of the cow anthrax in this district in combination of spatial statistical analysis and GIS. The Cow anthrax is biogeochemical disease, and its geographical distribution is related closely to the environmental factors of habitats and has some spatial characteristics, and therefore the correct analysis of the spatial distribution of anthrax cow for monitoring and the prevention and control of anthrax has a very important role. However, the application of classic statistical methods in some areas is very difficult because of the pastoral nomadic context. The high mobility of livestock and the lack of enough suitable sampling for the some of the difficulties in monitoring currently make it nearly impossible to apply rigorous random sampling methods. It is thus necessary to develop an alternative sampling method, which could overcome the lack of sampling and meet the requirements for randomness. The GIS computer application software ArcGIS9.1 was used to overcome the lack of data of sampling sites.Using ArcGIS 9.1 and GEODA to analyze the cow anthrax spatial distribution of district A. we gained some conclusions about cow anthrax' density: (1) there is a spatial clustering model. (2) there is an intensely spatial autocorrelation. We established a prediction model to estimate the anthrax distribution based on the spatial characteristic of the density of cow anthrax. Comparing with the true distribution, the prediction model has a well coincidence and is feasible to the application. The method using a GIS tool facilitates can be implemented significantly in the cow anthrax monitoring and investigation, and the space statistics - related prediction model provides a fundamental use for other study on space-related animal diseases.

  18. Demonstration of fundamental statistics by studying timing of electronics signals in a physics-based laboratory

    NASA Astrophysics Data System (ADS)

    Beach, Shaun E.; Semkow, Thomas M.; Remling, David J.; Bradt, Clayton J.

    2017-07-01

    We have developed accessible methods to demonstrate fundamental statistics in several phenomena, in the context of teaching electronic signal processing in a physics-based college-level curriculum. A relationship between the exponential time-interval distribution and Poisson counting distribution for a Markov process with constant rate is derived in a novel way and demonstrated using nuclear counting. Negative binomial statistics is demonstrated as a model for overdispersion and justified by the effect of electronic noise in nuclear counting. The statistics of digital packets on a computer network are shown to be compatible with the fractal-point stochastic process leading to a power-law as well as generalized inverse Gaussian density distributions of time intervals between packets.

  19. Testing statistical self-similarity in the topology of river networks

    USGS Publications Warehouse

    Troutman, Brent M.; Mantilla, Ricardo; Gupta, Vijay K.

    2010-01-01

    Recent work has demonstrated that the topological properties of real river networks deviate significantly from predictions of Shreve's random model. At the same time the property of mean self-similarity postulated by Tokunaga's model is well supported by data. Recently, a new class of network model called random self-similar networks (RSN) that combines self-similarity and randomness has been introduced to replicate important topological features observed in real river networks. We investigate if the hypothesis of statistical self-similarity in the RSN model is supported by data on a set of 30 basins located across the continental United States that encompass a wide range of hydroclimatic variability. We demonstrate that the generators of the RSN model obey a geometric distribution, and self-similarity holds in a statistical sense in 26 of these 30 basins. The parameters describing the distribution of interior and exterior generators are tested to be statistically different and the difference is shown to produce the well-known Hack's law. The inter-basin variability of RSN parameters is found to be statistically significant. We also test generator dependence on two climatic indices, mean annual precipitation and radiative index of dryness. Some indication of climatic influence on the generators is detected, but this influence is not statistically significant with the sample size available. Finally, two key applications of the RSN model to hydrology and geomorphology are briefly discussed.

  20. Comment on Pisarenko et al., "Characterization of the Tail of the Distribution of Earthquake Magnitudes by Combining the GEV and GPD Descriptions of Extreme Value Theory"

    NASA Astrophysics Data System (ADS)

    Raschke, Mathias

    2016-02-01

    In this short note, I comment on the research of Pisarenko et al. (Pure Appl. Geophys 171:1599-1624, 2014) regarding the extreme value theory and statistics in the case of earthquake magnitudes. The link between the generalized extreme value distribution (GEVD) as an asymptotic model for the block maxima of a random variable and the generalized Pareto distribution (GPD) as a model for the peaks over threshold (POT) of the same random variable is presented more clearly. Inappropriately, Pisarenko et al. (Pure Appl. Geophys 171:1599-1624, 2014) have neglected to note that the approximations by GEVD and GPD work only asymptotically in most cases. This is particularly the case with truncated exponential distribution (TED), a popular distribution model for earthquake magnitudes. I explain why the classical models and methods of the extreme value theory and statistics do not work well for truncated exponential distributions. Consequently, these classical methods should be used for the estimation of the upper bound magnitude and corresponding parameters. Furthermore, I comment on various issues of statistical inference in Pisarenko et al. and propose alternatives. I argue why GPD and GEVD would work for various types of stochastic earthquake processes in time, and not only for the homogeneous (stationary) Poisson process as assumed by Pisarenko et al. (Pure Appl. Geophys 171:1599-1624, 2014). The crucial point of earthquake magnitudes is the poor convergence of their tail distribution to the GPD, and not the earthquake process over time.

  1. Occupation times and ergodicity breaking in biased continuous time random walks

    NASA Astrophysics Data System (ADS)

    Bel, Golan; Barkai, Eli

    2005-12-01

    Continuous time random walk (CTRW) models are widely used to model diffusion in condensed matter. There are two classes of such models, distinguished by the convergence or divergence of the mean waiting time. Systems with finite average sojourn time are ergodic and thus Boltzmann-Gibbs statistics can be applied. We investigate the statistical properties of CTRW models with infinite average sojourn time; in particular, the occupation time probability density function is obtained. It is shown that in the non-ergodic phase the distribution of the occupation time of the particle on a given lattice point exhibits bimodal U or trimodal W shape, related to the arcsine law. The key points are as follows. (a) In a CTRW with finite or infinite mean waiting time, the distribution of the number of visits on a lattice point is determined by the probability that a member of an ensemble of particles in equilibrium occupies the lattice point. (b) The asymmetry parameter of the probability distribution function of occupation times is related to the Boltzmann probability and to the partition function. (c) The ensemble average is given by Boltzmann-Gibbs statistics for either finite or infinite mean sojourn time, when detailed balance conditions hold. (d) A non-ergodic generalization of the Boltzmann-Gibbs statistical mechanics for systems with infinite mean sojourn time is found.

  2. Computational statistics using the Bayesian Inference Engine

    NASA Astrophysics Data System (ADS)

    Weinberg, Martin D.

    2013-09-01

    This paper introduces the Bayesian Inference Engine (BIE), a general parallel, optimized software package for parameter inference and model selection. This package is motivated by the analysis needs of modern astronomical surveys and the need to organize and reuse expensive derived data. The BIE is the first platform for computational statistics designed explicitly to enable Bayesian update and model comparison for astronomical problems. Bayesian update is based on the representation of high-dimensional posterior distributions using metric-ball-tree based kernel density estimation. Among its algorithmic offerings, the BIE emphasizes hybrid tempered Markov chain Monte Carlo schemes that robustly sample multimodal posterior distributions in high-dimensional parameter spaces. Moreover, the BIE implements a full persistence or serialization system that stores the full byte-level image of the running inference and previously characterized posterior distributions for later use. Two new algorithms to compute the marginal likelihood from the posterior distribution, developed for and implemented in the BIE, enable model comparison for complex models and data sets. Finally, the BIE was designed to be a collaborative platform for applying Bayesian methodology to astronomy. It includes an extensible object-oriented and easily extended framework that implements every aspect of the Bayesian inference. By providing a variety of statistical algorithms for all phases of the inference problem, a scientist may explore a variety of approaches with a single model and data implementation. Additional technical details and download details are available from http://www.astro.umass.edu/bie. The BIE is distributed under the GNU General Public License.

  3. Assessment of corneal properties based on statistical modeling of OCT speckle

    PubMed Central

    Jesus, Danilo A.; Iskander, D. Robert

    2016-01-01

    A new approach to assess the properties of the corneal micro-structure in vivo based on the statistical modeling of speckle obtained from Optical Coherence Tomography (OCT) is presented. A number of statistical models were proposed to fit the corneal speckle data obtained from OCT raw image. Short-term changes in corneal properties were studied by inducing corneal swelling whereas age-related changes were observed analyzing data of sixty-five subjects aged between twenty-four and seventy-three years. Generalized Gamma distribution has shown to be the best model, in terms of the Akaike’s Information Criterion, to fit the OCT corneal speckle. Its parameters have shown statistically significant differences (Kruskal-Wallis, p < 0.001) for short and age-related corneal changes. In addition, it was observed that age-related changes influence the corneal biomechanical behaviour when corneal swelling is induced. This study shows that Generalized Gamma distribution can be utilized to modeling corneal speckle in OCT in vivo providing complementary quantified information where micro-structure of corneal tissue is of essence. PMID:28101409

  4. Modeling the Effects of Solar Cell Distribution on Optical Cross Section for Solar Panel Simulation

    DTIC Science & Technology

    2012-09-01

    cell material. The solar panel was created as a CAD model and simulated with the imaging facility parameters with TASAT. TASAT uses a BRDF to apply...1 MODELING THE EFFECTS OF SOLAR CELL DISTRIBUTION ON OPTICAL CROSS SECTION FOR SOLAR PANEL SIMULATION Kelly Feirstine Meiling Klein... model of a solar panel with various solar cell tip and tilt distribution statistics. Modeling a solar panel as a single sheet of “solar cell” material

  5. Constraining the noise-free distribution of halo spin parameters

    NASA Astrophysics Data System (ADS)

    Benson, Andrew J.

    2017-11-01

    Any measurement made using an N-body simulation is subject to noise due to the finite number of particles used to sample the dark matter distribution function, and the lack of structure below the simulation resolution. This noise can be particularly significant when attempting to measure intrinsically small quantities, such as halo spin. In this work, we develop a model to describe the effects of particle noise on halo spin parameters. This model is calibrated using N-body simulations in which the particle noise can be treated as a Poisson process on the underlying dark matter distribution function, and we demonstrate that this calibrated model reproduces measurements of halo spin parameter error distributions previously measured in N-body convergence studies. Utilizing this model, along with previous measurements of the distribution of halo spin parameters in N-body simulations, we place constraints on the noise-free distribution of halo spins. We find that the noise-free median spin is 3 per cent lower than that measured directly from the N-body simulation, corresponding to a shift of approximately 40 times the statistical uncertainty in this measurement arising purely from halo counting statistics. We also show that measurement of the spin of an individual halo to 10 per cent precision requires at least 4 × 104 particles in the halo - for haloes containing 200 particles, the fractional error on spins measured for individual haloes is of order unity. N-body simulations should be viewed as the results of a statistical experiment applied to a model of dark matter structure formation. When viewed in this way, it is clear that determination of any quantity from such a simulation should be made through forward modelling of the effects of particle noise.

  6. Robustness of location estimators under t-distributions: a literature review

    NASA Astrophysics Data System (ADS)

    Sumarni, C.; Sadik, K.; Notodiputro, K. A.; Sartono, B.

    2017-03-01

    The assumption of normality is commonly used in estimation of parameters in statistical modelling, but this assumption is very sensitive to outliers. The t-distribution is more robust than the normal distribution since the t-distributions have longer tails. The robustness measures of location estimators under t-distributions are reviewed and discussed in this paper. For the purpose of illustration we use the onion yield data which includes outliers as a case study and showed that the t model produces better fit than the normal model.

  7. Thermodynamic Model of Spatial Memory

    NASA Astrophysics Data System (ADS)

    Kaufman, Miron; Allen, P.

    1998-03-01

    We develop and test a thermodynamic model of spatial memory. Our model is an application of statistical thermodynamics to cognitive science. It is related to applications of the statistical mechanics framework in parallel distributed processes research. Our macroscopic model allows us to evaluate an entropy associated with spatial memory tasks. We find that older adults exhibit higher levels of entropy than younger adults. Thurstone's Law of Categorical Judgment, according to which the discriminal processes along the psychological continuum produced by presentations of a single stimulus are normally distributed, is explained by using a Hooke spring model of spatial memory. We have also analyzed a nonlinear modification of the ideal spring model of spatial memory. This work is supported by NIH/NIA grant AG09282-06.

  8. Simulating statistics of lightning-induced and man made fires

    NASA Astrophysics Data System (ADS)

    Krenn, R.; Hergarten, S.

    2009-04-01

    The frequency-area distributions of forest fires show power-law behavior with scaling exponents α in a quite narrow range, relating wildfire research to the theoretical framework of self-organized criticality. Examples of self-organized critical behavior can be found in computer simulations of simple cellular automata. The established self-organized critical Drossel-Schwabl forest fire model (DS-FFM) is one of the most widespread models in this context. Despite its qualitative agreement with event-size statistics from nature, its applicability is still questioned. Apart from general concerns that the DS-FFM apparently oversimplifies the complex nature of forest dynamics, it significantly overestimates the frequency of large fires. We present a straightforward modification of the model rules that increases the scaling exponent α by approximately 1•3 and brings the simulated event-size statistics close to those observed in nature. In addition, combined simulations of both the original and the modified model predict a dependence of the overall distribution on the ratio of lightning induced and man made fires as well as a difference between their respective event-size statistics. The increase of the scaling exponent with decreasing lightning probability as well as the splitting of the partial distributions are confirmed by the analysis of the Canadian Large Fire Database. As a consequence, lightning induced and man made forest fires cannot be treated separately in wildfire modeling, hazard assessment and forest management.

  9. SU-E-T-503: IMRT Optimization Using Monte Carlo Dose Engine: The Effect of Statistical Uncertainty.

    PubMed

    Tian, Z; Jia, X; Graves, Y; Uribe-Sanchez, A; Jiang, S

    2012-06-01

    With the development of ultra-fast GPU-based Monte Carlo (MC) dose engine, it becomes clinically realistic to compute the dose-deposition coefficients (DDC) for IMRT optimization using MC simulation. However, it is still time-consuming if we want to compute DDC with small statistical uncertainty. This work studies the effects of the statistical error in DDC matrix on IMRT optimization. The MC-computed DDC matrices are simulated here by adding statistical uncertainties at a desired level to the ones generated with a finite-size pencil beam algorithm. A statistical uncertainty model for MC dose calculation is employed. We adopt a penalty-based quadratic optimization model and gradient descent method to optimize fluence map and then recalculate the corresponding actual dose distribution using the noise-free DDC matrix. The impacts of DDC noise are assessed in terms of the deviation of the resulted dose distributions. We have also used a stochastic perturbation theory to theoretically estimate the statistical errors of dose distributions on a simplified optimization model. A head-and-neck case is used to investigate the perturbation to IMRT plan due to MC's statistical uncertainty. The relative errors of the final dose distributions of the optimized IMRT are found to be much smaller than those in the DDC matrix, which is consistent with our theoretical estimation. When history number is decreased from 108 to 106, the dose-volume-histograms are still very similar to the error-free DVHs while the error in DDC is about 3.8%. The results illustrate that the statistical errors in the DDC matrix have a relatively small effect on IMRT optimization in dose domain. This indicates we can use relatively small number of histories to obtain the DDC matrix with MC simulation within a reasonable amount of time, without considerably compromising the accuracy of the optimized treatment plan. This work is supported by Varian Medical Systems through a Master Research Agreement. © 2012 American Association of Physicists in Medicine.

  10. TH-CD-202-07: A Methodology for Generating Numerical Phantoms for Radiation Therapy Using Geometric Attribute Distribution Models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Dolly, S; Chen, H; Mutic, S

    Purpose: A persistent challenge for the quality assessment of radiation therapy treatments (e.g. contouring accuracy) is the absence of the known, ground truth for patient data. Moreover, assessment results are often patient-dependent. Computer simulation studies utilizing numerical phantoms can be performed for quality assessment with a known ground truth. However, previously reported numerical phantoms do not include the statistical properties of inter-patient variations, as their models are based on only one patient. In addition, these models do not incorporate tumor data. In this study, a methodology was developed for generating numerical phantoms which encapsulate the statistical variations of patients withinmore » radiation therapy, including tumors. Methods: Based on previous work in contouring assessment, geometric attribute distribution (GAD) models were employed to model both the deterministic and stochastic properties of individual organs via principle component analysis. Using pre-existing radiation therapy contour data, the GAD models are trained to model the shape and centroid distributions of each organ. Then, organs with different shapes and positions can be generated by assigning statistically sound weights to the GAD model parameters. Organ contour data from 20 retrospective prostate patient cases were manually extracted and utilized to train the GAD models. As a demonstration, computer-simulated CT images of generated numerical phantoms were calculated and assessed subjectively and objectively for realism. Results: A cohort of numerical phantoms of the male human pelvis was generated. CT images were deemed realistic both subjectively and objectively in terms of image noise power spectrum. Conclusion: A methodology has been developed to generate realistic numerical anthropomorphic phantoms using pre-existing radiation therapy data. The GAD models guarantee that generated organs span the statistical distribution of observed radiation therapy patients, according to the training dataset. The methodology enables radiation therapy treatment assessment with multi-modality imaging and a known ground truth, and without patient-dependent bias.« less

  11. Classroom Research: Assessment of Student Understanding of Sampling Distributions of Means and the Central Limit Theorem in Post-Calculus Probability and Statistics Classes

    ERIC Educational Resources Information Center

    Lunsford, M. Leigh; Rowell, Ginger Holmes; Goodson-Espy, Tracy

    2006-01-01

    We applied a classroom research model to investigate student understanding of sampling distributions of sample means and the Central Limit Theorem in post-calculus introductory probability and statistics courses. Using a quantitative assessment tool developed by previous researchers and a qualitative assessment tool developed by the authors, we…

  12. Self-affirmation model for football goal distributions

    NASA Astrophysics Data System (ADS)

    Bittner, E.; Nußbaumer, A.; Janke, W.; Weigel, M.

    2007-06-01

    Analyzing football score data with statistical techniques, we investigate how the highly co-operative nature of the game is reflected in averaged properties such as the distributions of scored goals for the home and away teams. It turns out that in particular the tails of the distributions are not well described by independent Bernoulli trials, but rather well modeled by negative binomial or generalized extreme value distributions. To understand this behavior from first principles, we suggest to modify the Bernoulli random process to include a simple component of self-affirmation which seems to describe the data surprisingly well and allows to interpret the observed deviation from Gaussian statistics. The phenomenological distributions used before can be understood as special cases within this framework. We analyzed historical football score data from many leagues in Europe as well as from international tournaments and found the proposed models to be applicable rather universally. In particular, here we compare men's and women's leagues and the separate German leagues during the cold war times and find some remarkable differences.

  13. Statistical distributions of extreme dry spell in Peninsular Malaysia

    NASA Astrophysics Data System (ADS)

    Zin, Wan Zawiah Wan; Jemain, Abdul Aziz

    2010-11-01

    Statistical distributions of annual extreme (AE) series and partial duration (PD) series for dry-spell event are analyzed for a database of daily rainfall records of 50 rain-gauge stations in Peninsular Malaysia, with recording period extending from 1975 to 2004. The three-parameter generalized extreme value (GEV) and generalized Pareto (GP) distributions are considered to model both series. In both cases, the parameters of these two distributions are fitted by means of the L-moments method, which provides a robust estimation of them. The goodness-of-fit (GOF) between empirical data and theoretical distributions are then evaluated by means of the L-moment ratio diagram and several goodness-of-fit tests for each of the 50 stations. It is found that for the majority of stations, the AE and PD series are well fitted by the GEV and GP models, respectively. Based on the models that have been identified, we can reasonably predict the risks associated with extreme dry spells for various return periods.

  14. MODELING FISH AND SHELLFISH DISTRIBUTIONS IN THE MOBILE BAY ESTUARY, USA

    EPA Science Inventory

    Estuaries in the Gulf of Mexico provide rich habitat for many fish and shellfish, including those that have been identified as economically and ecologically important. For the Mobile Bay estuary, we developed statistical models to relate distributions of individual species and sp...

  15. Direction dependence analysis: A framework to test the direction of effects in linear models with an implementation in SPSS.

    PubMed

    Wiedermann, Wolfgang; Li, Xintong

    2018-04-16

    In nonexperimental data, at least three possible explanations exist for the association of two variables x and y: (1) x is the cause of y, (2) y is the cause of x, or (3) an unmeasured confounder is present. Statistical tests that identify which of the three explanatory models fits best would be a useful adjunct to the use of theory alone. The present article introduces one such statistical method, direction dependence analysis (DDA), which assesses the relative plausibility of the three explanatory models on the basis of higher-moment information about the variables (i.e., skewness and kurtosis). DDA involves the evaluation of three properties of the data: (1) the observed distributions of the variables, (2) the residual distributions of the competing models, and (3) the independence properties of the predictors and residuals of the competing models. When the observed variables are nonnormally distributed, we show that DDA components can be used to uniquely identify each explanatory model. Statistical inference methods for model selection are presented, and macros to implement DDA in SPSS are provided. An empirical example is given to illustrate the approach. Conceptual and empirical considerations are discussed for best-practice applications in psychological data, and sample size recommendations based on previous simulation studies are provided.

  16. Time-dependent quantum oscillator as attenuator and amplifier: noise and statistical evolutions

    NASA Astrophysics Data System (ADS)

    Portes, D.; Rodrigues, H.; Duarte, S. B.; Baseia, B.

    2004-10-01

    We revisit the quantum oscillator, modelled as a time-dependent LC-circuit. Nonclassical properties concerned with attenuation and amplification regions are considered, as well as time evolution of quantum noise and statistics, with emphasis on revivals of the statistical distribution.

  17. A study of finite mixture model: Bayesian approach on financial time series data

    NASA Astrophysics Data System (ADS)

    Phoong, Seuk-Yen; Ismail, Mohd Tahir

    2014-07-01

    Recently, statistician have emphasized on the fitting finite mixture model by using Bayesian method. Finite mixture model is a mixture of distributions in modeling a statistical distribution meanwhile Bayesian method is a statistical method that use to fit the mixture model. Bayesian method is being used widely because it has asymptotic properties which provide remarkable result. In addition, Bayesian method also shows consistency characteristic which means the parameter estimates are close to the predictive distributions. In the present paper, the number of components for mixture model is studied by using Bayesian Information Criterion. Identify the number of component is important because it may lead to an invalid result. Later, the Bayesian method is utilized to fit the k-component mixture model in order to explore the relationship between rubber price and stock market price for Malaysia, Thailand, Philippines and Indonesia. Lastly, the results showed that there is a negative effect among rubber price and stock market price for all selected countries.

  18. A Conway-Maxwell-Poisson (CMP) model to address data dispersion on positron emission tomography.

    PubMed

    Santarelli, Maria Filomena; Della Latta, Daniele; Scipioni, Michele; Positano, Vincenzo; Landini, Luigi

    2016-10-01

    Positron emission tomography (PET) in medicine exploits the properties of positron-emitting unstable nuclei. The pairs of γ- rays emitted after annihilation are revealed by coincidence detectors and stored as projections in a sinogram. It is well known that radioactive decay follows a Poisson distribution; however, deviation from Poisson statistics occurs on PET projection data prior to reconstruction due to physical effects, measurement errors, correction of deadtime, scatter, and random coincidences. A model that describes the statistical behavior of measured and corrected PET data can aid in understanding the statistical nature of the data: it is a prerequisite to develop efficient reconstruction and processing methods and to reduce noise. The deviation from Poisson statistics in PET data could be described by the Conway-Maxwell-Poisson (CMP) distribution model, which is characterized by the centring parameter λ and the dispersion parameter ν, the latter quantifying the deviation from a Poisson distribution model. In particular, the parameter ν allows quantifying over-dispersion (ν<1) or under-dispersion (ν>1) of data. A simple and efficient method for λ and ν parameters estimation is introduced and assessed using Monte Carlo simulation for a wide range of activity values. The application of the method to simulated and experimental PET phantom data demonstrated that the CMP distribution parameters could detect deviation from the Poisson distribution both in raw and corrected PET data. It may be usefully implemented in image reconstruction algorithms and quantitative PET data analysis, especially in low counting emission data, as in dynamic PET data, where the method demonstrated the best accuracy. Copyright © 2016 Elsevier Ltd. All rights reserved.

  19. Uncertainties in obtaining high reliability from stress-strength models

    NASA Technical Reports Server (NTRS)

    Neal, Donald M.; Matthews, William T.; Vangel, Mark G.

    1992-01-01

    There has been a recent interest in determining high statistical reliability in risk assessment of aircraft components. The potential consequences are identified of incorrectly assuming a particular statistical distribution for stress or strength data used in obtaining the high reliability values. The computation of the reliability is defined as the probability of the strength being greater than the stress over the range of stress values. This method is often referred to as the stress-strength model. A sensitivity analysis was performed involving a comparison of reliability results in order to evaluate the effects of assuming specific statistical distributions. Both known population distributions, and those that differed slightly from the known, were considered. Results showed substantial differences in reliability estimates even for almost nondetectable differences in the assumed distributions. These differences represent a potential problem in using the stress-strength model for high reliability computations, since in practice it is impossible to ever know the exact (population) distribution. An alternative reliability computation procedure is examined involving determination of a lower bound on the reliability values using extreme value distributions. This procedure reduces the possibility of obtaining nonconservative reliability estimates. Results indicated the method can provide conservative bounds when computing high reliability. An alternative reliability computation procedure is examined involving determination of a lower bound on the reliability values using extreme value distributions. This procedure reduces the possibility of obtaining nonconservative reliability estimates. Results indicated the method can provide conservative bounds when computing high reliability.

  20. Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable

    PubMed Central

    2012-01-01

    Background When outcomes are binary, the c-statistic (equivalent to the area under the Receiver Operating Characteristic curve) is a standard measure of the predictive accuracy of a logistic regression model. Methods An analytical expression was derived under the assumption that a continuous explanatory variable follows a normal distribution in those with and without the condition. We then conducted an extensive set of Monte Carlo simulations to examine whether the expressions derived under the assumption of binormality allowed for accurate prediction of the empirical c-statistic when the explanatory variable followed a normal distribution in the combined sample of those with and without the condition. We also examine the accuracy of the predicted c-statistic when the explanatory variable followed a gamma, log-normal or uniform distribution in combined sample of those with and without the condition. Results Under the assumption of binormality with equality of variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the product of the standard deviation of the normal components (reflecting more heterogeneity) and the log-odds ratio (reflecting larger effects). Under the assumption of binormality with unequal variances, the c-statistic follows a standard normal cumulative distribution function with dependence on the standardized difference of the explanatory variable in those with and without the condition. In our Monte Carlo simulations, we found that these expressions allowed for reasonably accurate prediction of the empirical c-statistic when the distribution of the explanatory variable was normal, gamma, log-normal, and uniform in the entire sample of those with and without the condition. Conclusions The discriminative ability of a continuous explanatory variable cannot be judged by its odds ratio alone, but always needs to be considered in relation to the heterogeneity of the population. PMID:22716998

  1. Gene Level Meta-Analysis of Quantitative Traits by Functional Linear Models.

    PubMed

    Fan, Ruzong; Wang, Yifan; Boehnke, Michael; Chen, Wei; Li, Yun; Ren, Haobo; Lobach, Iryna; Xiong, Momiao

    2015-08-01

    Meta-analysis of genetic data must account for differences among studies including study designs, markers genotyped, and covariates. The effects of genetic variants may differ from population to population, i.e., heterogeneity. Thus, meta-analysis of combining data of multiple studies is difficult. Novel statistical methods for meta-analysis are needed. In this article, functional linear models are developed for meta-analyses that connect genetic data to quantitative traits, adjusting for covariates. The models can be used to analyze rare variants, common variants, or a combination of the two. Both likelihood-ratio test (LRT) and F-distributed statistics are introduced to test association between quantitative traits and multiple variants in one genetic region. Extensive simulations are performed to evaluate empirical type I error rates and power performance of the proposed tests. The proposed LRT and F-distributed statistics control the type I error very well and have higher power than the existing methods of the meta-analysis sequence kernel association test (MetaSKAT). We analyze four blood lipid levels in data from a meta-analysis of eight European studies. The proposed methods detect more significant associations than MetaSKAT and the P-values of the proposed LRT and F-distributed statistics are usually much smaller than those of MetaSKAT. The functional linear models and related test statistics can be useful in whole-genome and whole-exome association studies. Copyright © 2015 by the Genetics Society of America.

  2. Statistical methods for the beta-binomial model in teratology.

    PubMed Central

    Yamamoto, E; Yanagimoto, T

    1994-01-01

    The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716

  3. A meta-analysis and statistical modelling of nitrates in groundwater at the African scale

    NASA Astrophysics Data System (ADS)

    Ouedraogo, Issoufou; Vanclooster, Marnik

    2016-06-01

    Contamination of groundwater with nitrate poses a major health risk to millions of people around Africa. Assessing the space-time distribution of this contamination, as well as understanding the factors that explain this contamination, is important for managing sustainable drinking water at the regional scale. This study aims to assess the variables that contribute to nitrate pollution in groundwater at the African scale by statistical modelling. We compiled a literature database of nitrate concentration in groundwater (around 250 studies) and combined it with digital maps of physical attributes such as soil, geology, climate, hydrogeology, and anthropogenic data for statistical model development. The maximum, medium, and minimum observed nitrate concentrations were analysed. In total, 13 explanatory variables were screened to explain observed nitrate pollution in groundwater. For the mean nitrate concentration, four variables are retained in the statistical explanatory model: (1) depth to groundwater (shallow groundwater, typically < 50 m); (2) recharge rate; (3) aquifer type; and (4) population density. The first three variables represent intrinsic vulnerability of groundwater systems to pollution, while the latter variable is a proxy for anthropogenic pollution pressure. The model explains 65 % of the variation of mean nitrate contamination in groundwater at the African scale. Using the same proxy information, we could develop a statistical model for the maximum nitrate concentrations that explains 42 % of the nitrate variation. For the maximum concentrations, other environmental attributes such as soil type, slope, rainfall, climate class, and region type improve the prediction of maximum nitrate concentrations at the African scale. As to minimal nitrate concentrations, in the absence of normal distribution assumptions of the data set, we do not develop a statistical model for these data. The data-based statistical model presented here represents an important step towards developing tools that will allow us to accurately predict nitrate distribution at the African scale and thus may support groundwater monitoring and water management that aims to protect groundwater systems. Yet they should be further refined and validated when more detailed and harmonized data become available and/or combined with more conceptual descriptions of the fate of nutrients in the hydrosystem.

  4. The use of imputed sibling genotypes in sibship-based association analysis: on modeling alternatives, power and model misspecification.

    PubMed

    Minică, Camelia C; Dolan, Conor V; Hottenga, Jouke-Jan; Willemsen, Gonneke; Vink, Jacqueline M; Boomsma, Dorret I

    2013-05-01

    When phenotypic, but no genotypic data are available for relatives of participants in genetic association studies, previous research has shown that family-based imputed genotypes can boost the statistical power when included in such studies. Here, using simulations, we compared the performance of two statistical approaches suitable to model imputed genotype data: the mixture approach, which involves the full distribution of the imputed genotypes and the dosage approach, where the mean of the conditional distribution features as the imputed genotype. Simulations were run by varying sibship size, size of the phenotypic correlations among siblings, imputation accuracy and minor allele frequency of the causal SNP. Furthermore, as imputing sibling data and extending the model to include sibships of size two or greater requires modeling the familial covariance matrix, we inquired whether model misspecification affects power. Finally, the results obtained via simulations were empirically verified in two datasets with continuous phenotype data (height) and with a dichotomous phenotype (smoking initiation). Across the settings considered, the mixture and the dosage approach are equally powerful and both produce unbiased parameter estimates. In addition, the likelihood-ratio test in the linear mixed model appears to be robust to the considered misspecification in the background covariance structure, given low to moderate phenotypic correlations among siblings. Empirical results show that the inclusion in association analysis of imputed sibling genotypes does not always result in larger test statistic. The actual test statistic may drop in value due to small effect sizes. That is, if the power benefit is small, that the change in distribution of the test statistic under the alternative is relatively small, the probability is greater of obtaining a smaller test statistic. As the genetic effects are typically hypothesized to be small, in practice, the decision on whether family-based imputation could be used as a means to increase power should be informed by prior power calculations and by the consideration of the background correlation.

  5. Software for Statistical Analysis of Weibull Distributions with Application to Gear Fatigue Data: User Manual with Verification

    NASA Technical Reports Server (NTRS)

    Krantz, Timothy L.

    2002-01-01

    The Weibull distribution has been widely adopted for the statistical description and inference of fatigue data. This document provides user instructions, examples, and verification for software to analyze gear fatigue test data. The software was developed presuming the data are adequately modeled using a two-parameter Weibull distribution. The calculations are based on likelihood methods, and the approach taken is valid for data that include type 1 censoring. The software was verified by reproducing results published by others.

  6. Software for Statistical Analysis of Weibull Distributions with Application to Gear Fatigue Data: User Manual with Verification

    NASA Technical Reports Server (NTRS)

    Kranz, Timothy L.

    2002-01-01

    The Weibull distribution has been widely adopted for the statistical description and inference of fatigue data. This document provides user instructions, examples, and verification for software to analyze gear fatigue test data. The software was developed presuming the data are adequately modeled using a two-parameter Weibull distribution. The calculations are based on likelihood methods, and the approach taken is valid for data that include type I censoring. The software was verified by reproducing results published by others.

  7. Modeling species-abundance relationships in multi-species collections

    USGS Publications Warehouse

    Peng, S.; Yin, Z.; Ren, H.; Guo, Q.

    2003-01-01

    Species-abundance relationship is one of the most fundamental aspects of community ecology. Since Motomura first developed the geometric series model to describe the feature of community structure, ecologists have developed many other models to fit the species-abundance data in communities. These models can be classified into empirical and theoretical ones, including (1) statistical models, i.e., negative binomial distribution (and its extension), log-series distribution (and its extension), geometric distribution, lognormal distribution, Poisson-lognormal distribution, (2) niche models, i.e., geometric series, broken stick, overlapping niche, particulate niche, random assortment, dominance pre-emption, dominance decay, random fraction, weighted random fraction, composite niche, Zipf or Zipf-Mandelbrot model, and (3) dynamic models describing community dynamics and restrictive function of environment on community. These models have different characteristics and fit species-abundance data in various communities or collections. Among them, log-series distribution, lognormal distribution, geometric series, and broken stick model have been most widely used.

  8. A statistical model of aggregate fragmentation

    NASA Astrophysics Data System (ADS)

    Spahn, F.; Vieira Neto, E.; Guimarães, A. H. F.; Gorban, A. N.; Brilliantov, N. V.

    2014-01-01

    A statistical model of fragmentation of aggregates is proposed, based on the stochastic propagation of cracks through the body. The propagation rules are formulated on a lattice and mimic two important features of the process—a crack moves against the stress gradient while dissipating energy during its growth. We perform numerical simulations of the model for two-dimensional lattice and reveal that the mass distribution for small- and intermediate-size fragments obeys a power law, F(m)∝m-3/2, in agreement with experimental observations. We develop an analytical theory which explains the detected power law and demonstrate that the overall fragment mass distribution in our model agrees qualitatively with that one observed in experiments.

  9. Exploring Explanations of Subglacial Bedform Sizes Using Statistical Models.

    PubMed

    Hillier, John K; Kougioumtzoglou, Ioannis A; Stokes, Chris R; Smith, Michael J; Clark, Chris D; Spagnolo, Matteo S

    2016-01-01

    Sediments beneath modern ice sheets exert a key control on their flow, but are largely inaccessible except through geophysics or boreholes. In contrast, palaeo-ice sheet beds are accessible, and typically characterised by numerous bedforms. However, the interaction between bedforms and ice flow is poorly constrained and it is not clear how bedform sizes might reflect ice flow conditions. To better understand this link we present a first exploration of a variety of statistical models to explain the size distribution of some common subglacial bedforms (i.e., drumlins, ribbed moraine, MSGL). By considering a range of models, constructed to reflect key aspects of the physical processes, it is possible to infer that the size distributions are most effectively explained when the dynamics of ice-water-sediment interaction associated with bedform growth is fundamentally random. A 'stochastic instability' (SI) model, which integrates random bedform growth and shrinking through time with exponential growth, is preferred and is consistent with other observations of palaeo-bedforms and geophysical surveys of active ice sheets. Furthermore, we give a proof-of-concept demonstration that our statistical approach can bridge the gap between geomorphological observations and physical models, directly linking measurable size-frequency parameters to properties of ice sheet flow (e.g., ice velocity). Moreover, statistically developing existing models as proposed allows quantitative predictions to be made about sizes, making the models testable; a first illustration of this is given for a hypothesised repeat geophysical survey of bedforms under active ice. Thus, we further demonstrate the potential of size-frequency distributions of subglacial bedforms to assist the elucidation of subglacial processes and better constrain ice sheet models.

  10. Rainfall Downscaling Conditional on Upper-air Atmospheric Predictors: Improved Assessment of Rainfall Statistics in a Changing Climate

    NASA Astrophysics Data System (ADS)

    Langousis, Andreas; Mamalakis, Antonis; Deidda, Roberto; Marrocu, Marino

    2015-04-01

    To improve the level skill of Global Climate Models (GCMs) and Regional Climate Models (RCMs) in reproducing the statistics of rainfall at a basin level and at hydrologically relevant temporal scales (e.g. daily), two types of statistical approaches have been suggested. One is the statistical correction of climate model rainfall outputs using historical series of precipitation. The other is the use of stochastic models of rainfall to conditionally simulate precipitation series, based on large-scale atmospheric predictors produced by climate models (e.g. geopotential height, relative vorticity, divergence, mean sea level pressure). The latter approach, usually referred to as statistical rainfall downscaling, aims at reproducing the statistical character of rainfall, while accounting for the effects of large-scale atmospheric circulation (and, therefore, climate forcing) on rainfall statistics. While promising, statistical rainfall downscaling has not attracted much attention in recent years, since the suggested approaches involved complex (i.e. subjective or computationally intense) identification procedures of the local weather, in addition to demonstrating limited success in reproducing several statistical features of rainfall, such as seasonal variations, the distributions of dry and wet spell lengths, the distribution of the mean rainfall intensity inside wet periods, and the distribution of rainfall extremes. In an effort to remedy those shortcomings, Langousis and Kaleris (2014) developed a statistical framework for simulation of daily rainfall intensities conditional on upper air variables, which accurately reproduces the statistical character of rainfall at multiple time-scales. Here, we study the relative performance of: a) quantile-quantile (Q-Q) correction of climate model rainfall products, and b) the statistical downscaling scheme of Langousis and Kaleris (2014), in reproducing the statistical structure of rainfall, as well as rainfall extremes, at a regional level. This is done for an intermediate-sized catchment in Italy, i.e. the Flumendosa catchment, using climate model rainfall and atmospheric data from the ENSEMBLES project (http://ensembleseu.metoffice.com). In doing so, we split the historical rainfall record of mean areal precipitation (MAP) in 15-year calibration and 45-year validation periods, and compare the historical rainfall statistics to those obtained from: a) Q-Q corrected climate model rainfall products, and b) synthetic rainfall series generated by the suggested downscaling scheme. To our knowledge, this is the first time that climate model rainfall and statistically downscaled precipitation are compared to catchment-averaged MAP at a daily resolution. The obtained results are promising, since the proposed downscaling scheme is more accurate and robust in reproducing a number of historical rainfall statistics, independent of the climate model used and the length of the calibration period. This is particularly the case for the yearly rainfall maxima, where direct statistical correction of climate model rainfall outputs shows increased sensitivity to the length of the calibration period and the climate model used. The robustness of the suggested downscaling scheme in modeling rainfall extremes at a daily resolution, is a notable feature that can effectively be used to assess hydrologic risk at a regional level under changing climatic conditions. Acknowledgments The research project is implemented within the framework of the Action «Supporting Postdoctoral Researchers» of the Operational Program "Education and Lifelong Learning" (Action's Beneficiary: General Secretariat for Research and Technology), and is co-financed by the European Social Fund (ESF) and the Greek State. CRS4 highly acknowledges the contribution of the Sardinian regional authorities.

  11. Pinning time statistics for vortex lines in disordered environments.

    PubMed

    Dobramysl, Ulrich; Pleimling, Michel; Täuber, Uwe C

    2014-12-01

    We study the pinning dynamics of magnetic flux (vortex) lines in a disordered type-II superconductor. Using numerical simulations of a directed elastic line model, we extract the pinning time distributions of vortex line segments. We compare different model implementations for the disorder in the surrounding medium: discrete, localized pinning potential wells that are either attractive and repulsive or purely attractive, and whose strengths are drawn from a Gaussian distribution; as well as continuous Gaussian random potential landscapes. We find that both schemes yield power-law distributions in the pinned phase as predicted by extreme-event statistics, yet they differ significantly in their effective scaling exponents and their short-time behavior.

  12. Statistical distributions of earthquake numbers: consequence of branching process

    NASA Astrophysics Data System (ADS)

    Kagan, Yan Y.

    2010-03-01

    We discuss various statistical distributions of earthquake numbers. Previously, we derived several discrete distributions to describe earthquake numbers for the branching model of earthquake occurrence: these distributions are the Poisson, geometric, logarithmic and the negative binomial (NBD). The theoretical model is the `birth and immigration' population process. The first three distributions above can be considered special cases of the NBD. In particular, a point branching process along the magnitude (or log seismic moment) axis with independent events (immigrants) explains the magnitude/moment-frequency relation and the NBD of earthquake counts in large time/space windows, as well as the dependence of the NBD parameters on the magnitude threshold (magnitude of an earthquake catalogue completeness). We discuss applying these distributions, especially the NBD, to approximate event numbers in earthquake catalogues. There are many different representations of the NBD. Most can be traced either to the Pascal distribution or to the mixture of the Poisson distribution with the gamma law. We discuss advantages and drawbacks of both representations for statistical analysis of earthquake catalogues. We also consider applying the NBD to earthquake forecasts and describe the limits of the application for the given equations. In contrast to the one-parameter Poisson distribution so widely used to describe earthquake occurrence, the NBD has two parameters. The second parameter can be used to characterize clustering or overdispersion of a process. We determine the parameter values and their uncertainties for several local and global catalogues, and their subdivisions in various time intervals, magnitude thresholds, spatial windows, and tectonic categories. The theoretical model of how the clustering parameter depends on the corner (maximum) magnitude can be used to predict future earthquake number distribution in regions where very large earthquakes have not yet occurred.

  13. Constraints on the near-Earth asteroid obliquity distribution from the Yarkovsky effect

    NASA Astrophysics Data System (ADS)

    Tardioli, C.; Farnocchia, D.; Rozitis, B.; Cotto-Figueroa, D.; Chesley, S. R.; Statler, T. S.; Vasile, M.

    2017-12-01

    Aims: From light curve and radar data we know the spin axis of only 43 near-Earth asteroids. In this paper we attempt to constrain the spin axis obliquity distribution of near-Earth asteroids by leveraging the Yarkovsky effect and its dependence on an asteroid's obliquity. Methods: By modeling the physical parameters driving the Yarkovsky effect, we solve an inverse problem where we test different simple parametric obliquity distributions. Each distribution results in a predicted Yarkovsky effect distribution that we compare with a χ2 test to a dataset of 125 Yarkovsky estimates. Results: We find different obliquity distributions that are statistically satisfactory. In particular, among the considered models, the best-fit solution is a quadratic function, which only depends on two parameters, favors extreme obliquities consistent with the expected outcomes from the YORP effect, has a 2:1 ratio between retrograde and direct rotators, which is in agreement with theoretical predictions, and is statistically consistent with the distribution of known spin axes of near-Earth asteroids.

  14. Hyperspectral target detection using heavy-tailed distributions

    NASA Astrophysics Data System (ADS)

    Willis, Chris J.

    2009-09-01

    One promising approach to target detection in hyperspectral imagery exploits a statistical mixture model to represent scene content at a pixel level. The process then goes on to look for pixels which are rare, when judged against the model, and marks them as anomalies. It is assumed that military targets will themselves be rare and therefore likely to be detected amongst these anomalies. For the typical assumption of multivariate Gaussianity for the mixture components, the presence of the anomalous pixels within the training data will have a deleterious effect on the quality of the model. In particular, the derivation process itself is adversely affected by the attempt to accommodate the anomalies within the mixture components. This will bias the statistics of at least some of the components away from their true values and towards the anomalies. In many cases this will result in a reduction in the detection performance and an increased false alarm rate. This paper considers the use of heavy-tailed statistical distributions within the mixture model. Such distributions are better able to account for anomalies in the training data within the tails of their distributions, and the balance of the pixels within their central masses. This means that an improved model of the majority of the pixels in the scene may be produced, ultimately leading to a better anomaly detection result. The anomaly detection techniques are examined using both synthetic data and hyperspectral imagery with injected anomalous pixels. A range of results is presented for the baseline Gaussian mixture model and for models accommodating heavy-tailed distributions, for different parameterizations of the algorithms. These include scene understanding results, anomalous pixel maps at given significance levels and Receiver Operating Characteristic curves.

  15. Statistical dielectronic recombination rates for multielectron ions in plasma

    NASA Astrophysics Data System (ADS)

    Demura, A. V.; Leont'iev, D. S.; Lisitsa, V. S.; Shurygin, V. A.

    2017-10-01

    We describe the general analytic derivation of the dielectronic recombination (DR) rate coefficient for multielectron ions in a plasma based on the statistical theory of an atom in terms of the spatial distribution of the atomic electron density. The dielectronic recombination rates for complex multielectron tungsten ions are calculated numerically in a wide range of variation of the plasma temperature, which is important for modern nuclear fusion studies. The results of statistical theory are compared with the data obtained using level-by-level codes ADPAK, FAC, HULLAC, and experimental results. We consider different statistical DR models based on the Thomas-Fermi distribution, viz., integral and differential with respect to the orbital angular momenta of the ion core and the trapped electron, as well as the Rost model, which is an analog of the Frank-Condon model as applied to atomic structures. In view of its universality and relative simplicity, the statistical approach can be used for obtaining express estimates of the dielectronic recombination rate coefficients in complex calculations of the parameters of the thermonuclear plasmas. The application of statistical methods also provides information for the dielectronic recombination rates with much smaller computer time expenditures as compared to available level-by-level codes.

  16. Recurrence and interoccurrence behavior of self-organized complex phenomena

    NASA Astrophysics Data System (ADS)

    Abaimov, S. G.; Turcotte, D. L.; Shcherbakov, R.; Rundle, J. B.

    2007-08-01

    The sandpile, forest-fire and slider-block models are said to exhibit self-organized criticality. Associated natural phenomena include landslides, wildfires, and earthquakes. In all cases the frequency-size distributions are well approximated by power laws (fractals). Another important aspect of both the models and natural phenomena is the statistics of interval times. These statistics are particularly important for earthquakes. For earthquakes it is important to make a distinction between interoccurrence and recurrence times. Interoccurrence times are the interval times between earthquakes on all faults in a region whereas recurrence times are interval times between earthquakes on a single fault or fault segment. In many, but not all cases, interoccurrence time statistics are exponential (Poissonian) and the events occur randomly. However, the distribution of recurrence times are often Weibull to a good approximation. In this paper we study the interval statistics of slip events using a slider-block model. The behavior of this model is sensitive to the stiffness α of the system, α=kC/kL where kC is the spring constant of the connector springs and kL is the spring constant of the loader plate springs. For a soft system (small α) there are no system-wide events and interoccurrence time statistics of the larger events are Poissonian. For a stiff system (large α), system-wide events dominate the energy dissipation and the statistics of the recurrence times between these system-wide events satisfy the Weibull distribution to a good approximation. We argue that this applicability of the Weibull distribution is due to the power-law (scale invariant) behavior of the hazard function, i.e. the probability that the next event will occur at a time t0 after the last event has a power-law dependence on t0. The Weibull distribution is the only distribution that has a scale invariant hazard function. We further show that the onset of system-wide events is a well defined critical point. We find that the number of system-wide events NSWE satisfies the scaling relation NSWE ∝(α-αC)δ where αC is the critical value of the stiffness. The system-wide events represent a new phase for the slider-block system.

  17. Path statistics, memory, and coarse-graining of continuous-time random walks on networks

    PubMed Central

    Kion-Crosby, Willow; Morozov, Alexandre V.

    2015-01-01

    Continuous-time random walks (CTRWs) on discrete state spaces, ranging from regular lattices to complex networks, are ubiquitous across physics, chemistry, and biology. Models with coarse-grained states (for example, those employed in studies of molecular kinetics) or spatial disorder can give rise to memory and non-exponential distributions of waiting times and first-passage statistics. However, existing methods for analyzing CTRWs on complex energy landscapes do not address these effects. Here we use statistical mechanics of the nonequilibrium path ensemble to characterize first-passage CTRWs on networks with arbitrary connectivity, energy landscape, and waiting time distributions. Our approach can be applied to calculating higher moments (beyond the mean) of path length, time, and action, as well as statistics of any conservative or non-conservative force along a path. For homogeneous networks, we derive exact relations between length and time moments, quantifying the validity of approximating a continuous-time process with its discrete-time projection. For more general models, we obtain recursion relations, reminiscent of transfer matrix and exact enumeration techniques, to efficiently calculate path statistics numerically. We have implemented our algorithm in PathMAN (Path Matrix Algorithm for Networks), a Python script that users can apply to their model of choice. We demonstrate the algorithm on a few representative examples which underscore the importance of non-exponential distributions, memory, and coarse-graining in CTRWs. PMID:26646868

  18. SGR-like behaviour of the repeating FRB 121102

    NASA Astrophysics Data System (ADS)

    Wang, F. Y.; Yu, H.

    2017-03-01

    Fast radio bursts (FRBs) are millisecond-duration radio signals occurring at cosmological distances. However the physical model of FRBs is mystery, many models have been proposed. Here we study the frequency distributions of peak flux, fluence, duration and waiting time for the repeating FRB 121102. The cumulative distributions of peak flux, fluence and duration show power-law forms. The waiting time distribution also shows power-law distribution, and is consistent with a non-stationary Poisson process. These distributions are similar as those of soft gamma repeaters (SGRs). We also use the statistical results to test the proposed models for FRBs. These distributions are consistent with the predictions from avalanche models of slowly driven nonlinear dissipative systems.

  19. Regression modeling of particle size distributions in urban storm water: advancements through improved sample collection methods

    USGS Publications Warehouse

    Fienen, Michael N.; Selbig, William R.

    2012-01-01

    A new sample collection system was developed to improve the representation of sediment entrained in urban storm water by integrating water quality samples from the entire water column. The depth-integrated sampler arm (DISA) was able to mitigate sediment stratification bias in storm water, thereby improving the characterization of suspended-sediment concentration and particle size distribution at three independent study locations. Use of the DISA decreased variability, which improved statistical regression to predict particle size distribution using surrogate environmental parameters, such as precipitation depth and intensity. The performance of this statistical modeling technique was compared to results using traditional fixed-point sampling methods and was found to perform better. When environmental parameters can be used to predict particle size distributions, environmental managers have more options when characterizing concentrations, loads, and particle size distributions in urban runoff.

  20. The Effects of Selection Strategies for Bivariate Loglinear Smoothing Models on NEAT Equating Functions

    ERIC Educational Resources Information Center

    Moses, Tim; Holland, Paul W.

    2010-01-01

    In this study, eight statistical strategies were evaluated for selecting the parameterizations of loglinear models for smoothing the bivariate test score distributions used in nonequivalent groups with anchor test (NEAT) equating. Four of the strategies were based on significance tests of chi-square statistics (Likelihood Ratio, Pearson,…

  1. The Extended Erlang-Truncated Exponential distribution: Properties and application to rainfall data.

    PubMed

    Okorie, I E; Akpanta, A C; Ohakwe, J; Chikezie, D C

    2017-06-01

    The Erlang-Truncated Exponential ETE distribution is modified and the new lifetime distribution is called the Extended Erlang-Truncated Exponential EETE distribution. Some statistical and reliability properties of the new distribution are given and the method of maximum likelihood estimate was proposed for estimating the model parameters. The usefulness and flexibility of the EETE distribution was illustrated with an uncensored data set and its fit was compared with that of the ETE and three other three-parameter distributions. Results based on the minimized log-likelihood ([Formula: see text]), Akaike information criterion (AIC), Bayesian information criterion (BIC) and the generalized Cramér-von Mises [Formula: see text] statistics shows that the EETE distribution provides a more reasonable fit than the one based on the other competing distributions.

  2. Football fever: goal distributions and non-Gaussian statistics

    NASA Astrophysics Data System (ADS)

    Bittner, E.; Nußbaumer, A.; Janke, W.; Weigel, M.

    2009-02-01

    Analyzing football score data with statistical techniques, we investigate how the not purely random, but highly co-operative nature of the game is reflected in averaged properties such as the probability distributions of scored goals for the home and away teams. As it turns out, especially the tails of the distributions are not well described by the Poissonian or binomial model resulting from the assumption of uncorrelated random events. Instead, a good effective description of the data is provided by less basic distributions such as the negative binomial one or the probability densities of extreme value statistics. To understand this behavior from a microscopical point of view, however, no waiting time problem or extremal process need be invoked. Instead, modifying the Bernoulli random process underlying the Poissonian model to include a simple component of self-affirmation seems to describe the data surprisingly well and allows to understand the observed deviation from Gaussian statistics. The phenomenological distributions used before can be understood as special cases within this framework. We analyzed historical football score data from many leagues in Europe as well as from international tournaments, including data from all past tournaments of the “FIFA World Cup” series, and found the proposed models to be applicable rather universally. In particular, here we analyze the results of the German women’s premier football league and consider the two separate German men’s premier leagues in the East and West during the cold war times as well as the unified league after 1990 to see how scoring in football and the component of self-affirmation depend on cultural and political circumstances.

  3. Estimating the mean and standard deviation of environmental data with below detection limit observations: Considering highly skewed data and model misspecification.

    PubMed

    Shoari, Niloofar; Dubé, Jean-Sébastien; Chenouri, Shoja'eddin

    2015-11-01

    In environmental studies, concentration measurements frequently fall below detection limits of measuring instruments, resulting in left-censored data. Some studies employ parametric methods such as the maximum likelihood estimator (MLE), robust regression on order statistic (rROS), and gamma regression on order statistic (GROS), while others suggest a non-parametric approach, the Kaplan-Meier method (KM). Using examples of real data from a soil characterization study in Montreal, we highlight the need for additional investigations that aim at unifying the existing literature. A number of studies have examined this issue; however, those considering data skewness and model misspecification are rare. These aspects are investigated in this paper through simulations. Among other findings, results show that for low skewed data, the performance of different statistical methods is comparable, regardless of the censoring percentage and sample size. For highly skewed data, the performance of the MLE method under lognormal and Weibull distributions is questionable; particularly, when the sample size is small or censoring percentage is high. In such conditions, MLE under gamma distribution, rROS, GROS, and KM are less sensitive to skewness. Related to model misspecification, MLE based on lognormal and Weibull distributions provides poor estimates when the true distribution of data is misspecified. However, the methods of rROS, GROS, and MLE under gamma distribution are generally robust to model misspecifications regardless of skewness, sample size, and censoring percentage. Since the characteristics of environmental data (e.g., type of distribution and skewness) are unknown a priori, we suggest using MLE based on gamma distribution, rROS and GROS. Copyright © 2015 Elsevier Ltd. All rights reserved.

  4. Evaluation of probabilistic forecasts with the scoringRules package

    NASA Astrophysics Data System (ADS)

    Jordan, Alexander; Krüger, Fabian; Lerch, Sebastian

    2017-04-01

    Over the last decades probabilistic forecasts in the form of predictive distributions have become popular in many scientific disciplines. With the proliferation of probabilistic models arises the need for decision-theoretically principled tools to evaluate the appropriateness of models and forecasts in a generalized way in order to better understand sources of prediction errors and to improve the models. Proper scoring rules are functions S(F,y) which evaluate the accuracy of a forecast distribution F , given that an outcome y was observed. In coherence with decision-theoretical principles they allow to compare alternative models, a crucial ability given the variety of theories, data sources and statistical specifications that is available in many situations. This contribution presents the software package scoringRules for the statistical programming language R, which provides functions to compute popular scoring rules such as the continuous ranked probability score for a variety of distributions F that come up in applied work. For univariate variables, two main classes are parametric distributions like normal, t, or gamma distributions, and distributions that are not known analytically, but are indirectly described through a sample of simulation draws. For example, ensemble weather forecasts take this form. The scoringRules package aims to be a convenient dictionary-like reference for computing scoring rules. We offer state of the art implementations of several known (but not routinely applied) formulas, and implement closed-form expressions that were previously unavailable. Whenever more than one implementation variant exists, we offer statistically principled default choices. Recent developments include the addition of scoring rules to evaluate multivariate forecast distributions. The use of the scoringRules package is illustrated in an example on post-processing ensemble forecasts of temperature.

  5. Modeling and forecasting the distribution of Vibrio vulnificus in Chesapeake Bay

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jacobs, John M.; Rhodes, M.; Brown, C. W.

    The aim is to construct statistical models to predict the presence, abundance and potential virulence of Vibrio vulnificus in surface waters. A variety of statistical techniques were used in concert to identify water quality parameters associated with V. vulnificus presence, abundance and virulence markers in the interest of developing strong predictive models for use in regional oceanographic modeling systems. A suite of models are provided to represent the best model fit and alternatives using environmental variables that allow them to be put to immediate use in current ecological forecasting efforts. Conclusions: Environmental parameters such as temperature, salinity and turbidity aremore » capable of accurately predicting abundance and distribution of V. vulnificus in Chesapeake Bay. Forcing these empirical models with output from ocean modeling systems allows for spatially explicit forecasts for up to 48 h in the future. This study uses one of the largest data sets compiled to model Vibrio in an estuary, enhances our understanding of environmental correlates with abundance, distribution and presence of potentially virulent strains and offers a method to forecast these pathogens that may be replicated in other regions.« less

  6. Hierarchical spatial models for predicting pygmy rabbit distribution and relative abundance

    USGS Publications Warehouse

    Wilson, T.L.; Odei, J.B.; Hooten, M.B.; Edwards, T.C.

    2010-01-01

    Conservationists routinely use species distribution models to plan conservation, restoration and development actions, while ecologists use them to infer process from pattern. These models tend to work well for common or easily observable species, but are of limited utility for rare and cryptic species. This may be because honest accounting of known observation bias and spatial autocorrelation are rarely included, thereby limiting statistical inference of resulting distribution maps. We specified and implemented a spatially explicit Bayesian hierarchical model for a cryptic mammal species (pygmy rabbit Brachylagus idahoensis). Our approach used two levels of indirect sign that are naturally hierarchical (burrows and faecal pellets) to build a model that allows for inference on regression coefficients as well as spatially explicit model parameters. We also produced maps of rabbit distribution (occupied burrows) and relative abundance (number of burrows expected to be occupied by pygmy rabbits). The model demonstrated statistically rigorous spatial prediction by including spatial autocorrelation and measurement uncertainty. We demonstrated flexibility of our modelling framework by depicting probabilistic distribution predictions using different assumptions of pygmy rabbit habitat requirements. Spatial representations of the variance of posterior predictive distributions were obtained to evaluate heterogeneity in model fit across the spatial domain. Leave-one-out cross-validation was conducted to evaluate the overall model fit. Synthesis and applications. Our method draws on the strengths of previous work, thereby bridging and extending two active areas of ecological research: species distribution models and multi-state occupancy modelling. Our framework can be extended to encompass both larger extents and other species for which direct estimation of abundance is difficult. ?? 2010 The Authors. Journal compilation ?? 2010 British Ecological Society.

  7. Erlang circular model motivated by inverse stereographic projection

    NASA Astrophysics Data System (ADS)

    Pramesti, G.

    2018-05-01

    The Erlang distribution is a special case of the Gamma distribution with the shape parameter is an integer. This paper proposed a new circular model used inverse stereographic projection. The inverse stereographic projection which is a mapping that projects a random variable from a real line onto a circle can be used in circular statistics to construct a distribution on the circle from real domain. From the circular model, then can be derived the characteristics of the Erlang circular model such as the mean resultant length, mean direction, circular variance and trigonometric moments of the distribution.

  8. Tracking Electroencephalographic Changes Using Distributions of Linear Models: Application to Propofol-Based Depth of Anesthesia Monitoring.

    PubMed

    Kuhlmann, Levin; Manton, Jonathan H; Heyse, Bjorn; Vereecke, Hugo E M; Lipping, Tarmo; Struys, Michel M R F; Liley, David T J

    2017-04-01

    Tracking brain states with electrophysiological measurements often relies on short-term averages of extracted features and this may not adequately capture the variability of brain dynamics. The objective is to assess the hypotheses that this can be overcome by tracking distributions of linear models using anesthesia data, and that anesthetic brain state tracking performance of linear models is comparable to that of a high performing depth of anesthesia monitoring feature. Individuals' brain states are classified by comparing the distribution of linear (auto-regressive moving average-ARMA) model parameters estimated from electroencephalographic (EEG) data obtained with a sliding window to distributions of linear model parameters for each brain state. The method is applied to frontal EEG data from 15 subjects undergoing propofol anesthesia and classified by the observers assessment of alertness/sedation (OAA/S) scale. Classification of the OAA/S score was performed using distributions of either ARMA parameters or the benchmark feature, Higuchi fractal dimension. The highest average testing sensitivity of 59% (chance sensitivity: 17%) was found for ARMA (2,1) models and Higuchi fractal dimension achieved 52%, however, no statistical difference was observed. For the same ARMA case, there was no statistical difference if medians are used instead of distributions (sensitivity: 56%). The model-based distribution approach is not necessarily more effective than a median/short-term average approach, however, it performs well compared with a distribution approach based on a high performing anesthesia monitoring measure. These techniques hold potential for anesthesia monitoring and may be generally applicable for tracking brain states.

  9. Descriptive Statistics for Modern Test Score Distributions: Skewness, Kurtosis, Discreteness, and Ceiling Effects.

    PubMed

    Ho, Andrew D; Yu, Carol C

    2015-06-01

    Many statistical analyses benefit from the assumption that unconditional or conditional distributions are continuous and normal. More than 50 years ago in this journal, Lord and Cook chronicled departures from normality in educational tests, and Micerri similarly showed that the normality assumption is met rarely in educational and psychological practice. In this article, the authors extend these previous analyses to state-level educational test score distributions that are an increasingly common target of high-stakes analysis and interpretation. Among 504 scale-score and raw-score distributions from state testing programs from recent years, nonnormal distributions are common and are often associated with particular state programs. The authors explain how scaling procedures from item response theory lead to nonnormal distributions as well as unusual patterns of discreteness. The authors recommend that distributional descriptive statistics be calculated routinely to inform model selection for large-scale test score data, and they illustrate consequences of nonnormality using sensitivity studies that compare baseline results to those from normalized score scales.

  10. System Analysis for the Huntsville Operation Support Center, Distributed Computer System

    NASA Technical Reports Server (NTRS)

    Ingels, F. M.; Massey, D.

    1985-01-01

    HOSC as a distributed computing system, is responsible for data acquisition and analysis during Space Shuttle operations. HOSC also provides computing services for Marshall Space Flight Center's nonmission activities. As mission and nonmission activities change, so do the support functions of HOSC change, demonstrating the need for some method of simulating activity at HOSC in various configurations. The simulation developed in this work primarily models the HYPERchannel network. The model simulates the activity of a steady state network, reporting statistics such as, transmitted bits, collision statistics, frame sequences transmitted, and average message delay. These statistics are used to evaluate such performance indicators as throughout, utilization, and delay. Thus the overall performance of the network is evaluated, as well as predicting possible overload conditions.

  11. Multi-model comparison on the effects of climate change on tree species in the eastern U.S.: results from an enhanced niche model and process-based ecosystem and landscape models

    Treesearch

    Louis R. Iverson; Frank R. Thompson; Stephen Matthews; Matthew Peters; Anantha Prasad; William D. Dijak; Jacob Fraser; Wen J. Wang; Brice Hanberry; Hong He; Maria Janowiak; Patricia Butler; Leslie Brandt; Chris Swanston

    2016-01-01

    Context. Species distribution models (SDM) establish statistical relationships between the current distribution of species and key attributes whereas process-based models simulate ecosystem and tree species dynamics based on representations of physical and biological processes. TreeAtlas, which uses DISTRIB SDM, and Linkages and LANDIS PRO, process...

  12. Model for neural signaling leap statistics

    NASA Astrophysics Data System (ADS)

    Chevrollier, Martine; Oriá, Marcos

    2011-03-01

    We present a simple model for neural signaling leaps in the brain considering only the thermodynamic (Nernst) potential in neuron cells and brain temperature. We numerically simulated connections between arbitrarily localized neurons and analyzed the frequency distribution of the distances reached. We observed qualitative change between Normal statistics (with T = 37.5°C, awaken regime) and Lévy statistics (T = 35.5°C, sleeping period), characterized by rare events of long range connections.

  13. A Quantum Shuffling Game for Teaching Statistical Mechanics

    ERIC Educational Resources Information Center

    Black, P. J.; And Others

    1971-01-01

    A game simulating an Einstein model of a crystal producing a Boltzmann distribution. Computer-made films present the results with large distributions showing heat flow and some applications to entropy. (TS)

  14. Discrete disorder models for many-body localization

    NASA Astrophysics Data System (ADS)

    Janarek, Jakub; Delande, Dominique; Zakrzewski, Jakub

    2018-04-01

    Using exact diagonalization technique, we investigate the many-body localization phenomenon in the 1D Heisenberg chain comparing several disorder models. In particular we consider a family of discrete distributions of disorder strengths and compare the results with the standard uniform distribution. Both statistical properties of energy levels and the long time nonergodic behavior are discussed. The results for different discrete distributions are essentially identical to those obtained for the continuous distribution, provided the disorder strength is rescaled by the standard deviation of the random distribution. Only for the binary distribution significant deviations are observed.

  15. Modeling the spatial distribution of landslide-prone colluvium and shallow groundwater on hillslopes of Seattle, WA

    USGS Publications Warehouse

    Schulz, W.H.; Lidke, D.J.; Godt, J.W.

    2008-01-01

    Landslides in partially saturated colluvium on Seattle, WA, hillslopes have resulted in property damage and human casualties. We developed statistical models of colluvium and shallow-groundwater distributions to aid landslide hazard assessments. The models were developed using a geographic information system, digital geologic maps, digital topography, subsurface exploration results, the groundwater flow modeling software VS2DI and regression analyses. Input to the colluvium model includes slope, distance to a hillslope-crest escarpment, and escarpment slope and height. We developed different statistical relations for thickness of colluvium on four landforms. Groundwater model input includes colluvium basal slope and distance from the Fraser aquifer. This distance was used to estimate hydraulic conductivity based on the assumption that addition of finer-grained material from down-section would result in lower conductivity. Colluvial groundwater is perched so we estimated its saturated thickness. We used VS2DI to establish relations between saturated thickness and the hydraulic conductivity and basal slope of the colluvium. We developed different statistical relations for three groundwater flow regimes. All model results were validated using observational data that were excluded from calibration. Eighty percent of colluvium thickness predictions were within 25% of observed values and 88% of saturated thickness predictions were within 20% of observed values. The models are based on conditions common to many areas, so our method can provide accurate results for similar regions; relations in our statistical models require calibration for new regions. Our results suggest that Seattle landslides occur in native deposits and colluvium, ultimately in response to surface-water erosion of hillstope toes. Regional groundwater conditions do not appear to strongly affect the general distribution of Seattle landslides; historical landslides were equally dispersed within and outside of the area potentially affected by regional groundwater conditions.

  16. Species distribution models for a migratory bird based on citizen science and satellite tracking data

    USGS Publications Warehouse

    Coxen, Christopher L.; Frey, Jennifer K.; Carleton, Scott A.; Collins, Daniel P.

    2017-01-01

    Species distribution models can provide critical baseline distribution information for the conservation of poorly understood species. Here, we compared the performance of band-tailed pigeon (Patagioenas fasciata) species distribution models created using Maxent and derived from two separate presence-only occurrence data sources in New Mexico: 1) satellite tracked birds and 2) observations reported in eBird basic data set. Both models had good accuracy (test AUC > 0.8 and True Skill Statistic > 0.4), and high overlap between suitability scores (I statistic 0.786) and suitable habitat patches (relative rank 0.639). Our results suggest that, at the state-wide level, eBird occurrence data can effectively model similar species distributions as satellite tracking data. Climate change models for the band-tailed pigeon predict a 35% loss in area of suitable climate by 2070 if CO2 emissions drop to 1990 levels by 2100, and a 45% loss by 2070 if we continue current CO2 emission levels through the end of the century. These numbers may be conservative given the predicted increase in drought, wildfire, and forest pest impacts to the coniferous forests the species inhabits in New Mexico. The northern portion of the species’ range in New Mexico is predicted to be the most viable through time.

  17. Modeling Psychological Contract Violation using Dual Regime Models: An Event-based Approach.

    PubMed

    Hofmans, Joeri

    2017-01-01

    A good understanding of the dynamics of psychological contract violation requires theories, research methods and statistical models that explicitly recognize that violation feelings follow from an event that violates one's acceptance limits, after which interpretative processes are set into motion, determining the intensity of these violation feelings. Whereas theories-in the form of the dynamic model of the psychological contract-and research methods-in the form of daily diary research and experience sampling research-are available by now, the statistical tools to model such a two-stage process are still lacking. The aim of the present paper is to fill this gap in the literature by introducing two statistical models-the Zero-Inflated model and the Hurdle model-that closely mimic the theoretical process underlying the elicitation violation feelings via two model components: a binary distribution that models whether violation has occurred or not, and a count distribution that models how severe the negative impact is. Moreover, covariates can be included for both model components separately, which yields insight into their unique and shared antecedents. By doing this, the present paper offers a methodological-substantive synergy, showing how sophisticated methodology can be used to examine an important substantive issue.

  18. Statistical procedures for analyzing mental health services data.

    PubMed

    Elhai, Jon D; Calhoun, Patrick S; Ford, Julian D

    2008-08-15

    In mental health services research, analyzing service utilization data often poses serious problems, given the presence of substantially skewed data distributions. This article presents a non-technical introduction to statistical methods specifically designed to handle the complexly distributed datasets that represent mental health service use, including Poisson, negative binomial, zero-inflated, and zero-truncated regression models. A flowchart is provided to assist the investigator in selecting the most appropriate method. Finally, a dataset of mental health service use reported by medical patients is described, and a comparison of results across several different statistical methods is presented. Implications of matching data analytic techniques appropriately with the often complexly distributed datasets of mental health services utilization variables are discussed.

  19. Human dynamics scaling characteristics for aerial inbound logistics operation

    NASA Astrophysics Data System (ADS)

    Wang, Qing; Guo, Jin-Li

    2010-05-01

    In recent years, the study of power-law scaling characteristics of real-life networks has attracted much interest from scholars; it deviates from the Poisson process. In this paper, we take the whole process of aerial inbound operation in a logistics company as the empirical object. The main aim of this work is to study the statistical scaling characteristics of the task-restricted work patterns. We found that the statistical variables have the scaling characteristics of unimodal distribution with a power-law tail in five statistical distributions - that is to say, there obviously exists a peak in each distribution, the shape of the left part closes to a Poisson distribution, and the right part has a heavy-tailed scaling statistics. Furthermore, to our surprise, there is only one distribution where the right parts can be approximated by the power-law form with exponent α=1.50. Others are bigger than 1.50 (three of four are about 2.50, one of four is about 3.00). We then obtain two inferences based on these empirical results: first, the human behaviors probably both close to the Poisson statistics and power-law distributions on certain levels, and the human-computer interaction behaviors may be the most common in the logistics operational areas, even in the whole task-restricted work pattern areas. Second, the hypothesis in Vázquez et al. (2006) [A. Vázquez, J. G. Oliveira, Z. Dezsö, K.-I. Goh, I. Kondor, A.-L. Barabási. Modeling burst and heavy tails in human dynamics, Phys. Rev. E 73 (2006) 036127] is probably not sufficient; it claimed that human dynamics can be classified as two discrete university classes. There may be a new human dynamics mechanism that is different from the classical Barabási models.

  20. Heavy tailed bacterial motor switching statistics define macroscopic transport properties during upstream contamination by E. coli

    NASA Astrophysics Data System (ADS)

    Figueroa-Morales, N.; Rivera, A.; Altshuler, E.; Darnige, T.; Douarche, C.; Soto, R.; Lindner, A.; Clément, E.

    The motility of E. Coli bacteria is described as a run and tumble process. Changes of direction correspond to a switch in the flagellar motor rotation. The run time distribution is described as an exponential decay of characteristic time close to 1s. Remarkably, it has been demonstrated that the generic response for the distribution of run times is not exponential, but a heavy tailed power law decay, which is at odds with the motility findings. We investigate the consequences of the motor statistics in the macroscopic bacterial transport. During upstream contamination processes in very confined channels, we have identified very long contamination tongues. Using a stochastic model considering bacterial dwelling times on the surfaces related to the run times, we are able to reproduce qualitatively and quantitatively the evolution of the contamination profiles when considering the power law run time distribution. However, the model fails to reproduce the qualitative dynamics when the classical exponential run and tumble distribution is considered. Moreover, we have corroborated the existence of a power law run time distribution by means of 3D Lagrangian tracking. We then argue that the macroscopic transport of bacteria is essentially determined by the motor rotation statistics.

  1. Statistical self-similarity of hotspot seamount volumes modeled as self-similar criticality

    USGS Publications Warehouse

    Tebbens, S.F.; Burroughs, S.M.; Barton, C.C.; Naar, D.F.

    2001-01-01

    The processes responsible for hotspot seamount formation are complex, yet the cumulative frequency-volume distribution of hotspot seamounts in the Easter Island/Salas y Gomez Chain (ESC) is found to be well-described by an upper-truncated power law. We develop a model for hotspot seamount formation where uniform energy input produces events initiated on a self-similar distribution of critical cells. We call this model Self-Similar Criticality (SSC). By allowing the spatial distribution of magma migration to be self-similar, the SSC model recreates the observed ESC seamount volume distribution. The SSC model may have broad applicability to other natural systems.

  2. Statistical self-similarity of width function maxima with implications to floods

    USGS Publications Warehouse

    Veitzer, S.A.; Gupta, V.K.

    2001-01-01

    Recently a new theory of random self-similar river networks, called the RSN model, was introduced to explain empirical observations regarding the scaling properties of distributions of various topologic and geometric variables in natural basins. The RSN model predicts that such variables exhibit statistical simple scaling, when indexed by Horton-Strahler order. The average side tributary structure of RSN networks also exhibits Tokunaga-type self-similarity which is widely observed in nature. We examine the scaling structure of distributions of the maximum of the width function for RSNs for nested, complete Strahler basins by performing ensemble simulations. The maximum of the width function exhibits distributional simple scaling, when indexed by Horton-Strahler order, for both RSNs and natural river networks extracted from digital elevation models (DEMs). We also test a powerlaw relationship between Horton ratios for the maximum of the width function and drainage areas. These results represent first steps in formulating a comprehensive physical statistical theory of floods at multiple space-time scales for RSNs as discrete hierarchical branching structures. ?? 2001 Published by Elsevier Science Ltd.

  3. Evaluation of the statistical evidence for Characteristic Earthquakes in the frequency-magnitude distributions of Sumatra and other subduction zone regions

    NASA Astrophysics Data System (ADS)

    Naylor, M.; Main, I. G.; Greenhough, J.; Bell, A. F.; McCloskey, J.

    2009-04-01

    The Sumatran Boxing Day earthquake and subsequent large events provide an opportunity to re-evaluate the statistical evidence for characteristic earthquake events in frequency-magnitude distributions. Our aims are to (i) improve intuition regarding the properties of samples drawn from power laws, (ii) illustrate using random samples how appropriate Poisson confidence intervals can both aid the eye and provide an appropriate statistical evaluation of data drawn from power-law distributions, and (iii) apply these confidence intervals to test for evidence of characteristic earthquakes in subduction-zone frequency-magnitude distributions. We find no need for a characteristic model to describe frequency magnitude distributions in any of the investigated subduction zones, including Sumatra, due to an emergent skew in residuals of power law count data at high magnitudes combined with a sample bias for examining large earthquakes as candidate characteristic events.

  4. Statistical distribution of building lot frontage: application for Tokyo downtown districts

    NASA Astrophysics Data System (ADS)

    Usui, Hiroyuki

    2018-03-01

    The frontage of a building lot is the determinant factor of the residential environment. The statistical distribution of building lot frontages shows how the perimeters of urban blocks are shared by building lots for a given density of buildings and roads. For practitioners in urban planning, this is indispensable to identify potential districts which comprise a high percentage of building lots with narrow frontage after subdivision and to reconsider the appropriate criteria for the density of buildings and roads as residential environment indices. In the literature, however, the statistical distribution of building lot frontages and the density of buildings and roads has not been fully researched. In this paper, based on the empirical study in the downtown districts of Tokyo, it is found that (1) a log-normal distribution fits the observed distribution of building lot frontages better than a gamma distribution, which is the model of the size distribution of Poisson Voronoi cells on closed curves; (2) the statistical distribution of building lot frontages statistically follows a log-normal distribution, whose parameters are the gross building density, road density, average road width, the coefficient of variation of building lot frontage, and the ratio of the number of building lot frontages to the number of buildings; and (3) the values of the coefficient of variation of building lot frontages, and that of the ratio of the number of building lot frontages to that of buildings are approximately equal to 0.60 and 1.19, respectively.

  5. Statistically advanced, self-similar, radial probability density functions of atmospheric and under-expanded hydrogen jets

    NASA Astrophysics Data System (ADS)

    Ruggles, Adam J.

    2015-11-01

    This paper presents improved statistical insight regarding the self-similar scalar mixing process of atmospheric hydrogen jets and the downstream region of under-expanded hydrogen jets. Quantitative planar laser Rayleigh scattering imaging is used to probe both jets. The self-similarity of statistical moments up to the sixth order (beyond the literature established second order) is documented in both cases. This is achieved using a novel self-similar normalization method that facilitated a degree of statistical convergence that is typically limited to continuous, point-based measurements. This demonstrates that image-based measurements of a limited number of samples can be used for self-similar scalar mixing studies. Both jets exhibit the same radial trends of these moments demonstrating that advanced atmospheric self-similarity can be applied in the analysis of under-expanded jets. Self-similar histograms away from the centerline are shown to be the combination of two distributions. The first is attributed to turbulent mixing. The second, a symmetric Poisson-type distribution centered on zero mass fraction, progressively becomes the dominant and eventually sole distribution at the edge of the jet. This distribution is attributed to shot noise-affected pure air measurements, rather than a diffusive superlayer at the jet boundary. This conclusion is reached after a rigorous measurement uncertainty analysis and inspection of pure air data collected with each hydrogen data set. A threshold based upon the measurement noise analysis is used to separate the turbulent and pure air data, and thusly estimate intermittency. Beta-distributions (four parameters) are used to accurately represent the turbulent distribution moments. This combination of measured intermittency and four-parameter beta-distributions constitutes a new, simple approach to model scalar mixing. Comparisons between global moments from the data and moments calculated using the proposed model show excellent agreement. This was attributed to the high quality of the measurements which reduced the width of the correctly identified, noise-affected pure air distribution, with respect to the turbulent mixing distribution. The ignitability of the atmospheric jet is determined using the flammability factor calculated from both kernel density estimated (KDE) PDFs and PDFs generated using the newly proposed model. Agreement between contours from both approaches is excellent. Ignitability of the under-expanded jet is also calculated using KDE PDFs. Contours are compared with those calculated by applying the atmospheric model to the under-expanded jet. Once again, agreement is excellent. This work demonstrates that self-similar scalar mixing statistics and ignitability of atmospheric jets can be accurately described by the proposed model. This description can be applied with confidence to under-expanded jets, which are more realistic of leak and fuel injection scenarios.

  6. Extreme geomagnetic storms: Probabilistic forecasts and their uncertainties

    USGS Publications Warehouse

    Riley, Pete; Love, Jeffrey J.

    2017-01-01

    Extreme space weather events are low-frequency, high-risk phenomena. Estimating their rates of occurrence, as well as their associated uncertainties, is difficult. In this study, we derive statistical estimates and uncertainties for the occurrence rate of an extreme geomagnetic storm on the scale of the Carrington event (or worse) occurring within the next decade. We model the distribution of events as either a power law or lognormal distribution and use (1) Kolmogorov-Smirnov statistic to estimate goodness of fit, (2) bootstrapping to quantify the uncertainty in the estimates, and (3) likelihood ratio tests to assess whether one distribution is preferred over another. Our best estimate for the probability of another extreme geomagnetic event comparable to the Carrington event occurring within the next 10 years is 10.3% 95%  confidence interval (CI) [0.9,18.7] for a power law distribution but only 3.0% 95% CI [0.6,9.0] for a lognormal distribution. However, our results depend crucially on (1) how we define an extreme event, (2) the statistical model used to describe how the events are distributed in intensity, (3) the techniques used to infer the model parameters, and (4) the data and duration used for the analysis. We test a major assumption that the data represent time stationary processes and discuss the implications. If the current trends persist, suggesting that we are entering a period of lower activity, our forecasts may represent upper limits rather than best estimates.

  7. Bayesian analysis of the kinetics of quantal transmitter secretion at the neuromuscular junction.

    PubMed

    Saveliev, Anatoly; Khuzakhmetova, Venera; Samigullin, Dmitry; Skorinkin, Andrey; Kovyazina, Irina; Nikolsky, Eugeny; Bukharaeva, Ellya

    2015-10-01

    The timing of transmitter release from nerve endings is considered nowadays as one of the factors determining the plasticity and efficacy of synaptic transmission. In the neuromuscular junction, the moments of release of individual acetylcholine quanta are related to the synaptic delays of uniquantal endplate currents recorded under conditions of lowered extracellular calcium. Using Bayesian modelling, we performed a statistical analysis of synaptic delays in mouse neuromuscular junction with different patterns of rhythmic nerve stimulation and when the entry of calcium ions into the nerve terminal was modified. We have obtained a statistical model of the release timing which is represented as the summation of two independent statistical distributions. The first of these is the exponentially modified Gaussian distribution. The mixture of normal and exponential components in this distribution can be interpreted as a two-stage mechanism of early and late periods of phasic synchronous secretion. The parameters of this distribution depend on both the stimulation frequency of the motor nerve and the calcium ions' entry conditions. The second distribution was modelled as quasi-uniform, with parameters independent of nerve stimulation frequency and calcium entry. Two different probability density functions for the distribution of synaptic delays suggest at least two independent processes controlling the time course of secretion, one of them potentially involving two stages. The relative contribution of these processes to the total number of mediator quanta released depends differently on the motor nerve stimulation pattern and on calcium ion entry into nerve endings.

  8. Atom counting in HAADF STEM using a statistical model-based approach: methodology, possibilities, and inherent limitations.

    PubMed

    De Backer, A; Martinez, G T; Rosenauer, A; Van Aert, S

    2013-11-01

    In the present paper, a statistical model-based method to count the number of atoms of monotype crystalline nanostructures from high resolution high-angle annular dark-field (HAADF) scanning transmission electron microscopy (STEM) images is discussed in detail together with a thorough study on the possibilities and inherent limitations. In order to count the number of atoms, it is assumed that the total scattered intensity scales with the number of atoms per atom column. These intensities are quantitatively determined using model-based statistical parameter estimation theory. The distribution describing the probability that intensity values are generated by atomic columns containing a specific number of atoms is inferred on the basis of the experimental scattered intensities. Finally, the number of atoms per atom column is quantified using this estimated probability distribution. The number of atom columns available in the observed STEM image, the number of components in the estimated probability distribution, the width of the components of the probability distribution, and the typical shape of a criterion to assess the number of components in the probability distribution directly affect the accuracy and precision with which the number of atoms in a particular atom column can be estimated. It is shown that single atom sensitivity is feasible taking the latter aspects into consideration. © 2013 Elsevier B.V. All rights reserved.

  9. New scaling model for variables and increments with heavy-tailed distributions

    NASA Astrophysics Data System (ADS)

    Riva, Monica; Neuman, Shlomo P.; Guadagnini, Alberto

    2015-06-01

    Many hydrological (as well as diverse earth, environmental, ecological, biological, physical, social, financial and other) variables, Y, exhibit frequency distributions that are difficult to reconcile with those of their spatial or temporal increments, ΔY. Whereas distributions of Y (or its logarithm) are at times slightly asymmetric with relatively mild peaks and tails, those of ΔY tend to be symmetric with peaks that grow sharper, and tails that become heavier, as the separation distance (lag) between pairs of Y values decreases. No statistical model known to us captures these behaviors of Y and ΔY in a unified and consistent manner. We propose a new, generalized sub-Gaussian model that does so. We derive analytical expressions for probability distribution functions (pdfs) of Y and ΔY as well as corresponding lead statistical moments. In our model the peak and tails of the ΔY pdf scale with lag in line with observed behavior. The model allows one to estimate, accurately and efficiently, all relevant parameters by analyzing jointly sample moments of Y and ΔY. We illustrate key features of our new model and method of inference on synthetically generated samples and neutron porosity data from a deep borehole.

  10. A probabilistic mechanical model for prediction of aggregates’ size distribution effect on concrete compressive strength

    NASA Astrophysics Data System (ADS)

    Miled, Karim; Limam, Oualid; Sab, Karam

    2012-06-01

    To predict aggregates' size distribution effect on the concrete compressive strength, a probabilistic mechanical model is proposed. Within this model, a Voronoi tessellation of a set of non-overlapping and rigid spherical aggregates is used to describe the concrete microstructure. Moreover, aggregates' diameters are defined as statistical variables and their size distribution function is identified to the experimental sieve curve. Then, an inter-aggregate failure criterion is proposed to describe the compressive-shear crushing of the hardened cement paste when concrete is subjected to uniaxial compression. Using a homogenization approach based on statistical homogenization and on geometrical simplifications, an analytical formula predicting the concrete compressive strength is obtained. This formula highlights the effects of cement paste strength and aggregates' size distribution and volume fraction on the concrete compressive strength. According to the proposed model, increasing the concrete strength for the same cement paste and the same aggregates' volume fraction is obtained by decreasing both aggregates' maximum size and the percentage of coarse aggregates. Finally, the validity of the model has been discussed through a comparison with experimental results (15 concrete compressive strengths ranging between 46 and 106 MPa) taken from literature and showing a good agreement with the model predictions.

  11. Discrete statistical model of fatigue crack growth in a Ni-base superalloy, capable of life prediction

    NASA Astrophysics Data System (ADS)

    Boyd-Lee, Ashley; King, Julia

    1992-07-01

    A discrete statistical model of fatigue crack growth in a nickel base superalloy Waspaloy, which is quantitative from the start of the short crack regime to failure, is presented. Instantaneous crack growth rate distributions and persistence of arrest distributions are used to compute fatigue lives and worst case scenarios without extrapolation. The basis of the model is non-material specific, it provides an improved method of analyzing crack growth rate data. For Waspaloy, the model shows the importance of good bulk fatigue crack growth resistance to resist early short fatigue crack growth and the importance of maximizing crack arrest both by the presence of a proportion of small grains and by maximizing grain boundary corrugation.

  12. A fragmentation model of earthquake-like behavior in internet access activity

    NASA Astrophysics Data System (ADS)

    Paguirigan, Antonino A.; Angco, Marc Jordan G.; Bantang, Johnrob Y.

    We present a fragmentation model that generates almost any inverse power-law size distribution, including dual-scaled versions, consistent with the underlying dynamics of systems with earthquake-like behavior. We apply the model to explain the dual-scaled power-law statistics observed in an Internet access dataset that covers more than 32 million requests. The non-Poissonian statistics of the requested data sizes m and the amount of time τ needed for complete processing are consistent with the Gutenberg-Richter-law. Inter-event times δt between subsequent requests are also shown to exhibit power-law distributions consistent with the generalized Omori law. Thus, the dataset is similar to the earthquake data except that two power-law regimes are observed. Using the proposed model, we are able to identify underlying dynamics responsible in generating the observed dual power-law distributions. The model is universal enough for its applicability to any physical and human dynamics that is limited by finite resources such as space, energy, time or opportunity.

  13. The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

    PubMed

    Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger

    2017-09-01

    The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).

  14. Methods of Information Geometry to model complex shapes

    NASA Astrophysics Data System (ADS)

    De Sanctis, A.; Gattone, S. A.

    2016-09-01

    In this paper, a new statistical method to model patterns emerging in complex systems is proposed. A framework for shape analysis of 2- dimensional landmark data is introduced, in which each landmark is represented by a bivariate Gaussian distribution. From Information Geometry we know that Fisher-Rao metric endows the statistical manifold of parameters of a family of probability distributions with a Riemannian metric. Thus this approach allows to reconstruct the intermediate steps in the evolution between observed shapes by computing the geodesic, with respect to the Fisher-Rao metric, between the corresponding distributions. Furthermore, the geodesic path can be used for shape predictions. As application, we study the evolution of the rat skull shape. A future application in Ophthalmology is introduced.

  15. Exploring Explanations of Subglacial Bedform Sizes Using Statistical Models

    PubMed Central

    Kougioumtzoglou, Ioannis A.; Stokes, Chris R.; Smith, Michael J.; Clark, Chris D.; Spagnolo, Matteo S.

    2016-01-01

    Sediments beneath modern ice sheets exert a key control on their flow, but are largely inaccessible except through geophysics or boreholes. In contrast, palaeo-ice sheet beds are accessible, and typically characterised by numerous bedforms. However, the interaction between bedforms and ice flow is poorly constrained and it is not clear how bedform sizes might reflect ice flow conditions. To better understand this link we present a first exploration of a variety of statistical models to explain the size distribution of some common subglacial bedforms (i.e., drumlins, ribbed moraine, MSGL). By considering a range of models, constructed to reflect key aspects of the physical processes, it is possible to infer that the size distributions are most effectively explained when the dynamics of ice-water-sediment interaction associated with bedform growth is fundamentally random. A ‘stochastic instability’ (SI) model, which integrates random bedform growth and shrinking through time with exponential growth, is preferred and is consistent with other observations of palaeo-bedforms and geophysical surveys of active ice sheets. Furthermore, we give a proof-of-concept demonstration that our statistical approach can bridge the gap between geomorphological observations and physical models, directly linking measurable size-frequency parameters to properties of ice sheet flow (e.g., ice velocity). Moreover, statistically developing existing models as proposed allows quantitative predictions to be made about sizes, making the models testable; a first illustration of this is given for a hypothesised repeat geophysical survey of bedforms under active ice. Thus, we further demonstrate the potential of size-frequency distributions of subglacial bedforms to assist the elucidation of subglacial processes and better constrain ice sheet models. PMID:27458921

  16. Modeling Psychological Contract Violation using Dual Regime Models: An Event-based Approach

    PubMed Central

    Hofmans, Joeri

    2017-01-01

    A good understanding of the dynamics of psychological contract violation requires theories, research methods and statistical models that explicitly recognize that violation feelings follow from an event that violates one's acceptance limits, after which interpretative processes are set into motion, determining the intensity of these violation feelings. Whereas theories—in the form of the dynamic model of the psychological contract—and research methods—in the form of daily diary research and experience sampling research—are available by now, the statistical tools to model such a two-stage process are still lacking. The aim of the present paper is to fill this gap in the literature by introducing two statistical models—the Zero-Inflated model and the Hurdle model—that closely mimic the theoretical process underlying the elicitation violation feelings via two model components: a binary distribution that models whether violation has occurred or not, and a count distribution that models how severe the negative impact is. Moreover, covariates can be included for both model components separately, which yields insight into their unique and shared antecedents. By doing this, the present paper offers a methodological-substantive synergy, showing how sophisticated methodology can be used to examine an important substantive issue. PMID:29163316

  17. Level crossings and excess times due to a superposition of uncorrelated exponential pulses

    NASA Astrophysics Data System (ADS)

    Theodorsen, A.; Garcia, O. E.

    2018-01-01

    A well-known stochastic model for intermittent fluctuations in physical systems is investigated. The model is given by a superposition of uncorrelated exponential pulses, and the degree of pulse overlap is interpreted as an intermittency parameter. Expressions for excess time statistics, that is, the rate of level crossings above a given threshold and the average time spent above the threshold, are derived from the joint distribution of the process and its derivative. Limits of both high and low intermittency are investigated and compared to previously known results. In the case of a strongly intermittent process, the distribution of times spent above threshold is obtained analytically. This expression is verified numerically, and the distribution of times above threshold is explored for other intermittency regimes. The numerical simulations compare favorably to known results for the distribution of times above the mean threshold for an Ornstein-Uhlenbeck process. This contribution generalizes the excess time statistics for the stochastic model, which find applications in a wide diversity of natural and technological systems.

  18. [The reentrant binomial model of nuclear anomalies growth in rhabdomyosarcoma RA-23 cell populations under increasing doze of rare ionizing radiation].

    PubMed

    Alekseeva, N P; Alekseev, A O; Vakhtin, Iu B; Kravtsov, V Iu; Kuzovatov, S N; Skorikova, T I

    2008-01-01

    Distributions of nuclear morphology anomalies in transplantable rabdomiosarcoma RA-23 cell populations were investigated under effect of ionizing radiation from 0 to 45 Gy. Internuclear bridges, nuclear protrusions and dumbbell-shaped nuclei were accepted for morphological anomalies. Empirical distributions of the number of anomalies per 100 nuclei were used. The adequate model of reentrant binomial distribution has been found. The sum of binomial random variables with binomial number of summands has such distribution. Averages of these random variables were named, accordingly, internal and external average reentrant components. Their maximum likelihood estimations were received. Statistical properties of these estimations were investigated by means of statistical modeling. It has been received that at equally significant correlation between the radiation dose and the average of nuclear anomalies in cell populations after two-three cellular cycles from the moment of irradiation in vivo the irradiation doze significantly correlates with internal average reentrant component, and in remote descendants of cell transplants irradiated in vitro - with external one.

  19. Statistical model for forecasting monthly large wildfire events in western United States

    Treesearch

    Haiganoush K. Preisler; Anthony L. Westerling

    2006-01-01

    The ability to forecast the number and location of large wildfire events (with specified confidence bounds) is important to fire managers attempting to allocate and distribute suppression efforts during severe fire seasons. This paper describes the development of a statistical model for assessing the forecasting skills of fire-danger predictors and producing 1-month-...

  20. Poisson-process generalization for the trading waiting-time distribution in a double-auction mechanism

    NASA Astrophysics Data System (ADS)

    Cincotti, Silvano; Ponta, Linda; Raberto, Marco; Scalas, Enrico

    2005-05-01

    In this paper, empirical analyses and computational experiments are presented on high-frequency data for a double-auction (book) market. Main objective of the paper is to generalize the order waiting time process in order to properly model such empirical evidences. The empirical study is performed on the best bid and best ask data of 7 U.S. financial markets, for 30-stock time series. In particular, statistical properties of trading waiting times have been analyzed and quality of fits is evaluated by suitable statistical tests, i.e., comparing empirical distributions with theoretical models. Starting from the statistical studies on real data, attention has been focused on the reproducibility of such results in an artificial market. The computational experiments have been performed within the Genoa Artificial Stock Market. In the market model, heterogeneous agents trade one risky asset in exchange for cash. Agents have zero intelligence and issue random limit or market orders depending on their budget constraints. The price is cleared by means of a limit order book. The order generation is modelled with a renewal process. Based on empirical trading estimation, the distribution of waiting times between two consecutive orders is modelled by a mixture of exponential processes. Results show that the empirical waiting-time distribution can be considered as a generalization of a Poisson process. Moreover, the renewal process can approximate real data and implementation on the artificial stocks market can reproduce the trading activity in a realistic way.

  1. Probabilistic performance estimators for computational chemistry methods: The empirical cumulative distribution function of absolute errors

    NASA Astrophysics Data System (ADS)

    Pernot, Pascal; Savin, Andreas

    2018-06-01

    Benchmarking studies in computational chemistry use reference datasets to assess the accuracy of a method through error statistics. The commonly used error statistics, such as the mean signed and mean unsigned errors, do not inform end-users on the expected amplitude of prediction errors attached to these methods. We show that, the distributions of model errors being neither normal nor zero-centered, these error statistics cannot be used to infer prediction error probabilities. To overcome this limitation, we advocate for the use of more informative statistics, based on the empirical cumulative distribution function of unsigned errors, namely, (1) the probability for a new calculation to have an absolute error below a chosen threshold and (2) the maximal amplitude of errors one can expect with a chosen high confidence level. Those statistics are also shown to be well suited for benchmarking and ranking studies. Moreover, the standard error on all benchmarking statistics depends on the size of the reference dataset. Systematic publication of these standard errors would be very helpful to assess the statistical reliability of benchmarking conclusions.

  2. Co-occurrence statistics as a language-dependent cue for speech segmentation.

    PubMed

    Saksida, Amanda; Langus, Alan; Nespor, Marina

    2017-05-01

    To what extent can language acquisition be explained in terms of different associative learning mechanisms? It has been hypothesized that distributional regularities in spoken languages are strong enough to elicit statistical learning about dependencies among speech units. Distributional regularities could be a useful cue for word learning even without rich language-specific knowledge. However, it is not clear how strong and reliable the distributional cues are that humans might use to segment speech. We investigate cross-linguistic viability of different statistical learning strategies by analyzing child-directed speech corpora from nine languages and by modeling possible statistics-based speech segmentations. We show that languages vary as to which statistical segmentation strategies are most successful. The variability of the results can be partially explained by systematic differences between languages, such as rhythmical differences. The results confirm previous findings that different statistical learning strategies are successful in different languages and suggest that infants may have to primarily rely on non-statistical cues when they begin their process of speech segmentation. © 2016 John Wiley & Sons Ltd.

  3. ProbOnto: ontology and knowledge base of probability distributions.

    PubMed

    Swat, Maciej J; Grenon, Pierre; Wimalaratne, Sarala

    2016-09-01

    Probability distributions play a central role in mathematical and statistical modelling. The encoding, annotation and exchange of such models could be greatly simplified by a resource providing a common reference for the definition of probability distributions. Although some resources exist, no suitably detailed and complex ontology exists nor any database allowing programmatic access. ProbOnto, is an ontology-based knowledge base of probability distributions, featuring more than 80 uni- and multivariate distributions with their defining functions, characteristics, relationships and re-parameterization formulas. It can be used for model annotation and facilitates the encoding of distribution-based models, related functions and quantities. http://probonto.org mjswat@ebi.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  4. SGR-like behaviour of the repeating FRB 121102

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, F.Y.; Yu, H., E-mail: fayinwang@nju.edu.cn, E-mail: yuhai@smail.nju.edu.cn

    2017-03-01

    Fast radio bursts (FRBs) are millisecond-duration radio signals occurring at cosmological distances. However the physical model of FRBs is mystery, many models have been proposed. Here we study the frequency distributions of peak flux, fluence, duration and waiting time for the repeating FRB 121102. The cumulative distributions of peak flux, fluence and duration show power-law forms. The waiting time distribution also shows power-law distribution, and is consistent with a non-stationary Poisson process. These distributions are similar as those of soft gamma repeaters (SGRs). We also use the statistical results to test the proposed models for FRBs. These distributions are consistentmore » with the predictions from avalanche models of slowly driven nonlinear dissipative systems.« less

  5. A Monte Carlo Approach to Unidimensionality Testing in Polytomous Rasch Models

    ERIC Educational Resources Information Center

    Christensen, Karl Bang; Kreiner, Svend

    2007-01-01

    Many statistical tests are designed to test the different assumptions of the Rasch model, but only few are directed at detecting multidimensionality. The Martin-Lof test is an attractive approach, the disadvantage being that its null distribution deviates strongly from the asymptotic chi-square distribution for most realistic sample sizes. A Monte…

  6. Statistical analysis of water-quality data containing multiple detection limits II: S-language software for nonparametric distribution modeling and hypothesis testing

    USGS Publications Warehouse

    Lee, L.; Helsel, D.

    2007-01-01

    Analysis of low concentrations of trace contaminants in environmental media often results in left-censored data that are below some limit of analytical precision. Interpretation of values becomes complicated when there are multiple detection limits in the data-perhaps as a result of changing analytical precision over time. Parametric and semi-parametric methods, such as maximum likelihood estimation and robust regression on order statistics, can be employed to model distributions of multiply censored data and provide estimates of summary statistics. However, these methods are based on assumptions about the underlying distribution of data. Nonparametric methods provide an alternative that does not require such assumptions. A standard nonparametric method for estimating summary statistics of multiply-censored data is the Kaplan-Meier (K-M) method. This method has seen widespread usage in the medical sciences within a general framework termed "survival analysis" where it is employed with right-censored time-to-failure data. However, K-M methods are equally valid for the left-censored data common in the geosciences. Our S-language software provides an analytical framework based on K-M methods that is tailored to the needs of the earth and environmental sciences community. This includes routines for the generation of empirical cumulative distribution functions, prediction or exceedance probabilities, and related confidence limits computation. Additionally, our software contains K-M-based routines for nonparametric hypothesis testing among an unlimited number of grouping variables. A primary characteristic of K-M methods is that they do not perform extrapolation and interpolation. Thus, these routines cannot be used to model statistics beyond the observed data range or when linear interpolation is desired. For such applications, the aforementioned parametric and semi-parametric methods must be used.

  7. Bug Distribution and Statistical Pattern Classification.

    ERIC Educational Resources Information Center

    Tatsuoka, Kikumi K.; Tatsuoka, Maurice M.

    1987-01-01

    The rule space model permits measurement of cognitive skill acquisition and error diagnosis. Further discussion introduces Bayesian hypothesis testing and bug distribution. An illustration involves an artificial intelligence approach to testing fractions and arithmetic. (Author/GDC)

  8. Meta-analysis of diagnostic test data: a bivariate Bayesian modeling approach.

    PubMed

    Verde, Pablo E

    2010-12-30

    In the last decades, the amount of published results on clinical diagnostic tests has expanded very rapidly. The counterpart to this development has been the formal evaluation and synthesis of diagnostic results. However, published results present substantial heterogeneity and they can be regarded as so far removed from the classical domain of meta-analysis, that they can provide a rather severe test of classical statistical methods. Recently, bivariate random effects meta-analytic methods, which model the pairs of sensitivities and specificities, have been presented from the classical point of view. In this work a bivariate Bayesian modeling approach is presented. This approach substantially extends the scope of classical bivariate methods by allowing the structural distribution of the random effects to depend on multiple sources of variability. Meta-analysis is summarized by the predictive posterior distributions for sensitivity and specificity. This new approach allows, also, to perform substantial model checking, model diagnostic and model selection. Statistical computations are implemented in the public domain statistical software (WinBUGS and R) and illustrated with real data examples. Copyright © 2010 John Wiley & Sons, Ltd.

  9. Fisher's method of combining dependent statistics using generalizations of the gamma distribution with applications to genetic pleiotropic associations.

    PubMed

    Li, Qizhai; Hu, Jiyuan; Ding, Juan; Zheng, Gang

    2014-04-01

    A classical approach to combine independent test statistics is Fisher's combination of $p$-values, which follows the $\\chi ^2$ distribution. When the test statistics are dependent, the gamma distribution (GD) is commonly used for the Fisher's combination test (FCT). We propose to use two generalizations of the GD: the generalized and the exponentiated GDs. We study some properties of mis-using the GD for the FCT to combine dependent statistics when one of the two proposed distributions are true. Our results show that both generalizations have better control of type I error rates than the GD, which tends to have inflated type I error rates at more extreme tails. In practice, common model selection criteria (e.g. Akaike information criterion/Bayesian information criterion) can be used to help select a better distribution to use for the FCT. A simple strategy of the two generalizations of the GD in genome-wide association studies is discussed. Applications of the results to genetic pleiotrophic associations are described, where multiple traits are tested for association with a single marker.

  10. Passage relevance models for genomics search.

    PubMed

    Urbain, Jay; Frieder, Ophir; Goharian, Nazli

    2009-03-19

    We present a passage relevance model for integrating syntactic and semantic evidence of biomedical concepts and topics using a probabilistic graphical model. Component models of topics, concepts, terms, and document are represented as potential functions within a Markov Random Field. The probability of a passage being relevant to a biologist's information need is represented as the joint distribution across all potential functions. Relevance model feedback of top ranked passages is used to improve distributional estimates of query concepts and topics in context, and a dimensional indexing strategy is used for efficient aggregation of concept and term statistics. By integrating multiple sources of evidence including dependencies between topics, concepts, and terms, we seek to improve genomics literature passage retrieval precision. Using this model, we are able to demonstrate statistically significant improvements in retrieval precision using a large genomics literature corpus.

  11. Hyperparameterization of soil moisture statistical models for North America with Ensemble Learning Models (Elm)

    NASA Astrophysics Data System (ADS)

    Steinberg, P. D.; Brener, G.; Duffy, D.; Nearing, G. S.; Pelissier, C.

    2017-12-01

    Hyperparameterization, of statistical models, i.e. automated model scoring and selection, such as evolutionary algorithms, grid searches, and randomized searches, can improve forecast model skill by reducing errors associated with model parameterization, model structure, and statistical properties of training data. Ensemble Learning Models (Elm), and the related Earthio package, provide a flexible interface for automating the selection of parameters and model structure for machine learning models common in climate science and land cover classification, offering convenient tools for loading NetCDF, HDF, Grib, or GeoTiff files, decomposition methods like PCA and manifold learning, and parallel training and prediction with unsupervised and supervised classification, clustering, and regression estimators. Continuum Analytics is using Elm to experiment with statistical soil moisture forecasting based on meteorological forcing data from NASA's North American Land Data Assimilation System (NLDAS). There Elm is using the NSGA-2 multiobjective optimization algorithm for optimizing statistical preprocessing of forcing data to improve goodness-of-fit for statistical models (i.e. feature engineering). This presentation will discuss Elm and its components, including dask (distributed task scheduling), xarray (data structures for n-dimensional arrays), and scikit-learn (statistical preprocessing, clustering, classification, regression), and it will show how NSGA-2 is being used for automate selection of soil moisture forecast statistical models for North America.

  12. Explorations in statistics: the log transformation.

    PubMed

    Curran-Everett, Douglas

    2018-06-01

    Learning about statistics is a lot like learning about science: the learning is more meaningful if you can actively explore. This thirteenth installment of Explorations in Statistics explores the log transformation, an established technique that rescales the actual observations from an experiment so that the assumptions of some statistical analysis are better met. A general assumption in statistics is that the variability of some response Y is homogeneous across groups or across some predictor variable X. If the variability-the standard deviation-varies in rough proportion to the mean value of Y, a log transformation can equalize the standard deviations. Moreover, if the actual observations from an experiment conform to a skewed distribution, then a log transformation can make the theoretical distribution of the sample mean more consistent with a normal distribution. This is important: the results of a one-sample t test are meaningful only if the theoretical distribution of the sample mean is roughly normal. If we log-transform our observations, then we want to confirm the transformation was useful. We can do this if we use the Box-Cox method, if we bootstrap the sample mean and the statistic t itself, and if we assess the residual plots from the statistical model of the actual and transformed sample observations.

  13. Statistics of intensity in adaptive-optics images and their usefulness for detection and photometry of exoplanets.

    PubMed

    Gladysz, Szymon; Yaitskova, Natalia; Christou, Julian C

    2010-11-01

    This paper is an introduction to the problem of modeling the probability density function of adaptive-optics speckle. We show that with the modified Rician distribution one cannot describe the statistics of light on axis. A dual solution is proposed: the modified Rician distribution for off-axis speckle and gamma-based distribution for the core of the point spread function. From these two distributions we derive optimal statistical discriminators between real sources and quasi-static speckles. In the second part of the paper the morphological difference between the two probability density functions is used to constrain a one-dimensional, "blind," iterative deconvolution at the position of an exoplanet. Separation of the probability density functions of signal and speckle yields accurate differential photometry in our simulations of the SPHERE planet finder instrument.

  14. Statistical physics studies of multilayer adsorption isotherm in food materials and pore size distribution

    NASA Astrophysics Data System (ADS)

    Aouaini, F.; Knani, S.; Ben Yahia, M.; Ben Lamine, A.

    2015-08-01

    Water sorption isotherms of foodstuffs are very important in different areas of food science engineering such as for design, modeling and optimization of many processes. The equilibrium moisture content is an important parameter in models used to predict changes in the moisture content of a product during storage. A formulation of multilayer model with two energy levels was based on statistical physics and theoretical considerations. Thanks to the grand canonical ensemble in statistical physics. Some physicochemical parameters related to the adsorption process were introduced in the analytical model expression. The data tabulated in literature of water adsorption at different temperatures on: chickpea seeds, lentil seeds, potato and on green peppers were described applying the most popular models applied in food science. We also extend the study to the newest proposed model. It is concluded that among studied models the proposed model seems to be the best for description of data in the whole range of relative humidity. By using our model, we were able to determine the thermodynamic functions. The measurement of desorption isotherms, in particular a gas over a solid porous, allows access to the distribution of pore size PSD.

  15. PET image reconstruction: a robust state space approach.

    PubMed

    Liu, Huafeng; Tian, Yi; Shi, Pengcheng

    2005-01-01

    Statistical iterative reconstruction algorithms have shown improved image quality over conventional nonstatistical methods in PET by using accurate system response models and measurement noise models. Strictly speaking, however, PET measurements, pre-corrected for accidental coincidences, are neither Poisson nor Gaussian distributed and thus do not meet basic assumptions of these algorithms. In addition, the difficulty in determining the proper system response model also greatly affects the quality of the reconstructed images. In this paper, we explore the usage of state space principles for the estimation of activity map in tomographic PET imaging. The proposed strategy formulates the organ activity distribution through tracer kinetics models, and the photon-counting measurements through observation equations, thus makes it possible to unify the dynamic reconstruction problem and static reconstruction problem into a general framework. Further, it coherently treats the uncertainties of the statistical model of the imaging system and the noisy nature of measurement data. Since H(infinity) filter seeks minimummaximum-error estimates without any assumptions on the system and data noise statistics, it is particular suited for PET image reconstruction where the statistical properties of measurement data and the system model are very complicated. The performance of the proposed framework is evaluated using Shepp-Logan simulated phantom data and real phantom data with favorable results.

  16. Why inputs matter: Selection of climatic variables for species distribution modelling in the Himalayan region

    NASA Astrophysics Data System (ADS)

    Bobrowski, Maria; Schickhoff, Udo

    2017-04-01

    Betula utilis is a major constituent of alpine treeline ecotones in the western and central Himalayan region. The objective of this study is to provide first time analysis of the potential distribution of Betula utilis in the subalpine and alpine belts of the Himalayan region using species distribution modelling. Using Generalized Linear Models (GLM) we aim at examining climatic factors controlling the species distribution under current climate conditions. Furthermore we evaluate the prediction ability of climate data derived from different statistical methods. GLMs were created using least correlated bioclimatic variables derived from two different climate models: 1) interpolated climate data (i.e. Worldclim, Hijmans et al., 2005) and 2) quasi-mechanistical statistical downscaling (i.e. Chelsa; Karger et al., 2016). Model accuracy was evaluated by the ability to predict the potential species distribution range. We found that models based on variables of Chelsa climate data had higher predictive power, whereas models using Worldclim climate data consistently overpredicted the potential suitable habitat for Betula utilis. Although climatic variables of Worldclim are widely used in modelling species distribution, our results suggest to treat them with caution when remote regions like the Himalayan mountains are in focus. Unmindful usage of climatic variables for species distribution models potentially cause misleading projections and may lead to wrong implications and recommendations for nature conservation. References: Hijmans, R.J., Cameron, S.E., Parra, J.L., Jones, P.G. & Jarvis, A. (2005) Very high resolution interpolated climate surfaces for global land areas. International Journal of Climatology, 25, 1965-1978. Karger, D.N., Conrad, O., Böhner, J., Kawohl, T., Kreft, H., Soria-Auza, R.W., Zimmermann, N., Linder, H.P. & Kessler, M. (2016) Climatologies at high resolution for the earth land surface areas. arXiv:1607.00217 [physics].

  17. Characterization of Cloud Water-Content Distribution

    NASA Technical Reports Server (NTRS)

    Lee, Seungwon

    2010-01-01

    The development of realistic cloud parameterizations for climate models requires accurate characterizations of subgrid distributions of thermodynamic variables. To this end, a software tool was developed to characterize cloud water-content distributions in climate-model sub-grid scales. This software characterizes distributions of cloud water content with respect to cloud phase, cloud type, precipitation occurrence, and geo-location using CloudSat radar measurements. It uses a statistical method called maximum likelihood estimation to estimate the probability density function of the cloud water content.

  18. Liquid water breakthrough location distances on a gas diffusion layer of polymer electrolyte membrane fuel cells

    NASA Astrophysics Data System (ADS)

    Yu, Junliang; Froning, Dieter; Reimer, Uwe; Lehnert, Werner

    2018-06-01

    The lattice Boltzmann method is adopted to simulate the three dimensional dynamic process of liquid water breaking through the gas diffusion layer (GDL) in the polymer electrolyte membrane fuel cell. 22 micro-structures of Toray GDL are built based on a stochastic geometry model. It is found that more than one breakthrough locations are formed randomly on the GDL surface. Breakthrough location distance (BLD) are analyzed statistically in two ways. The distribution is evaluated statistically by the Lilliefors test. It is concluded that the BLD can be described by the normal distribution with certain statistic characteristics. Information of the shortest neighbor breakthrough location distance can be the input modeling setups on the cell-scale simulations in the field of fuel cell simulation.

  19. Slant path rain attenuation and path diversity statistics obtained through radar modeling of rain structure

    NASA Technical Reports Server (NTRS)

    Goldhirsh, J.

    1984-01-01

    Single and joint terminal slant path attenuation statistics at frequencies of 28.56 and 19.04 GHz have been derived, employing a radar data base obtained over a three-year period at Wallops Island, VA. Statistics were independently obtained for path elevation angles of 20, 45, and 90 deg for purposes of examining how elevation angles influences both single-terminal and joint probability distributions. Both diversity gains and autocorrelation function dependence on site spacing and elevation angles were determined employing the radar modeling results. Comparisons with other investigators are presented. An independent path elevation angle prediction technique was developed and demonstrated to fit well with the radar-derived single and joint terminal radar-derived cumulative fade distributions at various elevation angles.

  20. Parameter optimization in biased decoy-state quantum key distribution with both source errors and statistical fluctuations

    NASA Astrophysics Data System (ADS)

    Zhu, Jian-Rong; Li, Jian; Zhang, Chun-Mei; Wang, Qin

    2017-10-01

    The decoy-state method has been widely used in commercial quantum key distribution (QKD) systems. In view of the practical decoy-state QKD with both source errors and statistical fluctuations, we propose a universal model of full parameter optimization in biased decoy-state QKD with phase-randomized sources. Besides, we adopt this model to carry out simulations of two widely used sources: weak coherent source (WCS) and heralded single-photon source (HSPS). Results show that full parameter optimization can significantly improve not only the secure transmission distance but also the final key generation rate. And when taking source errors and statistical fluctuations into account, the performance of decoy-state QKD using HSPS suffered less than that of decoy-state QKD using WCS.

  1. Normal probabilities for Vandenberg AFB wind components - monthly reference periods for all flight azimuths, 0- to 70-km altitudes

    NASA Technical Reports Server (NTRS)

    Falls, L. W.

    1975-01-01

    Vandenberg Air Force Base (AFB), California, wind component statistics are presented to be used for aerospace engineering applications that require component wind probabilities for various flight azimuths and selected altitudes. The normal (Gaussian) distribution is presented as a statistical model to represent component winds at Vandenberg AFB. Head tail, and crosswind components are tabulated for all flight azimuths for altitudes from 0 to 70 km by monthly reference periods. Wind components are given for 11 selected percentiles ranging from 0.135 percent to 99.865 percent for each month. The results of statistical goodness-of-fit tests are presented to verify the use of the Gaussian distribution as an adequate model to represent component winds at Vandenberg AFB.

  2. Normal probabilities for Cape Kennedy wind components: Monthly reference periods for all flight azimuths. Altitudes 0 to 70 kilometers

    NASA Technical Reports Server (NTRS)

    Falls, L. W.

    1973-01-01

    This document replaces Cape Kennedy empirical wind component statistics which are presently being used for aerospace engineering applications that require component wind probabilities for various flight azimuths and selected altitudes. The normal (Gaussian) distribution is presented as an adequate statistical model to represent component winds at Cape Kennedy. Head-, tail-, and crosswind components are tabulated for all flight azimuths for altitudes from 0 to 70 km by monthly reference periods. Wind components are given for 11 selected percentiles ranging from 0.135 percent to 99,865 percent for each month. Results of statistical goodness-of-fit tests are presented to verify the use of the Gaussian distribution as an adequate model to represent component winds at Cape Kennedy, Florida.

  3. Statistical representation of multiphase flow

    NASA Astrophysics Data System (ADS)

    Subramaniam

    2000-11-01

    The relationship between two common statistical representations of multiphase flow, namely, the single--point Eulerian statistical representation of two--phase flow (D. A. Drew, Ann. Rev. Fluid Mech. (15), 1983), and the Lagrangian statistical representation of a spray using the dropet distribution function (F. A. Williams, Phys. Fluids 1 (6), 1958) is established for spherical dispersed--phase elements. This relationship is based on recent work which relates the droplet distribution function to single--droplet pdfs starting from a Liouville description of a spray (Subramaniam, Phys. Fluids 10 (12), 2000). The Eulerian representation, which is based on a random--field model of the flow, is shown to contain different statistical information from the Lagrangian representation, which is based on a point--process model. The two descriptions are shown to be simply related for spherical, monodisperse elements in statistically homogeneous two--phase flow, whereas such a simple relationship is precluded by the inclusion of polydispersity and statistical inhomogeneity. The common origin of these two representations is traced to a more fundamental statistical representation of a multiphase flow, whose concepts derive from a theory for dense sprays recently proposed by Edwards (Atomization and Sprays 10 (3--5), 2000). The issue of what constitutes a minimally complete statistical representation of a multiphase flow is resolved.

  4. TSP Symposium 2012 Proceedings

    DTIC Science & Technology

    2012-11-01

    and Statistical Model 78 7.3 Analysis and Results 79 7.4 Threats to Validity and Limitations 85 7.5 Conclusions 86 7.6 Acknowledgments 87 7.7...Table 12: Overall Statistics of the Experiment 32 Table 13: Results of Pairwise ANOVA Analysis, Highlighting Statistically Significant Differences...we calculated the percentage of defects injected. The distribution statistics are shown in Table 2. Table 2: Mean Lower, Upper Confidence Interval

  5. Statistical fluctuations in pedestrian evacuation times and the effect of social contagion

    NASA Astrophysics Data System (ADS)

    Nicolas, Alexandre; Bouzat, Sebastián; Kuperman, Marcelo N.

    2016-08-01

    Mathematical models of pedestrian evacuation and the associated simulation software have become essential tools for the assessment of the safety of public facilities and buildings. While a variety of models is now available, their calibration and test against empirical data are generally restricted to global averaged quantities; the statistics compiled from the time series of individual escapes ("microscopic" statistics) measured in recent experiments are thus overlooked. In the same spirit, much research has primarily focused on the average global evacuation time, whereas the whole distribution of evacuation times over some set of realizations should matter. In the present paper we propose and discuss the validity of a simple relation between this distribution and the microscopic statistics, which is theoretically valid in the absence of correlations. To this purpose, we develop a minimal cellular automaton, with features that afford a semiquantitative reproduction of the experimental microscopic statistics. We then introduce a process of social contagion of impatient behavior in the model and show that the simple relation under test may dramatically fail at high contagion strengths, the latter being responsible for the emergence of strong correlations in the system. We conclude with comments on the potential practical relevance for safety science of calculations based on microscopic statistics.

  6. Modeling the complexity of acoustic emission during intermittent plastic deformation: Power laws and multifractal spectra

    NASA Astrophysics Data System (ADS)

    Kumar, Jagadish; Ananthakrishna, G.

    2018-01-01

    Scale-invariant power-law distributions for acoustic emission signals are ubiquitous in several plastically deforming materials. However, power-law distributions for acoustic emission energies are reported in distinctly different plastically deforming situations such as hcp and fcc single and polycrystalline samples exhibiting smooth stress-strain curves and in dilute metallic alloys exhibiting discontinuous flow. This is surprising since the underlying dislocation mechanisms in these two types of deformations are very different. So far, there have been no models that predict the power-law statistics for discontinuous flow. Furthermore, the statistics of the acoustic emission signals in jerky flow is even more complex, requiring multifractal measures for a proper characterization. There has been no model that explains the complex statistics either. Here we address the problem of statistical characterization of the acoustic emission signals associated with the three types of the Portevin-Le Chatelier bands. Following our recently proposed general framework for calculating acoustic emission, we set up a wave equation for the elastic degrees of freedom with a plastic strain rate as a source term. The energy dissipated during acoustic emission is represented by the Rayleigh-dissipation function. Using the plastic strain rate obtained from the Ananthakrishna model for the Portevin-Le Chatelier effect, we compute the acoustic emission signals associated with the three Portevin-Le Chatelier bands and the Lüders-like band. The so-calculated acoustic emission signals are used for further statistical characterization. Our results show that the model predicts power-law statistics for all the acoustic emission signals associated with the three types of Portevin-Le Chatelier bands with the exponent values increasing with increasing strain rate. The calculated multifractal spectra corresponding to the acoustic emission signals associated with the three band types have a maximum spread for the type C bands and decreasing with types B and A. We further show that the acoustic emission signals associated with Lüders-like band also exhibit a power-law distribution and multifractality.

  7. Modeling the complexity of acoustic emission during intermittent plastic deformation: Power laws and multifractal spectra.

    PubMed

    Kumar, Jagadish; Ananthakrishna, G

    2018-01-01

    Scale-invariant power-law distributions for acoustic emission signals are ubiquitous in several plastically deforming materials. However, power-law distributions for acoustic emission energies are reported in distinctly different plastically deforming situations such as hcp and fcc single and polycrystalline samples exhibiting smooth stress-strain curves and in dilute metallic alloys exhibiting discontinuous flow. This is surprising since the underlying dislocation mechanisms in these two types of deformations are very different. So far, there have been no models that predict the power-law statistics for discontinuous flow. Furthermore, the statistics of the acoustic emission signals in jerky flow is even more complex, requiring multifractal measures for a proper characterization. There has been no model that explains the complex statistics either. Here we address the problem of statistical characterization of the acoustic emission signals associated with the three types of the Portevin-Le Chatelier bands. Following our recently proposed general framework for calculating acoustic emission, we set up a wave equation for the elastic degrees of freedom with a plastic strain rate as a source term. The energy dissipated during acoustic emission is represented by the Rayleigh-dissipation function. Using the plastic strain rate obtained from the Ananthakrishna model for the Portevin-Le Chatelier effect, we compute the acoustic emission signals associated with the three Portevin-Le Chatelier bands and the Lüders-like band. The so-calculated acoustic emission signals are used for further statistical characterization. Our results show that the model predicts power-law statistics for all the acoustic emission signals associated with the three types of Portevin-Le Chatelier bands with the exponent values increasing with increasing strain rate. The calculated multifractal spectra corresponding to the acoustic emission signals associated with the three band types have a maximum spread for the type C bands and decreasing with types B and A. We further show that the acoustic emission signals associated with Lüders-like band also exhibit a power-law distribution and multifractality.

  8. Statistical Modeling of Robotic Random Walks on Different Terrain

    NASA Astrophysics Data System (ADS)

    Naylor, Austin; Kinnaman, Laura

    Issues of public safety, especially with crowd dynamics and pedestrian movement, have been modeled by physicists using methods from statistical mechanics over the last few years. Complex decision making of humans moving on different terrains can be modeled using random walks (RW) and correlated random walks (CRW). The effect of different terrains, such as a constant increasing slope, on RW and CRW was explored. LEGO robots were programmed to make RW and CRW with uniform step sizes. Level ground tests demonstrated that the robots had the expected step size distribution and correlation angles (for CRW). The mean square displacement was calculated for each RW and CRW on different terrains and matched expected trends. The step size distribution was determined to change based on the terrain; theoretical predictions for the step size distribution were made for various simple terrains. It's Dr. Laura Kinnaman, not sure where to put the Prefix.

  9. Modified retrieval algorithm for three types of precipitation distribution using x-band synthetic aperture radar

    NASA Astrophysics Data System (ADS)

    Xie, Yanan; Zhou, Mingliang; Pan, Dengke

    2017-10-01

    The forward-scattering model is introduced to describe the response of normalized radar cross section (NRCS) of precipitation with synthetic aperture radar (SAR). Since the distribution of near-surface rainfall is related to the rate of near-surface rainfall and horizontal distribution factor, a retrieval algorithm called modified regression empirical and model-oriented statistical (M-M) based on the volterra integration theory is proposed. Compared with the model-oriented statistical and volterra integration (MOSVI) algorithm, the biggest difference is that the M-M algorithm is based on the modified regression empirical algorithm rather than the linear regression formula to retrieve the value of near-surface rainfall rate. Half of the empirical parameters are reduced in the weighted integral work and a smaller average relative error is received while the rainfall rate is less than 100 mm/h. Therefore, the algorithm proposed in this paper can obtain high-precision rainfall information.

  10. Bayesian models based on test statistics for multiple hypothesis testing problems.

    PubMed

    Ji, Yuan; Lu, Yiling; Mills, Gordon B

    2008-04-01

    We propose a Bayesian method for the problem of multiple hypothesis testing that is routinely encountered in bioinformatics research, such as the differential gene expression analysis. Our algorithm is based on modeling the distributions of test statistics under both null and alternative hypotheses. We substantially reduce the complexity of the process of defining posterior model probabilities by modeling the test statistics directly instead of modeling the full data. Computationally, we apply a Bayesian FDR approach to control the number of rejections of null hypotheses. To check if our model assumptions for the test statistics are valid for various bioinformatics experiments, we also propose a simple graphical model-assessment tool. Using extensive simulations, we demonstrate the performance of our models and the utility of the model-assessment tool. In the end, we apply the proposed methodology to an siRNA screening and a gene expression experiment.

  11. The Many Null Distributions of Person Fit Indices.

    ERIC Educational Resources Information Center

    Molenaar, Ivo W.; Hoijtink, Herbert

    1990-01-01

    Statistical properties of person fit indices are reviewed as indicators of the extent to which a person's score pattern is in agreement with a measurement model. Distribution of a fit index and ability-free fit evaluation are discussed. The null distribution was simulated for a test of 20 items. (SLD)

  12. Coronary artery calcium distributions in older persons in the AGES-Reykjavik study

    PubMed Central

    Gudmundsson, Elias Freyr; Gudnason, Vilmundur; Sigurdsson, Sigurdur; Launer, Lenore J.; Harris, Tamara B.; Aspelund, Thor

    2013-01-01

    Coronary Artery Calcium (CAC) is a sign of advanced atherosclerosis and an independent risk factor for cardiac events. Here, we describe CAC-distributions in an unselected aged population and compare modelling methods to characterize CAC-distribution. CAC is difficult to model because it has a skewed and zero inflated distribution with over-dispersion. Data are from the AGES-Reykjavik sample, a large population based study [2002-2006] in Iceland of 5,764 persons aged 66-96 years. Linear regressions using logarithmic- and Box-Cox transformations on CAC+1, quantile regression and a Zero-Inflated Negative Binomial model (ZINB) were applied. Methods were compared visually and with the PRESS-statistic, R2 and number of detected associations with concurrently measured variables. There were pronounced differences in CAC according to sex, age, history of coronary events and presence of plaque in the carotid artery. Associations with conventional coronary artery disease (CAD) risk factors varied between the sexes. The ZINB model provided the best results with respect to the PRESS-statistic, R2, and predicted proportion of zero scores. The ZINB model detected similar numbers of associations as the linear regression on ln(CAC+1) and usually with the same risk factors. PMID:22990371

  13. Pressure balance inconsistency exhibited in a statistical model of magnetospheric plasma

    NASA Astrophysics Data System (ADS)

    Garner, T. W.; Wolf, R. A.; Spiro, R. W.; Thomsen, M. F.; Korth, H.

    2003-08-01

    While quantitative theories of plasma flow from the magnetotail to the inner magnetosphere typically assume adiabatic convection, it has long been understood that these convection models tend to overestimate the plasma pressure in the inner magnetosphere. This phenomenon is called the pressure crisis or the pressure balance inconsistency. In order to analyze it in a new and more detailed manner we utilize an empirical model of the proton and electron distribution functions in the near-Earth plasma sheet (-50 RE < X < -10 RE), which uses the [1989] magnetic field model and a plasma sheet representation based upon several previously published statistical studies. We compare our results to a statistically derived particle distribution function at geosynchronous orbit. In this analysis the particle distribution function is characterized by the isotropic energy invariant λ = EV2/3, where E is the particle's kinetic energy and V is the magnetic flux tube volume. The energy invariant is conserved in guiding center drift under the assumption of strong, elastic pitch angle scattering. If, in addition, loss is negligible, the phase space density f(λ) is also conserved along the same path. The statistical model indicates that f(λ, ?) is approximately independent of X for X ≤ -35 RE but decreases with increasing X for X ≥ -35 RE. The tailward gradient of f(λ, ?) might be attributed to gradient/curvature drift for large isotropic energy invariants but not for small invariants. The tailward gradient of the distribution function indicates a violation of the adiabatic drift condition in the plasma sheet. It also confirms the existence of a "number crisis" in addition to the pressure crisis. In addition, plasma sheet pressure gradients, when crossed with the gradient of flux tube volume computed from the [1989] magnetic field model, indicate Region 1 currents on the dawn and dusk sides of the outer plasma sheet.

  14. Percolation technique for galaxy clustering

    NASA Technical Reports Server (NTRS)

    Klypin, Anatoly; Shandarin, Sergei F.

    1993-01-01

    We study percolation in mass and galaxy distributions obtained in 3D simulations of the CDM, C + HDM, and the power law (n = -1) models in the Omega = 1 universe. Percolation statistics is used here as a quantitative measure of the degree to which a mass or galaxy distribution is of a filamentary or cellular type. The very fast code used calculates the statistics of clusters along with the direct detection of percolation. We found that the two parameters mu(infinity), characterizing the size of the largest cluster, and mu-squared, characterizing the weighted mean size of all clusters excluding the largest one, are extremely useful for evaluating the percolation threshold. An advantage of using these parameters is their low sensitivity to boundary effects. We show that both the CDM and the C + HDM models are extremely filamentary both in mass and galaxy distribution. The percolation thresholds for the mass distributions are determined.

  15. Performance prediction of a synchronization link for distributed aerospace wireless systems.

    PubMed

    Wang, Wen-Qin; Shao, Huaizong

    2013-01-01

    For reasons of stealth and other operational advantages, distributed aerospace wireless systems have received much attention in recent years. In a distributed aerospace wireless system, since the transmitter and receiver placed on separated platforms which use independent master oscillators, there is no cancellation of low-frequency phase noise as in the monostatic cases. Thus, high accurate time and frequency synchronization techniques are required for distributed wireless systems. The use of a dedicated synchronization link to quantify and compensate oscillator frequency instability is investigated in this paper. With the mathematical statistical models of phase noise, closed-form analytic expressions for the synchronization link performance are derived. The possible error contributions including oscillator, phase-locked loop, and receiver noise are quantified. The link synchronization performance is predicted by utilizing the knowledge of the statistical models, system error contributions, and sampling considerations. Simulation results show that effective synchronization error compensation can be achieved by using this dedicated synchronization link.

  16. Full Counting Statistics for Interacting Fermions with Determinantal Quantum Monte Carlo Simulations.

    PubMed

    Humeniuk, Stephan; Büchler, Hans Peter

    2017-12-08

    We present a method for computing the full probability distribution function of quadratic observables such as particle number or magnetization for the Fermi-Hubbard model within the framework of determinantal quantum Monte Carlo calculations. Especially in cold atom experiments with single-site resolution, such a full counting statistics can be obtained from repeated projective measurements. We demonstrate that the full counting statistics can provide important information on the size of preformed pairs. Furthermore, we compute the full counting statistics of the staggered magnetization in the repulsive Hubbard model at half filling and find excellent agreement with recent experimental results. We show that current experiments are capable of probing the difference between the Hubbard model and the limiting Heisenberg model.

  17. Generalized linear and generalized additive models in studies of species distributions: Setting the scene

    USGS Publications Warehouse

    Guisan, Antoine; Edwards, T.C.; Hastie, T.

    2002-01-01

    An important statistical development of the last 30 years has been the advance in regression analysis provided by generalized linear models (GLMs) and generalized additive models (GAMs). Here we introduce a series of papers prepared within the framework of an international workshop entitled: Advances in GLMs/GAMs modeling: from species distribution to environmental management, held in Riederalp, Switzerland, 6-11 August 2001. We first discuss some general uses of statistical models in ecology, as well as provide a short review of several key examples of the use of GLMs and GAMs in ecological modeling efforts. We next present an overview of GLMs and GAMs, and discuss some of their related statistics used for predictor selection, model diagnostics, and evaluation. Included is a discussion of several new approaches applicable to GLMs and GAMs, such as ridge regression, an alternative to stepwise selection of predictors, and methods for the identification of interactions by a combined use of regression trees and several other approaches. We close with an overview of the papers and how we feel they advance our understanding of their application to ecological modeling. ?? 2002 Elsevier Science B.V. All rights reserved.

  18. Statistical ecology comes of age.

    PubMed

    Gimenez, Olivier; Buckland, Stephen T; Morgan, Byron J T; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-12-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1-4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data.

  19. Statistical ecology comes of age

    PubMed Central

    Gimenez, Olivier; Buckland, Stephen T.; Morgan, Byron J. T.; Bez, Nicolas; Bertrand, Sophie; Choquet, Rémi; Dray, Stéphane; Etienne, Marie-Pierre; Fewster, Rachel; Gosselin, Frédéric; Mérigot, Bastien; Monestiez, Pascal; Morales, Juan M.; Mortier, Frédéric; Munoz, François; Ovaskainen, Otso; Pavoine, Sandrine; Pradel, Roger; Schurr, Frank M.; Thomas, Len; Thuiller, Wilfried; Trenkel, Verena; de Valpine, Perry; Rexstad, Eric

    2014-01-01

    The desire to predict the consequences of global environmental change has been the driver towards more realistic models embracing the variability and uncertainties inherent in ecology. Statistical ecology has gelled over the past decade as a discipline that moves away from describing patterns towards modelling the ecological processes that generate these patterns. Following the fourth International Statistical Ecology Conference (1–4 July 2014) in Montpellier, France, we analyse current trends in statistical ecology. Important advances in the analysis of individual movement, and in the modelling of population dynamics and species distributions, are made possible by the increasing use of hierarchical and hidden process models. Exciting research perspectives include the development of methods to interpret citizen science data and of efficient, flexible computational algorithms for model fitting. Statistical ecology has come of age: it now provides a general and mathematically rigorous framework linking ecological theory and empirical data. PMID:25540151

  20. Modelling the distribution of domestic ducks in Monsoon Asia

    USGS Publications Warehouse

    Van Bockel, Thomas P.; Prosser, Diann; Franceschini, Gianluca; Biradar, Chandra; Wint, William; Robinson, Tim; Gilbert, Marius

    2011-01-01

    Domestic ducks are considered to be an important reservoir of highly pathogenic avian influenza (HPAI), as shown by a number of geospatial studies in which they have been identified as a significant risk factor associated with disease presence. Despite their importance in HPAI epidemiology, their large-scale distribution in Monsoon Asia is poorly understood. In this study, we created a spatial database of domestic duck census data in Asia and used it to train statistical distribution models for domestic duck distributions at a spatial resolution of 1km. The method was based on a modelling framework used by the Food and Agriculture Organisation to produce the Gridded Livestock of the World (GLW) database, and relies on stratified regression models between domestic duck densities and a set of agro-ecological explanatory variables. We evaluated different ways of stratifying the analysis and of combining the prediction to optimize the goodness of fit of the predictions. We found that domestic duck density could be predicted with reasonable accuracy (mean RMSE and correlation coefficient between log-transformed observed and predicted densities being 0.58 and 0.80, respectively), using a stratification based on livestock production systems. We tested the use of artificially degraded data on duck distributions in Thailand and Vietnam as training data, and compared the modelled outputs with the original high-resolution data. This showed, for these two countries at least, that these approaches could be used to accurately disaggregate provincial level (administrative level 1) statistical data to provide high resolution model distributions.

  1. Comparative analysis through probability distributions of a data set

    NASA Astrophysics Data System (ADS)

    Cristea, Gabriel; Constantinescu, Dan Mihai

    2018-02-01

    In practice, probability distributions are applied in such diverse fields as risk analysis, reliability engineering, chemical engineering, hydrology, image processing, physics, market research, business and economic research, customer support, medicine, sociology, demography etc. This article highlights important aspects of fitting probability distributions to data and applying the analysis results to make informed decisions. There are a number of statistical methods available which can help us to select the best fitting model. Some of the graphs display both input data and fitted distributions at the same time, as probability density and cumulative distribution. The goodness of fit tests can be used to determine whether a certain distribution is a good fit. The main used idea is to measure the "distance" between the data and the tested distribution, and compare that distance to some threshold values. Calculating the goodness of fit statistics also enables us to order the fitted distributions accordingly to how good they fit to data. This particular feature is very helpful for comparing the fitted models. The paper presents a comparison of most commonly used goodness of fit tests as: Kolmogorov-Smirnov, Anderson-Darling, and Chi-Squared. A large set of data is analyzed and conclusions are drawn by visualizing the data, comparing multiple fitted distributions and selecting the best model. These graphs should be viewed as an addition to the goodness of fit tests.

  2. Reliability Implications in Wood Systems of a Bivariate Gaussian-Weibull Distribution and the Associated Univariate Pseudo-truncated Weibull

    Treesearch

    Steve P. Verrill; James W. Evans; David E. Kretschmann; Cherilyn A. Hatfield

    2014-01-01

    Two important wood properties are the modulus of elasticity (MOE) and the modulus of rupture (MOR). In the past, the statistical distribution of the MOE has often been modeled as Gaussian, and that of the MOR as lognormal or as a two- or three-parameter Weibull distribution. It is well known that MOE and MOR are positively correlated. To model the simultaneous behavior...

  3. Backward deletion to minimize prediction errors in models from factorial experiments with zero to six center points

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1980-01-01

    Population model coefficients were chosen to simulate a saturated 2 to the fourth power fixed effects experiment having an unfavorable distribution of relative values. Using random number studies, deletion strategies were compared that were based on the F distribution, on an order statistics distribution of Cochran's, and on a combination of the two. Results of the comparisons and a recommended strategy are given.

  4. Single photon counting linear mode avalanche photodiode technologies

    NASA Astrophysics Data System (ADS)

    Williams, George M.; Huntington, Andrew S.

    2011-10-01

    The false count rate of a single-photon-sensitive photoreceiver consisting of a high-gain, low-excess-noise linear-mode InGaAs avalanche photodiode (APD) and a high-bandwidth transimpedance amplifier (TIA) is fit to a statistical model. The peak height distribution of the APD's multiplied dark current is approximated by the weighted sum of McIntyre distributions, each characterizing dark current generated at a different location within the APD's junction. The peak height distribution approximated in this way is convolved with a Gaussian distribution representing the input-referred noise of the TIA to generate the statistical distribution of the uncorrelated sum. The cumulative distribution function (CDF) representing count probability as a function of detection threshold is computed, and the CDF model fit to empirical false count data. It is found that only k=0 McIntyre distributions fit the empirically measured CDF at high detection threshold, and that false count rate drops faster than photon count rate as detection threshold is raised. Once fit to empirical false count data, the model predicts the improvement of the false count rate to be expected from reductions in TIA noise and APD dark current. Improvement by at least three orders of magnitude is thought feasible with further manufacturing development and a capacitive-feedback TIA (CTIA).

  5. Precipitation Cluster Distributions: Current Climate Storm Statistics and Projected Changes Under Global Warming

    NASA Astrophysics Data System (ADS)

    Quinn, Kevin Martin

    The total amount of precipitation integrated across a precipitation cluster (contiguous precipitating grid cells exceeding a minimum rain rate) is a useful measure of the aggregate size of the disturbance, expressed as the rate of water mass lost or latent heat released, i.e. the power of the disturbance. Probability distributions of cluster power are examined during boreal summer (May-September) and winter (January-March) using satellite-retrieved rain rates from the Tropical Rainfall Measuring Mission (TRMM) 3B42 and Special Sensor Microwave Imager and Sounder (SSM/I and SSMIS) programs, model output from the High Resolution Atmospheric Model (HIRAM, roughly 0.25-0.5 0 resolution), seven 1-2° resolution members of the Coupled Model Intercomparison Project Phase 5 (CMIP5) experiment, and National Center for Atmospheric Research Large Ensemble (NCAR LENS). Spatial distributions of precipitation-weighted centroids are also investigated in observations (TRMM-3B42) and climate models during winter as a metric for changes in mid-latitude storm tracks. Observed probability distributions for both seasons are scale-free from the smallest clusters up to a cutoff scale at high cluster power, after which the probability density drops rapidly. When low rain rates are excluded by choosing a minimum rain rate threshold in defining clusters, the models accurately reproduce observed cluster power statistics and winter storm tracks. Changes in behavior in the tail of the distribution, above the cutoff, are important for impacts since these quantify the frequency of the most powerful storms. End-of-century cluster power distributions and storm track locations are investigated in these models under a "business as usual" global warming scenario. The probability of high cluster power events increases by end-of-century across all models, by up to an order of magnitude for the highest-power events for which statistics can be computed. For the three models in the suite with continuous time series of high resolution output, there is substantial variability on when these probability increases for the most powerful precipitation clusters become detectable, ranging from detectable within the observational period to statistically significant trends emerging only after 2050. A similar analysis of National Centers for Environmental Prediction (NCEP) Reanalysis 2 and SSM/I-SSMIS rain rate retrievals in the recent observational record does not yield reliable evidence of trends in high-power cluster probabilities at this time. Large impacts to mid-latitude storm tracks are projected over the West Coast and eastern North America, with no less than 8 of the 9 models examined showing large increases by end-of-century in the probability density of the most powerful storms, ranging up to a factor of 6.5 in the highest range bin for which historical statistics are computed. However, within these regional domains, there is considerable variation among models in pinpointing exactly where the largest increases will occur.

  6. Assessing Climate Change Impacts for DoD installations in the Southwest United States During the Warm Season

    DTIC Science & Technology

    2017-03-10

    20 4. Statistical analysis methods to characterize distributions and trends...duration precipitation diagram from convective- permitting simulations for Barry Goldwater Range, Arizona. ix Figure 60: Statistically ...Same as Fig. 60 for other DoD facilities in the Southwest as labeled. Figure 62: Statistically significant model ensemble changes in rainfall

  7. Statistical Analysis of Spectral Properties and Prosodic Parameters of Emotional Speech

    NASA Astrophysics Data System (ADS)

    Přibil, J.; Přibilová, A.

    2009-01-01

    The paper addresses reflection of microintonation and spectral properties in male and female acted emotional speech. Microintonation component of speech melody is analyzed regarding its spectral and statistical parameters. According to psychological research of emotional speech, different emotions are accompanied by different spectral noise. We control its amount by spectral flatness according to which the high frequency noise is mixed in voiced frames during cepstral speech synthesis. Our experiments are aimed at statistical analysis of cepstral coefficient values and ranges of spectral flatness in three emotions (joy, sadness, anger), and a neutral state for comparison. Calculated histograms of spectral flatness distribution are visually compared and modelled by Gamma probability distribution. Histograms of cepstral coefficient distribution are evaluated and compared using skewness and kurtosis. Achieved statistical results show good correlation comparing male and female voices for all emotional states portrayed by several Czech and Slovak professional actors.

  8. The Two-Dimensional Gabor Function Adapted to Natural Image Statistics: A Model of Simple-Cell Receptive Fields and Sparse Structure in Images.

    PubMed

    Loxley, P N

    2017-10-01

    The two-dimensional Gabor function is adapted to natural image statistics, leading to a tractable probabilistic generative model that can be used to model simple cell receptive field profiles, or generate basis functions for sparse coding applications. Learning is found to be most pronounced in three Gabor function parameters representing the size and spatial frequency of the two-dimensional Gabor function and characterized by a nonuniform probability distribution with heavy tails. All three parameters are found to be strongly correlated, resulting in a basis of multiscale Gabor functions with similar aspect ratios and size-dependent spatial frequencies. A key finding is that the distribution of receptive-field sizes is scale invariant over a wide range of values, so there is no characteristic receptive field size selected by natural image statistics. The Gabor function aspect ratio is found to be approximately conserved by the learning rules and is therefore not well determined by natural image statistics. This allows for three distinct solutions: a basis of Gabor functions with sharp orientation resolution at the expense of spatial-frequency resolution, a basis of Gabor functions with sharp spatial-frequency resolution at the expense of orientation resolution, or a basis with unit aspect ratio. Arbitrary mixtures of all three cases are also possible. Two parameters controlling the shape of the marginal distributions in a probabilistic generative model fully account for all three solutions. The best-performing probabilistic generative model for sparse coding applications is found to be a gaussian copula with Pareto marginal probability density functions.

  9. Entropy maximization under the constraints on the generalized Gini index and its application in modeling income distributions

    NASA Astrophysics Data System (ADS)

    Khosravi Tanak, A.; Mohtashami Borzadaran, G. R.; Ahmadi, J.

    2015-11-01

    In economics and social sciences, the inequality measures such as Gini index, Pietra index etc., are commonly used to measure the statistical dispersion. There is a generalization of Gini index which includes it as special case. In this paper, we use principle of maximum entropy to approximate the model of income distribution with a given mean and generalized Gini index. Many distributions have been used as descriptive models for the distribution of income. The most widely known of these models are the generalized beta of second kind and its subclass distributions. The obtained maximum entropy distributions are fitted to the US family total money income in 2009, 2011 and 2013 and their relative performances with respect to generalized beta of second kind family are compared.

  10. Rain rate duration statistics derived from the Mid-Atlantic coast rain gauge network

    NASA Technical Reports Server (NTRS)

    Goldhirsh, Julius

    1993-01-01

    A rain gauge network comprised of 10 tipping bucket rain gauges located in the Mid-Atlantic coast of the United States has been in continuous operation since June 1, 1986. Rain rate distributions and estimated slant path fade distributions at 20 GHz and 30 GHz covering the first five year period were derived from the gauge network measurements, and these results were described by Goldhirsh. In this effort, rain rate time duration statistics are presented. The rain duration statistics are of interest for better understanding the physical nature of precipitation and to present a data base which may be used by modelers to convert to slant path fade duration statistics. Such statistics are important for better assessing optimal coding procedures over defined bandwidths.

  11. Filter Tuning Using the Chi-Squared Statistic

    NASA Technical Reports Server (NTRS)

    Lilly-Salkowski, Tyler B.

    2017-01-01

    This paper examines the use of the Chi-square statistic as a means of evaluating filter performance. The goal of the process is to characterize the filter performance in the metric of covariance realism. The Chi-squared statistic is the value calculated to determine the realism of a covariance based on the prediction accuracy and the covariance values at a given point in time. Once calculated, it is the distribution of this statistic that provides insight on the accuracy of the covariance. The process of tuning an Extended Kalman Filter (EKF) for Aqua and Aura support is described, including examination of the measurement errors of available observation types, and methods of dealing with potentially volatile atmospheric drag modeling. Predictive accuracy and the distribution of the Chi-squared statistic, calculated from EKF solutions, are assessed.

  12. Record statistics of financial time series and geometric random walks

    NASA Astrophysics Data System (ADS)

    Sabir, Behlool; Santhanam, M. S.

    2014-09-01

    The study of record statistics of correlated series in physics, such as random walks, is gaining momentum, and several analytical results have been obtained in the past few years. In this work, we study the record statistics of correlated empirical data for which random walk models have relevance. We obtain results for the records statistics of select stock market data and the geometric random walk, primarily through simulations. We show that the distribution of the age of records is a power law with the exponent α lying in the range 1.5≤α≤1.8. Further, the longest record ages follow the Fréchet distribution of extreme value theory. The records statistics of geometric random walk series is in good agreement with that obtained from empirical stock data.

  13. Non-gaussianity versus nonlinearity of cosmological perturbations.

    PubMed

    Verde, L

    2001-06-01

    Following the discovery of the cosmic microwave background, the hot big-bang model has become the standard cosmological model. In this theory, small primordial fluctuations are subsequently amplified by gravity to form the large-scale structure seen today. Different theories for unified models of particle physics, lead to different predictions for the statistical properties of the primordial fluctuations, that can be divided in two classes: gaussian and non-gaussian. Convincing evidence against or for gaussian initial conditions would rule out many scenarios and point us toward a physical theory for the origin of structures. The statistical distribution of cosmological perturbations, as we observe them, can deviate from the gaussian distribution in several different ways. Even if perturbations start off gaussian, nonlinear gravitational evolution can introduce non-gaussian features. Additionally, our knowledge of the Universe comes principally from the study of luminous material such as galaxies, but galaxies might not be faithful tracers of the underlying mass distribution. The relationship between fluctuations in the mass and in the galaxies distribution (bias), is often assumed to be local, but could well be nonlinear. Moreover, galaxy catalogues use the redshift as third spatial coordinate: the resulting redshift-space map of the galaxy distribution is nonlinearly distorted by peculiar velocities. Nonlinear gravitational evolution, biasing, and redshift-space distortion introduce non-gaussianity, even in an initially gaussian fluctuation field. I investigate the statistical tools that allow us, in principle, to disentangle the above different effects, and the observational datasets we require to do so in practice.

  14. Twenty-five years of maximum-entropy principle

    NASA Astrophysics Data System (ADS)

    Kapur, J. N.

    1983-04-01

    The strengths and weaknesses of the maximum entropy principle (MEP) are examined and some challenging problems that remain outstanding at the end of the first quarter century of the principle are discussed. The original formalism of the MEP is presented and its relationship to statistical mechanics is set forth. The use of MEP for characterizing statistical distributions, in statistical inference, nonlinear spectral analysis, transportation models, population density models, models for brand-switching in marketing and vote-switching in elections is discussed. Its application to finance, insurance, image reconstruction, pattern recognition, operations research and engineering, biology and medicine, and nonparametric density estimation is considered.

  15. Evaluation Of Statistical Models For Forecast Errors From The HBV-Model

    NASA Astrophysics Data System (ADS)

    Engeland, K.; Kolberg, S.; Renard, B.; Stensland, I.

    2009-04-01

    Three statistical models for the forecast errors for inflow to the Langvatn reservoir in Northern Norway have been constructed and tested according to how well the distribution and median values of the forecasts errors fit to the observations. For the first model observed and forecasted inflows were transformed by the Box-Cox transformation before a first order autoregressive model was constructed for the forecast errors. The parameters were conditioned on climatic conditions. In the second model the Normal Quantile Transformation (NQT) was applied on observed and forecasted inflows before a similar first order autoregressive model was constructed for the forecast errors. For the last model positive and negative errors were modeled separately. The errors were first NQT-transformed before a model where the mean values were conditioned on climate, forecasted inflow and yesterday's error. To test the three models we applied three criterions: We wanted a) the median values to be close to the observed values; b) the forecast intervals to be narrow; c) the distribution to be correct. The results showed that it is difficult to obtain a correct model for the forecast errors, and that the main challenge is to account for the auto-correlation in the errors. Model 1 and 2 gave similar results, and the main drawback is that the distributions are not correct. The 95% forecast intervals were well identified, but smaller forecast intervals were over-estimated, and larger intervals were under-estimated. Model 3 gave a distribution that fits better, but the median values do not fit well since the auto-correlation is not properly accounted for. If the 95% forecast interval is of interest, Model 2 is recommended. If the whole distribution is of interest, Model 3 is recommended.

  16. Nonlinear GARCH model and 1 / f noise

    NASA Astrophysics Data System (ADS)

    Kononovicius, A.; Ruseckas, J.

    2015-06-01

    Auto-regressive conditionally heteroskedastic (ARCH) family models are still used, by practitioners in business and economic policy making, as a conditional volatility forecasting models. Furthermore ARCH models still are attracting an interest of the researchers. In this contribution we consider the well known GARCH(1,1) process and its nonlinear modifications, reminiscent of NGARCH model. We investigate the possibility to reproduce power law statistics, probability density function and power spectral density, using ARCH family models. For this purpose we derive stochastic differential equations from the GARCH processes in consideration. We find the obtained equations to be similar to a general class of stochastic differential equations known to reproduce power law statistics. We show that linear GARCH(1,1) process has power law distribution, but its power spectral density is Brownian noise-like. However, the nonlinear modifications exhibit both power law distribution and power spectral density of the 1 /fβ form, including 1 / f noise.

  17. A new statistical method for characterizing the atmospheres of extrasolar planets

    NASA Astrophysics Data System (ADS)

    Henderson, Cassandra S.; Skemer, Andrew J.; Morley, Caroline V.; Fortney, Jonathan J.

    2017-10-01

    By detecting light from extrasolar planets, we can measure their compositions and bulk physical properties. The technologies used to make these measurements are still in their infancy, and a lack of self-consistency suggests that previous observations have underestimated their systemic errors. We demonstrate a statistical method, newly applied to exoplanet characterization, which uses a Bayesian formalism to account for underestimated errorbars. We use this method to compare photometry of a substellar companion, GJ 758b, with custom atmospheric models. Our method produces a probability distribution of atmospheric model parameters including temperature, gravity, cloud model (fsed) and chemical abundance for GJ 758b. This distribution is less sensitive to highly variant data and appropriately reflects a greater uncertainty on parameter fits.

  18. Non-linear learning in online tutorial to enhance students’ knowledge on normal distribution application topic

    NASA Astrophysics Data System (ADS)

    Kartono; Suryadi, D.; Herman, T.

    2018-01-01

    This study aimed to analyze the enhancement of non-linear learning (NLL) in the online tutorial (OT) content to students’ knowledge of normal distribution application (KONDA). KONDA is a competence expected to be achieved after students studied the topic of normal distribution application in the course named Education Statistics. The analysis was performed by quasi-experiment study design. The subject of the study was divided into an experimental class that was given OT content in NLL model and a control class which was given OT content in conventional learning (CL) model. Data used in this study were the results of online objective tests to measure students’ statistical prior knowledge (SPK) and students’ pre- and post-test of KONDA. The statistical analysis test of a gain score of KONDA of students who had low and moderate SPK’s scores showed students’ KONDA who learn OT content with NLL model was better than students’ KONDA who learn OT content with CL model. Meanwhile, for students who had high SPK’s scores, the gain score of students who learn OT content with NLL model had relatively similar with the gain score of students who learn OT content with CL model. Based on those findings it could be concluded that the NLL model applied to OT content could enhance KONDA of students in low and moderate SPK’s levels. Extra and more challenging didactical situation was needed for students in high SPK’s level to achieve the significant gain score.

  19. Relative mass distributions of neutron-rich thermally fissile nuclei within a statistical model

    NASA Astrophysics Data System (ADS)

    Kumar, Bharat; Kannan, M. T. Senthil; Balasubramaniam, M.; Agrawal, B. K.; Patra, S. K.

    2017-09-01

    We study the binary mass distribution for the recently predicted thermally fissile neutron-rich uranium and thorium nuclei using a statistical model. The level density parameters needed for the study are evaluated from the excitation energies of the temperature-dependent relativistic mean field formalism. The excitation energy and the level density parameter for a given temperature are employed in the convolution integral method to obtain the probability of the particular fragmentation. As representative cases, we present the results for the binary yields of 250U and 254Th. The relative yields are presented for three different temperatures: T =1 , 2, and 3 MeV.

  20. Rain attenuation measurements: Variability and data quality assessment

    NASA Technical Reports Server (NTRS)

    Crane, Robert K.

    1989-01-01

    Year to year variations in the cumulative distributions of rain rate or rain attenuation are evident in any of the published measurements for a single propagation path that span a period of several years of observation. These variations must be described by models for the prediction of rain attenuation statistics. Now that a large measurement data base has been assembled by the International Radio Consultative Committee, the information needed to assess variability is available. On the basis of 252 sample cumulative distribution functions for the occurrence of attenuation by rain, the expected year to year variation in attenuation at a fixed probability level in the 0.1 to 0.001 percent of a year range is estimated to be 27 percent. The expected deviation from an attenuation model prediction for a single year of observations is estimated to exceed 33 percent when any of the available global rain climate model are employed to estimate the rain rate statistics. The probability distribution for the variation in attenuation or rain rate at a fixed fraction of a year is lognormal. The lognormal behavior of the variate was used to compile the statistics for variability.

  1. The study of combining Latin Hypercube Sampling method and LU decomposition method (LULHS method) for constructing spatial random field

    NASA Astrophysics Data System (ADS)

    WANG, P. T.

    2015-12-01

    Groundwater modeling requires to assign hydrogeological properties to every numerical grid. Due to the lack of detailed information and the inherent spatial heterogeneity, geological properties can be treated as random variables. Hydrogeological property is assumed to be a multivariate distribution with spatial correlations. By sampling random numbers from a given statistical distribution and assigning a value to each grid, a random field for modeling can be completed. Therefore, statistics sampling plays an important role in the efficiency of modeling procedure. Latin Hypercube Sampling (LHS) is a stratified random sampling procedure that provides an efficient way to sample variables from their multivariate distributions. This study combines the the stratified random procedure from LHS and the simulation by using LU decomposition to form LULHS. Both conditional and unconditional simulations of LULHS were develpoed. The simulation efficiency and spatial correlation of LULHS are compared to the other three different simulation methods. The results show that for the conditional simulation and unconditional simulation, LULHS method is more efficient in terms of computational effort. Less realizations are required to achieve the required statistical accuracy and spatial correlation.

  2. The possibilities of using scale-selective polarization cartography in diagnostics of myocardium pathologies

    NASA Astrophysics Data System (ADS)

    Ushenko, Yu. A.; Wanchuliak, O. Y.

    2013-06-01

    The optical model of polycrystalline networks of myocardium protein fibrils is presented. The technique of determining the coordinate distribution of polarization azimuth of the points of laser images of myocardium histological sections is suggested. The results of investigating the interrelation between the values of statistical (statistical moments of the 1st-4th order) parameters are presented which characterize distributions of wavelet-coefficients polarization maps of myocardium layers and death reasons.

  3. Simulating Univariate and Multivariate Burr Type IIII and Type XII Distributions through the Method of L-Moments

    ERIC Educational Resources Information Center

    Pant, Mohan Dev

    2011-01-01

    The Burr families (Type III and Type XII) of distributions are traditionally used in the context of statistical modeling and for simulating non-normal distributions with moment-based parameters (e.g., Skew and Kurtosis). In educational and psychological studies, the Burr families of distributions can be used to simulate extremely asymmetrical and…

  4. Time interval between successive trading in foreign currency market: from microscopic to macroscopic

    NASA Astrophysics Data System (ADS)

    Sato, Aki-Hiro

    2004-12-01

    Recently, it has been shown that inter-transaction interval (ITI) distribution of foreign currency rates has a fat tail. In order to understand the statistical property of the ITI dealer model with N interactive agents is proposed. From numerical simulations it is confirmed that the ITI distribution of the dealer model has a power law tail. The random multiplicative process (RMP) can be approximately derived from the ITI of the dealer model. Consequently, we conclude that the power law tail of the ITI distribution of the dealer model is a result of the RMP.

  5. Using independent component analysis for electrical impedance tomography

    NASA Astrophysics Data System (ADS)

    Yan, Peimin; Mo, Yulong

    2004-05-01

    Independent component analysis (ICA) is a way to resolve signals into independent components based on the statistical characteristics of the signals. It is a method for factoring probability densities of measured signals into a set of densities that are as statistically independent as possible under the assumptions of a linear model. Electrical impedance tomography (EIT) is used to detect variations of the electric conductivity of the human body. Because there are variations of the conductivity distributions inside the body, EIT presents multi-channel data. In order to get all information contained in different location of tissue it is necessary to image the individual conductivity distribution. In this paper we consider to apply ICA to EIT on the signal subspace (individual conductivity distribution). Using ICA the signal subspace will then be decomposed into statistically independent components. The individual conductivity distribution can be reconstructed by the sensitivity theorem in this paper. Compute simulations show that the full information contained in the multi-conductivity distribution will be obtained by this method.

  6. Numerical solutions of ideal quantum gas dynamical flows governed by semiclassical ellipsoidal-statistical distribution.

    PubMed

    Yang, Jaw-Yen; Yan, Chih-Yuan; Diaz, Manuel; Huang, Juan-Chen; Li, Zhihui; Zhang, Hanxin

    2014-01-08

    The ideal quantum gas dynamics as manifested by the semiclassical ellipsoidal-statistical (ES) equilibrium distribution derived in Wu et al. (Wu et al . 2012 Proc. R. Soc. A 468 , 1799-1823 (doi:10.1098/rspa.2011.0673)) is numerically studied for particles of three statistics. This anisotropic ES equilibrium distribution was derived using the maximum entropy principle and conserves the mass, momentum and energy, but differs from the standard Fermi-Dirac or Bose-Einstein distribution. The present numerical method combines the discrete velocity (or momentum) ordinate method in momentum space and the high-resolution shock-capturing method in physical space. A decoding procedure to obtain the necessary parameters for determining the ES distribution is also devised. Computations of two-dimensional Riemann problems are presented, and various contours of the quantities unique to this ES model are illustrated. The main flow features, such as shock waves, expansion waves and slip lines and their complex nonlinear interactions, are depicted and found to be consistent with existing calculations for a classical gas.

  7. Numerical details and SAS programs for parameter recovery of the SB distribution

    Treesearch

    Bernard R. Parresol; Teresa Fidalgo Fonseca; Carlos Pacheco Marques

    2010-01-01

    The four-parameter SB distribution has seen widespread use in growth-and-yield modeling because it covers a broad spectrum of shapes, fitting both positively and negatively skewed data and bimodal configurations. Two recent parameter recovery schemes, an approach whereby characteristics of a statistical distribution are equated with attributes of...

  8. A note on the misuses of the variance test in meteorological studies

    NASA Astrophysics Data System (ADS)

    Hazra, Arnab; Bhattacharya, Sourabh; Banik, Pabitra; Bhattacharya, Sabyasachi

    2017-12-01

    Stochastic modeling of rainfall data is an important area in meteorology. The gamma distribution is a widely used probability model for non-zero rainfall. Typically the choice of the distribution for such meteorological studies is based on two goodness-of-fit tests—the Pearson's Chi-square test and the Kolmogorov-Smirnov test. Inspired by the index of dispersion introduced by Fisher (Statistical methods for research workers. Hafner Publishing Company Inc., New York, 1925), Mooley (Mon Weather Rev 101:160-176, 1973) proposed the variance test as a goodness-of-fit measure in this context and a number of researchers have implemented it since then. We show that the asymptotic distribution of the test statistic for the variance test is generally not comparable to any central Chi-square distribution and hence the test is erroneous. We also describe a method for checking the validity of the asymptotic distribution for a class of distributions. We implement the erroneous test on some simulated, as well as real datasets and demonstrate how it leads to some wrong conclusions.

  9. Statistical analysis and modeling of intermittent transport events in the tokamak scrape-off layer

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Johan, E-mail: anderson.johan@gmail.com; Halpern, Federico D.; Ricci, Paolo

    The turbulence observed in the scrape-off-layer of a tokamak is often characterized by intermittent events of bursty nature, a feature which raises concerns about the prediction of heat loads on the physical boundaries of the device. It appears thus necessary to delve into the statistical properties of turbulent physical fields such as density, electrostatic potential, and temperature, focusing on the mathematical expression of tails of the probability distribution functions. The method followed here is to generate statistical information from time-traces of the plasma density stemming from Braginskii-type fluid simulations and check this against a first-principles theoretical model. The analysis ofmore » the numerical simulations indicates that the probability distribution function of the intermittent process contains strong exponential tails, as predicted by the analytical theory.« less

  10. Statistical ensembles for money and debt

    NASA Astrophysics Data System (ADS)

    Viaggiu, Stefano; Lionetto, Andrea; Bargigli, Leonardo; Longo, Michele

    2012-10-01

    We build a statistical ensemble representation of two economic models describing respectively, in simplified terms, a payment system and a credit market. To this purpose we adopt the Boltzmann-Gibbs distribution where the role of the Hamiltonian is taken by the total money supply (i.e. including money created from debt) of a set of interacting economic agents. As a result, we can read the main thermodynamic quantities in terms of monetary ones. In particular, we define for the credit market model a work term which is related to the impact of monetary policy on credit creation. Furthermore, with our formalism we recover and extend some results concerning the temperature of an economic system, previously presented in the literature by considering only the monetary base as a conserved quantity. Finally, we study the statistical ensemble for the Pareto distribution.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Marekova, Elisaveta

    Series of relatively large earthquakes in different regions of the Earth are studied. The regions chooses are of a high seismic activity and has a good contemporary network for recording of the seismic events along them. The main purpose of this investigation is the attempt to describe analytically the seismic process in the space and time. We are considering the statistical distributions the distances and the times between consecutive earthquakes (so called pair analysis). Studies conducted on approximating the statistical distribution of the parameters of consecutive seismic events indicate the existence of characteristic functions that describe them best. Such amore » mathematical description allows the distributions of the examined parameters to be compared to other model distributions.« less

  12. On the cause of the non-Gaussian distribution of residuals in geomagnetism

    NASA Astrophysics Data System (ADS)

    Hulot, G.; Khokhlov, A.

    2017-12-01

    To describe errors in the data, Gaussian distributions naturally come to mind. In many practical instances, indeed, Gaussian distributions are appropriate. In the broad field of geomagnetism, however, it has repeatedly been noted that residuals between data and models often display much sharper distributions, sometimes better described by a Laplace distribution. In the present study, we make the case that such non-Gaussian behaviors are very likely the result of what is known as mixture of distributions in the statistical literature. Mixtures arise as soon as the data do not follow a common distribution or are not properly normalized, the resulting global distribution being a mix of the various distributions followed by subsets of the data, or even individual datum. We provide examples of the way such mixtures can lead to distributions that are much sharper than Gaussian distributions and discuss the reasons why such mixtures are likely the cause of the non-Gaussian distributions observed in geomagnetism. We also show that when properly selecting sub-datasets based on geophysical criteria, statistical mixture can sometimes be avoided and much more Gaussian behaviors recovered. We conclude with some general recommendations and point out that although statistical mixture always tends to sharpen the resulting distribution, it does not necessarily lead to a Laplacian distribution. This needs to be taken into account when dealing with such non-Gaussian distributions.

  13. Response statistics of rotating shaft with non-linear elastic restoring forces by path integration

    NASA Astrophysics Data System (ADS)

    Gaidai, Oleg; Naess, Arvid; Dimentberg, Michael

    2017-07-01

    Extreme statistics of random vibrations is studied for a Jeffcott rotor under uniaxial white noise excitation. Restoring force is modelled as elastic non-linear; comparison is done with linearized restoring force to see the force non-linearity effect on the response statistics. While for the linear model analytical solutions and stability conditions are available, it is not generally the case for non-linear system except for some special cases. The statistics of non-linear case is studied by applying path integration (PI) method, which is based on the Markov property of the coupled dynamic system. The Jeffcott rotor response statistics can be obtained by solving the Fokker-Planck (FP) equation of the 4D dynamic system. An efficient implementation of PI algorithm is applied, namely fast Fourier transform (FFT) is used to simulate dynamic system additive noise. The latter allows significantly reduce computational time, compared to the classical PI. Excitation is modelled as Gaussian white noise, however any kind distributed white noise can be implemented with the same PI technique. Also multidirectional Markov noise can be modelled with PI in the same way as unidirectional. PI is accelerated by using Monte Carlo (MC) estimated joint probability density function (PDF) as initial input. Symmetry of dynamic system was utilized to afford higher mesh resolution. Both internal (rotating) and external damping are included in mechanical model of the rotor. The main advantage of using PI rather than MC is that PI offers high accuracy in the probability distribution tail. The latter is of critical importance for e.g. extreme value statistics, system reliability, and first passage probability.

  14. Statistical procedures for evaluating daily and monthly hydrologic model predictions

    USGS Publications Warehouse

    Coffey, M.E.; Workman, S.R.; Taraba, J.L.; Fogle, A.W.

    2004-01-01

    The overall study objective was to evaluate the applicability of different qualitative and quantitative methods for comparing daily and monthly SWAT computer model hydrologic streamflow predictions to observed data, and to recommend statistical methods for use in future model evaluations. Statistical methods were tested using daily streamflows and monthly equivalent runoff depths. The statistical techniques included linear regression, Nash-Sutcliffe efficiency, nonparametric tests, t-test, objective functions, autocorrelation, and cross-correlation. None of the methods specifically applied to the non-normal distribution and dependence between data points for the daily predicted and observed data. Of the tested methods, median objective functions, sign test, autocorrelation, and cross-correlation were most applicable for the daily data. The robust coefficient of determination (CD*) and robust modeling efficiency (EF*) objective functions were the preferred methods for daily model results due to the ease of comparing these values with a fixed ideal reference value of one. Predicted and observed monthly totals were more normally distributed, and there was less dependence between individual monthly totals than was observed for the corresponding predicted and observed daily values. More statistical methods were available for comparing SWAT model-predicted and observed monthly totals. The 1995 monthly SWAT model predictions and observed data had a regression Rr2 of 0.70, a Nash-Sutcliffe efficiency of 0.41, and the t-test failed to reject the equal data means hypothesis. The Nash-Sutcliffe coefficient and the R r2 coefficient were the preferred methods for monthly results due to the ability to compare these coefficients to a set ideal value of one.

  15. Boosting Bayesian parameter inference of nonlinear stochastic differential equation models by Hamiltonian scale separation.

    PubMed

    Albert, Carlo; Ulzega, Simone; Stoop, Ruedi

    2016-04-01

    Parameter inference is a fundamental problem in data-driven modeling. Given observed data that is believed to be a realization of some parameterized model, the aim is to find parameter values that are able to explain the observed data. In many situations, the dominant sources of uncertainty must be included into the model for making reliable predictions. This naturally leads to stochastic models. Stochastic models render parameter inference much harder, as the aim then is to find a distribution of likely parameter values. In Bayesian statistics, which is a consistent framework for data-driven learning, this so-called posterior distribution can be used to make probabilistic predictions. We propose a novel, exact, and very efficient approach for generating posterior parameter distributions for stochastic differential equation models calibrated to measured time series. The algorithm is inspired by reinterpreting the posterior distribution as a statistical mechanics partition function of an object akin to a polymer, where the measurements are mapped on heavier beads compared to those of the simulated data. To arrive at distribution samples, we employ a Hamiltonian Monte Carlo approach combined with a multiple time-scale integration. A separation of time scales naturally arises if either the number of measurement points or the number of simulation points becomes large. Furthermore, at least for one-dimensional problems, we can decouple the harmonic modes between measurement points and solve the fastest part of their dynamics analytically. Our approach is applicable to a wide range of inference problems and is highly parallelizable.

  16. Human X-chromosome inactivation pattern distributions fit a model of genetically influenced choice better than models of completely random choice

    PubMed Central

    Renault, Nisa K E; Pritchett, Sonja M; Howell, Robin E; Greer, Wenda L; Sapienza, Carmen; Ørstavik, Karen Helene; Hamilton, David C

    2013-01-01

    In eutherian mammals, one X-chromosome in every XX somatic cell is transcriptionally silenced through the process of X-chromosome inactivation (XCI). Females are thus functional mosaics, where some cells express genes from the paternal X, and the others from the maternal X. The relative abundance of the two cell populations (X-inactivation pattern, XIP) can have significant medical implications for some females. In mice, the ‘choice' of which X to inactivate, maternal or paternal, in each cell of the early embryo is genetically influenced. In humans, the timing of XCI choice and whether choice occurs completely randomly or under a genetic influence is debated. Here, we explore these questions by analysing the distribution of XIPs in large populations of normal females. Models were generated to predict XIP distributions resulting from completely random or genetically influenced choice. Each model describes the discrete primary distribution at the onset of XCI, and the continuous secondary distribution accounting for changes to the XIP as a result of development and ageing. Statistical methods are used to compare models with empirical data from Danish and Utah populations. A rigorous data treatment strategy maximises information content and allows for unbiased use of unphased XIP data. The Anderson–Darling goodness-of-fit statistics and likelihood ratio tests indicate that a model of genetically influenced XCI choice better fits the empirical data than models of completely random choice. PMID:23652377

  17. Determination of errors in derived magnetic field directions in geosynchronous orbit: results from a statistical approach

    NASA Astrophysics Data System (ADS)

    Chen, Yue; Cunningham, Gregory; Henderson, Michael

    2016-09-01

    This study aims to statistically estimate the errors in local magnetic field directions that are derived from electron directional distributions measured by Los Alamos National Laboratory geosynchronous (LANL GEO) satellites. First, by comparing derived and measured magnetic field directions along the GEO orbit to those calculated from three selected empirical global magnetic field models (including a static Olson and Pfitzer 1977 quiet magnetic field model, a simple dynamic Tsyganenko 1989 model, and a sophisticated dynamic Tsyganenko 2001 storm model), it is shown that the errors in both derived and modeled directions are at least comparable. Second, using a newly developed proxy method as well as comparing results from empirical models, we are able to provide for the first time circumstantial evidence showing that derived magnetic field directions should statistically match the real magnetic directions better, with averaged errors < ˜ 2°, than those from the three empirical models with averaged errors > ˜ 5°. In addition, our results suggest that the errors in derived magnetic field directions do not depend much on magnetospheric activity, in contrast to the empirical field models. Finally, as applications of the above conclusions, we show examples of electron pitch angle distributions observed by LANL GEO and also take the derived magnetic field directions as the real ones so as to test the performance of empirical field models along the GEO orbits, with results suggesting dependence on solar cycles as well as satellite locations. This study demonstrates the validity and value of the method that infers local magnetic field directions from particle spin-resolved distributions.

  18. Determination of errors in derived magnetic field directions in geosynchronous orbit: results from a statistical approach

    DOE PAGES

    Chen, Yue; Cunningham, Gregory; Henderson, Michael

    2016-09-21

    Our study aims to statistically estimate the errors in local magnetic field directions that are derived from electron directional distributions measured by Los Alamos National Laboratory geosynchronous (LANL GEO) satellites. First, by comparing derived and measured magnetic field directions along the GEO orbit to those calculated from three selected empirical global magnetic field models (including a static Olson and Pfitzer 1977 quiet magnetic field model, a simple dynamic Tsyganenko 1989 model, and a sophisticated dynamic Tsyganenko 2001 storm model), it is shown that the errors in both derived and modeled directions are at least comparable. Furthermore, using a newly developedmore » proxy method as well as comparing results from empirical models, we are able to provide for the first time circumstantial evidence showing that derived magnetic field directions should statistically match the real magnetic directions better, with averaged errors < ~2°, than those from the three empirical models with averaged errors > ~5°. In addition, our results suggest that the errors in derived magnetic field directions do not depend much on magnetospheric activity, in contrast to the empirical field models. Finally, as applications of the above conclusions, we show examples of electron pitch angle distributions observed by LANL GEO and also take the derived magnetic field directions as the real ones so as to test the performance of empirical field models along the GEO orbits, with results suggesting dependence on solar cycles as well as satellite locations. Finally, this study demonstrates the validity and value of the method that infers local magnetic field directions from particle spin-resolved distributions.« less

  19. Determination of errors in derived magnetic field directions in geosynchronous orbit: results from a statistical approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Chen, Yue; Cunningham, Gregory; Henderson, Michael

    Our study aims to statistically estimate the errors in local magnetic field directions that are derived from electron directional distributions measured by Los Alamos National Laboratory geosynchronous (LANL GEO) satellites. First, by comparing derived and measured magnetic field directions along the GEO orbit to those calculated from three selected empirical global magnetic field models (including a static Olson and Pfitzer 1977 quiet magnetic field model, a simple dynamic Tsyganenko 1989 model, and a sophisticated dynamic Tsyganenko 2001 storm model), it is shown that the errors in both derived and modeled directions are at least comparable. Furthermore, using a newly developedmore » proxy method as well as comparing results from empirical models, we are able to provide for the first time circumstantial evidence showing that derived magnetic field directions should statistically match the real magnetic directions better, with averaged errors < ~2°, than those from the three empirical models with averaged errors > ~5°. In addition, our results suggest that the errors in derived magnetic field directions do not depend much on magnetospheric activity, in contrast to the empirical field models. Finally, as applications of the above conclusions, we show examples of electron pitch angle distributions observed by LANL GEO and also take the derived magnetic field directions as the real ones so as to test the performance of empirical field models along the GEO orbits, with results suggesting dependence on solar cycles as well as satellite locations. Finally, this study demonstrates the validity and value of the method that infers local magnetic field directions from particle spin-resolved distributions.« less

  20. Evolving Scale-Free Networks by Poisson Process: Modeling and Degree Distribution.

    PubMed

    Feng, Minyu; Qu, Hong; Yi, Zhang; Xie, Xiurui; Kurths, Jurgen

    2016-05-01

    Since the great mathematician Leonhard Euler initiated the study of graph theory, the network has been one of the most significant research subject in multidisciplinary. In recent years, the proposition of the small-world and scale-free properties of complex networks in statistical physics made the network science intriguing again for many researchers. One of the challenges of the network science is to propose rational models for complex networks. In this paper, in order to reveal the influence of the vertex generating mechanism of complex networks, we propose three novel models based on the homogeneous Poisson, nonhomogeneous Poisson and birth death process, respectively, which can be regarded as typical scale-free networks and utilized to simulate practical networks. The degree distribution and exponent are analyzed and explained in mathematics by different approaches. In the simulation, we display the modeling process, the degree distribution of empirical data by statistical methods, and reliability of proposed networks, results show our models follow the features of typical complex networks. Finally, some future challenges for complex systems are discussed.

  1. Distribution of Chinese names

    NASA Astrophysics Data System (ADS)

    Huang, Ding-wei

    2013-03-01

    We present a statistical model for the distribution of Chinese names. Both family names and given names are studied on the same basis. With naive expectation, the distribution of family names can be very different from that of given names. One is affected mostly by genealogy, while the other can be dominated by cultural effects. However, we find that both distributions can be well described by the same model. Various scaling behaviors can be understood as a result of stochastic processes. The exponents of different power-law distributions are controlled by a single parameter. We also comment on the significance of full-name repetition in Chinese population.

  2. Empirical research on complex networks modeling of combat SoS based on data from real war-game, Part I: Statistical characteristics

    NASA Astrophysics Data System (ADS)

    Chen, Lei; Kou, Yingxin; Li, Zhanwu; Xu, An; Wu, Cheng

    2018-01-01

    We build a complex networks model of combat System-of-Systems (SoS) based on empirical data from a real war-game, this model is a combination of command & control (C2) subnetwork, sensors subnetwork, influencers subnetwork and logistical support subnetwork, each subnetwork has idiographic components and statistical characteristics. The C2 subnetwork is the core of whole combat SoS, it has a hierarchical structure with no modularity, of which robustness is strong enough to maintain normal operation after any two nodes is destroyed; the sensors subnetwork and influencers subnetwork are like sense organ and limbs of whole combat SoS, they are both flat modular networks of which degree distribution obey GEV distribution and power-law distribution respectively. The communication network is the combination of all subnetworks, it is an assortative Small-World network with core-periphery structure, the Intelligence & Communication Stations/Command Center integrated with C2 nodes in the first three level act as the hub nodes in communication network, and all the fourth-level C2 nodes, sensors, influencers and logistical support nodes have communication capability, they act as the periphery nodes in communication network, its degree distribution obeys exponential distribution in the beginning, Gaussian distribution in the middle, and power-law distribution in the end, and its path length obeys GEV distribution. The betweenness centrality distribution, closeness centrality distribution and eigenvector centrality are also been analyzed to measure the vulnerability of nodes.

  3. WebDISCO: a web service for distributed cox model learning without patient-level data sharing.

    PubMed

    Lu, Chia-Lun; Wang, Shuang; Ji, Zhanglong; Wu, Yuan; Xiong, Li; Jiang, Xiaoqian; Ohno-Machado, Lucila

    2015-11-01

    The Cox proportional hazards model is a widely used method for analyzing survival data. To achieve sufficient statistical power in a survival analysis, it usually requires a large amount of data. Data sharing across institutions could be a potential workaround for providing this added power. The authors develop a web service for distributed Cox model learning (WebDISCO), which focuses on the proof-of-concept and algorithm development for federated survival analysis. The sensitive patient-level data can be processed locally and only the less-sensitive intermediate statistics are exchanged to build a global Cox model. Mathematical derivation shows that the proposed distributed algorithm is identical to the centralized Cox model. The authors evaluated the proposed framework at the University of California, San Diego (UCSD), Emory, and Duke. The experimental results show that both distributed and centralized models result in near-identical model coefficients with differences in the range [Formula: see text] to [Formula: see text]. The results confirm the mathematical derivation and show that the implementation of the distributed model can achieve the same results as the centralized implementation. The proposed method serves as a proof of concept, in which a publicly available dataset was used to evaluate the performance. The authors do not intend to suggest that this method can resolve policy and engineering issues related to the federated use of institutional data, but they should serve as evidence of the technical feasibility of the proposed approach.Conclusions WebDISCO (Web-based Distributed Cox Regression Model; https://webdisco.ucsd-dbmi.org:8443/cox/) provides a proof-of-concept web service that implements a distributed algorithm to conduct distributed survival analysis without sharing patient level data. © The Author 2015. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  4. On the statistical distribution in a deformed solid

    NASA Astrophysics Data System (ADS)

    Gorobei, N. N.; Luk'yanenko, A. S.

    2017-09-01

    A modification of the Gibbs distribution in a thermally insulated mechanically deformed solid, where its linear dimensions (shape parameters) are excluded from statistical averaging and included among the macroscopic parameters of state alongside with the temperature, is proposed. Formally, this modification is reduced to corresponding additional conditions when calculating the statistical sum. The shape parameters and the temperature themselves are found from the conditions of mechanical and thermal equilibria of a body, and their change is determined using the first law of thermodynamics. Known thermodynamic phenomena are analyzed for the simple model of a solid, i.e., an ensemble of anharmonic oscillators, within the proposed formalism with an accuracy of up to the first order by the anharmonicity constant. The distribution modification is considered for the classic and quantum temperature regions apart.

  5. Study of Automobile Market Dynamics : Volume 2. Analysis.

    DOT National Transportation Integrated Search

    1977-08-01

    Volume II describes the work in providing statistical inputs to a computer model by examining the effects of various options on the number of automobiles sold; the distribution of sales among small, medium and large cars; the distribution between aut...

  6. Simple Statistics: - Summarized!

    ERIC Educational Resources Information Center

    Blai, Boris, Jr.

    Statistics are an essential tool for making proper judgement decisions. It is concerned with probability distribution models, testing of hypotheses, significance tests and other means of determining the correctness of deductions and the most likely outcome of decisions. Measures of central tendency include the mean, median and mode. A second…

  7. Alternative Statistical Frameworks for Student Growth Percentile Estimation

    ERIC Educational Resources Information Center

    Lockwood, J. R.; Castellano, Katherine E.

    2015-01-01

    This article suggests two alternative statistical approaches for estimating student growth percentiles (SGP). The first is to estimate percentile ranks of current test scores conditional on past test scores directly, by modeling the conditional cumulative distribution functions, rather than indirectly through quantile regressions. This would…

  8. Reliable and More Powerful Methods for Power Analysis in Structural Equation Modeling

    ERIC Educational Resources Information Center

    Yuan, Ke-Hai; Zhang, Zhiyong; Zhao, Yanyun

    2017-01-01

    The normal-distribution-based likelihood ratio statistic T[subscript ml] = nF[subscript ml] is widely used for power analysis in structural Equation modeling (SEM). In such an analysis, power and sample size are computed by assuming that T[subscript ml] follows a central chi-square distribution under H[subscript 0] and a noncentral chi-square…

  9. A fast elitism Gaussian estimation of distribution algorithm and application for PID optimization.

    PubMed

    Xu, Qingyang; Zhang, Chengjin; Zhang, Li

    2014-01-01

    Estimation of distribution algorithm (EDA) is an intelligent optimization algorithm based on the probability statistics theory. A fast elitism Gaussian estimation of distribution algorithm (FEGEDA) is proposed in this paper. The Gaussian probability model is used to model the solution distribution. The parameters of Gaussian come from the statistical information of the best individuals by fast learning rule. A fast learning rule is used to enhance the efficiency of the algorithm, and an elitism strategy is used to maintain the convergent performance. The performances of the algorithm are examined based upon several benchmarks. In the simulations, a one-dimensional benchmark is used to visualize the optimization process and probability model learning process during the evolution, and several two-dimensional and higher dimensional benchmarks are used to testify the performance of FEGEDA. The experimental results indicate the capability of FEGEDA, especially in the higher dimensional problems, and the FEGEDA exhibits a better performance than some other algorithms and EDAs. Finally, FEGEDA is used in PID controller optimization of PMSM and compared with the classical-PID and GA.

  10. A Fast Elitism Gaussian Estimation of Distribution Algorithm and Application for PID Optimization

    PubMed Central

    Xu, Qingyang; Zhang, Chengjin; Zhang, Li

    2014-01-01

    Estimation of distribution algorithm (EDA) is an intelligent optimization algorithm based on the probability statistics theory. A fast elitism Gaussian estimation of distribution algorithm (FEGEDA) is proposed in this paper. The Gaussian probability model is used to model the solution distribution. The parameters of Gaussian come from the statistical information of the best individuals by fast learning rule. A fast learning rule is used to enhance the efficiency of the algorithm, and an elitism strategy is used to maintain the convergent performance. The performances of the algorithm are examined based upon several benchmarks. In the simulations, a one-dimensional benchmark is used to visualize the optimization process and probability model learning process during the evolution, and several two-dimensional and higher dimensional benchmarks are used to testify the performance of FEGEDA. The experimental results indicate the capability of FEGEDA, especially in the higher dimensional problems, and the FEGEDA exhibits a better performance than some other algorithms and EDAs. Finally, FEGEDA is used in PID controller optimization of PMSM and compared with the classical-PID and GA. PMID:24892059

  11. Robustness of fit indices to outliers and leverage observations in structural equation modeling.

    PubMed

    Yuan, Ke-Hai; Zhong, Xiaoling

    2013-06-01

    Normal-distribution-based maximum likelihood (NML) is the most widely used method in structural equation modeling (SEM), although practical data tend to be nonnormally distributed. The effect of nonnormally distributed data or data contamination on the normal-distribution-based likelihood ratio (LR) statistic is well understood due to many analytical and empirical studies. In SEM, fit indices are used as widely as the LR statistic. In addition to NML, robust procedures have been developed for more efficient and less biased parameter estimates with practical data. This article studies the effect of outliers and leverage observations on fit indices following NML and two robust methods. Analysis and empirical results indicate that good leverage observations following NML and one of the robust methods lead most fit indices to give more support to the substantive model. While outliers tend to make a good model superficially bad according to many fit indices following NML, they have little effect on those following the two robust procedures. Implications of the results to data analysis are discussed, and recommendations are provided regarding the use of estimation methods and interpretation of fit indices. (PsycINFO Database Record (c) 2013 APA, all rights reserved).

  12. Bayesian models for cost-effectiveness analysis in the presence of structural zero costs

    PubMed Central

    Baio, Gianluca

    2014-01-01

    Bayesian modelling for cost-effectiveness data has received much attention in both the health economics and the statistical literature, in recent years. Cost-effectiveness data are characterised by a relatively complex structure of relationships linking a suitable measure of clinical benefit (e.g. quality-adjusted life years) and the associated costs. Simplifying assumptions, such as (bivariate) normality of the underlying distributions, are usually not granted, particularly for the cost variable, which is characterised by markedly skewed distributions. In addition, individual-level data sets are often characterised by the presence of structural zeros in the cost variable. Hurdle models can be used to account for the presence of excess zeros in a distribution and have been applied in the context of cost data. We extend their application to cost-effectiveness data, defining a full Bayesian specification, which consists of a model for the individual probability of null costs, a marginal model for the costs and a conditional model for the measure of effectiveness (given the observed costs). We presented the model using a working example to describe its main features. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd. PMID:24343868

  13. Bayesian models for cost-effectiveness analysis in the presence of structural zero costs.

    PubMed

    Baio, Gianluca

    2014-05-20

    Bayesian modelling for cost-effectiveness data has received much attention in both the health economics and the statistical literature, in recent years. Cost-effectiveness data are characterised by a relatively complex structure of relationships linking a suitable measure of clinical benefit (e.g. quality-adjusted life years) and the associated costs. Simplifying assumptions, such as (bivariate) normality of the underlying distributions, are usually not granted, particularly for the cost variable, which is characterised by markedly skewed distributions. In addition, individual-level data sets are often characterised by the presence of structural zeros in the cost variable. Hurdle models can be used to account for the presence of excess zeros in a distribution and have been applied in the context of cost data. We extend their application to cost-effectiveness data, defining a full Bayesian specification, which consists of a model for the individual probability of null costs, a marginal model for the costs and a conditional model for the measure of effectiveness (given the observed costs). We presented the model using a working example to describe its main features. © 2013 The Authors. Statistics in Medicine published by John Wiley & Sons, Ltd.

  14. The role of presumed probability density functions in the simulation of nonpremixed turbulent combustion

    NASA Astrophysics Data System (ADS)

    Coclite, A.; Pascazio, G.; De Palma, P.; Cutrone, L.

    2016-07-01

    Flamelet-Progress-Variable (FPV) combustion models allow the evaluation of all thermochemical quantities in a reacting flow by computing only the mixture fraction Z and a progress variable C. When using such a method to predict turbulent combustion in conjunction with a turbulence model, a probability density function (PDF) is required to evaluate statistical averages (e. g., Favre averages) of chemical quantities. The choice of the PDF is a compromise between computational costs and accuracy level. The aim of this paper is to investigate the influence of the PDF choice and its modeling aspects to predict turbulent combustion. Three different models are considered: the standard one, based on the choice of a β-distribution for Z and a Dirac-distribution for C; a model employing a β-distribution for both Z and C; and the third model obtained using a β-distribution for Z and the statistically most likely distribution (SMLD) for C. The standard model, although widely used, does not take into account the interaction between turbulence and chemical kinetics as well as the dependence of the progress variable not only on its mean but also on its variance. The SMLD approach establishes a systematic framework to incorporate informations from an arbitrary number of moments, thus providing an improvement over conventionally employed presumed PDF closure models. The rational behind the choice of the three PDFs is described in some details and the prediction capability of the corresponding models is tested vs. well-known test cases, namely, the Sandia flames, and H2-air supersonic combustion.

  15. Long-term changes (1980-2003) in total ozone time series over Northern Hemisphere midlatitudes

    NASA Astrophysics Data System (ADS)

    Białek, Małgorzata

    2006-03-01

    Long-term changes in total ozone time series for Arosa, Belsk, Boulder and Sapporo stations are examined. For each station we analyze time series of the following statistical characteristics of the distribution of daily ozone data: seasonal mean, standard deviation, maximum and minimum of total daily ozone values for all seasons. The iterative statistical model is proposed to estimate trends and long-term changes in the statistical distribution of the daily total ozone data. The trends are calculated for the period 1980-2003. We observe lessening of negative trends in the seasonal means as compared to those calculated by WMO for 1980-2000. We discuss a possibility of a change of the distribution shape of ozone daily data using the Kolmogorov-Smirnov test and comparing trend values in the seasonal mean, standard deviation, maximum and minimum time series for the selected stations and seasons. The distribution shift toward lower values without a change in the distribution shape is suggested with the following exceptions: the spreading of the distribution toward lower values for Belsk during winter and no decisive result for Sapporo and Boulder in summer.

  16. Species Distribution Modelling of Aedes aegypti in two dengue-endemic regions of Pakistan.

    PubMed

    Fatima, Syeda Hira; Atif, Salman; Rasheed, Syed Basit; Zaidi, Farrah; Hussain, Ejaz

    2016-03-01

    Statistical tools are effectively used to determine the distribution of mosquitoes and to make ecological inferences about the vector-borne disease dynamics. In this study, we utilised species distribution models to understand spatial patterns of Aedes aegypti in two dengue-prevalent regions of Pakistan, Lahore and Swat. Species distribution models can potentially indicate the probability of suitability of Ae. aegypti once introduced to new regions like Swat, where invasion of this species is a recent phenomenon. The distribution of Ae. aegypti was determined by applying the MaxEnt algorithm on a set of potential environmental factors and species sample records. The ecological dependency of species on each environmental variable was analysed using response curves. We quantified the statistical performance of the models based on accuracy assessment and spatial predictions. Our results suggest that Ae. aegypti is widely distributed in Lahore. Human population density and urban infrastructure are primarily responsible for greater probability of mosquito occurrence in this region. In Swat, Ae. aegypti has clumped distribution, where urban patches provide refuge to the species in an otherwise hostile heterogeneous environment and road networks are assumed to have facilitated in passive-mediated dispersal of species. In Pakistan, Ae. aegypti is expanding its range northwards; this could be associated with rapid urbanisation, trade and travel. The main implication of this expansion is that more people are at risk of dengue fever in the northern highlands of Pakistan. © 2016 John Wiley & Sons Ltd.

  17. A statistical approach to the brittle fracture of a multi-phase solid

    NASA Technical Reports Server (NTRS)

    Liu, W. K.; Lua, Y. I.; Belytschko, T.

    1991-01-01

    A stochastic damage model is proposed to quantify the inherent statistical distribution of the fracture toughness of a brittle, multi-phase solid. The model, based on the macrocrack-microcrack interaction, incorporates uncertainties in locations and orientations of microcracks. Due to the high concentration of microcracks near the macro-tip, a higher order analysis based on traction boundary integral equations is formulated first for an arbitrary array of cracks. The effects of uncertainties in locations and orientations of microcracks at a macro-tip are analyzed quantitatively by using the boundary integral equations method in conjunction with the computer simulation of the random microcrack array. The short range interactions resulting from surrounding microcracks closet to the main crack tip are investigated. The effects of microcrack density parameter are also explored in the present study. The validity of the present model is demonstrated by comparing its statistical output with the Neville distribution function, which gives correct fits to sets of experimental data from multi-phase solids.

  18. Estimating Preferential Flow in Karstic Aquifers Using Statistical Mixed Models

    PubMed Central

    Anaya, Angel A.; Padilla, Ingrid; Macchiavelli, Raul; Vesper, Dorothy J.; Meeker, John D.; Alshawabkeh, Akram N.

    2013-01-01

    Karst aquifers are highly productive groundwater systems often associated with conduit flow. These systems can be highly vulnerable to contamination, resulting in a high potential for contaminant exposure to humans and ecosystems. This work develops statistical models to spatially characterize flow and transport patterns in karstified limestone and determines the effect of aquifer flow rates on these patterns. A laboratory-scale Geo-HydroBed model is used to simulate flow and transport processes in a karstic limestone unit. The model consists of stainless-steel tanks containing a karstified limestone block collected from a karst aquifer formation in northern Puerto Rico. Experimental work involves making a series of flow and tracer injections, while monitoring hydraulic and tracer response spatially and temporally. Statistical mixed models are applied to hydraulic data to determine likely pathways of preferential flow in the limestone units. The models indicate a highly heterogeneous system with dominant, flow-dependent preferential flow regions. Results indicate that regions of preferential flow tend to expand at higher groundwater flow rates, suggesting a greater volume of the system being flushed by flowing water at higher rates. Spatial and temporal distribution of tracer concentrations indicates the presence of conduit-like and diffuse flow transport in the system, supporting the notion of both combined transport mechanisms in the limestone unit. The temporal response of tracer concentrations at different locations in the model coincide with, and confirms the preferential flow distribution generated with the statistical mixed models used in the study. PMID:23802921

  19. A detailed heterogeneous agent model for a single asset financial market with trading via an order book.

    PubMed

    Mota Navarro, Roberto; Larralde, Hernán

    2017-01-01

    We present an agent based model of a single asset financial market that is capable of replicating most of the non-trivial statistical properties observed in real financial markets, generically referred to as stylized facts. In our model agents employ strategies inspired on those used in real markets, and a realistic trade mechanism based on a double auction order book. We study the role of the distinct types of trader on the return statistics: specifically, correlation properties (or lack thereof), volatility clustering, heavy tails, and the degree to which the distribution can be described by a log-normal. Further, by introducing the practice of "profit taking", our model is also capable of replicating the stylized fact related to an asymmetry in the distribution of losses and gains.

  20. A detailed heterogeneous agent model for a single asset financial market with trading via an order book

    PubMed Central

    2017-01-01

    We present an agent based model of a single asset financial market that is capable of replicating most of the non-trivial statistical properties observed in real financial markets, generically referred to as stylized facts. In our model agents employ strategies inspired on those used in real markets, and a realistic trade mechanism based on a double auction order book. We study the role of the distinct types of trader on the return statistics: specifically, correlation properties (or lack thereof), volatility clustering, heavy tails, and the degree to which the distribution can be described by a log-normal. Further, by introducing the practice of “profit taking”, our model is also capable of replicating the stylized fact related to an asymmetry in the distribution of losses and gains. PMID:28245251

  1. Statistical guides to estimating the number of undiscovered mineral deposits: an example with porphyry copper deposits

    USGS Publications Warehouse

    Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.

    2005-01-01

    Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes. 

  2. Vehicular headways on signalized intersections: theory, models, and reality

    NASA Astrophysics Data System (ADS)

    Krbálek, Milan; Šleis, Jiří

    2015-01-01

    We discuss statistical properties of vehicular headways measured on signalized crossroads. On the basis of mathematical approaches, we formulate theoretical and empirically inspired criteria for the acceptability of theoretical headway distributions. Sequentially, the multifarious families of statistical distributions (commonly used to fit real-road headway statistics) are confronted with these criteria, and with original empirical time clearances gauged among neighboring vehicles leaving signal-controlled crossroads after a green signal appears. Using three different numerical schemes, we demonstrate that an arrangement of vehicles on an intersection is a consequence of the general stochastic nature of queueing systems, rather than a consequence of traffic rules, driver estimation processes, or decision-making procedures.

  3. Performance of statistical models to predict mental health and substance abuse cost.

    PubMed

    Montez-Rath, Maria; Christiansen, Cindy L; Ettner, Susan L; Loveland, Susan; Rosen, Amy K

    2006-10-26

    Providers use risk-adjustment systems to help manage healthcare costs. Typically, ordinary least squares (OLS) models on either untransformed or log-transformed cost are used. We examine the predictive ability of several statistical models, demonstrate how model choice depends on the goal for the predictive model, and examine whether building models on samples of the data affects model choice. Our sample consisted of 525,620 Veterans Health Administration patients with mental health (MH) or substance abuse (SA) diagnoses who incurred costs during fiscal year 1999. We tested two models on a transformation of cost: a Log Normal model and a Square-root Normal model, and three generalized linear models on untransformed cost, defined by distributional assumption and link function: Normal with identity link (OLS); Gamma with log link; and Gamma with square-root link. Risk-adjusters included age, sex, and 12 MH/SA categories. To determine the best model among the entire dataset, predictive ability was evaluated using root mean square error (RMSE), mean absolute prediction error (MAPE), and predictive ratios of predicted to observed cost (PR) among deciles of predicted cost, by comparing point estimates and 95% bias-corrected bootstrap confidence intervals. To study the effect of analyzing a random sample of the population on model choice, we re-computed these statistics using random samples beginning with 5,000 patients and ending with the entire sample. The Square-root Normal model had the lowest estimates of the RMSE and MAPE, with bootstrap confidence intervals that were always lower than those for the other models. The Gamma with square-root link was best as measured by the PRs. The choice of best model could vary if smaller samples were used and the Gamma with square-root link model had convergence problems with small samples. Models with square-root transformation or link fit the data best. This function (whether used as transformation or as a link) seems to help deal with the high comorbidity of this population by introducing a form of interaction. The Gamma distribution helps with the long tail of the distribution. However, the Normal distribution is suitable if the correct transformation of the outcome is used.

  4. Bayesian approach to non-Gaussian field statistics for diffusive broadband terahertz pulses.

    PubMed

    Pearce, Jeremy; Jian, Zhongping; Mittleman, Daniel M

    2005-11-01

    We develop a closed-form expression for the probability distribution function for the field components of a diffusive broadband wave propagating through a random medium. We consider each spectral component to provide an individual observation of a random variable, the configurationally averaged spectral intensity. Since the intensity determines the variance of the field distribution at each frequency, this random variable serves as the Bayesian prior that determines the form of the non-Gaussian field statistics. This model agrees well with experimental results.

  5. Fission fragment angular distributions in the reactions {sup 16}O+{sup 188}Os and {sup 28}Si+{sup 176}Yb

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tripathi, R.; Sudarshan, K.; Sharma, S. K.

    2009-06-15

    Fission fragment angular distributions have been measured in the reactions {sup 16}O+{sup 188}Os and {sup 28}Si+{sup 176}Yb to investigate the contribution from noncompound nucleus fission. Parameters for statistical model calculations were fixed using fission cross section data in the {sup 16}O+{sup 188}Os reaction. Experimental anisotropies were in reasonable agreement with those calculated using the statistical saddle point model for both reactions. The present results are also consistent with those of mass distribution studies in the fission of {sup 202}Po, formed in the reactions with varying entrance channel mass asymmetry. However, the present studies do not show a large fusion hindrancemore » as reported in the pre-actinide region based on the measurement of evaporation residue cross section.« less

  6. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis.

    PubMed

    Lin, Johnny; Bentler, Peter M

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne's asymptotically distribution-free method and Satorra Bentler's mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler's statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby's study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic.

  7. Inverse Gaussian gamma distribution model for turbulence-induced fading in free-space optical communication.

    PubMed

    Cheng, Mingjian; Guo, Ya; Li, Jiangting; Zheng, Xiaotong; Guo, Lixin

    2018-04-20

    We introduce an alternative distribution to the gamma-gamma (GG) distribution, called inverse Gaussian gamma (IGG) distribution, which can efficiently describe moderate-to-strong irradiance fluctuations. The proposed stochastic model is based on a modulation process between small- and large-scale irradiance fluctuations, which are modeled by gamma and inverse Gaussian distributions, respectively. The model parameters of the IGG distribution are directly related to atmospheric parameters. The accuracy of the fit among the IGG, log-normal, and GG distributions with the experimental probability density functions in moderate-to-strong turbulence are compared, and results indicate that the newly proposed IGG model provides an excellent fit to the experimental data. As the receiving diameter is comparable with the atmospheric coherence radius, the proposed IGG model can reproduce the shape of the experimental data, whereas the GG and LN models fail to match the experimental data. The fundamental channel statistics of a free-space optical communication system are also investigated in an IGG-distributed turbulent atmosphere, and a closed-form expression for the outage probability of the system is derived with Meijer's G-function.

  8. Order statistics applied to the most massive and most distant galaxy clusters

    NASA Astrophysics Data System (ADS)

    Waizmann, J.-C.; Ettori, S.; Bartelmann, M.

    2013-06-01

    In this work, we present an analytic framework for calculating the individual and joint distributions of the nth most massive or nth highest redshift galaxy cluster for a given survey characteristic allowing us to formulate Λ cold dark matter (ΛCDM) exclusion criteria. We show that the cumulative distribution functions steepen with increasing order, giving them a higher constraining power with respect to the extreme value statistics. Additionally, we find that the order statistics in mass (being dominated by clusters at lower redshifts) is sensitive to the matter density and the normalization of the matter fluctuations, whereas the order statistics in redshift is particularly sensitive to the geometric evolution of the Universe. For a fixed cosmology, both order statistics are efficient probes of the functional shape of the mass function at the high-mass end. To allow a quick assessment of both order statistics, we provide fits as a function of the survey area that allow percentile estimation with an accuracy better than 2 per cent. Furthermore, we discuss the joint distributions in the two-dimensional case and find that for the combination of the largest and the second largest observation, it is most likely to find them to be realized with similar values with a broadly peaked distribution. When combining the largest observation with higher orders, it is more likely to find a larger gap between the observations and when combining higher orders in general, the joint probability density function peaks more strongly. Having introduced the theory, we apply the order statistical analysis to the Southpole Telescope (SPT) massive cluster sample and metacatalogue of X-ray detected clusters of galaxies catalogue and find that the 10 most massive clusters in the sample are consistent with ΛCDM and the Tinker mass function. For the order statistics in redshift, we find a discrepancy between the data and the theoretical distributions, which could in principle indicate a deviation from the standard cosmology. However, we attribute this deviation to the uncertainty in the modelling of the SPT survey selection function. In turn, by assuming the ΛCDM reference cosmology, order statistics can also be utilized for consistency checks of the completeness of the observed sample and of the modelling of the survey selection function.

  9. Probabilistic Mesomechanical Fatigue Model

    NASA Technical Reports Server (NTRS)

    Tryon, Robert G.

    1997-01-01

    A probabilistic mesomechanical fatigue life model is proposed to link the microstructural material heterogeneities to the statistical scatter in the macrostructural response. The macrostructure is modeled as an ensemble of microelements. Cracks nucleation within the microelements and grow from the microelements to final fracture. Variations of the microelement properties are defined using statistical parameters. A micromechanical slip band decohesion model is used to determine the crack nucleation life and size. A crack tip opening displacement model is used to determine the small crack growth life and size. Paris law is used to determine the long crack growth life. The models are combined in a Monte Carlo simulation to determine the statistical distribution of total fatigue life for the macrostructure. The modeled response is compared to trends in experimental observations from the literature.

  10. The Effect of Velocity Correlation on the Spatial Evolution of Breakthrough Curves in Heterogeneous Media

    NASA Astrophysics Data System (ADS)

    Massoudieh, A.; Dentz, M.; Le Borgne, T.

    2017-12-01

    In heterogeneous media, the velocity distribution and the spatial correlation structure of velocity for solute particles determine the breakthrough curves and how they evolve as one moves away from the solute source. The ability to predict such evolution can help relating the spatio-statistical hydraulic properties of the media to the transport behavior and travel time distributions. While commonly used non-local transport models such as anomalous dispersion and classical continuous time random walk (CTRW) can reproduce breakthrough curve successfully by adjusting the model parameter values, they lack the ability to relate model parameters to the spatio-statistical properties of the media. This in turns limits the transferability of these models. In the research to be presented, we express concentration or flux of solutes as a distribution over their velocity. We then derive an integrodifferential equation that governs the evolution of the particle distribution over velocity at given times and locations for a particle ensemble, based on a presumed velocity correlation structure and an ergodic cross-sectional velocity distribution. This way, the spatial evolution of breakthrough curves away from the source is predicted based on cross-sectional velocity distribution and the connectivity, which is expressed by the velocity transition probability density. The transition probability is specified via a copula function that can help construct a joint distribution with a given correlation and given marginal velocities. Using this approach, we analyze the breakthrough curves depending on the velocity distribution and correlation properties. The model shows how the solute transport behavior evolves from ballistic transport at small spatial scales to Fickian dispersion at large length scales relative to the velocity correlation length.

  11. Managing distribution changes in time series prediction

    NASA Astrophysics Data System (ADS)

    Matias, J. M.; Gonzalez-Manteiga, W.; Taboada, J.; Ordonez, C.

    2006-07-01

    When a problem is modeled statistically, a single distribution model is usually postulated that is assumed to be valid for the entire space. Nonetheless, this practice may be somewhat unrealistic in certain application areas, in which the conditions of the process that generates the data may change; as far as we are aware, however, no techniques have been developed to tackle this problem.This article proposes a technique for modeling and predicting this change in time series with a view to improving estimates and predictions. The technique is applied, among other models, to the hypernormal distribution recently proposed. When tested on real data from a range of stock market indices the technique produces better results that when a single distribution model is assumed to be valid for the entire period of time studied.Moreover, when a global model is postulated, it is highly recommended to select the hypernormal distribution parameter in the same likelihood maximization process.

  12. Distribution of lod scores in oligogenic linkage analysis.

    PubMed

    Williams, J T; North, K E; Martin, L J; Comuzzie, A G; Göring, H H; Blangero, J

    2001-01-01

    In variance component oligogenic linkage analysis it can happen that the residual additive genetic variance bounds to zero when estimating the effect of the ith quantitative trait locus. Using quantitative trait Q1 from the Genetic Analysis Workshop 12 simulated general population data, we compare the observed lod scores from oligogenic linkage analysis with the empirical lod score distribution under a null model of no linkage. We find that zero residual additive genetic variance in the null model alters the usual distribution of the likelihood-ratio statistic.

  13. Estimating migratory fish distribution from altitude and basin area: a case study in a large Neotropical river

    Treesearch

    Jose Ricardo Barradas; Lucas G. Silva; Bret C. Harvey; Nelson F. Fontoura

    2012-01-01

    1. The objective of this study was to identify longitudinal distribution patterns of large migratory fish species in the Uruguay River basin, southern Brazil, and construct statistical distribution models for Salminus brasiliensis, Prochilodus lineatus, Leporinus obtusidens and Pseudoplatystoma corruscans. 2. The sampling programme resulted in 202 interviews with old...

  14. On joint subtree distributions under two evolutionary models.

    PubMed

    Wu, Taoyang; Choi, Kwok Pui

    2016-04-01

    In population and evolutionary biology, hypotheses about micro-evolutionary and macro-evolutionary processes are commonly tested by comparing the shape indices of empirical evolutionary trees with those predicted by neutral models. A key ingredient in this approach is the ability to compute and quantify distributions of various tree shape indices under random models of interest. As a step to meet this challenge, in this paper we investigate the joint distribution of cherries and pitchforks (that is, subtrees with two and three leaves) under two widely used null models: the Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model. Based on two novel recursive formulae, we propose a dynamic approach to numerically compute the exact joint distribution (and hence the marginal distributions) for trees of any size. We also obtained insights into the statistical properties of trees generated under these two models, including a constant correlation between the cherry and the pitchfork distributions under the YHK model, and the log-concavity and unimodality of the cherry distributions under both models. In addition, we show that there exists a unique change point for the cherry distributions between these two models. Copyright © 2015 Elsevier Inc. All rights reserved.

  15. Comparative evaluation of statistical and mechanistic models of Escherichia coli at beaches in southern Lake Michigan

    USGS Publications Warehouse

    Safaie, Ammar; Wendzel, Aaron; Ge, Zhongfu; Nevers, Meredith; Whitman, Richard L.; Corsi, Steven R.; Phanikumar, Mantha S.

    2016-01-01

    Statistical and mechanistic models are popular tools for predicting the levels of indicator bacteria at recreational beaches. Researchers tend to use one class of model or the other, and it is difficult to generalize statements about their relative performance due to differences in how the models are developed, tested, and used. We describe a cooperative modeling approach for freshwater beaches impacted by point sources in which insights derived from mechanistic modeling were used to further improve the statistical models and vice versa. The statistical models provided a basis for assessing the mechanistic models which were further improved using probability distributions to generate high-resolution time series data at the source, long-term “tracer” transport modeling based on observed electrical conductivity, better assimilation of meteorological data, and the use of unstructured-grids to better resolve nearshore features. This approach resulted in improved models of comparable performance for both classes including a parsimonious statistical model suitable for real-time predictions based on an easily measurable environmental variable (turbidity). The modeling approach outlined here can be used at other sites impacted by point sources and has the potential to improve water quality predictions resulting in more accurate estimates of beach closures.

  16. Distribution analysis of airborne nicotine concentrations in hospitality facilities.

    PubMed

    Schorp, Matthias K; Leyden, Donald E

    2002-02-01

    A number of publications report statistical summaries for environmental tobacco smoke (ETS) concentrations. Despite compelling evidence for the data not being normally distributed, these publications typically report the arithmetic mean and standard deviation of the data, thereby losing important information related to the distribution of values contained in the original data. We were interested in the frequency distributions of reported nicotine concentrations in hospitality environments and subjected available data to distribution analyses. The distribution of experimental indoor airborne nicotine concentration data taken from hospitality facilities worldwide was fit to lognormal, Weibull, exponential, Pearson (Type V), logistic, and loglogistic distribution models. Comparison of goodness of fit (GOF) parameters and indications from the literature verified the selection of a lognormal distribution as the overall best model. When individual data were not reported in the literature, statistical summaries of results were used to model sets of lognormally distributed data that are intended to mimic the original data distribution. Grouping the data into various categories led to 31 frequency distributions that were further interpreted. The median values in nonsmoking environments are about half of the median values in smoking sections. When different continents are compared, Asian, European, and North American median values in restaurants are about a factor of three below levels encountered in other hospitality facilities. On a comparison of nicotine concentrations in North American smoking sections and nonsmoking sections, median values are about one-third of the European levels. The results obtained may be used to address issues related to exposure to ETS in the hospitality sector.

  17. FAST TRACK COMMUNICATION: Freezing and extreme-value statistics in a random energy model with logarithmically correlated potential

    NASA Astrophysics Data System (ADS)

    Fyodorov, Yan V.; Bouchaud, Jean-Philippe

    2008-09-01

    We investigate some implications of the freezing scenario proposed by Carpentier and Le Doussal (CLD) for a random energy model (REM) with logarithmically correlated random potential. We introduce a particular (circular) variant of the model, and show that the integer moments of the partition function in the high-temperature phase are given by the well-known Dyson Coulomb gas integrals. The CLD freezing scenario allows one to use those moments for extracting the distribution of the free energy in both high- and low-temperature phases. In particular, it yields the full distribution of the minimal value in the potential sequence. This provides an explicit new class of extreme-value statistics for strongly correlated variables, manifestly different from the standard Gumbel class.

  18. Trans-dimensional inversion of microtremor array dispersion data with hierarchical autoregressive error models

    NASA Astrophysics Data System (ADS)

    Dettmer, Jan; Molnar, Sheri; Steininger, Gavin; Dosso, Stan E.; Cassidy, John F.

    2012-02-01

    This paper applies a general trans-dimensional Bayesian inference methodology and hierarchical autoregressive data-error models to the inversion of microtremor array dispersion data for shear wave velocity (vs) structure. This approach accounts for the limited knowledge of the optimal earth model parametrization (e.g. the number of layers in the vs profile) and of the data-error statistics in the resulting vs parameter uncertainty estimates. The assumed earth model parametrization influences estimates of parameter values and uncertainties due to different parametrizations leading to different ranges of data predictions. The support of the data for a particular model is often non-unique and several parametrizations may be supported. A trans-dimensional formulation accounts for this non-uniqueness by including a model-indexing parameter as an unknown so that groups of models (identified by the indexing parameter) are considered in the results. The earth model is parametrized in terms of a partition model with interfaces given over a depth-range of interest. In this work, the number of interfaces (layers) in the partition model represents the trans-dimensional model indexing. In addition, serial data-error correlations are addressed by augmenting the geophysical forward model with a hierarchical autoregressive error model that can account for a wide range of error processes with a small number of parameters. Hence, the limited knowledge about the true statistical distribution of data errors is also accounted for in the earth model parameter estimates, resulting in more realistic uncertainties and parameter values. Hierarchical autoregressive error models do not rely on point estimates of the model vector to estimate data-error statistics, and have no requirement for computing the inverse or determinant of a data-error covariance matrix. This approach is particularly useful for trans-dimensional inverse problems, as point estimates may not be representative of the state space that spans multiple subspaces of different dimensionalities. The order of the autoregressive process required to fit the data is determined here by posterior residual-sample examination and statistical tests. Inference for earth model parameters is carried out on the trans-dimensional posterior probability distribution by considering ensembles of parameter vectors. In particular, vs uncertainty estimates are obtained by marginalizing the trans-dimensional posterior distribution in terms of vs-profile marginal distributions. The methodology is applied to microtremor array dispersion data collected at two sites with significantly different geology in British Columbia, Canada. At both sites, results show excellent agreement with estimates from invasive measurements.

  19. Estimating procedure times for surgeries by determining location parameters for the lognormal model.

    PubMed

    Spangler, William E; Strum, David P; Vargas, Luis G; May, Jerrold H

    2004-05-01

    We present an empirical study of methods for estimating the location parameter of the lognormal distribution. Our results identify the best order statistic to use, and indicate that using the best order statistic instead of the median may lead to less frequent incorrect rejection of the lognormal model, more accurate critical value estimates, and higher goodness-of-fit. Using simulation data, we constructed and compared two models for identifying the best order statistic, one based on conventional nonlinear regression and the other using a data mining/machine learning technique. Better surgical procedure time estimates may lead to improved surgical operations.

  20. Statistical Models for the Analysis of Zero-Inflated Pain Intensity Numeric Rating Scale Data.

    PubMed

    Goulet, Joseph L; Buta, Eugenia; Bathulapalli, Harini; Gueorguieva, Ralitza; Brandt, Cynthia A

    2017-03-01

    Pain intensity is often measured in clinical and research settings using the 0 to 10 numeric rating scale (NRS). NRS scores are recorded as discrete values, and in some samples they may display a high proportion of zeroes and a right-skewed distribution. Despite this, statistical methods for normally distributed data are frequently used in the analysis of NRS data. We present results from an observational cross-sectional study examining the association of NRS scores with patient characteristics using data collected from a large cohort of 18,935 veterans in Department of Veterans Affairs care diagnosed with a potentially painful musculoskeletal disorder. The mean (variance) NRS pain was 3.0 (7.5), and 34% of patients reported no pain (NRS = 0). We compared the following statistical models for analyzing NRS scores: linear regression, generalized linear models (Poisson and negative binomial), zero-inflated and hurdle models for data with an excess of zeroes, and a cumulative logit model for ordinal data. We examined model fit, interpretability of results, and whether conclusions about the predictor effects changed across models. In this study, models that accommodate zero inflation provided a better fit than the other models. These models should be considered for the analysis of NRS data with a large proportion of zeroes. We examined and analyzed pain data from a large cohort of veterans with musculoskeletal disorders. We found that many reported no current pain on the NRS on the diagnosis date. We present several alternative statistical methods for the analysis of pain intensity data with a large proportion of zeroes. Published by Elsevier Inc.

  1. Avalanches and generalized memory associativity in a network model for conscious and unconscious mental functioning

    NASA Astrophysics Data System (ADS)

    Siddiqui, Maheen; Wedemann, Roseli S.; Jensen, Henrik Jeldtoft

    2018-01-01

    We explore statistical characteristics of avalanches associated with the dynamics of a complex-network model, where two modules corresponding to sensorial and symbolic memories interact, representing unconscious and conscious mental processes. The model illustrates Freud's ideas regarding the neuroses and that consciousness is related with symbolic and linguistic memory activity in the brain. It incorporates the Stariolo-Tsallis generalization of the Boltzmann Machine in order to model memory retrieval and associativity. In the present work, we define and measure avalanche size distributions during memory retrieval, in order to gain insight regarding basic aspects of the functioning of these complex networks. The avalanche sizes defined for our model should be related to the time consumed and also to the size of the neuronal region which is activated, during memory retrieval. This allows the qualitative comparison of the behaviour of the distribution of cluster sizes, obtained during fMRI measurements of the propagation of signals in the brain, with the distribution of avalanche sizes obtained in our simulation experiments. This comparison corroborates the indication that the Nonextensive Statistical Mechanics formalism may indeed be more well suited to model the complex networks which constitute brain and mental structure.

  2. The Assessment of Climatological Impacts on Agricultural Production and Residential Energy Demand

    NASA Astrophysics Data System (ADS)

    Cooter, Ellen Jean

    The assessment of climatological impacts on selected economic activities is presented as a multi-step, inter -disciplinary problem. The assessment process which is addressed explicitly in this report focuses on (1) user identification, (2) direct impact model selection, (3) methodological development, (4) product development and (5) product communication. Two user groups of major economic importance were selected for study; agriculture and gas utilities. The broad agricultural sector is further defined as U.S.A. corn production. The general category of utilities is narrowed to Oklahoma residential gas heating demand. The CERES physiological growth model was selected as the process model for corn production. The statistical analysis for corn production suggests that (1) although this is a statistically complex model, it can yield useful impact information, (2) as a result of output distributional biases, traditional statistical techniques are not adequate analytical tools, (3) the model yield distribution as a whole is probably non-Gausian, particularly in the tails and (4) there appears to be identifiable weekly patterns of forecasted yields throughout the growing season. Agricultural quantities developed include point yield impact estimates and distributional characteristics, geographic corn weather distributions, return period estimates, decision making criteria (confidence limits) and time series of indices. These products were communicated in economic terms through the use of a Bayesian decision example and an econometric model. The NBSLD energy load model was selected to represent residential gas heating consumption. A cursory statistical analysis suggests relationships among weather variables across the Oklahoma study sites. No linear trend in "technology -free" modeled energy demand or input weather variables which would correspond to that contained in observed state -level residential energy use was detected. It is suggested that this trend is largely the result of non-weather factors such as population and home usage patterns rather than regional climate change. Year-to-year changes in modeled residential heating demand on the order of 10('6) Btu's per household were determined and later related to state -level components of the Oklahoma economy. Products developed include the definition of regional forecast areas, likelihood estimates of extreme seasonal conditions and an energy/climate index. This information is communicated in economic terms through an input/output model which is used to estimate changes in Gross State Product and Household income attributable to weather variability.

  3. Statistical, economic and other tools for assessing natural aggregate

    USGS Publications Warehouse

    Bliss, J.D.; Moyle, P.R.; Bolm, K.S.

    2003-01-01

    Quantitative aggregate resource assessment provides resource estimates useful for explorationists, land managers and those who make decisions about land allocation, which may have long-term implications concerning cost and the availability of aggregate resources. Aggregate assessment needs to be systematic and consistent, yet flexible enough to allow updating without invalidating other parts of the assessment. Evaluators need to use standard or consistent aggregate classification and statistic distributions or, in other words, models with geological, geotechnical and economic variables or interrelationships between these variables. These models can be used with subjective estimates, if needed, to estimate how much aggregate may be present in a region or country using distributions generated by Monte Carlo computer simulations.

  4. The effect of clulstering of galaxies on the statistics of gravitational lenses

    NASA Technical Reports Server (NTRS)

    Anderson, N.; Alcock, C.

    1986-01-01

    It is examined whether clustering of galaxies can significantly alter the statistical properties of gravitational lenses? Only models of clustering that resemble the observed distribution of galaxies in the properties of the two-point correlation function are considered. Monte-Carlo simulations of the imaging process are described. It is found that the effect of clustering is too small to be significant, unless the mass of the deflectors is so large that gravitational lenses become common occurrences. A special model is described which was concocted to optimize the effect of clustering on gravitational lensing but still resemble the observed distribution of galaxies; even this simulation did not satisfactorily produce large numbers of wide-angle lenses.

  5. Notes on power of normality tests of error terms in regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Střelec, Luboš

    2015-03-10

    Normality is one of the basic assumptions in applying statistical procedures. For example in linear regression most of the inferential procedures are based on the assumption of normality, i.e. the disturbance vector is assumed to be normally distributed. Failure to assess non-normality of the error terms may lead to incorrect results of usual statistical inference techniques such as t-test or F-test. Thus, error terms should be normally distributed in order to allow us to make exact inferences. As a consequence, normally distributed stochastic errors are necessary in order to make a not misleading inferences which explains a necessity and importancemore » of robust tests of normality. Therefore, the aim of this contribution is to discuss normality testing of error terms in regression models. In this contribution, we introduce the general RT class of robust tests for normality, and present and discuss the trade-off between power and robustness of selected classical and robust normality tests of error terms in regression models.« less

  6. A lognormal distribution of the lengths of terminal twigs on self-similar branches of elm trees.

    PubMed

    Koyama, Kohei; Yamamoto, Ken; Ushio, Masayuki

    2017-01-11

    Lognormal distributions and self-similarity are characteristics associated with a wide range of biological systems. The sequential breakage model has established a link between lognormal distributions and self-similarity and has been used to explain species abundance distributions. To date, however, there has been no similar evidence in studies of multicellular organismal forms. We tested the hypotheses that the distribution of the lengths of terminal stems of Japanese elm trees (Ulmus davidiana), the end products of a self-similar branching process, approaches a lognormal distribution. We measured the length of the stem segments of three elm branches and obtained the following results: (i) each occurrence of branching caused variations or errors in the lengths of the child stems relative to their parent stems; (ii) the branches showed statistical self-similarity; the observed error distributions were similar at all scales within each branch and (iii) the multiplicative effect of these errors generated variations of the lengths of terminal twigs that were well approximated by a lognormal distribution, although some statistically significant deviations from strict lognormality were observed for one branch. Our results provide the first empirical evidence that statistical self-similarity of an organismal form generates a lognormal distribution of organ sizes. © 2017 The Author(s).

  7. Numerically exact full counting statistics of the nonequilibrium Anderson impurity model

    NASA Astrophysics Data System (ADS)

    Ridley, Michael; Singh, Viveka N.; Gull, Emanuel; Cohen, Guy

    2018-03-01

    The time-dependent full counting statistics of charge transport through an interacting quantum junction is evaluated from its generating function, controllably computed with the inchworm Monte Carlo method. Exact noninteracting results are reproduced; then, we continue to explore the effect of electron-electron interactions on the time-dependent charge cumulants, first-passage time distributions, and n -electron transfer distributions. We observe a crossover in the noise from Coulomb blockade to Kondo-dominated physics as the temperature is decreased. In addition, we uncover long-tailed spin distributions in the Kondo regime and analyze queuing behavior caused by correlations between single-electron transfer events.

  8. Numerically exact full counting statistics of the nonequilibrium Anderson impurity model

    DOE PAGES

    Ridley, Michael; Singh, Viveka N.; Gull, Emanuel; ...

    2018-03-06

    The time-dependent full counting statistics of charge transport through an interacting quantum junction is evaluated from its generating function, controllably computed with the inchworm Monte Carlo method. Exact noninteracting results are reproduced; then, we continue to explore the effect of electron-electron interactions on the time-dependent charge cumulants, first-passage time distributions, and n-electron transfer distributions. We observe a crossover in the noise from Coulomb blockade to Kondo-dominated physics as the temperature is decreased. In addition, we uncover long-tailed spin distributions in the Kondo regime and analyze queuing behavior caused by correlations between single-electron transfer events

  9. Paretian Poisson Processes

    NASA Astrophysics Data System (ADS)

    Eliazar, Iddo; Klafter, Joseph

    2008-05-01

    Many random populations can be modeled as a countable set of points scattered randomly on the positive half-line. The points may represent magnitudes of earthquakes and tornados, masses of stars, market values of public companies, etc. In this article we explore a specific class of random such populations we coin ` Paretian Poisson processes'. This class is elemental in statistical physics—connecting together, in a deep and fundamental way, diverse issues including: the Poisson distribution of the Law of Small Numbers; Paretian tail statistics; the Fréchet distribution of Extreme Value Theory; the one-sided Lévy distribution of the Central Limit Theorem; scale-invariance, renormalization and fractality; resilience to random perturbations.

  10. Numerically exact full counting statistics of the nonequilibrium Anderson impurity model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ridley, Michael; Singh, Viveka N.; Gull, Emanuel

    The time-dependent full counting statistics of charge transport through an interacting quantum junction is evaluated from its generating function, controllably computed with the inchworm Monte Carlo method. Exact noninteracting results are reproduced; then, we continue to explore the effect of electron-electron interactions on the time-dependent charge cumulants, first-passage time distributions, and n-electron transfer distributions. We observe a crossover in the noise from Coulomb blockade to Kondo-dominated physics as the temperature is decreased. In addition, we uncover long-tailed spin distributions in the Kondo regime and analyze queuing behavior caused by correlations between single-electron transfer events

  11. Economic Statistical Design of Integrated X-bar-S Control Chart with Preventive Maintenance and General Failure Distribution

    PubMed Central

    Caballero Morales, Santiago Omar

    2013-01-01

    The application of Preventive Maintenance (PM) and Statistical Process Control (SPC) are important practices to achieve high product quality, small frequency of failures, and cost reduction in a production process. However there are some points that have not been explored in depth about its joint application. First, most SPC is performed with the X-bar control chart which does not fully consider the variability of the production process. Second, many studies of design of control charts consider just the economic aspect while statistical restrictions must be considered to achieve charts with low probabilities of false detection of failures. Third, the effect of PM on processes with different failure probability distributions has not been studied. Hence, this paper covers these points, presenting the Economic Statistical Design (ESD) of joint X-bar-S control charts with a cost model that integrates PM with general failure distribution. Experiments showed statistically significant reductions in costs when PM is performed on processes with high failure rates and reductions in the sampling frequency of units for testing under SPC. PMID:23527082

  12. Millisecond Microwave Spikes: Statistical Study and Application for Plasma Diagnostics

    NASA Astrophysics Data System (ADS)

    Rozhansky, I. V.; Fleishman, G. D.; Huang, G.-L.

    2008-07-01

    We analyze a dense cluster of solar radio spikes registered at 4.5-6 GHz by the Purple Mountain Observatory spectrometer (Nanjing, China), operating in the 4.5-7.5 GHz range with 5 ms temporal resolution. To handle the data from the spectrometer, we developed a new technique that uses a nonlinear multi-Gaussian spectral fit based on χ2 criteria to extract individual spikes from the originally recorded spectra. Applying this method to the experimental raw data, we eventually identified about 3000 spikes for this event, which allows us to make a detailed statistical analysis. Various statistical characteristics of the spikes have been evaluated, including the intensity distributions, the spectral bandwidth distributions, and the distribution of the spike mean frequencies. The most striking finding of this analysis is the distributions of the spike bandwidth, which are remarkably asymmetric. To reveal the underlaying microphysics, we explore the local-trap model with the renormalized theory of spectral profiles of the electron cyclotron maser (ECM) emission peak in a source with random magnetic irregularities. The distribution of the solar spike relative bandwidths calculated within the local-trap model represents an excellent fit to the experimental data. Accordingly, the developed technique may offer a new tool with which to study very low levels of magnetic turbulence in the spike sources, when the ECM mechanism of the spike cluster is confirmed.

  13. Bayesian Parameter Inference and Model Selection by Population Annealing in Systems Biology

    PubMed Central

    Murakami, Yohei

    2014-01-01

    Parameter inference and model selection are very important for mathematical modeling in systems biology. Bayesian statistics can be used to conduct both parameter inference and model selection. Especially, the framework named approximate Bayesian computation is often used for parameter inference and model selection in systems biology. However, Monte Carlo methods needs to be used to compute Bayesian posterior distributions. In addition, the posterior distributions of parameters are sometimes almost uniform or very similar to their prior distributions. In such cases, it is difficult to choose one specific value of parameter with high credibility as the representative value of the distribution. To overcome the problems, we introduced one of the population Monte Carlo algorithms, population annealing. Although population annealing is usually used in statistical mechanics, we showed that population annealing can be used to compute Bayesian posterior distributions in the approximate Bayesian computation framework. To deal with un-identifiability of the representative values of parameters, we proposed to run the simulations with the parameter ensemble sampled from the posterior distribution, named “posterior parameter ensemble”. We showed that population annealing is an efficient and convenient algorithm to generate posterior parameter ensemble. We also showed that the simulations with the posterior parameter ensemble can, not only reproduce the data used for parameter inference, but also capture and predict the data which was not used for parameter inference. Lastly, we introduced the marginal likelihood in the approximate Bayesian computation framework for Bayesian model selection. We showed that population annealing enables us to compute the marginal likelihood in the approximate Bayesian computation framework and conduct model selection depending on the Bayes factor. PMID:25089832

  14. Mathematical and Statistical Software Index.

    DTIC Science & Technology

    1986-08-01

    geometric) mean HMEAN - harmonic mean MEDIAN - median MODE - mode QUANT - quantiles OGIVE - distribution curve IQRNG - interpercentile range RANGE ... range mutliphase pivoting algorithm cross-classification multiple discriminant analysis cross-tabul ation mul tipl e-objecti ve model curve fitting...Statistics). .. .. .... ...... ..... ...... ..... .. 21 *RANGEX (Correct Correlations for Curtailment of Range ). .. .. .... ...... ... 21 *RUMMAGE II (Analysis

  15. Modeling Statistical Insensitivity: Sources of Suboptimal Behavior

    ERIC Educational Resources Information Center

    Gagliardi, Annie; Feldman, Naomi H.; Lidz, Jeffrey

    2017-01-01

    Children acquiring languages with noun classes (grammatical gender) have ample statistical information available that characterizes the distribution of nouns into these classes, but their use of this information to classify novel nouns differs from the predictions made by an optimal Bayesian classifier. We use rational analysis to investigate the…

  16. General solution of the chemical master equation and modality of marginal distributions for hierarchic first-order reaction networks.

    PubMed

    Reis, Matthias; Kromer, Justus A; Klipp, Edda

    2018-01-20

    Multimodality is a phenomenon which complicates the analysis of statistical data based exclusively on mean and variance. Here, we present criteria for multimodality in hierarchic first-order reaction networks, consisting of catalytic and splitting reactions. Those networks are characterized by independent and dependent subnetworks. First, we prove the general solvability of the Chemical Master Equation (CME) for this type of reaction network and thereby extend the class of solvable CME's. Our general solution is analytical in the sense that it allows for a detailed analysis of its statistical properties. Given Poisson/deterministic initial conditions, we then prove the independent species to be Poisson/binomially distributed, while the dependent species exhibit generalized Poisson/Khatri Type B distributions. Generalized Poisson/Khatri Type B distributions are multimodal for an appropriate choice of parameters. We illustrate our criteria for multimodality by several basic models, as well as the well-known two-stage transcription-translation network and Bateman's model from nuclear physics. For both examples, multimodality was previously not reported.

  17. Performance Prediction of a Synchronization Link for Distributed Aerospace Wireless Systems

    PubMed Central

    Shao, Huaizong

    2013-01-01

    For reasons of stealth and other operational advantages, distributed aerospace wireless systems have received much attention in recent years. In a distributed aerospace wireless system, since the transmitter and receiver placed on separated platforms which use independent master oscillators, there is no cancellation of low-frequency phase noise as in the monostatic cases. Thus, high accurate time and frequency synchronization techniques are required for distributed wireless systems. The use of a dedicated synchronization link to quantify and compensate oscillator frequency instability is investigated in this paper. With the mathematical statistical models of phase noise, closed-form analytic expressions for the synchronization link performance are derived. The possible error contributions including oscillator, phase-locked loop, and receiver noise are quantified. The link synchronization performance is predicted by utilizing the knowledge of the statistical models, system error contributions, and sampling considerations. Simulation results show that effective synchronization error compensation can be achieved by using this dedicated synchronization link. PMID:23970828

  18. Toward statistical modeling of saccadic eye-movement and visual saliency.

    PubMed

    Sun, Xiaoshuai; Yao, Hongxun; Ji, Rongrong; Liu, Xian-Ming

    2014-11-01

    In this paper, we present a unified statistical framework for modeling both saccadic eye movements and visual saliency. By analyzing the statistical properties of human eye fixations on natural images, we found that human attention is sparsely distributed and usually deployed to locations with abundant structural information. This observations inspired us to model saccadic behavior and visual saliency based on super-Gaussian component (SGC) analysis. Our model sequentially obtains SGC using projection pursuit, and generates eye movements by selecting the location with maximum SGC response. Besides human saccadic behavior simulation, we also demonstrated our superior effectiveness and robustness over state-of-the-arts by carrying out dense experiments on synthetic patterns and human eye fixation benchmarks. Multiple key issues in saliency modeling research, such as individual differences, the effects of scale and blur, are explored in this paper. Based on extensive qualitative and quantitative experimental results, we show promising potentials of statistical approaches for human behavior research.

  19. A Geostatistical Scaling Approach for the Generation of Non Gaussian Random Variables and Increments

    NASA Astrophysics Data System (ADS)

    Guadagnini, Alberto; Neuman, Shlomo P.; Riva, Monica; Panzeri, Marco

    2016-04-01

    We address manifestations of non-Gaussian statistical scaling displayed by many variables, Y, and their (spatial or temporal) increments. Evidence of such behavior includes symmetry of increment distributions at all separation distances (or lags) with sharp peaks and heavy tails which tend to decay asymptotically as lag increases. Variables reported to exhibit such distributions include quantities of direct relevance to hydrogeological sciences, e.g. porosity, log permeability, electrical resistivity, soil and sediment texture, sediment transport rate, rainfall, measured and simulated turbulent fluid velocity, and other. No model known to us captures all of the documented statistical scaling behaviors in a unique and consistent manner. We recently proposed a generalized sub-Gaussian model (GSG) which reconciles within a unique theoretical framework the probability distributions of a target variable and its increments. We presented an algorithm to generate unconditional random realizations of statistically isotropic or anisotropic GSG functions and illustrated it in two dimensions. In this context, we demonstrated the feasibility of estimating all key parameters of a GSG model underlying a single realization of Y by analyzing jointly spatial moments of Y data and corresponding increments. Here, we extend our GSG model to account for noisy measurements of Y at a discrete set of points in space (or time), present an algorithm to generate conditional realizations of corresponding isotropic or anisotropic random field, and explore them on one- and two-dimensional synthetic test cases.

  20. Detailed Spectral Analysis of the 260 ks XMM-Newton Data of 1E 1207.4-5209 and Significance of a 2.1 keV Absorption Feature

    NASA Astrophysics Data System (ADS)

    Mori, Kaya; Chonko, James C.; Hailey, Charles J.

    2005-10-01

    We have reanalyzed the 260 ks XMM-Newton observation of 1E 1207.4-5209. There are several significant improvements over previous work. First, a much broader range of physically plausible spectral models was used. Second, we have used a more rigorous statistical analysis. The standard F-distribution was not employed, but rather the exact finite statistics F-distribution was determined by Monte Carlo simulations. This approach was motivated by the recent work of Protassov and coworkers and Freeman and coworkers. They demonstrated that the standard F-distribution is not even asymptotically correct when applied to assess the significance of additional absorption features in a spectrum. With our improved analysis we do not find a third and fourth spectral feature in 1E 1207.4-5209 but only the two broad absorption features previously reported. Two additional statistical tests, one line model dependent and the other line model independent, confirmed our modified F-test analysis. For all physically plausible continuum models in which the weak residuals are strong enough to fit, the residuals occur at the instrument Au M edge. As a sanity check we confirmed that the residuals are consistent in strength and position with the instrument Au M residuals observed in 3C 273.

  1. The beta Burr type X distribution properties with application.

    PubMed

    Merovci, Faton; Khaleel, Mundher Abdullah; Ibrahim, Noor Akma; Shitan, Mahendran

    2016-01-01

    We develop a new continuous distribution called the beta-Burr type X distribution that extends the Burr type X distribution. The properties provide a comprehensive mathematical treatment of this distribution. Further more, various structural properties of the new distribution are derived, that includes moment generating function and the rth moment thus generalizing some results in the literature. We also obtain expressions for the density, moment generating function and rth moment of the order statistics. We consider the maximum likelihood estimation to estimate the parameters. Additionally, the asymptotic confidence intervals for the parameters are derived from the Fisher information matrix. Finally, simulation study is carried at under varying sample size to assess the performance of this model. Illustration the real dataset indicates that this new distribution can serve as a good alternative model to model positive real data in many areas.

  2. Impacts of climate change on precipitation and discharge extremes through the use of statistical downscaling approaches in a Mediterranean basin.

    PubMed

    Piras, Monica; Mascaro, Giuseppe; Deidda, Roberto; Vivoni, Enrique R

    2016-02-01

    Mediterranean region is characterized by high precipitation variability often enhanced by orography, with strong seasonality and large inter-annual fluctuations, and by high heterogeneity of terrain and land surface properties. As a consequence, catchments in this area are often prone to the occurrence of hydrometeorological extremes, including storms, floods and flash-floods. A number of climate studies focused in the Mediterranean region predict that extreme events will occur with higher intensity and frequency, thus requiring further analyses to assess their effect at the land surface, particularly in small- and medium-sized watersheds. In this study, climate and hydrologic simulations produced within the Climate Induced Changes on the Hydrology of Mediterranean Basins (CLIMB) EU FP7 research project were used to analyze how precipitation extremes propagate into discharge extremes in the Rio Mannu basin (472.5km(2)), located in Sardinia, Italy. The basin hydrologic response to climate forcings in a reference (1971-2000) and a future (2041-2070) period was simulated through the combined use of a set of global and regional climate models, statistical downscaling techniques, and a process based distributed hydrologic model. We analyzed and compared the distribution of annual maxima extracted from hourly and daily precipitation and peak discharge time series, simulated by the hydrologic model under climate forcing. For this aim, yearly maxima were fit by the Generalized Extreme Value (GEV) distribution using a regional approach. Next, we discussed commonality and contrasting behaviors of precipitation and discharge maxima distributions to better understand how hydrological transformations impact propagation of extremes. Finally, we show how rainfall statistical downscaling algorithms produce more reliable forcings for hydrological models than coarse climate model outputs. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Model Checking Techniques for Assessing Functional Form Specifications in Censored Linear Regression Models.

    PubMed

    León, Larry F; Cai, Tianxi

    2012-04-01

    In this paper we develop model checking techniques for assessing functional form specifications of covariates in censored linear regression models. These procedures are based on a censored data analog to taking cumulative sums of "robust" residuals over the space of the covariate under investigation. These cumulative sums are formed by integrating certain Kaplan-Meier estimators and may be viewed as "robust" censored data analogs to the processes considered by Lin, Wei & Ying (2002). The null distributions of these stochastic processes can be approximated by the distributions of certain zero-mean Gaussian processes whose realizations can be generated by computer simulation. Each observed process can then be graphically compared with a few realizations from the Gaussian process. We also develop formal test statistics for numerical comparison. Such comparisons enable one to assess objectively whether an apparent trend seen in a residual plot reects model misspecification or natural variation. We illustrate the methods with a well known dataset. In addition, we examine the finite sample performance of the proposed test statistics in simulation experiments. In our simulation experiments, the proposed test statistics have good power of detecting misspecification while at the same time controlling the size of the test.

  4. Predicting future protection of respirator users: Statistical approaches and practical implications.

    PubMed

    Hu, Chengcheng; Harber, Philip; Su, Jing

    2016-01-01

    The purpose of this article is to describe a statistical approach for predicting a respirator user's fit factor in the future based upon results from initial tests. A statistical prediction model was developed based upon joint distribution of multiple fit factor measurements over time obtained from linear mixed effect models. The model accounts for within-subject correlation as well as short-term (within one day) and longer-term variability. As an example of applying this approach, model parameters were estimated from a research study in which volunteers were trained by three different modalities to use one of two types of respirators. They underwent two quantitative fit tests at the initial session and two on the same day approximately six months later. The fitted models demonstrated correlation and gave the estimated distribution of future fit test results conditional on past results for an individual worker. This approach can be applied to establishing a criterion value for passing an initial fit test to provide reasonable likelihood that a worker will be adequately protected in the future; and to optimizing the repeat fit factor test intervals individually for each user for cost-effective testing.

  5. [Establishment of the mathematic model of total quantum statistical moment standard similarity for application to medical theoretical research].

    PubMed

    He, Fu-yuan; Deng, Kai-wen; Huang, Sheng; Liu, Wen-long; Shi, Ji-lian

    2013-09-01

    The paper aims to elucidate and establish a new mathematic model: the total quantum statistical moment standard similarity (TQSMSS) on the base of the original total quantum statistical moment model and to illustrate the application of the model to medical theoretical research. The model was established combined with the statistical moment principle and the normal distribution probability density function properties, then validated and illustrated by the pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical method for them, and by analysis of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving the Buyanghanwu-decoction extract. The established model consists of four mainly parameters: (1) total quantum statistical moment similarity as ST, an overlapped area by two normal distribution probability density curves in conversion of the two TQSM parameters; (2) total variability as DT, a confidence limit of standard normal accumulation probability which is equal to the absolute difference value between the two normal accumulation probabilities within integration of their curve nodical; (3) total variable probability as 1-Ss, standard normal distribution probability within interval of D(T); (4) total variable probability (1-beta)alpha and (5) stable confident probability beta(1-alpha): the correct probability to make positive and negative conclusions under confident coefficient alpha. With the model, we had analyzed the TQSMS similarities of pharmacokinetics of three ingredients in Buyanghuanwu decoction and of three data analytical methods for them were at range of 0.3852-0.9875 that illuminated different pharmacokinetic behaviors of each other; and the TQSMS similarities (ST) of chromatographic fingerprint for various extracts with different solubility parameter solvents dissolving Buyanghuanwu-decoction-extract were at range of 0.6842-0.999 2 that showed different constituents with various solvent extracts. The TQSMSS can characterize the sample similarity, by which we can quantitate the correct probability with the test of power under to make positive and negative conclusions no matter the samples come from same population under confident coefficient a or not, by which we can realize an analysis at both macroscopic and microcosmic levels, as an important similar analytical method for medical theoretical research.

  6. Competing risk models in reliability systems, an exponential distribution model with Bayesian analysis approach

    NASA Astrophysics Data System (ADS)

    Iskandar, I.

    2018-03-01

    The exponential distribution is the most widely used reliability analysis. This distribution is very suitable for representing the lengths of life of many cases and is available in a simple statistical form. The characteristic of this distribution is a constant hazard rate. The exponential distribution is the lower rank of the Weibull distributions. In this paper our effort is to introduce the basic notions that constitute an exponential competing risks model in reliability analysis using Bayesian analysis approach and presenting their analytic methods. The cases are limited to the models with independent causes of failure. A non-informative prior distribution is used in our analysis. This model describes the likelihood function and follows with the description of the posterior function and the estimations of the point, interval, hazard function, and reliability. The net probability of failure if only one specific risk is present, crude probability of failure due to a specific risk in the presence of other causes, and partial crude probabilities are also included.

  7. Rates of profit as correlated sums of random variables

    NASA Astrophysics Data System (ADS)

    Greenblatt, R. E.

    2013-10-01

    Profit realization is the dominant feature of market-based economic systems, determining their dynamics to a large extent. Rather than attaining an equilibrium, profit rates vary widely across firms, and the variation persists over time. Differing definitions of profit result in differing empirical distributions. To study the statistical properties of profit rates, I used data from a publicly available database for the US Economy for 2009-2010 (Risk Management Association). For each of three profit rate measures, the sample space consists of 771 points. Each point represents aggregate data from a small number of US manufacturing firms of similar size and type (NAICS code of principal product). When comparing the empirical distributions of profit rates, significant ‘heavy tails’ were observed, corresponding principally to a number of firms with larger profit rates than would be expected from simple models. An apparently novel correlated sum of random variables statistical model was used to model the data. In the case of operating and net profit rates, a number of firms show negative profits (losses), ruling out simple gamma or lognormal distributions as complete models for these data.

  8. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

    PubMed

    Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

    2015-11-15

    High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.

  9. Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

    PubMed Central

    Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

    2015-01-01

    Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307

  10. Stochastic modelling of non-stationary financial assets

    NASA Astrophysics Data System (ADS)

    Estevens, Joana; Rocha, Paulo; Boto, João P.; Lind, Pedro G.

    2017-11-01

    We model non-stationary volume-price distributions with a log-normal distribution and collect the time series of its two parameters. The time series of the two parameters are shown to be stationary and Markov-like and consequently can be modelled with Langevin equations, which are derived directly from their series of values. Having the evolution equations of the log-normal parameters, we reconstruct the statistics of the first moments of volume-price distributions which fit well the empirical data. Finally, the proposed framework is general enough to study other non-stationary stochastic variables in other research fields, namely, biology, medicine, and geology.

  11. Considering inventory distributions in a stochastic periodic inventory routing system

    NASA Astrophysics Data System (ADS)

    Yadollahi, Ehsan; Aghezzaf, El-Houssaine

    2017-07-01

    Dealing with the stochasticity of parameters is one of the critical issues in business and industry nowadays. Supply chain planners have difficulties in forecasting stochastic parameters of a distribution system. Demand rates of customers during their lead time are one of these parameters. In addition, holding a huge level of inventory at the retailers is costly and inefficient. To cover the uncertainty of forecasting demand rates, researchers have proposed the usage of safety stock to avoid stock-out. However, finding the precise level of safety stock depends on forecasting the statistical distribution of demand rates and their variations in different settings among the planning horizon. In this paper the demand rate distributions and its parameters are taken into account for each time period in a stochastic periodic IRP. An analysis of the achieved statistical distribution of the inventory and safety stock level is provided to measure the effects of input parameters on the output indicators. Different values for coefficient of variation are applied to the customers' demand rate in the optimization model. The outcome of the deterministic equivalent model of SPIRP is simulated in form of an illustrative case.

  12. Dependence of Microlensing on Source Size and Lens Mass

    NASA Astrophysics Data System (ADS)

    Congdon, A. B.; Keeton, C. R.

    2007-11-01

    In gravitational lensed quasars, the magnification of an image depends on the configuration of stars in the lensing galaxy. We study the statistics of the magnification distribution for random star fields. The width of the distribution characterizes the amount by which the observed magnification is likely to differ from models in which the mass is smoothly distributed. We use numerical simulations to explore how the width of the magnification distribution depends on the mass function of stars, and on the size of the source quasar. We then propose a semi-analytic model to describe the distribution width for different source sizes and stellar mass functions.

  13. Biomechanical influence of crown-to-implant ratio on stress distribution over internal hexagon short implant: 3-D finite element analysis with statistical test.

    PubMed

    Ramos Verri, Fellippo; Santiago Junior, Joel Ferreira; de Faria Almeida, Daniel Augusto; de Oliveira, Guilherme Bérgamo Brandão; de Souza Batista, Victor Eduardo; Marques Honório, Heitor; Noritomi, Pedro Yoshito; Pellizzer, Eduardo Piza

    2015-01-02

    The study of short implants is relevant to the biomechanics of dental implants, and research on crown increase has implications for the daily clinic. The aim of this study was to analyze the biomechanical interactions of a singular implant-supported prosthesis of different crown heights under vertical and oblique force, using the 3-D finite element method. Six 3-D models were designed with Invesalius 3.0, Rhinoceros 3D 4.0, and Solidworks 2010 software. Each model was constructed with a mandibular segment of bone block, including an implant supporting a screwed metal-ceramic crown. The crown height was set at 10, 12.5, and 15 mm. The applied force was 200 N (axial) and 100 N (oblique). We performed an ANOVA statistical test and Tukey tests; p<0.05 was considered statistically significant. The increase of crown height did not influence the stress distribution on screw prosthetic (p>0.05) under axial load. However, crown heights of 12.5 and 15 mm caused statistically significant damage to the stress distribution of screws and to the cortical bone (p<0.001) under oblique load. High crown to implant (C/I) ratio harmed microstrain distribution on bone tissue under axial and oblique loads (p<0.001). Crown increase was a possible deleterious factor to the screws and to the different regions of bone tissue. Copyright © 2014 Elsevier Ltd. All rights reserved.

  14. An improved approach for flight readiness certification: Probabilistic models for flaw propagation and turbine blade failure. Volume 1: Methodology and applications

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflight systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for designs failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.

  15. An improved approach for flight readiness certification: Probabilistic models for flaw propagation and turbine blade failure. Volume 2: Software documentation

    NASA Technical Reports Server (NTRS)

    Moore, N. R.; Ebbeler, D. H.; Newlin, L. E.; Sutharshana, S.; Creager, M.

    1992-01-01

    An improved methodology for quantitatively evaluating failure risk of spaceflights systems to assess flight readiness and identify risk control measures is presented. This methodology, called Probabilistic Failure Assessment (PFA), combines operating experience from tests and flights with analytical modeling of failure phenomena to estimate failure risk. The PFA methodology is of particular value when information on which to base an assessment of failure risk, including test experience and knowledge of parameters used in analytical modeling, is expensive or difficult to acquire. The PFA methodology is a prescribed statistical structure in which analytical models that characterize failure phenomena are used conjointly with uncertainties about analysis parameters and/or modeling accuracy to estimate failure probability distributions for specific failure modes. These distributions can then be modified, by means of statistical procedures of the PFA methodology, to reflect any test or flight experience. State-of-the-art analytical models currently employed for design, failure prediction, or performance analysis are used in this methodology. The rationale for the statistical approach taken in the PFA methodology is discussed, the PFA methodology is described, and examples of its application to structural failure modes are presented. The engineering models and computer software used in fatigue crack growth and fatigue crack initiation applications are thoroughly documented.

  16. Optimizing Sensing: From Water to the Web

    DTIC Science & Technology

    2009-05-01

    e.g., when applied to busi - ness practices, the Pareto Principle says that “80% of your sales come from 20% of your clients.” In water distribution...of this challenge included a realistic model of a real metropolitan area water distribution network (Figure 4(a)) with 12,527 nodes, as well as a...we learn a probabilistic model P (Y,XV) from training data collected by [37] from 20 users. This model encodes the statistical dependencies between

  17. Review of Statistical Methods for Analysing Healthcare Resources and Costs

    PubMed Central

    Mihaylova, Borislava; Briggs, Andrew; O'Hagan, Anthony; Thompson, Simon G

    2011-01-01

    We review statistical methods for analysing healthcare resource use and costs, their ability to address skewness, excess zeros, multimodality and heavy right tails, and their ease for general use. We aim to provide guidance on analysing resource use and costs focusing on randomised trials, although methods often have wider applicability. Twelve broad categories of methods were identified: (I) methods based on the normal distribution, (II) methods following transformation of data, (III) single-distribution generalized linear models (GLMs), (IV) parametric models based on skewed distributions outside the GLM family, (V) models based on mixtures of parametric distributions, (VI) two (or multi)-part and Tobit models, (VII) survival methods, (VIII) non-parametric methods, (IX) methods based on truncation or trimming of data, (X) data components models, (XI) methods based on averaging across models, and (XII) Markov chain methods. Based on this review, our recommendations are that, first, simple methods are preferred in large samples where the near-normality of sample means is assured. Second, in somewhat smaller samples, relatively simple methods, able to deal with one or two of above data characteristics, may be preferable but checking sensitivity to assumptions is necessary. Finally, some more complex methods hold promise, but are relatively untried; their implementation requires substantial expertise and they are not currently recommended for wider applied work. Copyright © 2010 John Wiley & Sons, Ltd. PMID:20799344

  18. Statistical characteristics of trajectories of diamagnetic unicellular organisms in a magnetic field.

    PubMed

    Gorobets, Yu I; Gorobets, O Yu

    2015-01-01

    The statistical model is proposed in this paper for description of orientation of trajectories of unicellular diamagnetic organisms in a magnetic field. The statistical parameter such as the effective energy is calculated on basis of this model. The resulting effective energy is the statistical characteristics of trajectories of diamagnetic microorganisms in a magnetic field connected with their metabolism. The statistical model is applicable for the case when the energy of the thermal motion of bacteria is negligible in comparison with their energy in a magnetic field and the bacteria manifest the significant "active random movement", i.e. there is the randomizing motion of the bacteria of non thermal nature, for example, movement of bacteria by means of flagellum. The energy of the randomizing active self-motion of bacteria is characterized by the new statistical parameter for biological objects. The parameter replaces the energy of the randomizing thermal motion in calculation of the statistical distribution. Copyright © 2014 Elsevier Ltd. All rights reserved.

  19. A κ-generalized statistical mechanics approach to income analysis

    NASA Astrophysics Data System (ADS)

    Clementi, F.; Gallegati, M.; Kaniadakis, G.

    2009-02-01

    This paper proposes a statistical mechanics approach to the analysis of income distribution and inequality. A new distribution function, having its roots in the framework of κ-generalized statistics, is derived that is particularly suitable for describing the whole spectrum of incomes, from the low-middle income region up to the high income Pareto power-law regime. Analytical expressions for the shape, moments and some other basic statistical properties are given. Furthermore, several well-known econometric tools for measuring inequality, which all exist in a closed form, are considered. A method for parameter estimation is also discussed. The model is shown to fit remarkably well the data on personal income for the United States, and the analysis of inequality performed in terms of its parameters is revealed as very powerful.

  20. The forecasting of menstruation based on a state-space modeling of basal body temperature time series.

    PubMed

    Fukaya, Keiichi; Kawamori, Ai; Osada, Yutaka; Kitazawa, Masumi; Ishiguro, Makio

    2017-09-20

    Women's basal body temperature (BBT) shows a periodic pattern that associates with menstrual cycle. Although this fact suggests a possibility that daily BBT time series can be useful for estimating the underlying phase state as well as for predicting the length of current menstrual cycle, little attention has been paid to model BBT time series. In this study, we propose a state-space model that involves the menstrual phase as a latent state variable to explain the daily fluctuation of BBT and the menstruation cycle length. Conditional distributions of the phase are obtained by using sequential Bayesian filtering techniques. A predictive distribution of the next menstruation day can be derived based on this conditional distribution and the model, leading to a novel statistical framework that provides a sequentially updated prediction for upcoming menstruation day. We applied this framework to a real data set of women's BBT and menstruation days and compared prediction accuracy of the proposed method with that of previous methods, showing that the proposed method generally provides a better prediction. Because BBT can be obtained with relatively small cost and effort, the proposed method can be useful for women's health management. Potential extensions of this framework as the basis of modeling and predicting events that are associated with the menstrual cycles are discussed. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2017 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.

  1. Normality of raw data in general linear models: The most widespread myth in statistics

    USGS Publications Warehouse

    Kery, Marc; Hatfield, Jeff S.

    2003-01-01

    In years of statistical consulting for ecologists and wildlife biologists, by far the most common misconception we have come across has been the one about normality in general linear models. These comprise a very large part of the statistical models used in ecology and include t tests, simple and multiple linear regression, polynomial regression, and analysis of variance (ANOVA) and covariance (ANCOVA). There is a widely held belief that the normality assumption pertains to the raw data rather than to the model residuals. We suspect that this error may also occur in countless published studies, whenever the normality assumption is tested prior to analysis. This may lead to the use of nonparametric alternatives (if there are any), when parametric tests would indeed be appropriate, or to use of transformations of raw data, which may introduce hidden assumptions such as multiplicative effects on the natural scale in the case of log-transformed data. Our aim here is to dispel this myth. We very briefly describe relevant theory for two cases of general linear models to show that the residuals need to be normally distributed if tests requiring normality are to be used, such as t and F tests. We then give two examples demonstrating that the distribution of the response variable may be nonnormal, and yet the residuals are well behaved. We do not go into the issue of how to test normality; instead we display the distributions of response variables and residuals graphically.

  2. Uncertainty propagation for statistical impact prediction of space debris

    NASA Astrophysics Data System (ADS)

    Hoogendoorn, R.; Mooij, E.; Geul, J.

    2018-01-01

    Predictions of the impact time and location of space debris in a decaying trajectory are highly influenced by uncertainties. The traditional Monte Carlo (MC) method can be used to perform accurate statistical impact predictions, but requires a large computational effort. A method is investigated that directly propagates a Probability Density Function (PDF) in time, which has the potential to obtain more accurate results with less computational effort. The decaying trajectory of Delta-K rocket stages was used to test the methods using a six degrees-of-freedom state model. The PDF of the state of the body was propagated in time to obtain impact-time distributions. This Direct PDF Propagation (DPP) method results in a multi-dimensional scattered dataset of the PDF of the state, which is highly challenging to process. No accurate results could be obtained, because of the structure of the DPP data and the high dimensionality. Therefore, the DPP method is less suitable for practical uncontrolled entry problems and the traditional MC method remains superior. Additionally, the MC method was used with two improved uncertainty models to obtain impact-time distributions, which were validated using observations of true impacts. For one of the two uncertainty models, statistically more valid impact-time distributions were obtained than in previous research.

  3. Bayesian inference on earthquake size distribution: a case study in Italy

    NASA Astrophysics Data System (ADS)

    Licia, Faenza; Carlo, Meletti; Laura, Sandri

    2010-05-01

    This paper is focused on the study of earthquake size statistical distribution by using Bayesian inference. The strategy consists in the definition of an a priori distribution based on instrumental seismicity, and modeled as a power law distribution. By using the observed historical data, the power law is then modified in order to obtain the posterior distribution. The aim of this paper is to define the earthquake size distribution using all the seismic database available (i.e., instrumental and historical catalogs) and a robust statistical technique. We apply this methodology to the Italian seismicity, dividing the territory in source zones as done for the seismic hazard assessment, taken here as a reference model. The results suggest that each area has its own peculiar trend: while the power law is able to capture the mean aspect of the earthquake size distribution, the posterior emphasizes different slopes in different areas. Our results are in general agreement with the ones used in the seismic hazard assessment in Italy. However, there are areas in which a flattening in the curve is shown, meaning a significant departure from the power law behavior and implying that there are some local aspects that a power law distribution is not able to capture.

  4. Modeling error distributions of growth curve models through Bayesian methods.

    PubMed

    Zhang, Zhiyong

    2016-06-01

    Growth curve models are widely used in social and behavioral sciences. However, typical growth curve models often assume that the errors are normally distributed although non-normal data may be even more common than normal data. In order to avoid possible statistical inference problems in blindly assuming normality, a general Bayesian framework is proposed to flexibly model normal and non-normal data through the explicit specification of the error distributions. A simulation study shows when the distribution of the error is correctly specified, one can avoid the loss in the efficiency of standard error estimates. A real example on the analysis of mathematical ability growth data from the Early Childhood Longitudinal Study, Kindergarten Class of 1998-99 is used to show the application of the proposed methods. Instructions and code on how to conduct growth curve analysis with both normal and non-normal error distributions using the the MCMC procedure of SAS are provided.

  5. An ensemble Kalman filter for statistical estimation of physics constrained nonlinear regression models

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Harlim, John, E-mail: jharlim@psu.edu; Mahdi, Adam, E-mail: amahdi@ncsu.edu; Majda, Andrew J., E-mail: jonjon@cims.nyu.edu

    2014-01-15

    A central issue in contemporary science is the development of nonlinear data driven statistical–dynamical models for time series of noisy partial observations from nature or a complex model. It has been established recently that ad-hoc quadratic multi-level regression models can have finite-time blow-up of statistical solutions and/or pathological behavior of their invariant measure. Recently, a new class of physics constrained nonlinear regression models were developed to ameliorate this pathological behavior. Here a new finite ensemble Kalman filtering algorithm is developed for estimating the state, the linear and nonlinear model coefficients, the model and the observation noise covariances from available partialmore » noisy observations of the state. Several stringent tests and applications of the method are developed here. In the most complex application, the perfect model has 57 degrees of freedom involving a zonal (east–west) jet, two topographic Rossby waves, and 54 nonlinearly interacting Rossby waves; the perfect model has significant non-Gaussian statistics in the zonal jet with blocked and unblocked regimes and a non-Gaussian skewed distribution due to interaction with the other 56 modes. We only observe the zonal jet contaminated by noise and apply the ensemble filter algorithm for estimation. Numerically, we find that a three dimensional nonlinear stochastic model with one level of memory mimics the statistical effect of the other 56 modes on the zonal jet in an accurate fashion, including the skew non-Gaussian distribution and autocorrelation decay. On the other hand, a similar stochastic model with zero memory levels fails to capture the crucial non-Gaussian behavior of the zonal jet from the perfect 57-mode model.« less

  6. Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes.

    PubMed

    Lord, Dominique; Guikema, Seth D; Geedipally, Srinivas Reddy

    2008-05-01

    This paper documents the application of the Conway-Maxwell-Poisson (COM-Poisson) generalized linear model (GLM) for modeling motor vehicle crashes. The COM-Poisson distribution, originally developed in 1962, has recently been re-introduced by statisticians for analyzing count data subjected to over- and under-dispersion. This innovative distribution is an extension of the Poisson distribution. The objectives of this study were to evaluate the application of the COM-Poisson GLM for analyzing motor vehicle crashes and compare the results with the traditional negative binomial (NB) model. The comparison analysis was carried out using the most common functional forms employed by transportation safety analysts, which link crashes to the entering flows at intersections or on segments. To accomplish the objectives of the study, several NB and COM-Poisson GLMs were developed and compared using two datasets. The first dataset contained crash data collected at signalized four-legged intersections in Toronto, Ont. The second dataset included data collected for rural four-lane divided and undivided highways in Texas. Several methods were used to assess the statistical fit and predictive performance of the models. The results of this study show that COM-Poisson GLMs perform as well as NB models in terms of GOF statistics and predictive performance. Given the fact the COM-Poisson distribution can also handle under-dispersed data (while the NB distribution cannot or has difficulties converging), which have sometimes been observed in crash databases, the COM-Poisson GLM offers a better alternative over the NB model for modeling motor vehicle crashes, especially given the important limitations recently documented in the safety literature about the latter type of model.

  7. Statistical parameters of random heterogeneity estimated by analysing coda waves based on finite difference method

    NASA Astrophysics Data System (ADS)

    Emoto, K.; Saito, T.; Shiomi, K.

    2017-12-01

    Short-period (<1 s) seismograms are strongly affected by small-scale (<10 km) heterogeneities in the lithosphere. In general, short-period seismograms are analysed based on the statistical method by considering the interaction between seismic waves and randomly distributed small-scale heterogeneities. Statistical properties of the random heterogeneities have been estimated by analysing short-period seismograms. However, generally, the small-scale random heterogeneity is not taken into account for the modelling of long-period (>2 s) seismograms. We found that the energy of the coda of long-period seismograms shows a spatially flat distribution. This phenomenon is well known in short-period seismograms and results from the scattering by small-scale heterogeneities. We estimate the statistical parameters that characterize the small-scale random heterogeneity by modelling the spatiotemporal energy distribution of long-period seismograms. We analyse three moderate-size earthquakes that occurred in southwest Japan. We calculate the spatial distribution of the energy density recorded by a dense seismograph network in Japan at the period bands of 8-16 s, 4-8 s and 2-4 s and model them by using 3-D finite difference (FD) simulations. Compared to conventional methods based on statistical theories, we can calculate more realistic synthetics by using the FD simulation. It is not necessary to assume a uniform background velocity, body or surface waves and scattering properties considered in general scattering theories. By taking the ratio of the energy of the coda area to that of the entire area, we can separately estimate the scattering and the intrinsic absorption effects. Our result reveals the spectrum of the random inhomogeneity in a wide wavenumber range including the intensity around the corner wavenumber as P(m) = 8πε2a3/(1 + a2m2)2, where ε = 0.05 and a = 3.1 km, even though past studies analysing higher-frequency records could not detect the corner. Finally, we estimate the intrinsic attenuation by modelling the decay rate of the energy. The method proposed in this study is suitable for quantifying the statistical properties of long-wavelength subsurface random inhomogeneity, which leads the way to characterizing a wider wavenumber range of spectra, including the corner wavenumber.

  8. Development and evaluation of statistical shape modeling for principal inner organs on torso CT images.

    PubMed

    Zhou, Xiangrong; Xu, Rui; Hara, Takeshi; Hirano, Yasushi; Yokoyama, Ryujiro; Kanematsu, Masayuki; Hoshi, Hiroaki; Kido, Shoji; Fujita, Hiroshi

    2014-07-01

    The shapes of the inner organs are important information for medical image analysis. Statistical shape modeling provides a way of quantifying and measuring shape variations of the inner organs in different patients. In this study, we developed a universal scheme that can be used for building the statistical shape models for different inner organs efficiently. This scheme combines the traditional point distribution modeling with a group-wise optimization method based on a measure called minimum description length to provide a practical means for 3D organ shape modeling. In experiments, the proposed scheme was applied to the building of five statistical shape models for hearts, livers, spleens, and right and left kidneys by use of 50 cases of 3D torso CT images. The performance of these models was evaluated by three measures: model compactness, model generalization, and model specificity. The experimental results showed that the constructed shape models have good "compactness" and satisfied the "generalization" performance for different organ shape representations; however, the "specificity" of these models should be improved in the future.

  9. Semantic Coherence Facilitates Distributional Learning.

    PubMed

    Ouyang, Long; Boroditsky, Lera; Frank, Michael C

    2017-04-01

    Computational models have shown that purely statistical knowledge about words' linguistic contexts is sufficient to learn many properties of words, including syntactic and semantic category. For example, models can infer that "postman" and "mailman" are semantically similar because they have quantitatively similar patterns of association with other words (e.g., they both tend to occur with words like "deliver," "truck," "package"). In contrast to these computational results, artificial language learning experiments suggest that distributional statistics alone do not facilitate learning of linguistic categories. However, experiments in this paradigm expose participants to entirely novel words, whereas real language learners encounter input that contains some known words that are semantically organized. In three experiments, we show that (a) the presence of familiar semantic reference points facilitates distributional learning and (b) this effect crucially depends both on the presence of known words and the adherence of these known words to some semantic organization. Copyright © 2016 Cognitive Science Society, Inc.

  10. Robust approaches to quantification of margin and uncertainty for sparse data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hund, Lauren; Schroeder, Benjamin B.; Rumsey, Kelin

    Characterizing the tails of probability distributions plays a key role in quantification of margins and uncertainties (QMU), where the goal is characterization of low probability, high consequence events based on continuous measures of performance. When data are collected using physical experimentation, probability distributions are typically fit using statistical methods based on the collected data, and these parametric distributional assumptions are often used to extrapolate about the extreme tail behavior of the underlying probability distribution. In this project, we character- ize the risk associated with such tail extrapolation. Specifically, we conducted a scaling study to demonstrate the large magnitude of themore » risk; then, we developed new methods for communicat- ing risk associated with tail extrapolation from unvalidated statistical models; lastly, we proposed a Bayesian data-integration framework to mitigate tail extrapolation risk through integrating ad- ditional information. We conclude that decision-making using QMU is a complex process that cannot be achieved using statistical analyses alone.« less

  11. Statistical Compression for Climate Model Output

    NASA Astrophysics Data System (ADS)

    Hammerling, D.; Guinness, J.; Soh, Y. J.

    2017-12-01

    Numerical climate model simulations run at high spatial and temporal resolutions generate massive quantities of data. As our computing capabilities continue to increase, storing all of the data is not sustainable, and thus is it important to develop methods for representing the full datasets by smaller compressed versions. We propose a statistical compression and decompression algorithm based on storing a set of summary statistics as well as a statistical model describing the conditional distribution of the full dataset given the summary statistics. We decompress the data by computing conditional expectations and conditional simulations from the model given the summary statistics. Conditional expectations represent our best estimate of the original data but are subject to oversmoothing in space and time. Conditional simulations introduce realistic small-scale noise so that the decompressed fields are neither too smooth nor too rough compared with the original data. Considerable attention is paid to accurately modeling the original dataset-one year of daily mean temperature data-particularly with regard to the inherent spatial nonstationarity in global fields, and to determining the statistics to be stored, so that the variation in the original data can be closely captured, while allowing for fast decompression and conditional emulation on modest computers.

  12. A Performance Comparison on the Probability Plot Correlation Coefficient Test using Several Plotting Positions for GEV Distribution.

    NASA Astrophysics Data System (ADS)

    Ahn, Hyunjun; Jung, Younghun; Om, Ju-Seong; Heo, Jun-Haeng

    2014-05-01

    It is very important to select the probability distribution in Statistical hydrology. Goodness of fit test is a statistical method that selects an appropriate probability model for a given data. The probability plot correlation coefficient (PPCC) test as one of the goodness of fit tests was originally developed for normal distribution. Since then, this test has been widely applied to other probability models. The PPCC test is known as one of the best goodness of fit test because it shows higher rejection powers among them. In this study, we focus on the PPCC tests for the GEV distribution which is widely used in the world. For the GEV model, several plotting position formulas are suggested. However, the PPCC statistics are derived only for the plotting position formulas (Goel and De, In-na and Nguyen, and Kim et al.) in which the skewness coefficient (or shape parameter) are included. And then the regression equations are derived as a function of the shape parameter and sample size for a given significance level. In addition, the rejection powers of these formulas are compared using Monte-Carlo simulation. Keywords: Goodness-of-fit test, Probability plot correlation coefficient test, Plotting position, Monte-Carlo Simulation ACKNOWLEDGEMENTS This research was supported by a grant 'Establishing Active Disaster Management System of Flood Control Structures by using 3D BIM Technique' [NEMA-12-NH-57] from the Natural Hazard Mitigation Research Group, National Emergency Management Agency of Korea.

  13. Reaction Event Counting Statistics of Biopolymer Reaction Systems with Dynamic Heterogeneity.

    PubMed

    Lim, Yu Rim; Park, Seong Jun; Park, Bo Jung; Cao, Jianshu; Silbey, Robert J; Sung, Jaeyoung

    2012-04-10

    We investigate the reaction event counting statistics (RECS) of an elementary biopolymer reaction in which the rate coefficient is dependent on states of the biopolymer and the surrounding environment and discover a universal kinetic phase transition in the RECS of the reaction system with dynamic heterogeneity. From an exact analysis for a general model of elementary biopolymer reactions, we find that the variance in the number of reaction events is dependent on the square of the mean number of the reaction events when the size of measurement time is small on the relaxation time scale of rate coefficient fluctuations, which does not conform to renewal statistics. On the other hand, when the size of the measurement time interval is much greater than the relaxation time of rate coefficient fluctuations, the variance becomes linearly proportional to the mean reaction number in accordance with renewal statistics. Gillespie's stochastic simulation method is generalized for the reaction system with a rate coefficient fluctuation. The simulation results confirm the correctness of the analytic results for the time dependent mean and variance of the reaction event number distribution. On the basis of the obtained results, we propose a method of quantitative analysis for the reaction event counting statistics of reaction systems with rate coefficient fluctuations, which enables one to extract information about the magnitude and the relaxation times of the fluctuating reaction rate coefficient, without a bias that can be introduced by assuming a particular kinetic model of conformational dynamics and the conformation dependent reactivity. An exact relationship is established between a higher moment of the reaction event number distribution and the multitime correlation of the reaction rate for the reaction system with a nonequilibrium initial state distribution as well as for the system with the equilibrium initial state distribution.

  14. Numerical solutions of ideal quantum gas dynamical flows governed by semiclassical ellipsoidal-statistical distribution

    PubMed Central

    Yang, Jaw-Yen; Yan, Chih-Yuan; Diaz, Manuel; Huang, Juan-Chen; Li, Zhihui; Zhang, Hanxin

    2014-01-01

    The ideal quantum gas dynamics as manifested by the semiclassical ellipsoidal-statistical (ES) equilibrium distribution derived in Wu et al. (Wu et al. 2012 Proc. R. Soc. A 468, 1799–1823 (doi:10.1098/rspa.2011.0673)) is numerically studied for particles of three statistics. This anisotropic ES equilibrium distribution was derived using the maximum entropy principle and conserves the mass, momentum and energy, but differs from the standard Fermi–Dirac or Bose–Einstein distribution. The present numerical method combines the discrete velocity (or momentum) ordinate method in momentum space and the high-resolution shock-capturing method in physical space. A decoding procedure to obtain the necessary parameters for determining the ES distribution is also devised. Computations of two-dimensional Riemann problems are presented, and various contours of the quantities unique to this ES model are illustrated. The main flow features, such as shock waves, expansion waves and slip lines and their complex nonlinear interactions, are depicted and found to be consistent with existing calculations for a classical gas. PMID:24399919

  15. ASYMPTOTIC DISTRIBUTION OF ΔAUC, NRIs, AND IDI BASED ON THEORY OF U-STATISTICS

    PubMed Central

    Demler, Olga V.; Pencina, Michael J.; Cook, Nancy R.; D’Agostino, Ralph B.

    2017-01-01

    The change in AUC (ΔAUC), the IDI, and NRI are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues we unite the ΔAUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ΔAUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ΔAUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ΔAUC, NRIs, or IDI. In the former case SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ΔAUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ΔAUC. PMID:28627112

  16. Asymptotic distribution of ∆AUC, NRIs, and IDI based on theory of U-statistics.

    PubMed

    Demler, Olga V; Pencina, Michael J; Cook, Nancy R; D'Agostino, Ralph B

    2017-09-20

    The change in area under the curve (∆AUC), the integrated discrimination improvement (IDI), and net reclassification index (NRI) are commonly used measures of risk prediction model performance. Some authors have reported good validity of associated methods of estimating their standard errors (SE) and construction of confidence intervals, whereas others have questioned their performance. To address these issues, we unite the ∆AUC, IDI, and three versions of the NRI under the umbrella of the U-statistics family. We rigorously show that the asymptotic behavior of ∆AUC, NRIs, and IDI fits the asymptotic distribution theory developed for U-statistics. We prove that the ∆AUC, NRIs, and IDI are asymptotically normal, unless they compare nested models under the null hypothesis. In the latter case, asymptotic normality and existing SE estimates cannot be applied to ∆AUC, NRIs, or IDI. In the former case, SE formulas proposed in the literature are equivalent to SE formulas obtained from U-statistics theory if we ignore adjustment for estimated parameters. We use Sukhatme-Randles-deWet condition to determine when adjustment for estimated parameters is necessary. We show that adjustment is not necessary for SEs of the ∆AUC and two versions of the NRI when added predictor variables are significant and normally distributed. The SEs of the IDI and three-category NRI should always be adjusted for estimated parameters. These results allow us to define when existing formulas for SE estimates can be used and when resampling methods such as the bootstrap should be used instead when comparing nested models. We also use the U-statistic theory to develop a new SE estimate of ∆AUC. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.

  17. Climate Change and Runoff Statistics: a Process Study for the Rhine Basin using a coupled Climate-Runoff Model

    NASA Astrophysics Data System (ADS)

    Kleinn, J.; Frei, C.; Gurtz, J.; Vidale, P. L.; Schär, C.

    2003-04-01

    The consequences of extreme runoff and extreme water levels are within the most important weather induced natural hazards. The question about the impact of a global climate change on the runoff regime, especially on the frequency of floods, is of utmost importance. In winter-time, two possible climate effects could influence the runoff statistis of large Central European rivers: the shift from snowfall to rain as a consequence of higher temperatures and the increase of heavy precipitation events due to an intensification of the hydrological cycle. The combined effect on the runoff statistics is examined in this study for the river Rhine. To this end, sensitivity experiments with a model chain including a regional climate model and a distributed runoff model are presented. The experiments are based on an idealized surrogate climate change scenario which stipulates a uniform increase in temperature by 2 Kelvin and an increase in atmospheric specific humidity by 15% (resulting from unchanged relative humidity) in the forcing fields for the regional climate model. The regional climate model CHRM is based on the mesoscale weather prediction model HRM of the German Weather Service (DWD) and has been adapted for climate simulations. The model is being used in a nested mode with horizontal resolutions of 56 km and 14 km. The boundary conditions are taken from the original ECMWF reanalysis and from a modified version representing the surrogate scenario. The distributed runoff model (WaSiM) is used at a horizontal resolution of 1 km for the whole Rhine basin down to Cologne. The coupling of the models is provided by a downscaling of the climate model fields (precipitaion, temperature, radiation, humidity, and wind) to the resolution of the distributed runoff model. The simulations cover the period of September 1987 to January 1994 with a special emphasis on the five winter seasons 1989/90 until 1993/94, each from November until January. A detailed validation of the control simulation shows a good correspondence of the precipitation fields from the regional climate model with measured fields regarding the distribution of precipitation at the scale of the Rhine basin. Systematic errors are visible at the scale of single subcatchements, in the altitudinal distribution and in the frequency distribution of precipitation. These errors only marginally affect the runoff simulations, which show good correspondence with runoff observations. The presentation includes results from the scenario simulations for the whole basin as well as for Alpine and lowland subcatchements. The change in the runoff statistics is being analyzed with respect to the changes in snowfall and to the fequency distribution of precipitation.

  18. Non-parametric model selection for subject-specific topological organization of resting-state functional connectivity.

    PubMed

    Ferrarini, Luca; Veer, Ilya M; van Lew, Baldur; Oei, Nicole Y L; van Buchem, Mark A; Reiber, Johan H C; Rombouts, Serge A R B; Milles, J

    2011-06-01

    In recent years, graph theory has been successfully applied to study functional and anatomical connectivity networks in the human brain. Most of these networks have shown small-world topological characteristics: high efficiency in long distance communication between nodes, combined with highly interconnected local clusters of nodes. Moreover, functional studies performed at high resolutions have presented convincing evidence that resting-state functional connectivity networks exhibits (exponentially truncated) scale-free behavior. Such evidence, however, was mostly presented qualitatively, in terms of linear regressions of the degree distributions on log-log plots. Even when quantitative measures were given, these were usually limited to the r(2) correlation coefficient. However, the r(2) statistic is not an optimal estimator of explained variance, when dealing with (truncated) power-law models. Recent developments in statistics have introduced new non-parametric approaches, based on the Kolmogorov-Smirnov test, for the problem of model selection. In this work, we have built on this idea to statistically tackle the issue of model selection for the degree distribution of functional connectivity at rest. The analysis, performed at voxel level and in a subject-specific fashion, confirmed the superiority of a truncated power-law model, showing high consistency across subjects. Moreover, the most highly connected voxels were found to be consistently part of the default mode network. Our results provide statistically sound support to the evidence previously presented in literature for a truncated power-law model of resting-state functional connectivity. Copyright © 2010 Elsevier Inc. All rights reserved.

  19. Longitudinal analysis of the strengths and difficulties questionnaire scores of the Millennium Cohort Study children in England using M-quantile random-effects regression.

    PubMed

    Tzavidis, Nikos; Salvati, Nicola; Schmid, Timo; Flouri, Eirini; Midouhas, Emily

    2016-02-01

    Multilevel modelling is a popular approach for longitudinal data analysis. Statistical models conventionally target a parameter at the centre of a distribution. However, when the distribution of the data is asymmetric, modelling other location parameters, e.g. percentiles, may be more informative. We present a new approach, M -quantile random-effects regression, for modelling multilevel data. The proposed method is used for modelling location parameters of the distribution of the strengths and difficulties questionnaire scores of children in England who participate in the Millennium Cohort Study. Quantile mixed models are also considered. The analyses offer insights to child psychologists about the differential effects of risk factors on children's outcomes.

  20. Evaluation of statistical models for forecast errors from the HBV model

    NASA Astrophysics Data System (ADS)

    Engeland, Kolbjørn; Renard, Benjamin; Steinsland, Ingelin; Kolberg, Sjur

    2010-04-01

    SummaryThree statistical models for the forecast errors for inflow into the Langvatn reservoir in Northern Norway have been constructed and tested according to the agreement between (i) the forecast distribution and the observations and (ii) median values of the forecast distribution and the observations. For the first model observed and forecasted inflows were transformed by the Box-Cox transformation before a first order auto-regressive model was constructed for the forecast errors. The parameters were conditioned on weather classes. In the second model the Normal Quantile Transformation (NQT) was applied on observed and forecasted inflows before a similar first order auto-regressive model was constructed for the forecast errors. For the third model positive and negative errors were modeled separately. The errors were first NQT-transformed before conditioning the mean error values on climate, forecasted inflow and yesterday's error. To test the three models we applied three criterions: we wanted (a) the forecast distribution to be reliable; (b) the forecast intervals to be narrow; (c) the median values of the forecast distribution to be close to the observed values. Models 1 and 2 gave almost identical results. The median values improved the forecast with Nash-Sutcliffe R eff increasing from 0.77 for the original forecast to 0.87 for the corrected forecasts. Models 1 and 2 over-estimated the forecast intervals but gave the narrowest intervals. Their main drawback was that the distributions are less reliable than Model 3. For Model 3 the median values did not fit well since the auto-correlation was not accounted for. Since Model 3 did not benefit from the potential variance reduction that lies in bias estimation and removal it gave on average wider forecasts intervals than the two other models. At the same time Model 3 on average slightly under-estimated the forecast intervals, probably explained by the use of average measures to evaluate the fit.

  1. Comparison of climate envelope models developed using expert-selected variables versus statistical selection

    USGS Publications Warehouse

    Brandt, Laura A.; Benscoter, Allison; Harvey, Rebecca G.; Speroterra, Carolina; Bucklin, David N.; Romañach, Stephanie; Watling, James I.; Mazzotti, Frank J.

    2017-01-01

    Climate envelope models are widely used to describe potential future distribution of species under different climate change scenarios. It is broadly recognized that there are both strengths and limitations to using climate envelope models and that outcomes are sensitive to initial assumptions, inputs, and modeling methods Selection of predictor variables, a central step in modeling, is one of the areas where different techniques can yield varying results. Selection of climate variables to use as predictors is often done using statistical approaches that develop correlations between occurrences and climate data. These approaches have received criticism in that they rely on the statistical properties of the data rather than directly incorporating biological information about species responses to temperature and precipitation. We evaluated and compared models and prediction maps for 15 threatened or endangered species in Florida based on two variable selection techniques: expert opinion and a statistical method. We compared model performance between these two approaches for contemporary predictions, and the spatial correlation, spatial overlap and area predicted for contemporary and future climate predictions. In general, experts identified more variables as being important than the statistical method and there was low overlap in the variable sets (<40%) between the two methods Despite these differences in variable sets (expert versus statistical), models had high performance metrics (>0.9 for area under the curve (AUC) and >0.7 for true skill statistic (TSS). Spatial overlap, which compares the spatial configuration between maps constructed using the different variable selection techniques, was only moderate overall (about 60%), with a great deal of variability across species. Difference in spatial overlap was even greater under future climate projections, indicating additional divergence of model outputs from different variable selection techniques. Our work is in agreement with other studies which have found that for broad-scale species distribution modeling, using statistical methods of variable selection is a useful first step, especially when there is a need to model a large number of species or expert knowledge of the species is limited. Expert input can then be used to refine models that seem unrealistic or for species that experts believe are particularly sensitive to change. It also emphasizes the importance of using multiple models to reduce uncertainty and improve map outputs for conservation planning. Where outputs overlap or show the same direction of change there is greater certainty in the predictions. Areas of disagreement can be used for learning by asking why the models do not agree, and may highlight areas where additional on-the-ground data collection could improve the models.

  2. A methodology for the stochastic generation of hourly synthetic direct normal irradiation time series

    NASA Astrophysics Data System (ADS)

    Larrañeta, M.; Moreno-Tejera, S.; Lillo-Bravo, I.; Silva-Pérez, M. A.

    2018-02-01

    Many of the available solar radiation databases only provide global horizontal irradiance (GHI) while there is a growing need of extensive databases of direct normal radiation (DNI) mainly for the development of concentrated solar power and concentrated photovoltaic technologies. In the present work, we propose a methodology for the generation of synthetic DNI hourly data from the hourly average GHI values by dividing the irradiance into a deterministic and stochastic component intending to emulate the dynamics of the solar radiation. The deterministic component is modeled through a simple classical model. The stochastic component is fitted to measured data in order to maintain the consistency of the synthetic data with the state of the sky, generating statistically significant DNI data with a cumulative frequency distribution very similar to the measured data. The adaptation and application of the model to the location of Seville shows significant improvements in terms of frequency distribution over the classical models. The proposed methodology applied to other locations with different climatological characteristics better results than the classical models in terms of frequency distribution reaching a reduction of the 50% in the Finkelstein-Schafer (FS) and Kolmogorov-Smirnov test integral (KSI) statistics.

  3. A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.

    PubMed

    Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique

    2018-01-01

    This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.

  4. An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data.

    PubMed

    Jenkinson, Garrett; Abante, Jordi; Feinberg, Andrew P; Goutsias, John

    2018-03-07

    DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods.

  5. A climate-based multivariate extreme emulator of met-ocean-hydrological events for coastal flooding

    NASA Astrophysics Data System (ADS)

    Camus, Paula; Rueda, Ana; Mendez, Fernando J.; Tomas, Antonio; Del Jesus, Manuel; Losada, Iñigo J.

    2015-04-01

    Atmosphere-ocean general circulation models (AOGCMs) are useful to analyze large-scale climate variability (long-term historical periods, future climate projections). However, applications such as coastal flood modeling require climate information at finer scale. Besides, flooding events depend on multiple climate conditions: waves, surge levels from the open-ocean and river discharge caused by precipitation. Therefore, a multivariate statistical downscaling approach is adopted to reproduce relationships between variables and due to its low computational cost. The proposed method can be considered as a hybrid approach which combines a probabilistic weather type downscaling model with a stochastic weather generator component. Predictand distributions are reproduced modeling the relationship with AOGCM predictors based on a physical division in weather types (Camus et al., 2012). The multivariate dependence structure of the predictand (extreme events) is introduced linking the independent marginal distributions of the variables by a probabilistic copula regression (Ben Ayala et al., 2014). This hybrid approach is applied for the downscaling of AOGCM data to daily precipitation and maximum significant wave height and storm-surge in different locations along the Spanish coast. Reanalysis data is used to assess the proposed method. A commonly predictor for the three variables involved is classified using a regression-guided clustering algorithm. The most appropriate statistical model (general extreme value distribution, pareto distribution) for daily conditions is fitted. Stochastic simulation of the present climate is performed obtaining the set of hydraulic boundary conditions needed for high resolution coastal flood modeling. References: Camus, P., Menéndez, M., Méndez, F.J., Izaguirre, C., Espejo, A., Cánovas, V., Pérez, J., Rueda, A., Losada, I.J., Medina, R. (2014b). A weather-type statistical downscaling framework for ocean wave climate. Journal of Geophysical Research, doi: 10.1002/2014JC010141. Ben Ayala, M.A., Chebana, F., Ouarda, T.B.M.J. (2014). Probabilistic Gaussian Copula Regression Model for Multisite and Multivariable Downscaling, Journal of Climate, 27, 3331-3347.

  6. A Statistical Multimodel Ensemble Approach to Improving Long-Range Forecasting in Pakistan

    DTIC Science & Technology

    2012-03-01

    Impact of global warming on monsoon variability in Pakistan. J. Anim. Pl. Sci., 21, no. 1, 107–110. Gillies, S., T. Murphree, and D. Meyer, 2012...are generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The...generated by multiple regression models that relate globally distributed oceanic and atmospheric predictors to local predictands. The predictands are

  7. Computer program to minimize prediction error in models from experiments with 16 hypercube points and 0 to 6 center points

    NASA Technical Reports Server (NTRS)

    Holms, A. G.

    1982-01-01

    A previous report described a backward deletion procedure of model selection that was optimized for minimum prediction error and which used a multiparameter combination of the F - distribution and an order statistics distribution of Cochran's. A computer program is described that applies the previously optimized procedure to real data. The use of the program is illustrated by examples.

  8. Widening Disparity and its Suppression in a Stochastic Replicator Model

    NASA Astrophysics Data System (ADS)

    Sakaguchi, Hidetsugu

    2016-04-01

    Winner-take-all phenomena are observed in various competitive systems. We find similar phenomena in replicator models with randomly fluctuating growth rates. The disparity between winners and losers increases indefinitely, even if all elements are statistically equivalent. A lognormal distribution describes well the nonstationary time evolution. If a nonlinear load corresponding to progressive taxation is introduced, a stationary distribution is obtained and disparity widening is suppressed.

  9. A consistent framework for Horton regression statistics that leads to a modified Hack's law

    USGS Publications Warehouse

    Furey, P.R.; Troutman, B.M.

    2008-01-01

    A statistical framework is introduced that resolves important problems with the interpretation and use of traditional Horton regression statistics. The framework is based on a univariate regression model that leads to an alternative expression for Horton ratio, connects Horton regression statistics to distributional simple scaling, and improves the accuracy in estimating Horton plot parameters. The model is used to examine data for drainage area A and mainstream length L from two groups of basins located in different physiographic settings. Results show that confidence intervals for the Horton plot regression statistics are quite wide. Nonetheless, an analysis of covariance shows that regression intercepts, but not regression slopes, can be used to distinguish between basin groups. The univariate model is generalized to include n > 1 dependent variables. For the case where the dependent variables represent ln A and ln L, the generalized model performs somewhat better at distinguishing between basin groups than two separate univariate models. The generalized model leads to a modification of Hack's law where L depends on both A and Strahler order ??. Data show that ?? plays a statistically significant role in the modified Hack's law expression. ?? 2008 Elsevier B.V.

  10. A Third Moment Adjusted Test Statistic for Small Sample Factor Analysis

    PubMed Central

    Lin, Johnny; Bentler, Peter M.

    2012-01-01

    Goodness of fit testing in factor analysis is based on the assumption that the test statistic is asymptotically chi-square; but this property may not hold in small samples even when the factors and errors are normally distributed in the population. Robust methods such as Browne’s asymptotically distribution-free method and Satorra Bentler’s mean scaling statistic were developed under the presumption of non-normality in the factors and errors. This paper finds new application to the case where factors and errors are normally distributed in the population but the skewness of the obtained test statistic is still high due to sampling error in the observed indicators. An extension of Satorra Bentler’s statistic is proposed that not only scales the mean but also adjusts the degrees of freedom based on the skewness of the obtained test statistic in order to improve its robustness under small samples. A simple simulation study shows that this third moment adjusted statistic asymptotically performs on par with previously proposed methods, and at a very small sample size offers superior Type I error rates under a properly specified model. Data from Mardia, Kent and Bibby’s study of students tested for their ability in five content areas that were either open or closed book were used to illustrate the real-world performance of this statistic. PMID:23144511

  11. Nakagami-based total variation method for speckle reduction in thyroid ultrasound images.

    PubMed

    Koundal, Deepika; Gupta, Savita; Singh, Sukhwinder

    2016-02-01

    A good statistical model is necessary for the reduction in speckle noise. The Nakagami model is more general than the Rayleigh distribution for statistical modeling of speckle in ultrasound images. In this article, the Nakagami-based noise removal method is presented to enhance thyroid ultrasound images and to improve clinical diagnosis. The statistics of log-compressed image are derived from the Nakagami distribution following a maximum a posteriori estimation framework. The minimization problem is solved by optimizing an augmented Lagrange and Chambolle's projection method. The proposed method is evaluated on both artificial speckle-simulated and real ultrasound images. The experimental findings reveal the superiority of the proposed method both quantitatively and qualitatively in comparison with other speckle reduction methods reported in the literature. The proposed method yields an average signal-to-noise ratio gain of more than 2.16 dB over the non-convex regularizer-based speckle noise removal method, 3.83 dB over the Aubert-Aujol model, 1.71 dB over the Shi-Osher model and 3.21 dB over the Rudin-Lions-Osher model on speckle-simulated synthetic images. Furthermore, visual evaluation of the despeckled images shows that the proposed method suppresses speckle noise well while preserving the textures and fine details. © IMechE 2015.

  12. A Review of Multivariate Distributions for Count Data Derived from the Poisson Distribution.

    PubMed

    Inouye, David; Yang, Eunho; Allen, Genevera; Ravikumar, Pradeep

    2017-01-01

    The Poisson distribution has been widely studied and used for modeling univariate count-valued data. Multivariate generalizations of the Poisson distribution that permit dependencies, however, have been far less popular. Yet, real-world high-dimensional count-valued data found in word counts, genomics, and crime statistics, for example, exhibit rich dependencies, and motivate the need for multivariate distributions that can appropriately model this data. We review multivariate distributions derived from the univariate Poisson, categorizing these models into three main classes: 1) where the marginal distributions are Poisson, 2) where the joint distribution is a mixture of independent multivariate Poisson distributions, and 3) where the node-conditional distributions are derived from the Poisson. We discuss the development of multiple instances of these classes and compare the models in terms of interpretability and theory. Then, we empirically compare multiple models from each class on three real-world datasets that have varying data characteristics from different domains, namely traffic accident data, biological next generation sequencing data, and text data. These empirical experiments develop intuition about the comparative advantages and disadvantages of each class of multivariate distribution that was derived from the Poisson. Finally, we suggest new research directions as explored in the subsequent discussion section.

  13. An initial-abstraction, constant-loss model for unit hydrograph modeling for applicable watersheds in Texas

    USGS Publications Warehouse

    Asquith, William H.; Roussel, Meghan C.

    2007-01-01

    Estimation of representative hydrographs from design storms, which are known as design hydrographs, provides for cost-effective, riskmitigated design of drainage structures such as bridges, culverts, roadways, and other infrastructure. During 2001?07, the U.S. Geological Survey (USGS), in cooperation with the Texas Department of Transportation, investigated runoff hydrographs, design storms, unit hydrographs,and watershed-loss models to enhance design hydrograph estimation in Texas. Design hydrographs ideally should mimic the general volume, peak, and shape of observed runoff hydrographs. Design hydrographs commonly are estimated in part by unit hydrographs. A unit hydrograph is defined as the runoff hydrograph that results from a unit pulse of excess rainfall uniformly distributed over the watershed at a constant rate for a specific duration. A time-distributed, watershed-loss model is required for modeling by unit hydrographs. This report develops a specific time-distributed, watershed-loss model known as an initial-abstraction, constant-loss model. For this watershed-loss model, a watershed is conceptualized to have the capacity to store or abstract an absolute depth of rainfall at and near the beginning of a storm. Depths of total rainfall less than this initial abstraction do not produce runoff. The watershed also is conceptualized to have the capacity to remove rainfall at a constant rate (loss) after the initial abstraction is satisfied. Additional rainfall inputs after the initial abstraction is satisfied contribute to runoff if the rainfall rate (intensity) is larger than the constant loss. The initial abstraction, constant-loss model thus is a two-parameter model. The initial-abstraction, constant-loss model is investigated through detailed computational and statistical analysis of observed rainfall and runoff data for 92 USGS streamflow-gaging stations (watersheds) in Texas with contributing drainage areas from 0.26 to 166 square miles. The analysis is limited to a previously described, watershed-specific, gamma distribution model of the unit hydrograph. In particular, the initial-abstraction, constant-loss model is tuned to the gamma distribution model of the unit hydrograph. A complex computational analysis of observed rainfall and runoff for the 92 watersheds was done to determine, by storm, optimal values of initial abstraction and constant loss. Optimal parameter values for a given storm were defined as those values that produced a modeled runoff hydrograph with volume equal to the observed runoff hydrograph and also minimized the residual sum of squares of the two hydrographs. Subsequently, the means of the optimal parameters were computed on a watershed-specific basis. These means for each watershed are considered the most representative, are tabulated, and are used in further statistical analyses. Statistical analyses of watershed-specific, initial abstraction and constant loss include documentation of the distribution of each parameter using the generalized lambda distribution. The analyses show that watershed development has substantial influence on initial abstraction and limited influence on constant loss. The means and medians of the 92 watershed-specific parameters are tabulated with respect to watershed development; although they have considerable uncertainty, these parameters can be used for parameter prediction for ungaged watersheds. The statistical analyses of watershed-specific, initial abstraction and constant loss also include development of predictive procedures for estimation of each parameter for ungaged watersheds. Both regression equations and regression trees for estimation of initial abstraction and constant loss are provided. The watershed characteristics included in the regression analyses are (1) main-channel length, (2) a binary factor representing watershed development, (3) a binary factor representing watersheds with an abundance of rocky and thin-soiled terrain, and (4) curve numb

  14. The effect of noise-induced variance on parameter recovery from reaction times.

    PubMed

    Vadillo, Miguel A; Garaizar, Pablo

    2016-03-31

    Technical noise can compromise the precision and accuracy of the reaction times collected in psychological experiments, especially in the case of Internet-based studies. Although this noise seems to have only a small impact on traditional statistical analyses, its effects on model fit to reaction-time distributions remains unexplored. Across four simulations we study the impact of technical noise on parameter recovery from data generated from an ex-Gaussian distribution and from a Ratcliff Diffusion Model. Our results suggest that the impact of noise-induced variance tends to be limited to specific parameters and conditions. Although we encourage researchers to adopt all measures to reduce the impact of noise on reaction-time experiments, we conclude that the typical amount of noise-induced variance found in these experiments does not pose substantial problems for statistical analyses based on model fitting.

  15. Statistical interpretation of transient current power-law decay in colloidal quantum dot arrays

    NASA Astrophysics Data System (ADS)

    Sibatov, R. T.

    2011-08-01

    A new statistical model of the charge transport in colloidal quantum dot arrays is proposed. It takes into account Coulomb blockade forbidding multiple occupancy of nanocrystals and the influence of energetic disorder of interdot space. The model explains power-law current transients and the presence of the memory effect. The fractional differential analogue of the Ohm law is found phenomenologically for nanocrystal arrays. The model combines ideas that were considered as conflicting by other authors: the Scher-Montroll idea about the power-law distribution of waiting times in localized states for disordered semiconductors is applied taking into account Coulomb blockade; Novikov's condition about the asymptotic power-law distribution of time intervals between successful current pulses in conduction channels is fulfilled; and the carrier injection blocking predicted by Ginger and Greenham (2000 J. Appl. Phys. 87 1361) takes place.

  16. Development of a statistical model for cervical cancer cell death with irreversible electroporation in vitro.

    PubMed

    Yang, Yongji; Moser, Michael A J; Zhang, Edwin; Zhang, Wenjun; Zhang, Bing

    2018-01-01

    The aim of this study was to develop a statistical model for cell death by irreversible electroporation (IRE) and to show that the statistic model is more accurate than the electric field threshold model in the literature using cervical cancer cells in vitro. HeLa cell line was cultured and treated with different IRE protocols in order to obtain data for modeling the statistical relationship between the cell death and pulse-setting parameters. In total, 340 in vitro experiments were performed with a commercial IRE pulse system, including a pulse generator and an electric cuvette. Trypan blue staining technique was used to evaluate cell death after 4 hours of incubation following IRE treatment. Peleg-Fermi model was used in the study to build the statistical relationship using the cell viability data obtained from the in vitro experiments. A finite element model of IRE for the electric field distribution was also built. Comparison of ablation zones between the statistical model and electric threshold model (drawn from the finite element model) was used to show the accuracy of the proposed statistical model in the description of the ablation zone and its applicability in different pulse-setting parameters. The statistical models describing the relationships between HeLa cell death and pulse length and the number of pulses, respectively, were built. The values of the curve fitting parameters were obtained using the Peleg-Fermi model for the treatment of cervical cancer with IRE. The difference in the ablation zone between the statistical model and the electric threshold model was also illustrated to show the accuracy of the proposed statistical model in the representation of ablation zone in IRE. This study concluded that: (1) the proposed statistical model accurately described the ablation zone of IRE with cervical cancer cells, and was more accurate compared with the electric field model; (2) the proposed statistical model was able to estimate the value of electric field threshold for the computer simulation of IRE in the treatment of cervical cancer; and (3) the proposed statistical model was able to express the change in ablation zone with the change in pulse-setting parameters.

  17. Bayesian transformation cure frailty models with multivariate failure time data.

    PubMed

    Yin, Guosheng

    2008-12-10

    We propose a class of transformation cure frailty models to accommodate a survival fraction in multivariate failure time data. Established through a general power transformation, this family of cure frailty models includes the proportional hazards and the proportional odds modeling structures as two special cases. Within the Bayesian paradigm, we obtain the joint posterior distribution and the corresponding full conditional distributions of the model parameters for the implementation of Gibbs sampling. Model selection is based on the conditional predictive ordinate statistic and deviance information criterion. As an illustration, we apply the proposed method to a real data set from dentistry.

  18. Renewal models and coseismic stress transfer in the Corinth Gulf, Greece, fault system

    NASA Astrophysics Data System (ADS)

    Console, Rodolfo; Falcone, Giuseppe; Karakostas, Vassilis; Murru, Maura; Papadimitriou, Eleftheria; Rhoades, David

    2013-07-01

    model interevent times and Coulomb static stress transfer on the rupture segments along the Corinth Gulf extension zone, a region with a wealth of observations on strong-earthquake recurrence behavior. From the available information on past seismic activity, we have identified eight segments without significant overlapping that are aligned along the southern boundary of the Corinth rift. We aim to test if strong earthquakes on these segments are characterized by some kind of time-predictable behavior, rather than by complete randomness. The rationale for time-predictable behavior is based on the characteristic earthquake hypothesis, the necessary ingredients of which are a known faulting geometry and slip rate. The tectonic loading rate is characterized by slip of 6 mm/yr on the westernmost fault segment, diminishing to 4 mm/yr on the easternmost segment, based on the most reliable geodetic data. In this study, we employ statistical and physical modeling to account for stress transfer among these fault segments. The statistical modeling is based on the definition of a probability density distribution of the interevent times for each segment. Both the Brownian Passage-Time (BPT) and Weibull distributions are tested. The time-dependent hazard rate thus obtained is then modified by the inclusion of a permanent physical effect due to the Coulomb static stress change caused by failure of neighboring faults since the latest characteristic earthquake on the fault of interest. The validity of the renewal model is assessed retrospectively, using the data of the last 300 years, by comparison with a plain time-independent Poisson model, by means of statistical tools including the Relative Operating Characteristic diagram, the R-score, the probability gain and the log-likelihood ratio. We treat the uncertainties in the parameters of each examined fault source, such as linear dimensions, depth of the fault center, focal mechanism, recurrence time, coseismic slip, and aperiodicity of the statistical distribution, by a Monte Carlo technique. The Monte Carlo samples for all these parameters are drawn from a uniform distribution within their uncertainty limits. We find that the BPT and the Weibull renewal models yield comparable results, and both of them perform significantly better than the Poisson hypothesis. No clear performance enhancement is achieved by the introduction of the Coulomb static stress change into the renewal model.

  19. Establishment of a center of excellence for applied mathematical and statistical research

    NASA Technical Reports Server (NTRS)

    Woodward, W. A.; Gray, H. L.

    1983-01-01

    The state of the art was assessed with regards to efforts in support of the crop production estimation problem and alternative generic proportion estimation techniques were investigated. Topics covered include modeling the greeness profile (Badhwarmos model), parameter estimation using mixture models such as CLASSY, and minimum distance estimation as an alternative to maximum likelihood estimation. Approaches to the problem of obtaining proportion estimates when the underlying distributions are asymmetric are examined including the properties of Weibull distribution.

  20. Accurate Modeling of Galaxy Clustering on Small Scales: Testing the Standard ΛCDM + Halo Model

    NASA Astrophysics Data System (ADS)

    Sinha, Manodeep; Berlind, Andreas A.; McBride, Cameron; Scoccimarro, Roman

    2015-01-01

    The large-scale distribution of galaxies can be explained fairly simply by assuming (i) a cosmological model, which determines the dark matter halo distribution, and (ii) a simple connection between galaxies and the halos they inhabit. This conceptually simple framework, called the halo model, has been remarkably successful at reproducing the clustering of galaxies on all scales, as observed in various galaxy redshift surveys. However, none of these previous studies have carefully modeled the systematics and thus truly tested the halo model in a statistically rigorous sense. We present a new accurate and fully numerical halo model framework and test it against clustering measurements from two luminosity samples of galaxies drawn from the SDSS DR7. We show that the simple ΛCDM cosmology + halo model is not able to simultaneously reproduce the galaxy projected correlation function and the group multiplicity function. In particular, the more luminous sample shows significant tension with theory. We discuss the implications of our findings and how this work paves the way for constraining galaxy formation by accurate simultaneous modeling of multiple galaxy clustering statistics.

  1. Statistical Methodology for the Analysis of Repeated Duration Data in Behavioral Studies

    ERIC Educational Resources Information Center

    Letué, Frédérique; Martinez, Marie-José; Samson, Adeline; Vilain, Anne; Vilain, Coriandre

    2018-01-01

    Purpose: Repeated duration data are frequently used in behavioral studies. Classical linear or log-linear mixed models are often inadequate to analyze such data, because they usually consist of nonnegative and skew-distributed variables. Therefore, we recommend use of a statistical methodology specific to duration data. Method: We propose a…

  2. A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants.

    PubMed

    Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

    2014-04-01

    Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided.

  3. A statistical simulation model for field testing of non-target organisms in environmental risk assessment of genetically modified plants

    PubMed Central

    Goedhart, Paul W; van der Voet, Hilko; Baldacchino, Ferdinando; Arpaia, Salvatore

    2014-01-01

    Genetic modification of plants may result in unintended effects causing potentially adverse effects on the environment. A comparative safety assessment is therefore required by authorities, such as the European Food Safety Authority, in which the genetically modified plant is compared with its conventional counterpart. Part of the environmental risk assessment is a comparative field experiment in which the effect on non-target organisms is compared. Statistical analysis of such trials come in two flavors: difference testing and equivalence testing. It is important to know the statistical properties of these, for example, the power to detect environmental change of a given magnitude, before the start of an experiment. Such prospective power analysis can best be studied by means of a statistical simulation model. This paper describes a general framework for simulating data typically encountered in environmental risk assessment of genetically modified plants. The simulation model, available as Supplementary Material, can be used to generate count data having different statistical distributions possibly with excess-zeros. In addition the model employs completely randomized or randomized block experiments, can be used to simulate single or multiple trials across environments, enables genotype by environment interaction by adding random variety effects, and finally includes repeated measures in time following a constant, linear or quadratic pattern in time possibly with some form of autocorrelation. The model also allows to add a set of reference varieties to the GM plants and its comparator to assess the natural variation which can then be used to set limits of concern for equivalence testing. The different count distributions are described in some detail and some examples of how to use the simulation model to study various aspects, including a prospective power analysis, are provided. PMID:24834325

  4. Refining calibration and predictions of a Bayesian statistical-dynamical model for long term avalanche forecasting using dendrochronological reconstructions

    NASA Astrophysics Data System (ADS)

    Eckert, Nicolas; Schläppy, Romain; Jomelli, Vincent; Naaim, Mohamed

    2013-04-01

    A crucial step for proposing relevant long-term mitigation measures in long term avalanche forecasting is the accurate definition of high return period avalanches. Recently, "statistical-dynamical" approach combining a numerical model with stochastic operators describing the variability of its inputs-outputs have emerged. Their main interests is to take into account the topographic dependency of snow avalanche runout distances, and to constrain the correlation structure between model's variables by physical rules, so as to simulate the different marginal distributions of interest (pressure, flow depth, etc.) with a reasonable realism. Bayesian methods have been shown to be well adapted to achieve model inference, getting rid of identifiability problems thanks to prior information. An important problem which has virtually never been considered before is the validation of the predictions resulting from a statistical-dynamical approach (or from any other engineering method for computing extreme avalanches). In hydrology, independent "fossil" data such as flood deposits in caves are sometimes confronted to design discharges corresponding to high return periods. Hence, the aim of this work is to implement a similar comparison between high return period avalanches obtained with a statistical-dynamical approach and independent validation data resulting from careful dendrogeomorphological reconstructions. To do so, an up-to-date statistical model based on the depth-averaged equations and the classical Voellmy friction law is used on a well-documented case study. First, parameter values resulting from another path are applied, and the dendrological validation sample shows that this approach fails in providing realistic prediction for the case study. This may be due to the strongly bounded behaviour of runouts in this case (the extreme of their distribution is identified as belonging to the Weibull attraction domain). Second, local calibration on the available avalanche chronicle is performed with various prior distributions resulting from expert knowledge and/or other paths. For all calibrations, a very successful convergence is obtained, which confirms the robustness of the used Metropolis-Hastings estimation algorithm. This also demonstrates the interest of the Bayesian framework for aggregating information by sequential assimilation in the frequently encountered case of limited data quantity. Confrontation with the dendrological sample stresses the predominant role of the Coulombian friction coefficient distribution's variance on predicted high magnitude runouts. The optimal fit is obtained for a strong prior reflecting the local bounded behavior, and results in a 10-40 m difference for return periods ranging between 10 and 300 years. Implementing predictive simulations shows that this is largely within the range of magnitude of uncertainties to be taken into account. On the other hand, the different priors tested for the turbulent friction coefficient influence predictive performances only slightly, but have a large influence on predicted velocity and flow depth distributions. This all may be of high interest to refine calibration and predictive use of the statistical-dynamical model for any engineering application.

  5. ON A POSSIBLE SIZE/COLOR RELATIONSHIP IN THE KUIPER BELT

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pike, R. E.; Kavelaars, J. J., E-mail: repike@uvic.ca

    2013-10-01

    Color measurements and albedo distributions introduce non-intuitive observational biases in size-color relationships among Kuiper Belt Objects (KBOs) that cannot be disentangled without a well characterized sample population with systematic photometry. Peixinho et al. report that the form of the KBO color distribution varies with absolute magnitude, H. However, Tegler et al. find that KBO color distributions are a property of object classification. We construct synthetic models of observed KBO colors based on two B-R color distribution scenarios: color distribution dependent on H magnitude (H-Model) and color distribution based on object classification (Class-Model). These synthetic B-R color distributions were modified tomore » account for observational flux biases. We compare our synthetic B-R distributions to the observed ''Hot'' and ''Cold'' detected objects from the Canada-France Ecliptic Plane Survey and the Meudon Multicolor Survey. For both surveys, the Hot population color distribution rejects the H-Model, but is well described by the Class-Model. The Cold objects reject the H-Model, but the Class-Model (while not statistically rejected) also does not provide a compelling match for data. Although we formally reject models where the structure of the color distribution is a strong function of H magnitude, we also do not find that a simple dependence of color distribution on orbit classification is sufficient to describe the color distribution of classical KBOs.« less

  6. Probability Distributions in Library and Information Science: A Historical and Practitioner Viewpoint.

    ERIC Educational Resources Information Center

    Bensman, Stephen J.

    2000-01-01

    This speculative historiographic essay attempts to fix the present position of library and information science within the context of the probabilistic revolution that has been encompassing all of science. Comprises a guide to statistical research in library and information science, discussing skewed distributions, biostatistics, stochastic models,…

  7. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model.

    PubMed

    Austin, Peter C

    2018-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest.

  8. Statistical power to detect violation of the proportional hazards assumption when using the Cox regression model

    PubMed Central

    Austin, Peter C.

    2017-01-01

    The use of the Cox proportional hazards regression model is widespread. A key assumption of the model is that of proportional hazards. Analysts frequently test the validity of this assumption using statistical significance testing. However, the statistical power of such assessments is frequently unknown. We used Monte Carlo simulations to estimate the statistical power of two different methods for detecting violations of this assumption. When the covariate was binary, we found that a model-based method had greater power than a method based on cumulative sums of martingale residuals. Furthermore, the parametric nature of the distribution of event times had an impact on power when the covariate was binary. Statistical power to detect a strong violation of the proportional hazards assumption was low to moderate even when the number of observed events was high. In many data sets, power to detect a violation of this assumption is likely to be low to modest. PMID:29321694

  9. Accumulated distribution of material gain at dislocation crystal growth

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rakin, V. I., E-mail: rakin@geo.komisc.ru

    2016-05-15

    A model for slowing down the tangential growth rate of an elementary step at dislocation crystal growth is proposed based on the exponential law of impurity particle distribution over adsorption energy. It is established that the statistical distribution of material gain on structurally equivalent faces obeys the Erlang law. The Erlang distribution is proposed to be used to calculate the occurrence rates of morphological combinatorial types of polyhedra, presenting real simple crystallographic forms.

  10. Generation of future potential scenarios in an Alpine Catchment by applying bias-correction techniques, delta-change approaches and stochastic Weather Generators at different spatial scale. Analysis of their influence on basic and drought statistics.

    NASA Astrophysics Data System (ADS)

    Collados-Lara, Antonio-Juan; Pulido-Velazquez, David; Pardo-Iguzquiza, Eulogio

    2017-04-01

    Assessing impacts of potential future climate change scenarios in precipitation and temperature is essential to design adaptive strategies in water resources systems. The objective of this work is to analyze the possibilities of different statistical downscaling methods to generate future potential scenarios in an Alpine Catchment from historical data and the available climate models simulations performed in the frame of the CORDEX EU project. The initial information employed to define these downscaling approaches are the historical climatic data (taken from the Spain02 project for the period 1971-2000 with a spatial resolution of 12.5 Km) and the future series provided by climatic models in the horizon period 2071-2100 . We have used information coming from nine climate model simulations (obtained from five different Regional climate models (RCM) nested to four different Global Climate Models (GCM)) from the European CORDEX project. In our application we have focused on the Representative Concentration Pathways (RCP) 8.5 emissions scenario, which is the most unfavorable scenario considered in the fifth Assessment Report (AR5) by the Intergovernmental Panel on Climate Change (IPCC). For each RCM we have generated future climate series for the period 2071-2100 by applying two different approaches, bias correction and delta change, and five different transformation techniques (first moment correction, first and second moment correction, regression functions, quantile mapping using distribution derived transformation and quantile mapping using empirical quantiles) for both of them. Ensembles of the obtained series were proposed to obtain more representative potential future climate scenarios to be employed to study potential impacts. In this work we propose a non-equifeaseble combination of the future series giving more weight to those coming from models (delta change approaches) or combination of models and techniques that provides better approximation to the basic and drought statistic of the historical data. A multi-objective analysis using basic statistics (mean, standard deviation and asymmetry coefficient) and droughts statistics (duration, magnitude and intensity) has been performed to identify which models are better in terms of goodness of fit to reproduce the historical series. The drought statistics have been obtained from the Standard Precipitation index (SPI) series using the Theory of Runs. This analysis allows discriminate the best RCM and the best combination of model and correction technique in the bias-correction method. We have also analyzed the possibilities of using different Stochastic Weather Generators to approximate the basic and droughts statistics of the historical series. These analyses have been performed in our case study in a lumped and in a distributed way in order to assess its sensibility to the spatial scale. The statistic of the future temperature series obtained with different ensemble options are quite homogeneous, but the precipitation shows a higher sensibility to the adopted method and spatial scale. The global increment in the mean temperature values are 31.79 %, 31.79 %, 31.03 % and 31.74 % for the distributed bias-correction, distributed delta-change, lumped bias-correction and lumped delta-change ensembles respectively and in the precipitation they are -25.48 %, -28.49 %, -26.42 % and -27.35% respectively. Acknowledgments: This research work has been partially supported by the GESINHIMPADAPT project (CGL2013-48424-C2-2-R) with Spanish MINECO funds. We would also like to thank Spain02 and CORDEX projects for the data provided for this study and the R package qmap.

  11. Incorporating linguistic knowledge for learning distributed word representations.

    PubMed

    Wang, Yan; Liu, Zhiyuan; Sun, Maosong

    2015-01-01

    Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining.

  12. Incorporating Linguistic Knowledge for Learning Distributed Word Representations

    PubMed Central

    Wang, Yan; Liu, Zhiyuan; Sun, Maosong

    2015-01-01

    Combined with neural language models, distributed word representations achieve significant advantages in computational linguistics and text mining. Most existing models estimate distributed word vectors from large-scale data in an unsupervised fashion, which, however, do not take rich linguistic knowledge into consideration. Linguistic knowledge can be represented as either link-based knowledge or preference-based knowledge, and we propose knowledge regularized word representation models (KRWR) to incorporate these prior knowledge for learning distributed word representations. Experiment results demonstrate that our estimated word representation achieves better performance in task of semantic relatedness ranking. This indicates that our methods can efficiently encode both prior knowledge from knowledge bases and statistical knowledge from large-scale text corpora into a unified word representation model, which will benefit many tasks in text mining. PMID:25874581

  13. Statistical Analysis of Instantaneous Frequency Scaling Factor as Derived From Optical Disdrometer Measurements At KQ Bands

    NASA Technical Reports Server (NTRS)

    Zemba, Michael; Nessel, James; Houts, Jacquelynne; Luini, Lorenzo; Riva, Carlo

    2016-01-01

    The rain rate data and statistics of a location are often used in conjunction with models to predict rain attenuation. However, the true attenuation is a function not only of rain rate, but also of the drop size distribution (DSD). Generally, models utilize an average drop size distribution (Laws and Parsons or Marshall and Palmer. However, individual rain events may deviate from these models significantly if their DSD is not well approximated by the average. Therefore, characterizing the relationship between the DSD and attenuation is valuable in improving modeled predictions of rain attenuation statistics. The DSD may also be used to derive the instantaneous frequency scaling factor and thus validate frequency scaling models. Since June of 2014, NASA Glenn Research Center (GRC) and the Politecnico di Milano (POLIMI) have jointly conducted a propagation study in Milan, Italy utilizing the 20 and 40 GHz beacon signals of the Alphasat TDP#5 Aldo Paraboni payload. The Ka- and Q-band beacon receivers provide a direct measurement of the signal attenuation while concurrent weather instrumentation provides measurements of the atmospheric conditions at the receiver. Among these instruments is a Thies Clima Laser Precipitation Monitor (optical disdrometer) which yields droplet size distributions (DSD); this DSD information can be used to derive a scaling factor that scales the measured 20 GHz data to expected 40 GHz attenuation. Given the capability to both predict and directly observe 40 GHz attenuation, this site is uniquely situated to assess and characterize such predictions. Previous work using this data has examined the relationship between the measured drop-size distribution and the measured attenuation of the link]. The focus of this paper now turns to a deeper analysis of the scaling factor, including the prediction error as a function of attenuation level, correlation between the scaling factor and the rain rate, and the temporal variability of the drop size distribution both within a given rain event and across different varieties of rain events. Index Terms-drop size distribution, frequency scaling, propagation losses, radiowave propagation.

  14. Statistical Analysis of Instantaneous Frequency Scaling Factor as Derived From Optical Disdrometer Measurements At KQ Bands

    NASA Technical Reports Server (NTRS)

    Zemba, Michael; Nessel, James; Houts, Jacquelynne; Luini, Lorenzo; Riva, Carlo

    2016-01-01

    The rain rate data and statistics of a location are often used in conjunction with models to predict rain attenuation. However, the true attenuation is a function not only of rain rate, but also of the drop size distribution (DSD). Generally, models utilize an average drop size distribution (Laws and Parsons or Marshall and Palmer [1]). However, individual rain events may deviate from these models significantly if their DSD is not well approximated by the average. Therefore, characterizing the relationship between the DSD and attenuation is valuable in improving modeled predictions of rain attenuation statistics. The DSD may also be used to derive the instantaneous frequency scaling factor and thus validate frequency scaling models. Since June of 2014, NASA Glenn Research Center (GRC) and the Politecnico di Milano (POLIMI) have jointly conducted a propagation study in Milan, Italy utilizing the 20 and 40 GHz beacon signals of the Alphasat TDP#5 Aldo Paraboni payload. The Ka- and Q-band beacon receivers provide a direct measurement of the signal attenuation while concurrent weather instrumentation provides measurements of the atmospheric conditions at the receiver. Among these instruments is a Thies Clima Laser Precipitation Monitor (optical disdrometer) which yields droplet size distributions (DSD); this DSD information can be used to derive a scaling factor that scales the measured 20 GHz data to expected 40 GHz attenuation. Given the capability to both predict and directly observe 40 GHz attenuation, this site is uniquely situated to assess and characterize such predictions. Previous work using this data has examined the relationship between the measured drop-size distribution and the measured attenuation of the link [2]. The focus of this paper now turns to a deeper analysis of the scaling factor, including the prediction error as a function of attenuation level, correlation between the scaling factor and the rain rate, and the temporal variability of the drop size distribution both within a given rain event and across different varieties of rain events. Index Terms-drop size distribution, frequency scaling, propagation losses, radiowave propagation.

  15. A Statistical Model of the Fluctuations in the Geomagnetic Field from Paleosecular Variation to Reversal

    PubMed

    Camps; Prevot

    1996-08-09

    The statistical characteristics of the local magnetic field of Earth during paleosecular variation, excursions, and reversals are described on the basis of a database that gathers the cleaned mean direction and average remanent intensity of 2741 lava flows that have erupted over the last 20 million years. A model consisting of a normally distributed axial dipole component plus an independent isotropic set of vectors with a Maxwellian distribution that simulates secular variation fits the range of geomagnetic fluctuations, in terms of both direction and intensity. This result suggests that the magnitude of secular variation vectors is independent of the magnitude of Earth's axial dipole moment and that the amplitude of secular variation is unchanged during reversals.

  16. Quantum statistics of Raman scattering model with Stokes mode generation

    NASA Technical Reports Server (NTRS)

    Tanatar, Bilal; Shumovsky, Alexander S.

    1994-01-01

    The model describing three coupled quantum oscillators with decay of Rayleigh mode into the Stokes and vibration (phonon) modes is examined. Due to the Manley-Rowe relations the problem of exact eigenvalues and eigenstates is reduced to the calculation of new orthogonal polynomials defined both by the difference and differential equations. The quantum statistical properties are examined in the case when initially: the Stokes mode is in the vacuum state; the Rayleigh mode is in the number state; and the vibration mode is in the number of or squeezed states. The collapses and revivals are obtained for different initial conditions as well as the change in time the sub-Poisson distribution by the super-Poisson distribution and vice versa.

  17. Exponentiated power Lindley distribution.

    PubMed

    Ashour, Samir K; Eltehiwy, Mahmoud A

    2015-11-01

    A new generalization of the Lindley distribution is recently proposed by Ghitany et al. [1], called as the power Lindley distribution. Another generalization of the Lindley distribution was introduced by Nadarajah et al. [2], named as the generalized Lindley distribution. This paper proposes a more generalization of the Lindley distribution which generalizes the two. We refer to this new generalization as the exponentiated power Lindley distribution. The new distribution is important since it contains as special sub-models some widely well-known distributions in addition to the above two models, such as the Lindley distribution among many others. It also provides more flexibility to analyze complex real data sets. We study some statistical properties for the new distribution. We discuss maximum likelihood estimation of the distribution parameters. Least square estimation is used to evaluate the parameters. Three algorithms are proposed for generating random data from the proposed distribution. An application of the model to a real data set is analyzed using the new distribution, which shows that the exponentiated power Lindley distribution can be used quite effectively in analyzing real lifetime data.

  18. The Birth-Death-Mutation Process: A New Paradigm for Fat Tailed Distributions

    PubMed Central

    Maruvka, Yosef E.; Kessler, David A.; Shnerb, Nadav M.

    2011-01-01

    Fat tailed statistics and power-laws are ubiquitous in many complex systems. Usually the appearance of of a few anomalously successful individuals (bio-species, investors, websites) is interpreted as reflecting some inherent “quality” (fitness, talent, giftedness) as in Darwin's theory of natural selection. Here we adopt the opposite, “neutral”, outlook, suggesting that the main factor explaining success is merely luck. The statistics emerging from the neutral birth-death-mutation (BDM) process is shown to fit marvelously many empirical distributions. While previous neutral theories have focused on the power-law tail, our theory economically and accurately explains the entire distribution. We thus suggest the BDM distribution as a standard neutral model: effects of fitness and selection are to be identified by substantial deviations from it. PMID:22069453

  19. The social architecture of capitalism

    NASA Astrophysics Data System (ADS)

    Wright, Ian

    2005-02-01

    A dynamic model of the social relations between workers and capitalists is introduced. The model self-organises into a dynamic equilibrium with statistical properties that are in close qualitative and in many cases quantitative agreement with a broad range of known empirical distributions of developed capitalism, including the power-law firm size distribution, the Laplace firm and GDP growth distribution, the lognormal firm demises distribution, the exponential recession duration distribution, the lognormal-Pareto income distribution, and the gamma-like firm rate-of-profit distribution. Normally these distributions are studied in isolation, but this model unifies and connects them within a single causal framework. The model also generates business cycle phenomena, including fluctuating wage and profit shares in national income about values consistent with empirical studies. The generation of an approximately lognormal-Pareto income distribution and an exponential-Pareto wealth distribution demonstrates that the power-law regime of the income distribution can be explained by an additive process on a power-law network that models the social relation between employers and employees organised in firms, rather than a multiplicative process that models returns to investment in financial markets. A testable consequence of the model is the conjecture that the rate-of-profit distribution is consistent with a parameter-mix of a ratio of normal variates with means and variances that depend on a firm size parameter that is distributed according to a power-law.

  20. Statistical steady states in turbulent droplet condensation

    NASA Astrophysics Data System (ADS)

    Bec, Jeremie; Krstulovic, Giorgio; Siewert, Christoph

    2017-11-01

    We investigate the general problem of turbulent condensation. Using direct numerical simulations we show that the fluctuations of the supersaturation field offer different conditions for the growth of droplets which evolve in time due to turbulent transport and mixing. This leads to propose a Lagrangian stochastic model consisting of a set of integro-differential equations for the joint evolution of the squared radius and the supersaturation along droplet trajectories. The model has two parameters fixed by the total amount of water and the thermodynamic properties, as well as the Lagrangian integral timescale of the turbulent supersaturation. The model reproduces very well the droplet size distributions obtained from direct numerical simulations and their time evolution. A noticeable result is that, after a stage where the squared radius simply diffuses, the system converges exponentially fast to a statistical steady state independent of the initial conditions. The main mechanism involved in this convergence is a loss of memory induced by a significant number of droplets undergoing a complete evaporation before growing again. The statistical steady state is characterised by an exponential tail in the droplet mass distribution.

  1. Secure and scalable deduplication of horizontally partitioned health data for privacy-preserving distributed statistical computation.

    PubMed

    Yigzaw, Kassaye Yitbarek; Michalas, Antonis; Bellika, Johan Gustav

    2017-01-03

    Techniques have been developed to compute statistics on distributed datasets without revealing private information except the statistical results. However, duplicate records in a distributed dataset may lead to incorrect statistical results. Therefore, to increase the accuracy of the statistical analysis of a distributed dataset, secure deduplication is an important preprocessing step. We designed a secure protocol for the deduplication of horizontally partitioned datasets with deterministic record linkage algorithms. We provided a formal security analysis of the protocol in the presence of semi-honest adversaries. The protocol was implemented and deployed across three microbiology laboratories located in Norway, and we ran experiments on the datasets in which the number of records for each laboratory varied. Experiments were also performed on simulated microbiology datasets and data custodians connected through a local area network. The security analysis demonstrated that the protocol protects the privacy of individuals and data custodians under a semi-honest adversarial model. More precisely, the protocol remains secure with the collusion of up to N - 2 corrupt data custodians. The total runtime for the protocol scales linearly with the addition of data custodians and records. One million simulated records distributed across 20 data custodians were deduplicated within 45 s. The experimental results showed that the protocol is more efficient and scalable than previous protocols for the same problem. The proposed deduplication protocol is efficient and scalable for practical uses while protecting the privacy of patients and data custodians.

  2. Transmuted of Rayleigh Distribution with Estimation and Application on Noise Signal

    NASA Astrophysics Data System (ADS)

    Ahmed, Suhad; Qasim, Zainab

    2018-05-01

    This paper deals with transforming one parameter Rayleigh distribution, into transmuted probability distribution through introducing a new parameter (λ), since this studied distribution is necessary in representing signal data distribution and failure data model the value of this transmuted parameter |λ| ≤ 1, is also estimated as well as the original parameter (⊖) by methods of moments and maximum likelihood using different sample size (n=25, 50, 75, 100) and comparing the results of estimation by statistical measure (mean square error, MSE).

  3. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape

    PubMed Central

    Coupé, Christophe

    2018-01-01

    As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for ‘difficult’ variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables. PMID:29713298

  4. Modeling Linguistic Variables With Regression Models: Addressing Non-Gaussian Distributions, Non-independent Observations, and Non-linear Predictors With Random Effects and Generalized Additive Models for Location, Scale, and Shape.

    PubMed

    Coupé, Christophe

    2018-01-01

    As statistical approaches are getting increasingly used in linguistics, attention must be paid to the choice of methods and algorithms used. This is especially true since they require assumptions to be satisfied to provide valid results, and because scientific articles still often fall short of reporting whether such assumptions are met. Progress is being, however, made in various directions, one of them being the introduction of techniques able to model data that cannot be properly analyzed with simpler linear regression models. We report recent advances in statistical modeling in linguistics. We first describe linear mixed-effects regression models (LMM), which address grouping of observations, and generalized linear mixed-effects models (GLMM), which offer a family of distributions for the dependent variable. Generalized additive models (GAM) are then introduced, which allow modeling non-linear parametric or non-parametric relationships between the dependent variable and the predictors. We then highlight the possibilities offered by generalized additive models for location, scale, and shape (GAMLSS). We explain how they make it possible to go beyond common distributions, such as Gaussian or Poisson, and offer the appropriate inferential framework to account for 'difficult' variables such as count data with strong overdispersion. We also demonstrate how they offer interesting perspectives on data when not only the mean of the dependent variable is modeled, but also its variance, skewness, and kurtosis. As an illustration, the case of phonemic inventory size is analyzed throughout the article. For over 1,500 languages, we consider as predictors the number of speakers, the distance from Africa, an estimation of the intensity of language contact, and linguistic relationships. We discuss the use of random effects to account for genealogical relationships, the choice of appropriate distributions to model count data, and non-linear relationships. Relying on GAMLSS, we assess a range of candidate distributions, including the Sichel, Delaporte, Box-Cox Green and Cole, and Box-Cox t distributions. We find that the Box-Cox t distribution, with appropriate modeling of its parameters, best fits the conditional distribution of phonemic inventory size. We finally discuss the specificities of phoneme counts, weak effects, and how GAMLSS should be considered for other linguistic variables.

  5. Multivariate normality

    NASA Technical Reports Server (NTRS)

    Crutcher, H. L.; Falls, L. W.

    1976-01-01

    Sets of experimentally determined or routinely observed data provide information about the past, present and, hopefully, future sets of similarly produced data. An infinite set of statistical models exists which may be used to describe the data sets. The normal distribution is one model. If it serves at all, it serves well. If a data set, or a transformation of the set, representative of a larger population can be described by the normal distribution, then valid statistical inferences can be drawn. There are several tests which may be applied to a data set to determine whether the univariate normal model adequately describes the set. The chi-square test based on Pearson's work in the late nineteenth and early twentieth centuries is often used. Like all tests, it has some weaknesses which are discussed in elementary texts. Extension of the chi-square test to the multivariate normal model is provided. Tables and graphs permit easier application of the test in the higher dimensions. Several examples, using recorded data, illustrate the procedures. Tests of maximum absolute differences, mean sum of squares of residuals, runs and changes of sign are included in these tests. Dimensions one through five with selected sample sizes 11 to 101 are used to illustrate the statistical tests developed.

  6. Model Uncertainty and Robustness: A Computational Framework for Multimodel Analysis

    ERIC Educational Resources Information Center

    Young, Cristobal; Holsteen, Katherine

    2017-01-01

    Model uncertainty is pervasive in social science. A key question is how robust empirical results are to sensible changes in model specification. We present a new approach and applied statistical software for computational multimodel analysis. Our approach proceeds in two steps: First, we estimate the modeling distribution of estimates across all…

  7. Exploring Contextual Models in Chemical Patent Search

    NASA Astrophysics Data System (ADS)

    Urbain, Jay; Frieder, Ophir

    We explore the development of probabilistic retrieval models for integrating term statistics with entity search using multiple levels of document context to improve the performance of chemical patent search. A distributed indexing model was developed to enable efficient named entity search and aggregation of term statistics at multiple levels of patent structure including individual words, sentences, claims, descriptions, abstracts, and titles. The system can be scaled to an arbitrary number of compute instances in a cloud computing environment to support concurrent indexing and query processing operations on large patent collections.

  8. Hydrological responses to dynamically and statistically downscaled climate model output

    USGS Publications Warehouse

    Wilby, R.L.; Hay, L.E.; Gutowski, W.J.; Arritt, R.W.; Takle, E.S.; Pan, Z.; Leavesley, G.H.; Clark, M.P.

    2000-01-01

    Daily rainfall and surface temperature series were simulated for the Animas River basin, Colorado using dynamically and statistically downscaled output from the National Center for Environmental Prediction/National Center for Atmospheric Research (NCEP/NCAR) re-analysis. A distributed hydrological model was then applied to the downscaled data. Relative to raw NCEP output, downscaled climate variables provided more realistic stimulations of basin scale hydrology. However, the results highlight the sensitivity of modeled processes to the choice of downscaling technique, and point to the need for caution when interpreting future hydrological scenarios.

  9. A Decision Model for Supporting Task Allocation Processes in Global Software Development

    NASA Astrophysics Data System (ADS)

    Lamersdorf, Ansgar; Münch, Jürgen; Rombach, Dieter

    Today, software-intensive systems are increasingly being developed in a globally distributed way. However, besides its benefit, global development also bears a set of risks and problems. One critical factor for successful project management of distributed software development is the allocation of tasks to sites, as this is assumed to have a major influence on the benefits and risks. We introduce a model that aims at improving management processes in globally distributed projects by giving decision support for task allocation that systematically regards multiple criteria. The criteria and causal relationships were identified in a literature study and refined in a qualitative interview study. The model uses existing approaches from distributed systems and statistical modeling. The article gives an overview of the problem and related work, introduces the empirical and theoretical foundations of the model, and shows the use of the model in an example scenario.

  10. An Algebraic Implicitization and Specialization of Minimum KL-Divergence Models

    NASA Astrophysics Data System (ADS)

    Dukkipati, Ambedkar; Manathara, Joel George

    In this paper we study representation of KL-divergence minimization, in the cases where integer sufficient statistics exists, using tools from polynomial algebra. We show that the estimation of parametric statistical models in this case can be transformed to solving a system of polynomial equations. In particular, we also study the case of Kullback-Csisźar iteration scheme. We present implicit descriptions of these models and show that implicitization preserves specialization of prior distribution. This result leads us to a Gröbner bases method to compute an implicit representation of minimum KL-divergence models.

  11. Self-organization of cosmic radiation pressure instability. II - One-dimensional simulations

    NASA Technical Reports Server (NTRS)

    Hogan, Craig J.; Woods, Jorden

    1992-01-01

    The clustering of statistically uniform discrete absorbing particles moving solely under the influence of radiation pressure from uniformly distributed emitters is studied in a simple one-dimensional model. Radiation pressure tends to amplify statistical clustering in the absorbers; the absorbing material is swept into empty bubbles, the biggest bubbles grow bigger almost as they would in a uniform medium, and the smaller ones get crushed and disappear. Numerical simulations of a one-dimensional system are used to support the conjecture that the system is self-organizing. Simple statistics indicate that a wide range of initial conditions produce structure approaching the same self-similar statistical distribution, whose scaling properties follow those of the attractor solution for an isolated bubble. The importance of the process for large-scale structuring of the interstellar medium is briefly discussed.

  12. Generalised Central Limit Theorems for Growth Rate Distribution of Complex Systems

    NASA Astrophysics Data System (ADS)

    Takayasu, Misako; Watanabe, Hayafumi; Takayasu, Hideki

    2014-04-01

    We introduce a solvable model of randomly growing systems consisting of many independent subunits. Scaling relations and growth rate distributions in the limit of infinite subunits are analysed theoretically. Various types of scaling properties and distributions reported for growth rates of complex systems in a variety of fields can be derived from this basic physical model. Statistical data of growth rates for about 1 million business firms are analysed as a real-world example of randomly growing systems. Not only are the scaling relations consistent with the theoretical solution, but the entire functional form of the growth rate distribution is fitted with a theoretical distribution that has a power-law tail.

  13. Local operators in kinetic wealth distribution

    NASA Astrophysics Data System (ADS)

    Andrecut, M.

    2016-05-01

    The statistical mechanics approach to wealth distribution is based on the conservative kinetic multi-agent model for money exchange, where the local interaction rule between the agents is analogous to the elastic particle scattering process. Here, we discuss the role of a class of conservative local operators, and we show that, depending on the values of their parameters, they can be used to generate all the relevant distributions. We also show numerically that in order to generate the power-law tail, an heterogeneous risk aversion model is required. By changing the parameters of these operators, one can also fine tune the resulting distributions in order to provide support for the emergence of a more egalitarian wealth distribution.

  14. Resolving Structural Variability in Network Models and the Brain

    PubMed Central

    Klimm, Florian; Bassett, Danielle S.; Carlson, Jean M.; Mucha, Peter J.

    2014-01-01

    Large-scale white matter pathways crisscrossing the cortex create a complex pattern of connectivity that underlies human cognitive function. Generative mechanisms for this architecture have been difficult to identify in part because little is known in general about mechanistic drivers of structured networks. Here we contrast network properties derived from diffusion spectrum imaging data of the human brain with 13 synthetic network models chosen to probe the roles of physical network embedding and temporal network growth. We characterize both the empirical and synthetic networks using familiar graph metrics, but presented here in a more complete statistical form, as scatter plots and distributions, to reveal the full range of variability of each measure across scales in the network. We focus specifically on the degree distribution, degree assortativity, hierarchy, topological Rentian scaling, and topological fractal scaling—in addition to several summary statistics, including the mean clustering coefficient, the shortest path-length, and the network diameter. The models are investigated in a progressive, branching sequence, aimed at capturing different elements thought to be important in the brain, and range from simple random and regular networks, to models that incorporate specific growth rules and constraints. We find that synthetic models that constrain the network nodes to be physically embedded in anatomical brain regions tend to produce distributions that are most similar to the corresponding measurements for the brain. We also find that network models hardcoded to display one network property (e.g., assortativity) do not in general simultaneously display a second (e.g., hierarchy). This relative independence of network properties suggests that multiple neurobiological mechanisms might be at play in the development of human brain network architecture. Together, the network models that we develop and employ provide a potentially useful starting point for the statistical inference of brain network structure from neuroimaging data. PMID:24675546

  15. Forecasting distributions of large federal-lands fires utilizing satellite and gridded weather information

    Treesearch

    H.K. Preisler; R.E. Burgan; J.C. Eidenshink; J.M. Klaver; R.W. Klaver

    2009-01-01

    The current study presents a statistical model for assessing the skill of fire danger indices and for forecasting the distribution of the expected numbers of large fires over a given region and for the upcoming week. The procedure permits development of daily maps that forecast, for the forthcoming week and within federal lands, percentiles of the distributions of (i)...

  16. ON CONTINUOUS-REVIEW (S-1,S) INVENTORY POLICIES WITH STATE-DEPENDENT LEADTIMES,

    DTIC Science & Technology

    INVENTORY CONTROL, *REPLACEMENT THEORY), MATHEMATICAL MODELS, LEAD TIME , MANAGEMENT ENGINEERING, DISTRIBUTION FUNCTIONS, PROBABILITY, QUEUEING THEORY, COSTS, OPTIMIZATION, STATISTICAL PROCESSES, DIFFERENCE EQUATIONS

  17. Modeling and Simulation of Upset-Inducing Disturbances for Digital Systems in an Electromagnetic Reverberation Chamber

    NASA Technical Reports Server (NTRS)

    Torres-Pomales, Wilfredo

    2014-01-01

    This report describes a modeling and simulation approach for disturbance patterns representative of the environment experienced by a digital system in an electromagnetic reverberation chamber. The disturbance is modeled by a multi-variate statistical distribution based on empirical observations. Extended versions of the Rejection Samping and Inverse Transform Sampling techniques are developed to generate multi-variate random samples of the disturbance. The results show that Inverse Transform Sampling returns samples with higher fidelity relative to the empirical distribution. This work is part of an ongoing effort to develop a resilience assessment methodology for complex safety-critical distributed systems.

  18. Statistical electric field and switching time distributions in PZT 1Nb2Sr ceramics: Crystal- and microstructure effects

    NASA Astrophysics Data System (ADS)

    Zhukov, Sergey; Kungl, Hans; Genenko, Yuri A.; von Seggern, Heinz

    2014-01-01

    Dispersive polarization response of ferroelectric PZT ceramics is analyzed assuming the inhomogeneous field mechanism of polarization switching. In terms of this model, the local polarization switching proceeds according to the Kolmogorov-Avrami-Ishibashi scenario with the switching time determined by the local electric field. As a result, the total polarization reversal is dominated by the statistical distribution of the local field magnitudes. Microscopic parameters of this model (the high-field switching time and the activation field) as well as the statistical field and consequent switching time distributions due to disorder at a mesoscopic scale can be directly determined from a set of experiments measuring the time dependence of the total polarization switching, when applying electric fields of different magnitudes. PZT 1Nb2Sr ceramics with Zr/Ti ratios 51.5/48.5, 52.25/47.75, and 60/40 with four different grain sizes each were analyzed following this approach. Pronounced differences of field and switching time distributions were found depending on the Zr/Ti ratios. Varying grain size also affects polarization reversal parameters, but in another way. The field distributions remain almost constant with grain size whereas switching times and activation field tend to decrease with increasing grain size. The quantitative changes of the latter parameters with grain size are very different depending on composition. The origin of the effects on the field and switching time distributions are related to differences in structural and microstructural characteristics of the materials and are discussed with respect to the hysteresis loops observed under bipolar electrical cycling.

  19. Separate-channel analysis of two-channel microarrays: recovering inter-spot information.

    PubMed

    Smyth, Gordon K; Altman, Naomi S

    2013-05-26

    Two-channel (or two-color) microarrays are cost-effective platforms for comparative analysis of gene expression. They are traditionally analysed in terms of the log-ratios (M-values) of the two channel intensities at each spot, but this analysis does not use all the information available in the separate channel observations. Mixed models have been proposed to analyse intensities from the two channels as separate observations, but such models can be complex to use and the gain in efficiency over the log-ratio analysis is difficult to quantify. Mixed models yield test statistics for the null distributions can be specified only approximately, and some approaches do not borrow strength between genes. This article reformulates the mixed model to clarify the relationship with the traditional log-ratio analysis, to facilitate information borrowing between genes, and to obtain an exact distributional theory for the resulting test statistics. The mixed model is transformed to operate on the M-values and A-values (average log-expression for each spot) instead of on the log-expression values. The log-ratio analysis is shown to ignore information contained in the A-values. The relative efficiency of the log-ratio analysis is shown to depend on the size of the intraspot correlation. A new separate channel analysis method is proposed that assumes a constant intra-spot correlation coefficient across all genes. This approach permits the mixed model to be transformed into an ordinary linear model, allowing the data analysis to use a well-understood empirical Bayes analysis pipeline for linear modeling of microarray data. This yields statistically powerful test statistics that have an exact distributional theory. The log-ratio, mixed model and common correlation methods are compared using three case studies. The results show that separate channel analyses that borrow strength between genes are more powerful than log-ratio analyses. The common correlation analysis is the most powerful of all. The common correlation method proposed in this article for separate-channel analysis of two-channel microarray data is no more difficult to apply in practice than the traditional log-ratio analysis. It provides an intuitive and powerful means to conduct analyses and make comparisons that might otherwise not be possible.

  20. A SIGNIFICANCE TEST FOR THE LASSO1

    PubMed Central

    Lockhart, Richard; Taylor, Jonathan; Tibshirani, Ryan J.; Tibshirani, Robert

    2014-01-01

    In the sparse linear regression setting, we consider testing the significance of the predictor variable that enters the current lasso model, in the sequence of models visited along the lasso solution path. We propose a simple test statistic based on lasso fitted values, called the covariance test statistic, and show that when the true model is linear, this statistic has an Exp(1) asymptotic distribution under the null hypothesis (the null being that all truly active variables are contained in the current lasso model). Our proof of this result for the special case of the first predictor to enter the model (i.e., testing for a single significant predictor variable against the global null) requires only weak assumptions on the predictor matrix X. On the other hand, our proof for a general step in the lasso path places further technical assumptions on X and the generative model, but still allows for the important high-dimensional case p > n, and does not necessarily require that the current lasso model achieves perfect recovery of the truly active variables. Of course, for testing the significance of an additional variable between two nested linear models, one typically uses the chi-squared test, comparing the drop in residual sum of squares (RSS) to a χ12 distribution. But when this additional variable is not fixed, and has been chosen adaptively or greedily, this test is no longer appropriate: adaptivity makes the drop in RSS stochastically much larger than χ12 under the null hypothesis. Our analysis explicitly accounts for adaptivity, as it must, since the lasso builds an adaptive sequence of linear models as the tuning parameter λ decreases. In this analysis, shrinkage plays a key role: though additional variables are chosen adaptively, the coefficients of lasso active variables are shrunken due to the l1 penalty. Therefore, the test statistic (which is based on lasso fitted values) is in a sense balanced by these two opposing properties—adaptivity and shrinkage—and its null distribution is tractable and asymptotically Exp(1). PMID:25574062

  1. Modifying climate change habitat models using tree species-specific assessments of model uncertainty and life history-factors

    Treesearch

    Stephen N. Matthews; Louis R. Iverson; Anantha M. Prasad; Matthew P. Peters; Paul G. Rodewald

    2011-01-01

    Species distribution models (SDMs) to evaluate trees' potential responses to climate change are essential for developing appropriate forest management strategies. However, there is a great need to better understand these models' limitations and evaluate their uncertainties. We have previously developed statistical models of suitable habitat, based on both...

  2. Projecting changes in the distribution and productivity of living marine resources: A critical review of the suite of modelling approaches used in the large European project VECTORS

    NASA Astrophysics Data System (ADS)

    Peck, Myron A.; Arvanitidis, Christos; Butenschön, Momme; Canu, Donata Melaku; Chatzinikolaou, Eva; Cucco, Andrea; Domenici, Paolo; Fernandes, Jose A.; Gasche, Loic; Huebert, Klaus B.; Hufnagl, Marc; Jones, Miranda C.; Kempf, Alexander; Keyl, Friedemann; Maar, Marie; Mahévas, Stéphanie; Marchal, Paul; Nicolas, Delphine; Pinnegar, John K.; Rivot, Etienne; Rochette, Sébastien; Sell, Anne F.; Sinerchia, Matteo; Solidoro, Cosimo; Somerfield, Paul J.; Teal, Lorna R.; Travers-Trolet, Morgan; van de Wolfshaar, Karen E.

    2018-02-01

    We review and compare four broad categories of spatially-explicit modelling approaches currently used to understand and project changes in the distribution and productivity of living marine resources including: 1) statistical species distribution models, 2) physiology-based, biophysical models of single life stages or the whole life cycle of species, 3) food web models, and 4) end-to-end models. Single pressures are rare and, in the future, models must be able to examine multiple factors affecting living marine resources such as interactions between: i) climate-driven changes in temperature regimes and acidification, ii) reductions in water quality due to eutrophication, iii) the introduction of alien invasive species, and/or iv) (over-)exploitation by fisheries. Statistical (correlative) approaches can be used to detect historical patterns which may not be relevant in the future. Advancing predictive capacity of changes in distribution and productivity of living marine resources requires explicit modelling of biological and physical mechanisms. New formulations are needed which (depending on the question) will need to strive for more realism in ecophysiology and behaviour of individuals, life history strategies of species, as well as trophodynamic interactions occurring at different spatial scales. Coupling existing models (e.g. physical, biological, economic) is one avenue that has proven successful. However, fundamental advancements are needed to address key issues such as the adaptive capacity of species/groups and ecosystems. The continued development of end-to-end models (e.g., physics to fish to human sectors) will be critical if we hope to assess how multiple pressures may interact to cause changes in living marine resources including the ecological and economic costs and trade-offs of different spatial management strategies. Given the strengths and weaknesses of the various types of models reviewed here, confidence in projections of changes in the distribution and productivity of living marine resources will be increased by assessing model structural uncertainty through biological ensemble modelling.

  3. Distributed Monitoring of the R(sup 2) Statistic for Linear Regression

    NASA Technical Reports Server (NTRS)

    Bhaduri, Kanishka; Das, Kamalika; Giannella, Chris R.

    2011-01-01

    The problem of monitoring a multivariate linear regression model is relevant in studying the evolving relationship between a set of input variables (features) and one or more dependent target variables. This problem becomes challenging for large scale data in a distributed computing environment when only a subset of instances is available at individual nodes and the local data changes frequently. Data centralization and periodic model recomputation can add high overhead to tasks like anomaly detection in such dynamic settings. Therefore, the goal is to develop techniques for monitoring and updating the model over the union of all nodes data in a communication-efficient fashion. Correctness guarantees on such techniques are also often highly desirable, especially in safety-critical application scenarios. In this paper we develop DReMo a distributed algorithm with very low resource overhead, for monitoring the quality of a regression model in terms of its coefficient of determination (R2 statistic). When the nodes collectively determine that R2 has dropped below a fixed threshold, the linear regression model is recomputed via a network-wide convergecast and the updated model is broadcast back to all nodes. We show empirically, using both synthetic and real data, that our proposed method is highly communication-efficient and scalable, and also provide theoretical guarantees on correctness.

  4. Modeling evaporation of Jet A, JP-7 and RP-1 drops at 1 to 15 bars

    NASA Technical Reports Server (NTRS)

    Harstad, K.; Bellan, J.

    2003-01-01

    A model describing the evaportion of an isolated drop of a multicomponent fuel containing hundreds of species has been developed. The model is based on Continuous Thermodynamics concepts wherein the composition of a fuel is statistically described using a Probability Distribution Function (PDF).

  5. Statistical models for predicting pair dispersion and particle clustering in isotropic turbulence and their applications

    NASA Astrophysics Data System (ADS)

    Zaichik, Leonid I.; Alipchenkov, Vladimir M.

    2009-10-01

    The purpose of this paper is twofold: (i) to advance and extend the statistical two-point models of pair dispersion and particle clustering in isotropic turbulence that were previously proposed by Zaichik and Alipchenkov (2003 Phys. Fluids15 1776-87 2007 Phys. Fluids 19, 113308) and (ii) to present some applications of these models. The models developed are based on a kinetic equation for the two-point probability density function of the relative velocity distribution of two particles. These models predict the pair relative velocity statistics and the preferential accumulation of heavy particles in stationary and decaying homogeneous isotropic turbulent flows. Moreover, the models are applied to predict the effect of particle clustering on turbulent collisions, sedimentation and intensity of microwave radiation as well as to calculate the mean filtered subgrid stress of the particulate phase. Model predictions are compared with direct numerical simulations and experimental measurements.

  6. Species distribution models: A comparison of statistical approaches for livestock and disease epidemics.

    PubMed

    Hollings, Tracey; Robinson, Andrew; van Andel, Mary; Jewell, Chris; Burgman, Mark

    2017-01-01

    In livestock industries, reliable up-to-date spatial distribution and abundance records for animals and farms are critical for governments to manage and respond to risks. Yet few, if any, countries can afford to maintain comprehensive, up-to-date agricultural census data. Statistical modelling can be used as a proxy for such data but comparative modelling studies have rarely been undertaken for livestock populations. Widespread species, including livestock, can be difficult to model effectively due to complex spatial distributions that do not respond predictably to environmental gradients. We assessed three machine learning species distribution models (SDM) for their capacity to estimate national-level farm animal population numbers within property boundaries: boosted regression trees (BRT), random forests (RF) and K-nearest neighbour (K-NN). The models were built from a commercial livestock database and environmental and socio-economic predictor data for New Zealand. We used two spatial data stratifications to test (i) support for decision making in an emergency response situation, and (ii) the ability for the models to predict to new geographic regions. The performance of the three model types varied substantially, but the best performing models showed very high accuracy. BRTs had the best performance overall, but RF performed equally well or better in many simulations; RFs were superior at predicting livestock numbers for all but very large commercial farms. K-NN performed poorly relative to both RF and BRT in all simulations. The predictions of both multi species and single species models for farms and within hypothetical quarantine zones were very close to observed data. These models are generally applicable for livestock estimation with broad applications in disease risk modelling, biosecurity, policy and planning.

  7. Species distribution models: A comparison of statistical approaches for livestock and disease epidemics

    PubMed Central

    Robinson, Andrew; van Andel, Mary; Jewell, Chris; Burgman, Mark

    2017-01-01

    In livestock industries, reliable up-to-date spatial distribution and abundance records for animals and farms are critical for governments to manage and respond to risks. Yet few, if any, countries can afford to maintain comprehensive, up-to-date agricultural census data. Statistical modelling can be used as a proxy for such data but comparative modelling studies have rarely been undertaken for livestock populations. Widespread species, including livestock, can be difficult to model effectively due to complex spatial distributions that do not respond predictably to environmental gradients. We assessed three machine learning species distribution models (SDM) for their capacity to estimate national-level farm animal population numbers within property boundaries: boosted regression trees (BRT), random forests (RF) and K-nearest neighbour (K-NN). The models were built from a commercial livestock database and environmental and socio-economic predictor data for New Zealand. We used two spatial data stratifications to test (i) support for decision making in an emergency response situation, and (ii) the ability for the models to predict to new geographic regions. The performance of the three model types varied substantially, but the best performing models showed very high accuracy. BRTs had the best performance overall, but RF performed equally well or better in many simulations; RFs were superior at predicting livestock numbers for all but very large commercial farms. K-NN performed poorly relative to both RF and BRT in all simulations. The predictions of both multi species and single species models for farms and within hypothetical quarantine zones were very close to observed data. These models are generally applicable for livestock estimation with broad applications in disease risk modelling, biosecurity, policy and planning. PMID:28837685

  8. A Hybrid Semi-supervised Classification Scheme for Mining Multisource Geospatial Data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Vatsavai, Raju; Bhaduri, Budhendra L

    2011-01-01

    Supervised learning methods such as Maximum Likelihood (ML) are often used in land cover (thematic) classification of remote sensing imagery. ML classifier relies exclusively on spectral characteristics of thematic classes whose statistical distributions (class conditional probability densities) are often overlapping. The spectral response distributions of thematic classes are dependent on many factors including elevation, soil types, and ecological zones. A second problem with statistical classifiers is the requirement of large number of accurate training samples (10 to 30 |dimensions|), which are often costly and time consuming to acquire over large geographic regions. With the increasing availability of geospatial databases, itmore » is possible to exploit the knowledge derived from these ancillary datasets to improve classification accuracies even when the class distributions are highly overlapping. Likewise newer semi-supervised techniques can be adopted to improve the parameter estimates of statistical model by utilizing a large number of easily available unlabeled training samples. Unfortunately there is no convenient multivariate statistical model that can be employed for mulitsource geospatial databases. In this paper we present a hybrid semi-supervised learning algorithm that effectively exploits freely available unlabeled training samples from multispectral remote sensing images and also incorporates ancillary geospatial databases. We have conducted several experiments on real datasets, and our new hybrid approach shows over 25 to 35% improvement in overall classification accuracy over conventional classification schemes.« less

  9. Attempting to physically explain space-time correlation of extremes

    NASA Astrophysics Data System (ADS)

    Bernardara, Pietro; Gailhard, Joel

    2010-05-01

    Spatial and temporal clustering of hydro-meteorological extreme events is scientific evidence. Moreover, the statistical parameters characterizing their local frequencies of occurrence show clear spatial patterns. Thus, in order to robustly assess the hydro-meteorological hazard, statistical models need to be able to take into account spatial and temporal dependencies. Statistical models considering long term correlation for quantifying and qualifying temporal and spatial dependencies are available, such as multifractal approach. Furthermore, the development of regional frequency analysis techniques allows estimating the frequency of occurrence of extreme events taking into account spatial patterns on the extreme quantiles behaviour. However, in order to understand the origin of spatio-temporal clustering, an attempt to find physical explanation should be done. Here, some statistical evidences of spatio-temporal correlation and spatial patterns of extreme behaviour are given on a large database of more than 400 rainfall and discharge series in France. In particular, the spatial distribution of multifractal and Generalized Pareto distribution parameters shows evident correlation patterns in the behaviour of frequency of occurrence of extremes. It is then shown that the identification of atmospheric circulation pattern (weather types) can physically explain the temporal clustering of extreme rainfall events (seasonality) and the spatial pattern of the frequency of occurrence. Moreover, coupling this information with the hydrological modelization of a watershed (as in the Schadex approach) an explanation of spatio-temporal distribution of extreme discharge can also be provided. We finally show that a hydro-meteorological approach (as the Schadex approach) can explain and take into account space and time dependencies of hydro-meteorological extreme events.

  10. Fundamental questions of earthquake statistics, source behavior, and the estimation of earthquake probabilities from possible foreshocks

    USGS Publications Warehouse

    Michael, Andrew J.

    2012-01-01

    Estimates of the probability that an ML 4.8 earthquake, which occurred near the southern end of the San Andreas fault on 24 March 2009, would be followed by an M 7 mainshock over the following three days vary from 0.0009 using a Gutenberg–Richter model of aftershock statistics (Reasenberg and Jones, 1989) to 0.04 using a statistical model of foreshock behavior and long‐term estimates of large earthquake probabilities, including characteristic earthquakes (Agnew and Jones, 1991). I demonstrate that the disparity between the existing approaches depends on whether or not they conform to Gutenberg–Richter behavior. While Gutenberg–Richter behavior is well established over large regions, it could be violated on individual faults if they have characteristic earthquakes or over small areas if the spatial distribution of large‐event nucleations is disproportional to the rate of smaller events. I develop a new form of the aftershock model that includes characteristic behavior and combines the features of both models. This new model and the older foreshock model yield the same results when given the same inputs, but the new model has the advantage of producing probabilities for events of all magnitudes, rather than just for events larger than the initial one. Compared with the aftershock model, the new model has the advantage of taking into account long‐term earthquake probability models. Using consistent parameters, the probability of an M 7 mainshock on the southernmost San Andreas fault is 0.0001 for three days from long‐term models and the clustering probabilities following the ML 4.8 event are 0.00035 for a Gutenberg–Richter distribution and 0.013 for a characteristic‐earthquake magnitude–frequency distribution. Our decisions about the existence of characteristic earthquakes and how large earthquakes nucleate have a first‐order effect on the probabilities obtained from short‐term clustering models for these large events.

  11. Noise and the statistical mechanics of distributed transport in a colony of interacting agents

    NASA Astrophysics Data System (ADS)

    Katifori, Eleni; Graewer, Johannes; Ronellenfitsch, Henrik; Mazza, Marco G.

    Inspired by the process of liquid food distribution between individuals in an ant colony, in this work we consider the statistical mechanics of resource dissemination between interacting agents with finite carrying capacity. The agents move inside a confined space (nest), pick up the food at the entrance of the nest and share it with other agents that they encounter. We calculate analytically and via a series of simulations the global food intake rate for the whole colony as well as observables describing how uniformly the food is distributed within the nest. Our model and predictions provide a useful benchmark to assess which strategies can lead to efficient food distribution within the nest and also to what level the observed food uptake rates and efficiency in food distribution are due to stochastic fluctuations or specific food exchange strategies by an actual ant colony.

  12. Modeling the sound transmission between rooms coupled through partition walls by using a diffusion model.

    PubMed

    Billon, Alexis; Foy, Cédric; Picaut, Judicaël; Valeau, Vincent; Sakout, Anas

    2008-06-01

    In this paper, a modification of the diffusion model for room acoustics is proposed to account for sound transmission between two rooms, a source room and an adjacent room, which are coupled through a partition wall. A system of two diffusion equations, one for each room, together with a set of two boundary conditions, one for the partition wall and one for the other walls of a room, is obtained and numerically solved. The modified diffusion model is validated by numerical comparisons with the statistical theory for several coupled-room configurations by varying the coupling area surface, the absorption coefficient of each room, and the volume of the adjacent room. An experimental comparison is also carried out for two coupled classrooms. The modified diffusion model results agree very well with both the statistical theory and the experimental data. The diffusion model can then be used as an alternative to the statistical theory, especially when the statistical theory is not applicable, that is, when the reverberant sound field is not diffuse. Moreover, the diffusion model allows the prediction of the spatial distribution of sound energy within each coupled room, while the statistical theory gives only one sound level for each room.

  13. Statistics of SU(5) D-brane models on a type II orientifold

    NASA Astrophysics Data System (ADS)

    Gmeiner, Florian; Stein, Maren

    2006-06-01

    We perform a statistical analysis of models with SU(5) and flipped SU(5) gauge group in a type II orientifold setup. We investigate the distribution and correlation of properties of these models, including the number of generations and the hidden sector gauge group. Compared to the recent analysis [F. Gmeiner, R. Blumenhagen, G. Honecker, D. Lüst, and T. Weigand, J. High Energy Phys.JHEPFG1029-8479 01 (2006) 004; F. Gmeiner, Fortschr. Phys.FPYKA60015-8208 54, 391 (2006).10.1088/1126-6708/2006/01/004] of models with a standard model-like gauge group, we find very similar results.

  14. A statistical model for interpreting computerized dynamic posturography data

    NASA Technical Reports Server (NTRS)

    Feiveson, Alan H.; Metter, E. Jeffrey; Paloski, William H.

    2002-01-01

    Computerized dynamic posturography (CDP) is widely used for assessment of altered balance control. CDP trials are quantified using the equilibrium score (ES), which ranges from zero to 100, as a decreasing function of peak sway angle. The problem of how best to model and analyze ESs from a controlled study is considered. The ES often exhibits a skewed distribution in repeated trials, which can lead to incorrect inference when applying standard regression or analysis of variance models. Furthermore, CDP trials are terminated when a patient loses balance. In these situations, the ES is not observable, but is assigned the lowest possible score--zero. As a result, the response variable has a mixed discrete-continuous distribution, further compromising inference obtained by standard statistical methods. Here, we develop alternative methodology for analyzing ESs under a stochastic model extending the ES to a continuous latent random variable that always exists, but is unobserved in the event of a fall. Loss of balance occurs conditionally, with probability depending on the realized latent ES. After fitting the model by a form of quasi-maximum-likelihood, one may perform statistical inference to assess the effects of explanatory variables. An example is provided, using data from the NIH/NIA Baltimore Longitudinal Study on Aging.

  15. From Sub-basin to Grid Scale Soil Moisture Disaggregation in SMART, A Semi-distributed Hydrologic Modeling Framework

    NASA Astrophysics Data System (ADS)

    Ajami, H.; Sharma, A.

    2016-12-01

    A computationally efficient, semi-distributed hydrologic modeling framework is developed to simulate water balance at a catchment scale. The Soil Moisture and Runoff simulation Toolkit (SMART) is based upon the delineation of contiguous and topologically connected Hydrologic Response Units (HRUs). In SMART, HRUs are delineated using thresholds obtained from topographic and geomorphic analysis of a catchment, and simulation elements are distributed cross sections or equivalent cross sections (ECS) delineated in first order sub-basins. ECSs are formulated by aggregating topographic and physiographic properties of the part or entire first order sub-basins to further reduce computational time in SMART. Previous investigations using SMART have shown that temporal dynamics of soil moisture are well captured at a HRU level using the ECS delineation approach. However, spatial variability of soil moisture within a given HRU is ignored. Here, we examined a number of disaggregation schemes for soil moisture distribution in each HRU. The disaggregation schemes are either based on topographic based indices or a covariance matrix obtained from distributed soil moisture simulations. To assess the performance of the disaggregation schemes, soil moisture simulations from an integrated land surface-groundwater model, ParFlow.CLM in Baldry sub-catchment, Australia are used. ParFlow is a variably saturated sub-surface flow model that is coupled to the Common Land Model (CLM). Our results illustrate that the statistical disaggregation scheme performs better than the methods based on topographic data in approximating soil moisture distribution at a 60m scale. Moreover, the statistical disaggregation scheme maintains temporal correlation of simulated daily soil moisture while preserves the mean sub-basin soil moisture. Future work is focused on assessing the performance of this scheme in catchments with various topographic and climate settings.

  16. Patch-Based Generative Shape Model and MDL Model Selection for Statistical Analysis of Archipelagos

    NASA Astrophysics Data System (ADS)

    Ganz, Melanie; Nielsen, Mads; Brandt, Sami

    We propose a statistical generative shape model for archipelago-like structures. These kind of structures occur, for instance, in medical images, where our intention is to model the appearance and shapes of calcifications in x-ray radio graphs. The generative model is constructed by (1) learning a patch-based dictionary for possible shapes, (2) building up a time-homogeneous Markov model to model the neighbourhood correlations between the patches, and (3) automatic selection of the model complexity by the minimum description length principle. The generative shape model is proposed as a probability distribution of a binary image where the model is intended to facilitate sequential simulation. Our results show that a relatively simple model is able to generate structures visually similar to calcifications. Furthermore, we used the shape model as a shape prior in the statistical segmentation of calcifications, where the area overlap with the ground truth shapes improved significantly compared to the case where the prior was not used.

  17. heterogeneous mixture distributions for multi-source extreme rainfall

    NASA Astrophysics Data System (ADS)

    Ouarda, T.; Shin, J.; Lee, T. S.

    2013-12-01

    Mixture distributions have been used to model hydro-meteorological variables showing mixture distributional characteristics, e.g. bimodality. Homogeneous mixture (HOM) distributions (e.g. Normal-Normal and Gumbel-Gumbel) have been traditionally applied to hydro-meteorological variables. However, there is no reason to restrict the mixture distribution as the combination of one identical type. It might be beneficial to characterize the statistical behavior of hydro-meteorological variables from the application of heterogeneous mixture (HTM) distributions such as Normal-Gamma. In the present work, we focus on assessing the suitability of HTM distributions for the frequency analysis of hydro-meteorological variables. In the present work, in order to estimate the parameters of HTM distributions, the meta-heuristic algorithm (Genetic Algorithm) is employed to maximize the likelihood function. In the present study, a number of distributions are compared, including the Gamma-Extreme value type-one (EV1) HTM distribution, the EV1-EV1 HOM distribution, and EV1 distribution. The proposed distribution models are applied to the annual maximum precipitation data in South Korea. The Akaike Information Criterion (AIC), the root mean squared errors (RMSE) and the log-likelihood are used as measures of goodness-of-fit of the tested distributions. Results indicate that the HTM distribution (Gamma-EV1) presents the best fitness. The HTM distribution shows significant improvement in the estimation of quantiles corresponding to the 20-year return period. It is shown that extreme rainfall in the coastal region of South Korea presents strong heterogeneous mixture distributional characteristics. Results indicate that HTM distributions are a good alternative for the frequency analysis of hydro-meteorological variables when disparate statistical characteristics are presented.

  18. Origin of Pareto-like spatial distributions in ecosystems.

    PubMed

    Manor, Alon; Shnerb, Nadav M

    2008-12-31

    Recent studies of cluster distribution in various ecosystems revealed Pareto statistics for the size of spatial colonies. These results were supported by cellular automata simulations that yield robust criticality for endogenous pattern formation based on positive feedback. We show that this patch statistics is a manifestation of the law of proportionate effect. Mapping the stochastic model to a Markov birth-death process, the transition rates are shown to scale linearly with cluster size. This mapping provides a connection between patch statistics and the dynamics of the ecosystem; the "first passage time" for different colonies emerges as a powerful tool that discriminates between endogenous and exogenous clustering mechanisms. Imminent catastrophic shifts (such as desertification) manifest themselves in a drastic change of the stability properties of spatial colonies.

  19. Wavelet analysis of polarization maps of polycrystalline biological fluids networks

    NASA Astrophysics Data System (ADS)

    Ushenko, Y. A.

    2011-12-01

    The optical model of human joints synovial fluid is proposed. The statistic (statistic moments), correlation (autocorrelation function) and self-similar (Log-Log dependencies of power spectrum) structure of polarization two-dimensional distributions (polarization maps) of synovial fluid has been analyzed. It has been shown that differentiation of polarization maps of joint synovial fluid with different physiological state samples is expected of scale-discriminative analysis. To mark out of small-scale domain structure of synovial fluid polarization maps, the wavelet analysis has been used. The set of parameters, which characterize statistic, correlation and self-similar structure of wavelet coefficients' distributions of different scales of polarization domains for diagnostics and differentiation of polycrystalline network transformation connected with the pathological processes, has been determined.

  20. Nonparametric statistical modeling of binary star separations

    NASA Technical Reports Server (NTRS)

    Heacox, William D.; Gathright, John

    1994-01-01

    We develop a comprehensive statistical model for the distribution of observed separations in binary star systems, in terms of distributions of orbital elements, projection effects, and distances to systems. We use this model to derive several diagnostics for estimating the completeness of imaging searches for stellar companions, and the underlying stellar multiplicities. In application to recent imaging searches for low-luminosity companions to nearby M dwarf stars, and for companions to young stars in nearby star-forming regions, our analyses reveal substantial uncertainty in estimates of stellar multiplicity. For binary stars with late-type dwarf companions, semimajor axes appear to be distributed approximately as a(exp -1) for values ranging from about one to several thousand astronomical units. About one-quarter of the companions to field F and G dwarf stars have semimajor axes less than 1 AU, and about 15% lie beyond 1000 AU. The geometric efficiency (fraction of companions imaged onto the detector) of imaging searches is nearly independent of distances to program stars and orbital eccentricities, and varies only slowly with detector spatial limitations.

Top