binomial distributions: Topics by Science.gov

Sample records for binomial distributions

Zero-truncated negative binomial - Erlang distribution

NASA Astrophysics Data System (ADS)

Bodhisuwan, Winai; Pudprommarat, Chookait; Bodhisuwan, Rujira; Saothayanun, Luckhana

2017-11-01

The zero-truncated negative binomial-Erlang distribution is introduced. It is developed from negative binomial-Erlang distribution. In this work, the probability mass function is derived and some properties are included. The parameters of the zero-truncated negative binomial-Erlang distribution are estimated by using the maximum likelihood estimation. Finally, the proposed distribution is applied to real data, the number of methamphetamine in the Bangkok, Thailand. Based on the results, it shows that the zero-truncated negative binomial-Erlang distribution provided a better fit than the zero-truncated Poisson, zero-truncated negative binomial, zero-truncated generalized negative-binomial and zero-truncated Poisson-Lindley distributions for this data.
Distinguishing between Binomial, Hypergeometric and Negative Binomial Distributions

ERIC Educational Resources Information Center

Wroughton, Jacqueline; Cole, Tarah

2013-01-01

Recognizing the differences between three discrete distributions (Binomial, Hypergeometric and Negative Binomial) can be challenging for students. We present an activity designed to help students differentiate among these distributions. In addition, we present assessment results in the form of pre- and post-tests that were designed to assess the…
Library Book Circulation and the Beta-Binomial Distribution.

ERIC Educational Resources Information Center

Gelman, E.; Sichel, H. S.

1987-01-01

Argues that library book circulation is a binomial rather than a Poisson process, and that individual book popularities are continuous beta distributions. Three examples demonstrate the superiority of beta over negative binomial distribution, and it is suggested that a bivariate-binomial process would be helpful in predicting future book…
Comparison of multiplicity distributions to the negative binomial distribution in muon-proton scattering

NASA Astrophysics Data System (ADS)

Arneodo, M.; Arvidson, A.; Aubert, J. J.; Badełek, B.; Beaufays, J.; Bee, C. P.; Benchouk, C.; Berghoff, G.; Bird, I.; Blum, D.; Böhm, E.; de Bouard, X.; Brasse, F. W.; Braun, H.; Broll, C.; Brown, S.; Brück, H.; Calen, H.; Chima, J. S.; Ciborowski, J.; Clifft, R.; Coignet, G.; Combley, F.; Coughlan, J.; D'Agostini, G.; Dahlgren, S.; Dengler, F.; Derado, I.; Dreyer, T.; Drees, J.; Düren, M.; Eckardt, V.; Edwards, A.; Edwards, M.; Ernst, T.; Eszes, G.; Favier, J.; Ferrero, M. I.; Figiel, J.; Flauger, W.; Foster, J.; Ftáčnik, J.; Gabathuler, E.; Gajewski, J.; Gamet, R.; Gayler, J.; Geddes, N.; Grafström, P.; Grard, F.; Haas, J.; Hagberg, E.; Hasert, F. J.; Hayman, P.; Heusse, P.; Jaffré, M.; Jachołkowska, A.; Janata, F.; Jancsó, G.; Johnson, A. S.; Kabuss, E. M.; Kellner, G.; Korbel, V.; Krüger, J.; Kullander, S.; Landgraf, U.; Lanske, D.; Loken, J.; Long, K.; Maire, M.; Malecki, P.; Manz, A.; Maselli, S.; Mohr, W.; Montanet, F.; Montgomery, H. E.; Nagy, E.; Nassalski, J.; Norton, P. R.; Oakham, F. G.; Osborne, A. M.; Pascaud, C.; Pawlik, B.; Payre, P.; Peroni, C.; Peschel, H.; Pessard, H.; Pettinghale, J.; Pietrzyk, B.; Pietrzyk, U.; Pönsgen, B.; Pötsch, M.; Renton, P.; Ribarics, P.; Rith, K.; Rondio, E.; Sandacz, A.; Scheer, M.; Schlagböhmer, A.; Schiemann, H.; Schmitz, N.; Schneegans, M.; Schneider, A.; Scholz, M.; Schröder, T.; Schultze, K.; Sloan, T.; Stier, H. E.; Studt, M.; Taylor, G. N.; Thénard, J. M.; Thompson, J. C.; de La Torre, A.; Toth, J.; Urban, L.; Urban, L.; Wallucks, W.; Whalley, M.; Wheeler, S.; Williams, W. S. C.; Wimpenny, S. J.; Windmolders, R.; Wolf, G.

1987-09-01

The multiplicity distributions of charged hadrons produced in the deep inelastic muon-proton scattering at 280 GeV are analysed in various rapidity intervals, as a function of the total hadronic centre of mass energy W ranging from 4 20 GeV. Multiplicity distributions for the backward and forward hemispheres are also analysed separately. The data can be well parameterized by binomial distributions, extending their range of applicability to the case of lepton-proton scattering. The energy and the rapidity dependence of the parameters is presented and a smooth transition from the negative binomial distribution via Poissonian to the ordinary binomial is observed.
On the p, q-binomial distribution and the Ising model

NASA Astrophysics Data System (ADS)

Lundow, P. H.; Rosengren, A.

2010-08-01

We employ p, q-binomial coefficients, a generalisation of the binomial coefficients, to describe the magnetisation distributions of the Ising model. For the complete graph this distribution corresponds exactly to the limit case p = q. We apply our investigation to the simple d-dimensional lattices for d = 1, 2, 3, 4, 5 and fit p, q-binomial distributions to our data, some of which are exact but most are sampled. For d = 1 and d = 5, the magnetisation distributions are remarkably well-fitted by p,q-binomial distributions. For d = 4 we are only slightly less successful, while for d = 2, 3 we see some deviations (with exceptions!) between the p, q-binomial and the Ising distribution. However, at certain temperatures near T c the statistical moments of the fitted distribution agree with the moments of the sampled data within the precision of sampling. We begin the paper by giving results of the behaviour of the p, q-distribution and its moment growth exponents given a certain parameterisation of p, q. Since the moment exponents are known for the Ising model (or at least approximately for d = 3) we can predict how p, q should behave and compare this to our measured p, q. The results speak in favour of the p, q-binomial distribution's correctness regarding its general behaviour in comparison to the Ising model. The full extent to which they correctly model the Ising distribution, however, is not settled.
Distribution-free Inference of Zero-inated Binomial Data for Longitudinal Studies.

PubMed

He, H; Wang, W J; Hu, J; Gallop, R; Crits-Christoph, P; Xia, Y L

2015-10-01

Count reponses with structural zeros are very common in medical and psychosocial research, especially in alcohol and HIV research, and the zero-inflated poisson (ZIP) and zero-inflated negative binomial (ZINB) models are widely used for modeling such outcomes. However, as alcohol drinking outcomes such as days of drinkings are counts within a given period, their distributions are bounded above by an upper limit (total days in the period) and thus inherently follow a binomial or zero-inflated binomial (ZIB) distribution, rather than a Poisson or zero-inflated Poisson (ZIP) distribution, in the presence of structural zeros. In this paper, we develop a new semiparametric approach for modeling zero-inflated binomial (ZIB)-like count responses for cross-sectional as well as longitudinal data. We illustrate this approach with both simulated and real study data.
The magnetisation distribution of the Ising model - a new approach

NASA Astrophysics Data System (ADS)

Hakan Lundow, Per; Rosengren, Anders

2010-03-01

A completely new approach to the Ising model in 1 to 5 dimensions is developed. We employ a generalisation of the binomial coefficients to describe the magnetisation distributions of the Ising model. For the complete graph this distribution is exact. For simple lattices of dimensions d=1 and d=5 the magnetisation distributions are remarkably well-fitted by the generalized binomial distributions. For d=4 we are only slightly less successful, while for d=2,3 we see some deviations (with exceptions!) between the generalized binomial and the Ising distribution. The results speak in favour of the generalized binomial distribution's correctness regarding their general behaviour in comparison to the Ising model. A theoretical analysis of the distribution's moments also lends support their being correct asymptotically, including the logarithmic corrections in d=4. The full extent to which they correctly model the Ising distribution, and for which graph families, is not settled though.
C-5A Cargo Deck Low-Frequency Vibration Environment

DTIC Science & Technology

1975-02-01

SAMPLE VIBRATION CALCULATIONS 13 1. Normal Distribution 13 2. Binomial Distribution 15 IV CONCLUSIONS 17 -! V REFERENCES 18 t: FEiCENDIJJ PAGS 2LANKNOT...Calculation for Binomial Distribution 108 (Vertical Acceleration, Right Rear Cargo Deck) xi I. INTRODUCTION The availability of large transport...the end of taxi. These peaks could then be used directly to compile the probability of occurrence of specific values of acceleration using the binomial
Analysis of generalized negative binomial distributions attached to hyperbolic Landau levels

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chhaiba, Hassan, E-mail: chhaiba.hassan@gmail.com; Demni, Nizar, E-mail: nizar.demni@univ-rennes1.fr; Mouayn, Zouhair, E-mail: mouayn@fstbm.ac.ma

2016-07-15

To each hyperbolic Landau level of the Poincaré disc is attached a generalized negative binomial distribution. In this paper, we compute the moment generating function of this distribution and supply its atomic decomposition as a perturbation of the negative binomial distribution by a finitely supported measure. Using the Mandel parameter, we also discuss the nonclassical nature of the associated coherent states. Next, we derive a Lévy-Khintchine-type representation of its characteristic function when the latter does not vanish and deduce that it is quasi-infinitely divisible except for the lowest hyperbolic Landau level corresponding to the negative binomial distribution. By considering themore » total variation of the obtained quasi-Lévy measure, we introduce a new infinitely divisible distribution for which we derive the characteristic function.« less
Sampling--how big a sample?

PubMed

Aitken, C G

1999-07-01

It is thought that, in a consignment of discrete units, a certain proportion of the units contain illegal material. A sample of the consignment is to be inspected. Various methods for the determination of the sample size are compared. The consignment will be considered as a random sample from some super-population of units, a certain proportion of which contain drugs. For large consignments, a probability distribution, known as the beta distribution, for the proportion of the consignment which contains illegal material is obtained. This distribution is based on prior beliefs about the proportion. Under certain specific conditions the beta distribution gives the same numerical results as an approach based on the binomial distribution. The binomial distribution provides a probability for the number of units in a sample which contain illegal material, conditional on knowing the proportion of the consignment which contains illegal material. This is in contrast to the beta distribution which provides probabilities for the proportion of a consignment which contains illegal material, conditional on knowing the number of units in the sample which contain illegal material. The interpretation when the beta distribution is used is much more intuitively satisfactory. It is also much more flexible in its ability to cater for prior beliefs which may vary given the different circumstances of different crimes. For small consignments, a distribution, known as the beta-binomial distribution, for the number of units in the consignment which are found to contain illegal material, is obtained, based on prior beliefs about the number of units in the consignment which are thought to contain illegal material. As with the beta and binomial distributions for large samples, it is shown that, in certain specific conditions, the beta-binomial and hypergeometric distributions give the same numerical results. However, the beta-binomial distribution, as with the beta distribution, has a more intuitively satisfactory interpretation and greater flexibility. The beta and the beta-binomial distributions provide methods for the determination of the minimum sample size to be taken from a consignment in order to satisfy a certain criterion. The criterion requires the specification of a proportion and a probability.
Choosing a Transformation in Analyses of Insect Counts from Contagious Distributions with Low Means

Treesearch

W.D. Pepper; S.J. Zarnoch; G.L. DeBarr; P. de Groot; C.D. Tangren

1997-01-01

Guidelines based on computer simulation are suggested for choosing a transformation of insect counts from negative binomial distributions with low mean counts and high levels of contagion. Typical values and ranges of negative binomial model parameters were determined by fitting the model to data from 19 entomological field studies. Random sampling of negative binomial...
A Monte Carlo Risk Analysis of Life Cycle Cost Prediction.

DTIC Science & Technology

1975-09-01

process which occurs with each FLU failure. With this in mind there is no alternative other than the binomial distribution. 24 GOR/SM/75D-6 With all of...Weibull distribution of failures as selected by user. For each failure of the ith FLU, the model then samples from the binomial distribution to deter- mine...which is sampled from the binomial . Neither of the two conditions for normality are met, i.e., that RTS Ie close to .5 and the number of samples close
The Binomial Distribution in Shooting

ERIC Educational Resources Information Center

Chalikias, Miltiadis S.

2009-01-01

The binomial distribution is used to predict the winner of the 49th International Shooting Sport Federation World Championship in double trap shooting held in 2006 in Zagreb, Croatia. The outcome of the competition was definitely unexpected.
Phase transition and information cascade in a voting model

NASA Astrophysics Data System (ADS)

Hisakado, M.; Mori, S.

2010-08-01

In this paper, we introduce a voting model that is similar to a Keynesian beauty contest and analyse it from a mathematical point of view. There are two types of voters—copycat and independent—and two candidates. Our voting model is a binomial distribution (independent voters) doped in a beta binomial distribution (copycat voters). We find that the phase transition in this system is at the upper limit of t, where t is the time (or the number of the votes). Our model contains three phases. If copycats constitute a majority or even half of the total voters, the voting rate converges more slowly than it would in a binomial distribution. If independents constitute the majority of voters, the voting rate converges at the same rate as it would in a binomial distribution. We also study why it is difficult to estimate the conclusion of a Keynesian beauty contest when there is an information cascade.
Estimating the Parameters of the Beta-Binomial Distribution.

ERIC Educational Resources Information Center

Wilcox, Rand R.

1979-01-01

For some situations the beta-binomial distribution might be used to describe the marginal distribution of test scores for a particular population of examinees. Several different methods of approximating the maximum likelihood estimate were investigated, and it was found that the Newton-Raphson method should be used when it yields admissable…
Reliability of environmental sampling culture results using the negative binomial intraclass correlation coefficient.

PubMed

Aly, Sharif S; Zhao, Jianyang; Li, Ben; Jiang, Jiming

2014-01-01

The Intraclass Correlation Coefficient (ICC) is commonly used to estimate the similarity between quantitative measures obtained from different sources. Overdispersed data is traditionally transformed so that linear mixed model (LMM) based ICC can be estimated. A common transformation used is the natural logarithm. The reliability of environmental sampling of fecal slurry on freestall pens has been estimated for Mycobacterium avium subsp. paratuberculosis using the natural logarithm transformed culture results. Recently, the negative binomial ICC was defined based on a generalized linear mixed model for negative binomial distributed data. The current study reports on the negative binomial ICC estimate which includes fixed effects using culture results of environmental samples. Simulations using a wide variety of inputs and negative binomial distribution parameters (r; p) showed better performance of the new negative binomial ICC compared to the ICC based on LMM even when negative binomial data was logarithm, and square root transformed. A second comparison that targeted a wider range of ICC values showed that the mean of estimated ICC closely approximated the true ICC.
Simulation on Poisson and negative binomial models of count road accident modeling

NASA Astrophysics Data System (ADS)

Sapuan, M. S.; Razali, A. M.; Zamzuri, Z. H.; Ibrahim, K.

2016-11-01

Accident count data have often been shown to have overdispersion. On the other hand, the data might contain zero count (excess zeros). The simulation study was conducted to create a scenarios which an accident happen in T-junction with the assumption the dependent variables of generated data follows certain distribution namely Poisson and negative binomial distribution with different sample size of n=30 to n=500. The study objective was accomplished by fitting Poisson regression, negative binomial regression and Hurdle negative binomial model to the simulated data. The model validation was compared and the simulation result shows for each different sample size, not all model fit the data nicely even though the data generated from its own distribution especially when the sample size is larger. Furthermore, the larger sample size indicates that more zeros accident count in the dataset.
[The reentrant binomial model of nuclear anomalies growth in rhabdomyosarcoma RA-23 cell populations under increasing doze of rare ionizing radiation].

PubMed

Alekseeva, N P; Alekseev, A O; Vakhtin, Iu B; Kravtsov, V Iu; Kuzovatov, S N; Skorikova, T I

2008-01-01

Distributions of nuclear morphology anomalies in transplantable rabdomiosarcoma RA-23 cell populations were investigated under effect of ionizing radiation from 0 to 45 Gy. Internuclear bridges, nuclear protrusions and dumbbell-shaped nuclei were accepted for morphological anomalies. Empirical distributions of the number of anomalies per 100 nuclei were used. The adequate model of reentrant binomial distribution has been found. The sum of binomial random variables with binomial number of summands has such distribution. Averages of these random variables were named, accordingly, internal and external average reentrant components. Their maximum likelihood estimations were received. Statistical properties of these estimations were investigated by means of statistical modeling. It has been received that at equally significant correlation between the radiation dose and the average of nuclear anomalies in cell populations after two-three cellular cycles from the moment of irradiation in vivo the irradiation doze significantly correlates with internal average reentrant component, and in remote descendants of cell transplants irradiated in vitro - with external one.
Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach

PubMed Central

Chen, Yong; Hong, Chuan; Ning, Yang; Su, Xiao

2018-01-01

When conducting a meta-analysis of studies with bivariate binary outcomes, challenges arise when the within-study correlation and between-study heterogeneity should be taken into account. In this paper, we propose a marginal beta-binomial model for the meta-analysis of studies with binary outcomes. This model is based on the composite likelihood approach, and has several attractive features compared to the existing models such as bivariate generalized linear mixed model (Chu and Cole, 2006) and Sarmanov beta-binomial model (Chen et al., 2012). The advantages of the proposed marginal model include modeling the probabilities in the original scale, not requiring any transformation of probabilities or any link function, having closed-form expression of likelihood function, and no constraints on the correlation parameter. More importantly, since the marginal beta-binomial model is only based on the marginal distributions, it does not suffer from potential misspecification of the joint distribution of bivariate study-specific probabilities. Such misspecification is difficult to detect and can lead to biased inference using currents methods. We compare the performance of the marginal beta-binomial model with the bivariate generalized linear mixed model and the Sarmanov beta-binomial model by simulation studies. Interestingly, the results show that the marginal beta-binomial model performs better than the Sarmanov beta-binomial model, whether or not the true model is Sarmanov beta-binomial, and the marginal beta-binomial model is more robust than the bivariate generalized linear mixed model under model misspecifications. Two meta-analyses of diagnostic accuracy studies and a meta-analysis of case-control studies are conducted for illustration. PMID:26303591
Binomial Baseball.

ERIC Educational Resources Information Center

Levin, Eugene M.

1981-01-01

Student access to programmable calculators and computer terminals, coupled with a familiarity with baseball, provides opportunities to enhance their understanding of the binomial distribution and other aspects of analysis. (MP)

A Three-Parameter Generalisation of the Beta-Binomial Distribution with Applications

DTIC Science & Technology

1987-07-01

York. Rust, R.T. and Klompmaker, J.E. (1981). Improving the estimation procedure for the beta binomial t.v. exposure model. Journal of Marketing ... Research . 18, 442-448. Sabavala, D.J. and Morrison, D.G. (1977). Television show loyalty: a beta- binomial model using recall data. Journal of Advertiuing
Use of the negative binomial-truncated Poisson distribution in thunderstorm prediction

NASA Technical Reports Server (NTRS)

Cohen, A. C.

1971-01-01

A probability model is presented for the distribution of thunderstorms over a small area given that thunderstorm events (1 or more thunderstorms) are occurring over a larger area. The model incorporates the negative binomial and truncated Poisson distributions. Probability tables for Cape Kennedy for spring, summer, and fall months and seasons are presented. The computer program used to compute these probabilities is appended.
Meta-analysis of studies with bivariate binary outcomes: a marginal beta-binomial model approach.

PubMed

Chen, Yong; Hong, Chuan; Ning, Yang; Su, Xiao

2016-01-15

When conducting a meta-analysis of studies with bivariate binary outcomes, challenges arise when the within-study correlation and between-study heterogeneity should be taken into account. In this paper, we propose a marginal beta-binomial model for the meta-analysis of studies with binary outcomes. This model is based on the composite likelihood approach and has several attractive features compared with the existing models such as bivariate generalized linear mixed model (Chu and Cole, 2006) and Sarmanov beta-binomial model (Chen et al., 2012). The advantages of the proposed marginal model include modeling the probabilities in the original scale, not requiring any transformation of probabilities or any link function, having closed-form expression of likelihood function, and no constraints on the correlation parameter. More importantly, because the marginal beta-binomial model is only based on the marginal distributions, it does not suffer from potential misspecification of the joint distribution of bivariate study-specific probabilities. Such misspecification is difficult to detect and can lead to biased inference using currents methods. We compare the performance of the marginal beta-binomial model with the bivariate generalized linear mixed model and the Sarmanov beta-binomial model by simulation studies. Interestingly, the results show that the marginal beta-binomial model performs better than the Sarmanov beta-binomial model, whether or not the true model is Sarmanov beta-binomial, and the marginal beta-binomial model is more robust than the bivariate generalized linear mixed model under model misspecifications. Two meta-analyses of diagnostic accuracy studies and a meta-analysis of case-control studies are conducted for illustration. Copyright © 2015 John Wiley & Sons, Ltd.
Using the β-binomial distribution to characterize forest health

Treesearch

S.J. Zarnoch; R.L. Anderson; R.M. Sheffield

1995-01-01

The β-binomial distribution is suggested as a model for describing and analyzing the dichotomous data obtained from programs monitoring the health of forests in the United States. Maximum likelihood estimation of the parameters is given as well as asymptotic likelihood ratio tests. The procedure is illustrated with data on dogwood anthracnose infection (caused...
Pricing American Asian options with higher moments in the underlying distribution

NASA Astrophysics Data System (ADS)

Lo, Keng-Hsin; Wang, Kehluh; Hsu, Ming-Feng

2009-01-01

We develop a modified Edgeworth binomial model with higher moment consideration for pricing American Asian options. With lognormal underlying distribution for benchmark comparison, our algorithm is as precise as that of Chalasani et al. [P. Chalasani, S. Jha, F. Egriboyun, A. Varikooty, A refined binomial lattice for pricing American Asian options, Rev. Derivatives Res. 3 (1) (1999) 85-105] if the number of the time steps increases. If the underlying distribution displays negative skewness and leptokurtosis as often observed for stock index returns, our estimates can work better than those in Chalasani et al. [P. Chalasani, S. Jha, F. Egriboyun, A. Varikooty, A refined binomial lattice for pricing American Asian options, Rev. Derivatives Res. 3 (1) (1999) 85-105] and are very similar to the benchmarks in Hull and White [J. Hull, A. White, Efficient procedures for valuing European and American path-dependent options, J. Derivatives 1 (Fall) (1993) 21-31]. The numerical analysis shows that our modified Edgeworth binomial model can value American Asian options with greater accuracy and speed given higher moments in their underlying distribution.
Distribution pattern of public transport passenger in Yogyakarta, Indonesia

NASA Astrophysics Data System (ADS)

Narendra, Alfa; Malkhamah, Siti; Sopha, Bertha Maya

2018-03-01

The arrival and departure distribution pattern of Trans Jogja bus passenger is one of the fundamental model for simulation. The purpose of this paper is to build models of passengers flows. This research used passengers data from January to May 2014. There is no policy that change the operation system affecting the nature of this pattern nowadays. The roads, buses, land uses, schedule, and people are relatively still the same. The data then categorized based on the direction, days, and location. Moreover, each category was fitted into some well-known discrete distributions. Those distributions are compared based on its AIC value and BIC. The chosen distribution model has the smallest AIC and BIC value and the negative binomial distribution found has the smallest AIC and BIC value. Probability mass function (PMF) plots of those models were compared to draw generic model from each categorical negative binomial distribution models. The value of accepted generic negative binomial distribution is 0.7064 and 1.4504 of mu. The minimum and maximum passenger vector value of distribution are is 0 and 41.
Fitting statistical distributions to sea duck count data: implications for survey design and abundance estimation

USGS Publications Warehouse

Zipkin, Elise F.; Leirness, Jeffery B.; Kinlan, Brian P.; O'Connell, Allan F.; Silverman, Emily D.

2014-01-01

Determining appropriate statistical distributions for modeling animal count data is important for accurate estimation of abundance, distribution, and trends. In the case of sea ducks along the U.S. Atlantic coast, managers want to estimate local and regional abundance to detect and track population declines, to define areas of high and low use, and to predict the impact of future habitat change on populations. In this paper, we used a modified marked point process to model survey data that recorded flock sizes of Common eiders, Long-tailed ducks, and Black, Surf, and White-winged scoters. The data come from an experimental aerial survey, conducted by the United States Fish & Wildlife Service (USFWS) Division of Migratory Bird Management, during which east-west transects were flown along the Atlantic Coast from Maine to Florida during the winters of 2009–2011. To model the number of flocks per transect (the points), we compared the fit of four statistical distributions (zero-inflated Poisson, zero-inflated geometric, zero-inflated negative binomial and negative binomial) to data on the number of species-specific sea duck flocks that were recorded for each transect flown. To model the flock sizes (the marks), we compared the fit of flock size data for each species to seven statistical distributions: positive Poisson, positive negative binomial, positive geometric, logarithmic, discretized lognormal, zeta and Yule–Simon. Akaike’s Information Criterion and Vuong’s closeness tests indicated that the negative binomial and discretized lognormal were the best distributions for all species for the points and marks, respectively. These findings have important implications for estimating sea duck abundances as the discretized lognormal is a more skewed distribution than the Poisson and negative binomial, which are frequently used to model avian counts; the lognormal is also less heavy-tailed than the power law distributions (e.g., zeta and Yule–Simon), which are becoming increasingly popular for group size modeling. Choosing appropriate statistical distributions for modeling flock size data is fundamental to accurately estimating population summaries, determining required survey effort, and assessing and propagating uncertainty through decision-making processes.
Statistical methods for the beta-binomial model in teratology.

PubMed Central

Yamamoto, E; Yanagimoto, T

1994-01-01

The beta-binomial model is widely used for analyzing teratological data involving littermates. Recent developments in statistical analyses of teratological data are briefly reviewed with emphasis on the model. For statistical inference of the parameters in the beta-binomial distribution, separation of the likelihood introduces an likelihood inference. This leads to reducing biases of estimators and also to improving accuracy of empirical significance levels of tests. Separate inference of the parameters can be conducted in a unified way. PMID:8187716
The coefficient of determination R2 and intra-class correlation coefficient from generalized linear mixed-effects models revisited and expanded.

PubMed

Nakagawa, Shinichi; Johnson, Paul C D; Schielzeth, Holger

2017-09-01

The coefficient of determination R 2 quantifies the proportion of variance explained by a statistical model and is an important summary statistic of biological interest. However, estimating R 2 for generalized linear mixed models (GLMMs) remains challenging. We have previously introduced a version of R 2 that we called [Formula: see text] for Poisson and binomial GLMMs, but not for other distributional families. Similarly, we earlier discussed how to estimate intra-class correlation coefficients (ICCs) using Poisson and binomial GLMMs. In this paper, we generalize our methods to all other non-Gaussian distributions, in particular to negative binomial and gamma distributions that are commonly used for modelling biological data. While expanding our approach, we highlight two useful concepts for biologists, Jensen's inequality and the delta method, both of which help us in understanding the properties of GLMMs. Jensen's inequality has important implications for biologically meaningful interpretation of GLMMs, whereas the delta method allows a general derivation of variance associated with non-Gaussian distributions. We also discuss some special considerations for binomial GLMMs with binary or proportion data. We illustrate the implementation of our extension by worked examples from the field of ecology and evolution in the R environment. However, our method can be used across disciplines and regardless of statistical environments. © 2017 The Author(s).
Distribution pattern of phthirapterans infesting certain common Indian birds.

PubMed

Saxena, A K; Kumar, Sandeep; Gupta, Nidhi; Mitra, J D; Ali, S A; Srivastava, Roshni

2007-08-01

The prevalence and frequency distribution patterns of 10 phthirapteran species infesting house sparrows, Indian parakeets, common mynas, and white breasted kingfishers were recorded in the district of Rampur, India, during 2004-05. The sample mean abundances, mean intensities, range of infestations, variance to mean ratios, values of the exponent of the negative binomial distribution, and the indices of discrepancy were also computed. Frequency distribution patterns of all phthirapteran species were skewed, but the observed frequencies did not correspond to the negative binomial distribution. Thus, adult-nymph ratios varied in different species from 1:0.53 to 1:1.25. Sex ratios of different phthirapteran species ranged from 1:1.10 to 1:1.65 and were female biased.
A methodology to design heuristics for model selection based on the characteristics of data: Application to investigate when the Negative Binomial Lindley (NB-L) is preferred over the Negative Binomial (NB).

PubMed

Shirazi, Mohammadali; Dhavala, Soma Sekhar; Lord, Dominique; Geedipally, Srinivas Reddy

2017-10-01

Safety analysts usually use post-modeling methods, such as the Goodness-of-Fit statistics or the Likelihood Ratio Test, to decide between two or more competitive distributions or models. Such metrics require all competitive distributions to be fitted to the data before any comparisons can be accomplished. Given the continuous growth in introducing new statistical distributions, choosing the best one using such post-modeling methods is not a trivial task, in addition to all theoretical or numerical issues the analyst may face during the analysis. Furthermore, and most importantly, these measures or tests do not provide any intuitions into why a specific distribution (or model) is preferred over another (Goodness-of-Logic). This paper ponders into these issues by proposing a methodology to design heuristics for Model Selection based on the characteristics of data, in terms of descriptive summary statistics, before fitting the models. The proposed methodology employs two analytic tools: (1) Monte-Carlo Simulations and (2) Machine Learning Classifiers, to design easy heuristics to predict the label of the 'most-likely-true' distribution for analyzing data. The proposed methodology was applied to investigate when the recently introduced Negative Binomial Lindley (NB-L) distribution is preferred over the Negative Binomial (NB) distribution. Heuristics were designed to select the 'most-likely-true' distribution between these two distributions, given a set of prescribed summary statistics of data. The proposed heuristics were successfully compared against classical tests for several real or observed datasets. Not only they are easy to use and do not need any post-modeling inputs, but also, using these heuristics, the analyst can attain useful information about why the NB-L is preferred over the NB - or vice versa- when modeling data. Copyright © 2017 Elsevier Ltd. All rights reserved.
Dispersion and sampling of adult Dermacentor andersoni in rangeland in Western North America.

PubMed

Rochon, K; Scoles, G A; Lysyk, T J

2012-03-01

A fixed precision sampling plan was developed for off-host populations of adult Rocky Mountain wood tick, Dermacentor andersoni (Stiles) based on data collected by dragging at 13 locations in Alberta, Canada; Washington; and Oregon. In total, 222 site-date combinations were sampled. Each site-date combination was considered a sample, and each sample ranged in size from 86 to 250 10 m2 quadrats. Analysis of simulated quadrats ranging in size from 10 to 50 m2 indicated that the most precise sample unit was the 10 m2 quadrat. Samples taken when abundance < 0.04 ticks per 10 m2 were more likely to not depart significantly from statistical randomness than samples taken when abundance was greater. Data were grouped into ten abundance classes and assessed for fit to the Poisson and negative binomial distributions. The Poisson distribution fit only data in abundance classes < 0.02 ticks per 10 m2, while the negative binomial distribution fit data from all abundance classes. A negative binomial distribution with common k = 0.3742 fit data in eight of the 10 abundance classes. Both the Taylor and Iwao mean-variance relationships were fit and used to predict sample sizes for a fixed level of precision. Sample sizes predicted using the Taylor model tended to underestimate actual sample sizes, while sample sizes estimated using the Iwao model tended to overestimate actual sample sizes. Using a negative binomial with common k provided estimates of required sample sizes closest to empirically calculated sample sizes.
Use of negative binomial distribution to describe the presence of Anisakis in Thyrsites atun.

PubMed

Peña-Rehbein, Patricio; De los Ríos-Escalante, Patricio

2012-01-01

Nematodes of the genus Anisakis have marine fishes as intermediate hosts. One of these hosts is Thyrsites atun, an important fishery resource in Chile between 38 and 41° S. This paper describes the frequency and number of Anisakis nematodes in the internal organs of Thyrsites atun. An analysis based on spatial distribution models showed that the parasites tend to be clustered. The variation in the number of parasites per host could be described by the negative binomial distribution. The maximum observed number of parasites was nine parasites per host. The environmental and zoonotic aspects of the study are also discussed.
Enumerative and binomial sequential sampling plans for the multicolored Asian lady beetle (Coleoptera: Coccinellidae) in wine grapes.

PubMed

Galvan, T L; Burkness, E C; Hutchison, W D

2007-06-01

To develop a practical integrated pest management (IPM) system for the multicolored Asian lady beetle, Harmonia axyridis (Pallas) (Coleoptera: Coccinellidae), in wine grapes, we assessed the spatial distribution of H. axyridis and developed eight sampling plans to estimate adult density or infestation level in grape clusters. We used 49 data sets collected from commercial vineyards in 2004 and 2005, in Minnesota and Wisconsin. Enumerative plans were developed using two precision levels (0.10 and 0.25); the six binomial plans reflected six unique action thresholds (3, 7, 12, 18, 22, and 31% of cluster samples infested with at least one H. axyridis). The spatial distribution of H. axyridis in wine grapes was aggregated, independent of cultivar and year, but it was more randomly distributed as mean density declined. The average sample number (ASN) for each sampling plan was determined using resampling software. For research purposes, an enumerative plan with a precision level of 0.10 (SE/X) resulted in a mean ASN of 546 clusters. For IPM applications, the enumerative plan with a precision level of 0.25 resulted in a mean ASN of 180 clusters. In contrast, the binomial plans resulted in much lower ASNs and provided high probabilities of arriving at correct "treat or no-treat" decisions, making these plans more efficient for IPM applications. For a tally threshold of one adult per cluster, the operating characteristic curves for the six action thresholds provided binomial sequential sampling plans with mean ASNs of only 19-26 clusters, and probabilities of making correct decisions between 83 and 96%. The benefits of the binomial sampling plans are discussed within the context of improving IPM programs for wine grapes.
Football goal distributions and extremal statistics

NASA Astrophysics Data System (ADS)

Greenhough, J.; Birch, P. C.; Chapman, S. C.; Rowlands, G.

2002-12-01

We analyse the distributions of the number of goals scored by home teams, away teams, and the total scored in the match, in domestic football games from 169 countries between 1999 and 2001. The probability density functions (PDFs) of goals scored are too heavy-tailed to be fitted over their entire ranges by Poisson or negative binomial distributions which would be expected for uncorrelated processes. Log-normal distributions cannot include zero scores and here we find that the PDFs are consistent with those arising from extremal statistics. In addition, we show that it is sufficient to model English top division and FA Cup matches in the seasons of 1970/71-2000/01 on Poisson or negative binomial distributions, as reported in analyses of earlier seasons, and that these are not consistent with extremal statistics.
Accounting for non-independent detection when estimating abundance of organisms with a Bayesian approach

USGS Publications Warehouse

Martin, Julien; Royle, J. Andrew; MacKenzie, Darryl I.; Edwards, Holly H.; Kery, Marc; Gardner, Beth

2011-01-01

Summary 1. Binomial mixture models use repeated count data to estimate abundance. They are becoming increasingly popular because they provide a simple and cost-effective way to account for imperfect detection. However, these models assume that individuals are detected independently of each other. This assumption may often be violated in the field. For instance, manatees (Trichechus manatus latirostris) may surface in turbid water (i.e. become available for detection during aerial surveys) in a correlated manner (i.e. in groups). However, correlated behaviour, affecting the non-independence of individual detections, may also be relevant in other systems (e.g. correlated patterns of singing in birds and amphibians). 2. We extend binomial mixture models to account for correlated behaviour and therefore to account for non-independent detection of individuals. We simulated correlated behaviour using beta-binomial random variables. Our approach can be used to simultaneously estimate abundance, detection probability and a correlation parameter. 3. Fitting binomial mixture models to data that followed a beta-binomial distribution resulted in an overestimation of abundance even for moderate levels of correlation. In contrast, the beta-binomial mixture model performed considerably better in our simulation scenarios. We also present a goodness-of-fit procedure to evaluate the fit of beta-binomial mixture models. 4. We illustrate our approach by fitting both binomial and beta-binomial mixture models to aerial survey data of manatees in Florida. We found that the binomial mixture model did not fit the data, whereas there was no evidence of lack of fit for the beta-binomial mixture model. This example helps illustrate the importance of using simulations and assessing goodness-of-fit when analysing ecological data with N-mixture models. Indeed, both the simulations and the goodness-of-fit procedure highlighted the limitations of the standard binomial mixture model for aerial manatee surveys. 5. Overestimation of abundance by binomial mixture models owing to non-independent detections is problematic for ecological studies, but also for conservation. For example, in the case of endangered species, it could lead to inappropriate management decisions, such as downlisting. These issues will be increasingly relevant as more ecologists apply flexible N-mixture models to ecological data.
Selecting the right statistical model for analysis of insect count data by using information theoretic measures.

PubMed

Sileshi, G

2006-10-01

Researchers and regulatory agencies often make statistical inferences from insect count data using modelling approaches that assume homogeneous variance. Such models do not allow for formal appraisal of variability which in its different forms is the subject of interest in ecology. Therefore, the objectives of this paper were to (i) compare models suitable for handling variance heterogeneity and (ii) select optimal models to ensure valid statistical inferences from insect count data. The log-normal, standard Poisson, Poisson corrected for overdispersion, zero-inflated Poisson, the negative binomial distribution and zero-inflated negative binomial models were compared using six count datasets on foliage-dwelling insects and five families of soil-dwelling insects. Akaike's and Schwarz Bayesian information criteria were used for comparing the various models. Over 50% of the counts were zeros even in locally abundant species such as Ootheca bennigseni Weise, Mesoplatys ochroptera Stål and Diaecoderus spp. The Poisson model after correction for overdispersion and the standard negative binomial distribution model provided better description of the probability distribution of seven out of the 11 insects than the log-normal, standard Poisson, zero-inflated Poisson or zero-inflated negative binomial models. It is concluded that excess zeros and variance heterogeneity are common data phenomena in insect counts. If not properly modelled, these properties can invalidate the normal distribution assumptions resulting in biased estimation of ecological effects and jeopardizing the integrity of the scientific inferences. Therefore, it is recommended that statistical models appropriate for handling these data properties be selected using objective criteria to ensure efficient statistical inference.
Orchestrating Semiotic Leaps from Tacit to Cultural Quantitative Reasoning--The Case of Anticipating Experimental Outcomes of a Quasi-Binomial Random Generator

ERIC Educational Resources Information Center

Abrahamson, Dor

2009-01-01

This article reports on a case study from a design-based research project that investigated how students make sense of the disciplinary tools they are taught to use, and specifically, what personal, interpersonal, and material resources support this process. The probability topic of binomial distribution was selected due to robust documentation of…
Marginalized zero-inflated negative binomial regression with application to dental caries

PubMed Central

Preisser, John S.; Das, Kalyan; Long, D. Leann; Divaris, Kimon

2015-01-01

The zero-inflated negative binomial regression model (ZINB) is often employed in diverse fields such as dentistry, health care utilization, highway safety, and medicine to examine relationships between exposures of interest and overdispersed count outcomes exhibiting many zeros. The regression coefficients of ZINB have latent class interpretations for a susceptible subpopulation at risk for the disease/condition under study with counts generated from a negative binomial distribution and for a non-susceptible subpopulation that provides only zero counts. The ZINB parameters, however, are not well-suited for estimating overall exposure effects, specifically, in quantifying the effect of an explanatory variable in the overall mixture population. In this paper, a marginalized zero-inflated negative binomial regression (MZINB) model for independent responses is proposed to model the population marginal mean count directly, providing straightforward inference for overall exposure effects based on maximum likelihood estimation. Through simulation studies, the finite sample performance of MZINB is compared to marginalized zero-inflated Poisson, Poisson, and negative binomial regression. The MZINB model is applied in the evaluation of a school-based fluoride mouthrinse program on dental caries in 677 children. PMID:26568034
Smisc - A collection of miscellaneous functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Landon Sego, PNNL

2015-08-31

A collection of functions for statistical computing and data manipulation. These include routines for rapidly aggregating heterogeneous matrices, manipulating file names, loading R objects, sourcing multiple R files, formatting datetimes, multi-core parallel computing, stream editing, specialized plotting, etc. Smisc-package A collection of miscellaneous functions allMissing Identifies missing rows or columns in a data frame or matrix as.numericSilent Silent wrapper for coercing a vector to numeric comboList Produces all possible combinations of a set of linear model predictors cumMax Computes the maximum of the vector up to the current index cumsumNA Computes the cummulative sum of a vector without propogating NAsmore » d2binom Probability functions for the sum of two independent binomials dataIn A flexible way to import data into R. dbb The Beta-Binomial Distribution df2list Row-wise conversion of a data frame to a list dfplapply Parallelized single row processing of a data frame dframeEquiv Examines the equivalence of two dataframes or matrices dkbinom Probability functions for the sum of k independent binomials factor2character Converts all factor variables in a dataframe to character variables findDepMat Identify linearly dependent rows or columns in a matrix formatDT Converts date or datetime strings into alternate formats getExtension Filename manipulations: remove the extension or path, extract the extension or path getPath Filename manipulations: remove the extension or path, extract the extension or path grabLast Filename manipulations: remove the extension or path, extract the extension or path ifelse1 Non-vectorized version of ifelse integ Simple numerical integration routine interactionPlot Two-way Interaction Plot with Error Bar linearMap Linear mapping of a numerical vector or scalar list2df Convert a list to a data frame loadObject Loads and returns the object(s) in an ".Rdata" file more Display the contents of a file to the R terminal movAvg2 Calculate the moving average using a 2-sided window openDevice Opens a graphics device based on the filename extension p2binom Probability functions for the sum of two independent binomials padZero Pad a vector of numbers with zeros parseJob Parses a collection of elements into (almost) equal sized groups pbb The Beta-Binomial Distribution pcbinom A continuous version of the binomial cdf pkbinom Probability functions for the sum of k independent binomials plapply Simple parallelization of lapply plotFun Plot one or more functions on a single plot PowerData An example of power data pvar Prints the name and value of one or more objects qbb The Beta-Binomial Distribution rbb And numerous others (space limits reporting).« less

CUMBIN - CUMULATIVE BINOMIAL PROGRAMS

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The cumulative binomial program, CUMBIN, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, CUMBIN, NEWTONP (NPO-17556), and CROSSER (NPO-17557), can be used independently of one another. CUMBIN can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. CUMBIN calculates the probability that a system of n components has at least k operating if the probability that any one operating is p and the components are independent. Equivalently, this is the reliability of a k-out-of-n system having independent components with common reliability p. CUMBIN can evaluate the incomplete beta distribution for two positive integer arguments. CUMBIN can also evaluate the cumulative F distribution and the negative binomial distribution, and can determine the sample size in a test design. CUMBIN is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. The program is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. The CUMBIN program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CUMBIN was developed in 1988.
The analysis of incontinence episodes and other count data in patients with overactive bladder by Poisson and negative binomial regression.

PubMed

Martina, R; Kay, R; van Maanen, R; Ridder, A

2015-01-01

Clinical studies in overactive bladder have traditionally used analysis of covariance or nonparametric methods to analyse the number of incontinence episodes and other count data. It is known that if the underlying distributional assumptions of a particular parametric method do not hold, an alternative parametric method may be more efficient than a nonparametric one, which makes no assumptions regarding the underlying distribution of the data. Therefore, there are advantages in using methods based on the Poisson distribution or extensions of that method, which incorporate specific features that provide a modelling framework for count data. One challenge with count data is overdispersion, but methods are available that can account for this through the introduction of random effect terms in the modelling, and it is this modelling framework that leads to the negative binomial distribution. These models can also provide clinicians with a clearer and more appropriate interpretation of treatment effects in terms of rate ratios. In this paper, the previously used parametric and non-parametric approaches are contrasted with those based on Poisson regression and various extensions in trials evaluating solifenacin and mirabegron in patients with overactive bladder. In these applications, negative binomial models are seen to fit the data well. Copyright © 2014 John Wiley & Sons, Ltd.
Lotka's Law and Institutional Productivity.

ERIC Educational Resources Information Center

Kumar, Suresh; Sharma, Praveen; Garg, K. C.

1998-01-01

Examines the applicability of Lotka's Law, negative binomial distribution, and lognormal distribution for institutional productivity in the same way as it is to authors and their productivity. Results indicate that none of the distributions are applicable for institutional productivity in engineering sciences. (Author/LRW)
On measures of association among genetic variables

PubMed Central

Gianola, Daniel; Manfredi, Eduardo; Simianer, Henner

2012-01-01

Summary Systems involving many variables are important in population and quantitative genetics, for example, in multi-trait prediction of breeding values and in exploration of multi-locus associations. We studied departures of the joint distribution of sets of genetic variables from independence. New measures of association based on notions of statistical distance between distributions are presented. These are more general than correlations, which are pairwise measures, and lack a clear interpretation beyond the bivariate normal distribution. Our measures are based on logarithmic (Kullback-Leibler) and on relative ‘distances’ between distributions. Indexes of association are developed and illustrated for quantitative genetics settings in which the joint distribution of the variables is either multivariate normal or multivariate-t, and we show how the indexes can be used to study linkage disequilibrium in a two-locus system with multiple alleles and present applications to systems of correlated beta distributions. Two multivariate beta and multivariate beta-binomial processes are examined, and new distributions are introduced: the GMS-Sarmanov multivariate beta and its beta-binomial counterpart. PMID:22742500
Bayesian inference for disease prevalence using negative binomial group testing

PubMed Central

Pritchard, Nicholas A.; Tebbs, Joshua M.

2011-01-01

Group testing, also known as pooled testing, and inverse sampling are both widely used methods of data collection when the goal is to estimate a small proportion. Taking a Bayesian approach, we consider the new problem of estimating disease prevalence from group testing when inverse (negative binomial) sampling is used. Using different distributions to incorporate prior knowledge of disease incidence and different loss functions, we derive closed form expressions for posterior distributions and resulting point and credible interval estimators. We then evaluate our new estimators, on Bayesian and classical grounds, and apply our methods to a West Nile Virus data set. PMID:21259308
Extended Poisson process modelling and analysis of grouped binary data.

PubMed

Faddy, Malcolm J; Smith, David M

2012-05-01

A simple extension of the Poisson process results in binomially distributed counts of events in a time interval. A further extension generalises this to probability distributions under- or over-dispersed relative to the binomial distribution. Substantial levels of under-dispersion are possible with this modelling, but only modest levels of over-dispersion - up to Poisson-like variation. Although simple analytical expressions for the moments of these probability distributions are not available, approximate expressions for the mean and variance are derived, and used to re-parameterise the models. The modelling is applied in the analysis of two published data sets, one showing under-dispersion and the other over-dispersion. More appropriate assessment of the precision of estimated parameters and reliable model checking diagnostics follow from this more general modelling of these data sets. © 2012 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim.
Spatial Distribution of Adult Anthonomus grandis Boheman (Coleoptera: Curculionidae) and Damage to Cotton Flower Buds Due to Feeding and Oviposition.

PubMed

Grigolli, J F J; Souza, L A; Fernandes, M G; Busoli, A C

2017-08-01

The cotton boll weevil Anthonomus grandis Boheman (Coleoptera: Curculionidae) is the main pest in cotton crop around the world, directly affecting cotton production. In order to establish a sequential sampling plan, it is crucial to understand the spatial distribution of the pest population and the damage it causes to the crop through the different developmental stages of cotton plants. Therefore, this study aimed to investigate the spatial distribution of adults in the cultivation area and their oviposition and feeding behavior throughout the development of the cotton plants. The experiment was conducted in Maracaju, Mato Grosso do Sul, Brazil, in the 2012/2013 and 2013/2014 growing seasons, in an area of 10,000 m 2 , planted with the cotton cultivar FM 993. The experimental area was divided into 100 plots of 100 m 2 (10 × 10 m) each, and five plants per plot were sampled weekly throughout the crop cycle. The number of flower buds with feeding and oviposition punctures and of adult A. grandis was recorded throughout the crop cycle in five plants per plot. After determining the aggregation indices (variance/mean ratio, Morisita's index, exponent k of the negative binomial distribution, and Green's coefficient) and adjusting the frequencies observed in the field to the distribution of frequencies (Poisson, negative binomial, and positive binomial) using the chi-squared test, it was observed that flower buds with punctures derived from feeding, oviposition, and feeding + oviposition showed an aggregated distribution in the cultivation area until 85 days after emergence and a random distribution after this stage. The adults of A. grandis presented a random distribution in the cultivation area.
The Difference Calculus and The NEgative Binomial Distribution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowman, Kimiko o; Shenton, LR

2007-01-01

In a previous paper we state the dominant term in the third central moment of the maximum likelihood estimator k of the parameter k in the negative binomial probability function where the probability generating function is (p + 1 - pt){sup -k}. A partial sum of the series {Sigma}1/(k + x){sup 3} is involved, where x is a negative binomial random variate. In expectation this sum can only be found numerically using the computer. Here we give a simple definite integral in (0,1) for the generalized case. This means that now we do have a valid expression for {radical}{beta}{sub 11}(k)more » and {radical}{beta}{sub 11}(p). In addition we use the finite difference operator {Delta}, and E = 1 + {Delta} to set up formulas for low order moments. Other examples of the operators are quoted relating to the orthogonal set of polynomials associated with the negative binomial probability function used as a weight function.« less
Temporal acceleration of spatially distributed kinetic Monte Carlo simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chatterjee, Abhijit; Vlachos, Dionisios G.

The computational intensity of kinetic Monte Carlo (KMC) simulation is a major impediment in simulating large length and time scales. In recent work, an approximate method for KMC simulation of spatially uniform systems, termed the binomial {tau}-leap method, was introduced [A. Chatterjee, D.G. Vlachos, M.A. Katsoulakis, Binomial distribution based {tau}-leap accelerated stochastic simulation, J. Chem. Phys. 122 (2005) 024112], where molecular bundles instead of individual processes are executed over coarse-grained time increments. This temporal coarse-graining can lead to significant computational savings but its generalization to spatially lattice KMC simulation has not been realized yet. Here we extend the binomial {tau}-leapmore » method to lattice KMC simulations by combining it with spatially adaptive coarse-graining. Absolute stability and computational speed-up analyses for spatial systems along with simulations provide insights into the conditions where accuracy and substantial acceleration of the new spatio-temporal coarse-graining method are ensured. Model systems demonstrate that the r-time increment criterion of Chatterjee et al. obeys the absolute stability limit for values of r up to near 1.« less
Distribution of apparent activation energy counterparts during thermo - And thermo-oxidative degradation of Aronia melanocarpa (black chokeberry).

PubMed

Janković, Bojan; Marinović-Cincović, Milena; Janković, Marija

2017-09-01

Kinetics of degradation for Aronia melanocarpa fresh fruits in argon and air atmospheres were investigated. The investigation was based on probability distributions of apparent activation energy of counterparts (ε a ). Isoconversional analysis results indicated that the degradation process in an inert atmosphere was governed by decomposition reactions of esterified compounds. Also, based on same kinetics approach, it was assumed that in an air atmosphere, the primary compound in degradation pathways could be anthocyanins, which undergo rapid chemical reactions. A new model of reactivity demonstrated that, under inert atmospheres, expectation values for ε a occured at levels of statistical probability. These values corresponded to decomposition processes in which polyphenolic compounds might be involved. ε a values obeyed laws of binomial distribution. It was established that, for thermo-oxidative degradation, Poisson distribution represented a very successful approximation for ε a values where there was additional mechanistic complexity and the binomial distribution was no longer valid. Copyright © 2017 Elsevier Ltd. All rights reserved.
Spatial distribution and sequential sampling plans for Tuta absoluta (Lepidoptera: Gelechiidae) in greenhouse tomato crops.

PubMed

Cocco, Arturo; Serra, Giuseppe; Lentini, Andrea; Deliperi, Salvatore; Delrio, Gavino

2015-09-01

The within- and between-plant distribution of the tomato leafminer, Tuta absoluta (Meyrick), was investigated in order to define action thresholds based on leaf infestation and to propose enumerative and binomial sequential sampling plans for pest management applications in protected crops. The pest spatial distribution was aggregated between plants, and median leaves were the most suitable sample to evaluate the pest density. Action thresholds of 36 and 48%, 43 and 56% and 60 and 73% infested leaves, corresponding to economic thresholds of 1 and 3% damaged fruits, were defined for tomato cultivars with big, medium and small fruits respectively. Green's method was a more suitable enumerative sampling plan as it required a lower sampling effort. Binomial sampling plans needed lower average sample sizes than enumerative plans to make a treatment decision, with probabilities of error of <0.10. The enumerative sampling plan required 87 or 343 leaves to estimate the population density in extensive or intensive ecological studies respectively. Binomial plans would be more practical and efficient for control purposes, needing average sample sizes of 17, 20 and 14 leaves to take a pest management decision in order to avoid fruit damage higher than 1% in cultivars with big, medium and small fruits respectively. © 2014 Society of Chemical Industry.
Selecting a distributional assumption for modelling relative densities of benthic macroinvertebrates

USGS Publications Warehouse

Gray, B.R.

2005-01-01

The selection of a distributional assumption suitable for modelling macroinvertebrate density data is typically challenging. Macroinvertebrate data often exhibit substantially larger variances than expected under a standard count assumption, that of the Poisson distribution. Such overdispersion may derive from multiple sources, including heterogeneity of habitat (historically and spatially), differing life histories for organisms collected within a single collection in space and time, and autocorrelation. Taken to extreme, heterogeneity of habitat may be argued to explain the frequent large proportions of zero observations in macroinvertebrate data. Sampling locations may consist of habitats defined qualitatively as either suitable or unsuitable. The former category may yield random or stochastic zeroes and the latter structural zeroes. Heterogeneity among counts may be accommodated by treating the count mean itself as a random variable, while extra zeroes may be accommodated using zero-modified count assumptions, including zero-inflated and two-stage (or hurdle) approaches. These and linear assumptions (following log- and square root-transformations) were evaluated using 9 years of mayfly density data from a 52 km, ninth-order reach of the Upper Mississippi River (n = 959). The data exhibited substantial overdispersion relative to that expected under a Poisson assumption (i.e. variance:mean ratio = 23 ??? 1), and 43% of the sampling locations yielded zero mayflies. Based on the Akaike Information Criterion (AIC), count models were improved most by treating the count mean as a random variable (via a Poisson-gamma distributional assumption) and secondarily by zero modification (i.e. improvements in AIC values = 9184 units and 47-48 units, respectively). Zeroes were underestimated by the Poisson, log-transform and square root-transform models, slightly by the standard negative binomial model but not by the zero-modified models (61%, 24%, 32%, 7%, and 0%, respectively). However, the zero-modified Poisson models underestimated small counts (1 ??? y ??? 4) and overestimated intermediate counts (7 ??? y ??? 23). Counts greater than zero were estimated well by zero-modified negative binomial models, while counts greater than one were also estimated well by the standard negative binomial model. Based on AIC and percent zero estimation criteria, the two-stage and zero-inflated models performed similarly. The above inferences were largely confirmed when the models were used to predict values from a separate, evaluation data set (n = 110). An exception was that, using the evaluation data set, the standard negative binomial model appeared superior to its zero-modified counterparts using the AIC (but not percent zero criteria). This and other evidence suggest that a negative binomial distributional assumption should be routinely considered when modelling benthic macroinvertebrate data from low flow environments. Whether negative binomial models should themselves be routinely examined for extra zeroes requires, from a statistical perspective, more investigation. However, this question may best be answered by ecological arguments that may be specific to the sampled species and locations. ?? 2004 Elsevier B.V. All rights reserved.
FluBreaks: early epidemic detection from Google flu trends.

PubMed

Pervaiz, Fahad; Pervaiz, Mansoor; Abdur Rehman, Nabeel; Saif, Umar

2012-10-04

The Google Flu Trends service was launched in 2008 to track changes in the volume of online search queries related to flu-like symptoms. Over the last few years, the trend data produced by this service has shown a consistent relationship with the actual number of flu reports collected by the US Centers for Disease Control and Prevention (CDC), often identifying increases in flu cases weeks in advance of CDC records. However, contrary to popular belief, Google Flu Trends is not an early epidemic detection system. Instead, it is designed as a baseline indicator of the trend, or changes, in the number of disease cases. To evaluate whether these trends can be used as a basis for an early warning system for epidemics. We present the first detailed algorithmic analysis of how Google Flu Trends can be used as a basis for building a fully automated system for early warning of epidemics in advance of methods used by the CDC. Based on our work, we present a novel early epidemic detection system, called FluBreaks (dritte.org/flubreaks), based on Google Flu Trends data. We compared the accuracy and practicality of three types of algorithms: normal distribution algorithms, Poisson distribution algorithms, and negative binomial distribution algorithms. We explored the relative merits of these methods, and related our findings to changes in Internet penetration and population size for the regions in Google Flu Trends providing data. Across our performance metrics of percentage true-positives (RTP), percentage false-positives (RFP), percentage overlap (OT), and percentage early alarms (EA), Poisson- and negative binomial-based algorithms performed better in all except RFP. Poisson-based algorithms had average values of 99%, 28%, 71%, and 76% for RTP, RFP, OT, and EA, respectively, whereas negative binomial-based algorithms had average values of 97.8%, 17.8%, 60%, and 55% for RTP, RFP, OT, and EA, respectively. Moreover, the EA was also affected by the region's population size. Regions with larger populations (regions 4 and 6) had higher values of EA than region 10 (which had the smallest population) for negative binomial- and Poisson-based algorithms. The difference was 12.5% and 13.5% on average in negative binomial- and Poisson-based algorithms, respectively. We present the first detailed comparative analysis of popular early epidemic detection algorithms on Google Flu Trends data. We note that realizing this opportunity requires moving beyond the cumulative sum and historical limits method-based normal distribution approaches, traditionally employed by the CDC, to negative binomial- and Poisson-based algorithms to deal with potentially noisy search query data from regions with varying population and Internet penetrations. Based on our work, we have developed FluBreaks, an early warning system for flu epidemics using Google Flu Trends.
A Random Variable Transformation Process.

ERIC Educational Resources Information Center

Scheuermann, Larry

1989-01-01

Provides a short BASIC program, RANVAR, which generates random variates for various theoretical probability distributions. The seven variates include: uniform, exponential, normal, binomial, Poisson, Pascal, and triangular. (MVL)
A binomial stochastic kinetic approach to the Michaelis-Menten mechanism

NASA Astrophysics Data System (ADS)

Lente, Gábor

2013-05-01

This Letter presents a new method that gives an analytical approximation of the exact solution of the stochastic Michaelis-Menten mechanism without computationally demanding matrix operations. The method is based on solving the deterministic rate equations and then using the results as guiding variables of calculating probability values using binomial distributions. This principle can be generalized to a number of different kinetic schemes and is expected to be very useful in the evaluation of measurements focusing on the catalytic activity of one or a few individual enzyme molecules.
Studying the Binomial Distribution Using LabVIEW

ERIC Educational Resources Information Center

George, Danielle J.; Hammer, Nathan I.

2015-01-01

This undergraduate physical chemistry laboratory exercise introduces students to the study of probability distributions both experimentally and using computer simulations. Students perform the classic coin toss experiment individually and then pool all of their data together to study the effect of experimental sample size on the binomial…
Enumerative and binomial sampling plans for citrus mealybug (Homoptera: pseudococcidae) in citrus groves.

PubMed

Martínez-Ferrer, María Teresa; Ripollés, José Luís; Garcia-Marí, Ferran

2006-06-01

The spatial distribution of the citrus mealybug, Planococcus citri (Risso) (Homoptera: Pseudococcidae), was studied in citrus groves in northeastern Spain. Constant precision sampling plans were designed for all developmental stages of citrus mealybug under the fruit calyx, for late stages on fruit, and for females on trunks and main branches; more than 66, 286, and 101 data sets, respectively, were collected from nine commercial fields during 1992-1998. Dispersion parameters were determined using Taylor's power law, giving aggregated spatial patterns for citrus mealybug populations in three locations of the tree sampled. A significant relationship between the number of insects per organ and the percentage of occupied organs was established using either Wilson and Room's binomial model or Kono and Sugino's empirical formula. Constant precision (E = 0.25) sampling plans (i.e., enumerative plans) for estimating mean densities were developed using Green's equation and the two binomial models. For making management decisions, enumerative counts may be less labor-intensive than binomial sampling. Therefore, we recommend enumerative sampling plans for the use in an integrated pest management program in citrus. Required sample sizes for the range of population densities near current management thresholds, in the three plant locations calyx, fruit, and trunk were 50, 110-330, and 30, respectively. Binomial sampling, especially the empirical model, required a higher sample size to achieve equivalent levels of precision.
Use of the binomial distribution to predict impairment: application in a nonclinical sample.

PubMed

Axelrod, Bradley N; Wall, Jacqueline R; Estes, Bradley W

2008-01-01

A mathematical model based on the binomial theory was developed to illustrate when abnormal score variations occur by chance in a multitest battery (Ingraham & Aiken, 1996). It has been successfully used as a comparison for obtained test scores in clinical samples, but not in nonclinical samples. In the current study, this model has been applied to demographically corrected scores on the Halstead-Reitan Neuropsychological Test Battery, obtained from a sample of 94 nonclinical college students. Results found that 15% of the sample had impairments suggested by the Halstead Impairment Index, using criteria established by Reitan and Wolfson (1993). In addition, one-half of the sample obtained impaired scores on one or two tests. These results were compared to that predicted by the binomial model and found to be consistent. The model therefore serves as a useful resource for clinicians considering the probability of impaired test performance.
Estimating relative risks in multicenter studies with a small number of centers - which methods to use? A simulation study.

PubMed

Pedroza, Claudia; Truong, Van Thi Thanh

2017-11-02

Analyses of multicenter studies often need to account for center clustering to ensure valid inference. For binary outcomes, it is particularly challenging to properly adjust for center when the number of centers or total sample size is small, or when there are few events per center. Our objective was to evaluate the performance of generalized estimating equation (GEE) log-binomial and Poisson models, generalized linear mixed models (GLMMs) assuming binomial and Poisson distributions, and a Bayesian binomial GLMM to account for center effect in these scenarios. We conducted a simulation study with few centers (≤30) and 50 or fewer subjects per center, using both a randomized controlled trial and an observational study design to estimate relative risk. We compared the GEE and GLMM models with a log-binomial model without adjustment for clustering in terms of bias, root mean square error (RMSE), and coverage. For the Bayesian GLMM, we used informative neutral priors that are skeptical of large treatment effects that are almost never observed in studies of medical interventions. All frequentist methods exhibited little bias, and the RMSE was very similar across the models. The binomial GLMM had poor convergence rates, ranging from 27% to 85%, but performed well otherwise. The results show that both GEE models need to use small sample corrections for robust SEs to achieve proper coverage of 95% CIs. The Bayesian GLMM had similar convergence rates but resulted in slightly more biased estimates for the smallest sample sizes. However, it had the smallest RMSE and good coverage across all scenarios. These results were very similar for both study designs. For the analyses of multicenter studies with a binary outcome and few centers, we recommend adjustment for center with either a GEE log-binomial or Poisson model with appropriate small sample corrections or a Bayesian binomial GLMM with informative priors.
Co-Infestation and Spatial Distribution of Bactrocera carambolae and Anastrepha spp. (Diptera: Tephritidae) in Common Guava in the Eastern Amazon

PubMed Central

Deus, E. G.; Godoy, W. A. C.; Sousa, M. S. M.; Lopes, G. N.; Jesus-Barros, C. R.; Silva, J. G.; Adaime, R.

2016-01-01

Field infestation and spatial distribution of introduced Bactrocera carambolae Drew and Hancock and native species of Anastrepha in common guavas [Psidium guajava (L.)] were investigated in the eastern Amazon. Fruit sampling was carried out in the municipalities of Calçoene and Oiapoque in the state of Amapá, Brazil. The frequency distribution of larvae in fruit was fitted to the negative binomial distribution. Anastrepha striata was more abundant in both sampled areas in comparison to Anastrepha fraterculus (Wiedemann) and B. carambolae. The frequency distribution analysis of adults revealed an aggregated pattern for B. carambolae as well as for A. fraterculus and Anastrepha striata Schiner, described by the negative binomial distribution. Although the populations of Anastrepha spp. may have suffered some impact due to the presence of B. carambolae, the results are still not robust enough to indicate effective reduction in the abundance of Anastrepha spp. caused by B. carambolae in a general sense. The high degree of aggregation observed for both species suggests interspecific co-occurrence with the simultaneous presence of both species in the analysed fruit. Moreover, a significant fraction of uninfested guavas also indicated absence of competitive displacement. PMID:27638949

Modeling number of claims and prediction of total claim amount

NASA Astrophysics Data System (ADS)

Acar, Aslıhan Şentürk; Karabey, Uǧur

2017-07-01

In this study we focus on annual number of claims of a private health insurance data set which belongs to a local insurance company in Turkey. In addition to Poisson model and negative binomial model, zero-inflated Poisson model and zero-inflated negative binomial model are used to model the number of claims in order to take into account excess zeros. To investigate the impact of different distributional assumptions for the number of claims on the prediction of total claim amount, predictive performances of candidate models are compared by using root mean square error (RMSE) and mean absolute error (MAE) criteria.
Confidence Intervals for True Scores Using the Skew-Normal Distribution

ERIC Educational Resources Information Center

Garcia-Perez, Miguel A.

2010-01-01

A recent comparative analysis of alternative interval estimation approaches and procedures has shown that confidence intervals (CIs) for true raw scores determined with the Score method--which uses the normal approximation to the binomial distribution--have actual coverage probabilities that are closest to their nominal level. It has also recently…
The Detection of Signals in Impulsive Noise.

DTIC Science & Technology

1983-06-01

ASSI FICATION/ DOWN GRADING SCHEOUL1E * I1S. DISTRIBUTION STATEMENT (of th0i0 Rhport) Approved for Public Release; Distribucion Unlimited * 17...has a symmetric distribution, sgn(x i) will be -1 with probability 1/2 and +1 with probability 1/2. Considering the sum of observations as 0 binomial
Using beta binomials to estimate classification uncertainty for ensemble models.

PubMed

Clark, Robert D; Liang, Wenkel; Lee, Adam C; Lawless, Michael S; Fraczkiewicz, Robert; Waldman, Marvin

2014-01-01

Quantitative structure-activity (QSAR) models have enormous potential for reducing drug discovery and development costs as well as the need for animal testing. Great strides have been made in estimating their overall reliability, but to fully realize that potential, researchers and regulators need to know how confident they can be in individual predictions. Submodels in an ensemble model which have been trained on different subsets of a shared training pool represent multiple samples of the model space, and the degree of agreement among them contains information on the reliability of ensemble predictions. For artificial neural network ensembles (ANNEs) using two different methods for determining ensemble classification - one using vote tallies and the other averaging individual network outputs - we have found that the distribution of predictions across positive vote tallies can be reasonably well-modeled as a beta binomial distribution, as can the distribution of errors. Together, these two distributions can be used to estimate the probability that a given predictive classification will be in error. Large data sets comprised of logP, Ames mutagenicity, and CYP2D6 inhibition data are used to illustrate and validate the method. The distributions of predictions and errors for the training pool accurately predicted the distribution of predictions and errors for large external validation sets, even when the number of positive and negative examples in the training pool were not balanced. Moreover, the likelihood of a given compound being prospectively misclassified as a function of the degree of consensus between networks in the ensemble could in most cases be estimated accurately from the fitted beta binomial distributions for the training pool. Confidence in an individual predictive classification by an ensemble model can be accurately assessed by examining the distributions of predictions and errors as a function of the degree of agreement among the constituent submodels. Further, ensemble uncertainty estimation can often be improved by adjusting the voting or classification threshold based on the parameters of the error distribution. Finally, the profiles for models whose predictive uncertainty estimates are not reliable provide clues to that effect without the need for comparison to an external test set.
Novel formulation of the ℳ model through the Generalized-K distribution for atmospheric optical channels.

PubMed

Garrido-Balsells, José María; Jurado-Navas, Antonio; Paris, José Francisco; Castillo-Vazquez, Miguel; Puerta-Notario, Antonio

2015-03-09

In this paper, a novel and deeper physical interpretation on the recently published Málaga or ℳ statistical distribution is provided. This distribution, which is having a wide acceptance by the scientific community, models the optical irradiance scintillation induced by the atmospheric turbulence. Here, the analytical expressions previously published are modified in order to express them by a mixture of the known Generalized-K and discrete Binomial and Negative Binomial distributions. In particular, the probability density function (pdf) of the ℳ model is now obtained as a linear combination of these Generalized-K pdf, in which the coefficients depend directly on the parameters of the ℳ distribution. In this way, the Málaga model can be physically interpreted as a superposition of different optical sub-channels each of them described by the corresponding Generalized-K fading model and weighted by the ℳ dependent coefficients. The expressions here proposed are simpler than the equations of the original ℳ model and are validated by means of numerical simulations by generating ℳ -distributed random sequences and their associated histogram. This novel interpretation of the Málaga statistical distribution provides a valuable tool for analyzing the performance of atmospheric optical channels for every turbulence condition.
A Methodology for Quantifying Certain Design Requirements During the Design Phase

NASA Technical Reports Server (NTRS)

Adams, Timothy; Rhodes, Russel

2005-01-01

A methodology for developing and balancing quantitative design requirements for safety, reliability, and maintainability has been proposed. Conceived as the basis of a more rational approach to the design of spacecraft, the methodology would also be applicable to the design of automobiles, washing machines, television receivers, or almost any other commercial product. Heretofore, it has been common practice to start by determining the requirements for reliability of elements of a spacecraft or other system to ensure a given design life for the system. Next, safety requirements are determined by assessing the total reliability of the system and adding redundant components and subsystems necessary to attain safety goals. As thus described, common practice leaves the maintainability burden to fall to chance; therefore, there is no control of recurring costs or of the responsiveness of the system. The means that have been used in assessing maintainability have been oriented toward determining the logistical sparing of components so that the components are available when needed. The process established for developing and balancing quantitative requirements for safety (S), reliability (R), and maintainability (M) derives and integrates NASA s top-level safety requirements and the controls needed to obtain program key objectives for safety and recurring cost (see figure). Being quantitative, the process conveniently uses common mathematical models. Even though the process is shown as being worked from the top down, it can also be worked from the bottom up. This process uses three math models: (1) the binomial distribution (greaterthan- or-equal-to case), (2) reliability for a series system, and (3) the Poisson distribution (less-than-or-equal-to case). The zero-fail case for the binomial distribution approximates the commonly known exponential distribution or "constant failure rate" distribution. Either model can be used. The binomial distribution was selected for modeling flexibility because it conveniently addresses both the zero-fail and failure cases. The failure case is typically used for unmanned spacecraft as with missiles.
Introductory Statistics in the Garden

ERIC Educational Resources Information Center

Wagaman, John C.

2017-01-01

This article describes four semesters of introductory statistics courses that incorporate service learning and gardening into the curriculum with applications of the binomial distribution, least squares regression and hypothesis testing. The activities span multiple semesters and are iterative in nature.
A comparison of LMC and SDL complexity measures on binomial distributions

NASA Astrophysics Data System (ADS)

Piqueira, José Roberto C.

2016-02-01

The concept of complexity has been widely discussed in the last forty years, with a lot of thinking contributions coming from all areas of the human knowledge, including Philosophy, Linguistics, History, Biology, Physics, Chemistry and many others, with mathematicians trying to give a rigorous view of it. In this sense, thermodynamics meets information theory and, by using the entropy definition, López-Ruiz, Mancini and Calbet proposed a definition for complexity that is referred as LMC measure. Shiner, Davison and Landsberg, by slightly changing the LMC definition, proposed the SDL measure and the both, LMC and SDL, are satisfactory to measure complexity for a lot of problems. Here, SDL and LMC measures are applied to the case of a binomial probability distribution, trying to clarify how the length of the data set implies complexity and how the success probability of the repeated trials determines how complex the whole set is.
Inferring subunit stoichiometry from single molecule photobleaching

PubMed Central

2013-01-01

Single molecule photobleaching is a powerful tool for determining the stoichiometry of protein complexes. By attaching fluorophores to proteins of interest, the number of associated subunits in a complex can be deduced by imaging single molecules and counting fluorophore photobleaching steps. Because some bleaching steps might be unobserved, the ensemble of steps will be binomially distributed. In this work, it is shown that inferring the true composition of a complex from such data is nontrivial because binomially distributed observations present an ill-posed inference problem. That is, a unique and optimal estimate of the relevant parameters cannot be extracted from the observations. Because of this, a method has not been firmly established to quantify confidence when using this technique. This paper presents a general inference model for interpreting such data and provides methods for accurately estimating parameter confidence. The formalization and methods presented here provide a rigorous analytical basis for this pervasive experimental tool. PMID:23712552
Non-normal Distributions Commonly Used in Health, Education, and Social Sciences: A Systematic Review

PubMed Central

Bono, Roser; Blanca, María J.; Arnau, Jaume; Gómez-Benito, Juana

2017-01-01

Statistical analysis is crucial for research and the choice of analytical technique should take into account the specific distribution of data. Although the data obtained from health, educational, and social sciences research are often not normally distributed, there are very few studies detailing which distributions are most likely to represent data in these disciplines. The aim of this systematic review was to determine the frequency of appearance of the most common non-normal distributions in the health, educational, and social sciences. The search was carried out in the Web of Science database, from which we retrieved the abstracts of papers published between 2010 and 2015. The selection was made on the basis of the title and the abstract, and was performed independently by two reviewers. The inter-rater reliability for article selection was high (Cohen’s kappa = 0.84), and agreement regarding the type of distribution reached 96.5%. A total of 262 abstracts were included in the final review. The distribution of the response variable was reported in 231 of these abstracts, while in the remaining 31 it was merely stated that the distribution was non-normal. In terms of their frequency of appearance, the most-common non-normal distributions can be ranked in descending order as follows: gamma, negative binomial, multinomial, binomial, lognormal, and exponential. In addition to identifying the distributions most commonly used in empirical studies these results will help researchers to decide which distributions should be included in simulation studies examining statistical procedures. PMID:28959227
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment.

PubMed

Gierliński, Marek; Cole, Christian; Schofield, Pietà; Schurch, Nicholas J; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J

2015-11-15

High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of 'bad' replicates, which can drastically affect the gene read-count distribution. RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. g.j.barton@dundee.ac.uk. © The Author 2015. Published by Oxford University Press.
Statistical models for RNA-seq data derived from a two-condition 48-replicate experiment

PubMed Central

Cole, Christian; Schofield, Pietà; Schurch, Nicholas J.; Sherstnev, Alexander; Singh, Vijender; Wrobel, Nicola; Gharbi, Karim; Simpson, Gordon; Owen-Hughes, Tom; Blaxter, Mark; Barton, Geoffrey J.

2015-01-01

Motivation: High-throughput RNA sequencing (RNA-seq) is now the standard method to determine differential gene expression. Identifying differentially expressed genes crucially depends on estimates of read-count variability. These estimates are typically based on statistical models such as the negative binomial distribution, which is employed by the tools edgeR, DESeq and cuffdiff. Until now, the validity of these models has usually been tested on either low-replicate RNA-seq data or simulations. Results: A 48-replicate RNA-seq experiment in yeast was performed and data tested against theoretical models. The observed gene read counts were consistent with both log-normal and negative binomial distributions, while the mean-variance relation followed the line of constant dispersion parameter of ∼0.01. The high-replicate data also allowed for strict quality control and screening of ‘bad’ replicates, which can drastically affect the gene read-count distribution. Availability and implementation: RNA-seq data have been submitted to ENA archive with project ID PRJEB5348. Contact: g.j.barton@dundee.ac.uk PMID:26206307
Negative Binomial Process Count and Mixture Modeling.

PubMed

Zhou, Mingyuan; Carin, Lawrence

2015-02-01

The seemingly disjoint problems of count and mixture modeling are united under the negative binomial (NB) process. A gamma process is employed to model the rate measure of a Poisson process, whose normalization provides a random probability measure for mixture modeling and whose marginalization leads to an NB process for count modeling. A draw from the NB process consists of a Poisson distributed finite number of distinct atoms, each of which is associated with a logarithmic distributed number of data samples. We reveal relationships between various count- and mixture-modeling distributions and construct a Poisson-logarithmic bivariate distribution that connects the NB and Chinese restaurant table distributions. Fundamental properties of the models are developed, and we derive efficient Bayesian inference. It is shown that with augmentation and normalization, the NB process and gamma-NB process can be reduced to the Dirichlet process and hierarchical Dirichlet process, respectively. These relationships highlight theoretical, structural, and computational advantages of the NB process. A variety of NB processes, including the beta-geometric, beta-NB, marked-beta-NB, marked-gamma-NB and zero-inflated-NB processes, with distinct sharing mechanisms, are also constructed. These models are applied to topic modeling, with connections made to existing algorithms under Poisson factor analysis. Example results show the importance of inferring both the NB dispersion and probability parameters.
Technical and biological variance structure in mRNA-Seq data: life in the real world

PubMed Central

2012-01-01

Background mRNA expression data from next generation sequencing platforms is obtained in the form of counts per gene or exon. Counts have classically been assumed to follow a Poisson distribution in which the variance is equal to the mean. The Negative Binomial distribution which allows for over-dispersion, i.e., for the variance to be greater than the mean, is commonly used to model count data as well. Results In mRNA-Seq data from 25 subjects, we found technical variation to generally follow a Poisson distribution as has been reported previously and biological variability was over-dispersed relative to the Poisson model. The mean-variance relationship across all genes was quadratic, in keeping with a Negative Binomial (NB) distribution. Over-dispersed Poisson and NB distributional assumptions demonstrated marked improvements in goodness-of-fit (GOF) over the standard Poisson model assumptions, but with evidence of over-fitting in some genes. Modeling of experimental effects improved GOF for high variance genes but increased the over-fitting problem. Conclusions These conclusions will guide development of analytical strategies for accurate modeling of variance structure in these data and sample size determination which in turn will aid in the identification of true biological signals that inform our understanding of biological systems. PMID:22769017
Generalization of multifractal theory within quantum calculus

NASA Astrophysics Data System (ADS)

Olemskoi, A.; Shuda, I.; Borisyuk, V.

2010-03-01

On the basis of the deformed series in quantum calculus, we generalize the partition function and the mass exponent of a multifractal, as well as the average of a random variable distributed over a self-similar set. For the partition function, such expansion is shown to be determined by binomial-type combinations of the Tsallis entropies related to manifold deformations, while the mass exponent expansion generalizes the known relation τq=Dq(q-1). We find the equation for the set of averages related to ordinary, escort, and generalized probabilities in terms of the deformed expansion as well. Multifractals related to the Cantor binomial set, exchange currency series, and porous-surface condensates are considered as examples.
Simulation of flight maneuver-load distributions by utilizing stationary, non-Gaussian random load histories

NASA Technical Reports Server (NTRS)

Leybold, H. A.

1971-01-01

Random numbers were generated with the aid of a digital computer and transformed such that the probability density function of a discrete random load history composed of these random numbers had one of the following non-Gaussian distributions: Poisson, binomial, log-normal, Weibull, and exponential. The resulting random load histories were analyzed to determine their peak statistics and were compared with cumulative peak maneuver-load distributions for fighter and transport aircraft in flight.
An analytical framework for estimating aquatic species density from environmental DNA

USGS Publications Warehouse

Chambert, Thierry; Pilliod, David S.; Goldberg, Caren S.; Doi, Hideyuki; Takahara, Teruhiko

2018-01-01

Environmental DNA (eDNA) analysis of water samples is on the brink of becoming a standard monitoring method for aquatic species. This method has improved detection rates over conventional survey methods and thus has demonstrated effectiveness for estimation of site occupancy and species distribution. The frontier of eDNA applications, however, is to infer species density. Building upon previous studies, we present and assess a modeling approach that aims at inferring animal density from eDNA. The modeling combines eDNA and animal count data from a subset of sites to estimate species density (and associated uncertainties) at other sites where only eDNA data are available. As a proof of concept, we first perform a cross-validation study using experimental data on carp in mesocosms. In these data, fish densities are known without error, which allows us to test the performance of the method with known data. We then evaluate the model using field data from a study on a stream salamander species to assess the potential of this method to work in natural settings, where density can never be known with absolute certainty. Two alternative distributions (Normal and Negative Binomial) to model variability in eDNA concentration data are assessed. Assessment based on the proof of concept data (carp) revealed that the Negative Binomial model provided much more accurate estimates than the model based on a Normal distribution, likely because eDNA data tend to be overdispersed. Greater imprecision was found when we applied the method to the field data, but the Negative Binomial model still provided useful density estimates. We call for further model development in this direction, as well as further research targeted at sampling design optimization. It will be important to assess these approaches on a broad range of study systems.
Statistical inference involving binomial and negative binomial parameters.

PubMed

García-Pérez, Miguel A; Núñez-Antón, Vicente

2009-05-01

Statistical inference about two binomial parameters implies that they are both estimated by binomial sampling. There are occasions in which one aims at testing the equality of two binomial parameters before and after the occurrence of the first success along a sequence of Bernoulli trials. In these cases, the binomial parameter before the first success is estimated by negative binomial sampling whereas that after the first success is estimated by binomial sampling, and both estimates are related. This paper derives statistical tools to test two hypotheses, namely, that both binomial parameters equal some specified value and that both parameters are equal though unknown. Simulation studies are used to show that in small samples both tests are accurate in keeping the nominal Type-I error rates, and also to determine sample size requirements to detect large, medium, and small effects with adequate power. Additional simulations also show that the tests are sufficiently robust to certain violations of their assumptions.
Modeling avian abundance from replicated counts using binomial mixture models

USGS Publications Warehouse

Kery, Marc; Royle, J. Andrew; Schmid, Hans

2005-01-01

Abundance estimation in ecology is usually accomplished by capture–recapture, removal, or distance sampling methods. These may be hard to implement at large spatial scales. In contrast, binomial mixture models enable abundance estimation without individual identification, based simply on temporally and spatially replicated counts. Here, we evaluate mixture models using data from the national breeding bird monitoring program in Switzerland, where some 250 1-km2 quadrats are surveyed using the territory mapping method three times during each breeding season. We chose eight species with contrasting distribution (wide–narrow), abundance (high–low), and detectability (easy–difficult). Abundance was modeled as a random effect with a Poisson or negative binomial distribution, with mean affected by forest cover, elevation, and route length. Detectability was a logit-linear function of survey date, survey date-by-elevation, and sampling effort (time per transect unit). Resulting covariate effects and parameter estimates were consistent with expectations. Detectability per territory (for three surveys) ranged from 0.66 to 0.94 (mean 0.84) for easy species, and from 0.16 to 0.83 (mean 0.53) for difficult species, depended on survey effort for two easy and all four difficult species, and changed seasonally for three easy and three difficult species. Abundance was positively related to route length in three high-abundance and one low-abundance (one easy and three difficult) species, and increased with forest cover in five forest species, decreased for two nonforest species, and was unaffected for a generalist species. Abundance estimates under the most parsimonious mixture models were between 1.1 and 8.9 (median 1.8) times greater than estimates based on territory mapping; hence, three surveys were insufficient to detect all territories for each species. We conclude that binomial mixture models are an important new approach for estimating abundance corrected for detectability when only repeated-count data are available. Future developments envisioned include estimation of trend, occupancy, and total regional abundance.
Spatial distribution of single-nucleotide polymorphisms related to fungicide resistance and implications for sampling.

PubMed

Van der Heyden, H; Dutilleul, P; Brodeur, L; Carisse, O

2014-06-01

Spatial distribution of single-nucleotide polymorphisms (SNPs) related to fungicide resistance was studied for Botrytis cinerea populations in vineyards and for B. squamosa populations in onion fields. Heterogeneity in this distribution was characterized by performing geostatistical analyses based on semivariograms and through the fitting of discrete probability distributions. Two SNPs known to be responsible for boscalid resistance (H272R and H272Y), both located on the B subunit of the succinate dehydrogenase gene, and one SNP known to be responsible for dicarboximide resistance (I365S) were chosen for B. cinerea in grape. For B. squamosa in onion, one SNP responsible for dicarboximide resistance (I365S homologous) was chosen. One onion field was sampled in 2009 and another one was sampled in 2010 for B. squamosa, and two vineyards were sampled in 2011 for B. cinerea, for a total of four sampled sites. Cluster sampling was carried on a 10-by-10 grid, each of the 100 nodes being the center of a 10-by-10-m quadrat. In each quadrat, 10 samples were collected and analyzed by restriction fragment length polymorphism polymerase chain reaction (PCR) or allele specific PCR. Mean SNP incidence varied from 16 to 68%, with an overall mean incidence of 43%. In the geostatistical analyses, omnidirectional variograms showed spatial autocorrelation characterized by ranges of 21 to 1 m. Various levels of anisotropy were detected, however, with variograms computed in four directions (at 0°, 45°, 90°, and 135° from the within-row direction used as reference), indicating that spatial autocorrelation was prevalent or characterized by a longer range in one direction. For all eight data sets, the β-binomial distribution was found to fit the data better than the binomial distribution. This indicates local aggregation of fungicide resistance among sampling units, as supported by estimates of the parameter θ of the β-binomial distribution of 0.09 to 0.23 (overall median value = 0.20). On the basis of the observed spatial distribution patterns of SNP incidence, sampling curves were computed for different levels of reliability, emphasizing the importance of sample size for the detection of mutation incidence below the risk threshold for control failure.

NEWTONP - CUMULATIVE BINOMIAL PROGRAMS

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The cumulative binomial program, NEWTONP, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, NEWTONP, CUMBIN (NPO-17555), and CROSSER (NPO-17557), can be used independently of one another. NEWTONP can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. NEWTONP calculates the probably p required to yield a given system reliability V for a k-out-of-n system. It can also be used to determine the Clopper-Pearson confidence limits (either one-sided or two-sided) for the parameter p of a Bernoulli distribution. NEWTONP can determine Bayesian probability limits for a proportion (if the beta prior has positive integer parameters). It can determine the percentiles of incomplete beta distributions with positive integer parameters. It can also determine the percentiles of F distributions and the midian plotting positions in probability plotting. NEWTONP is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. NEWTONP is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. It also lists the number of iterations of Newton's method required to calculate the answer within the given error. The NEWTONP program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. NEWTONP was developed in 1988.
Distribution of chewing lice upon the polygynous peacock Pavo cristatus.

PubMed

Stewart, I R; Clark, F; Petrie, M

1996-04-01

An opportunistic survey of louse distribution upon the peacock Pavo cristatus was undertaken following a cull of 23 birds from an English zoo. After complete skin and feather dissolution, 2 species of lice were retrieved, Goniodes pavonis and Amyrsidea minuta. The distribution of both louse species could be described by a negative binomial model. The significance of this is discussed in relation to transmission dynamics of lice in the atypical avian mating system found in the peacock, which involves no male parental care.
Estimation of aquifer scale proportion using equal area grids: assessment of regional scale groundwater quality

USGS Publications Warehouse

Belitz, Kenneth; Jurgens, Bryant C.; Landon, Matthew K.; Fram, Miranda S.; Johnson, Tyler D.

2010-01-01

The proportion of an aquifer with constituent concentrations above a specified threshold (high concentrations) is taken as a nondimensional measure of regional scale water quality. If computed on the basis of area, it can be referred to as the aquifer scale proportion. A spatially unbiased estimate of aquifer scale proportion and a confidence interval for that estimate are obtained through the use of equal area grids and the binomial distribution. Traditionally, the confidence interval for a binomial proportion is computed using either the standard interval or the exact interval. Research from the statistics literature has shown that the standard interval should not be used and that the exact interval is overly conservative. On the basis of coverage probability and interval width, the Jeffreys interval is preferred. If more than one sample per cell is available, cell declustering is used to estimate the aquifer scale proportion, and Kish's design effect may be useful for estimating an effective number of samples. The binomial distribution is also used to quantify the adequacy of a grid with a given number of cells for identifying a small target, defined as a constituent that is present at high concentrations in a small proportion of the aquifer. Case studies illustrate a consistency between approaches that use one well per grid cell and many wells per cell. The methods presented in this paper provide a quantitative basis for designing a sampling program and for utilizing existing data.
Parameter Estimation for the Dirichlet-Multinomial Distribution Using Supplementary Beta-Binomial Data.

DTIC Science & Technology

1987-07-01

multinomial distribution as a magazine exposure model. J. of Marketing Research . 21, 100-106. Lehmann, E.L. (1983). Theory of Point Estimation. John Wiley and... Marketing Research . 21, 89-99. V I flWflW WflW~WWMWSS tWN ,rw fl rwwrwwr-w~ w-. ~. - - -- .~ 𔃾 4’.) ~a 4’ ., 𔃾. ’-4. .4.: .4~ I .4. ~J3iAf a,’ -a’ 4
Logistic quantile regression provides improved estimates for bounded avian counts: a case study of California Spotted Owl fledgling production

Treesearch

Brian S. Cade; Barry R. Noon; Rick D. Scherer; John J. Keane

2017-01-01

Counts of avian fledglings, nestlings, or clutch size that are bounded below by zero and above by some small integer form a discrete random variable distribution that is not approximated well by conventional parametric count distributions such as the Poisson or negative binomial. We developed a logistic quantile regression model to provide estimates of the empirical...
A Statistical Treatment of Bioassay Pour Fractions

NASA Technical Reports Server (NTRS)

Barengoltz, Jack; Hughes, David W.

2014-01-01

The binomial probability distribution is used to treat the statistics of a microbiological sample that is split into two parts, with only one part evaluated for spore count. One wishes to estimate the total number of spores in the sample based on the counts obtained from the part that is evaluated (pour fraction). Formally, the binomial distribution is recharacterized as a function of the observed counts (successes), with the total number (trials) an unknown. The pour fraction is the probability of success per spore (trial). This distribution must be renormalized in terms of the total number. Finally, the new renormalized distribution is integrated and mathematically inverted to yield the maximum estimate of the total number as a function of a desired level of confidence ( P(
Yes, the GIGP Really Does Work--And Is Workable!

ERIC Educational Resources Information Center

Burrell, Quentin L.; Fenton, Michael R.

1993-01-01

Discusses the generalized inverse Gaussian-Poisson (GIGP) process for informetric modeling. Negative binomial distribution is discussed, construction of the GIGP process is explained, zero-truncated GIGP is considered, and applications of the process with journals, library circulation statistics, and database index terms are described. (50…
Topics in Bayesian Hierarchical Modeling and its Monte Carlo Computations

NASA Astrophysics Data System (ADS)

Tak, Hyung Suk

The first chapter addresses a Beta-Binomial-Logit model that is a Beta-Binomial conjugate hierarchical model with covariate information incorporated via a logistic regression. Various researchers in the literature have unknowingly used improper posterior distributions or have given incorrect statements about posterior propriety because checking posterior propriety can be challenging due to the complicated functional form of a Beta-Binomial-Logit model. We derive data-dependent necessary and sufficient conditions for posterior propriety within a class of hyper-prior distributions that encompass those used in previous studies. Frequency coverage properties of several hyper-prior distributions are also investigated to see when and whether Bayesian interval estimates of random effects meet their nominal confidence levels. The second chapter deals with a time delay estimation problem in astrophysics. When the gravitational field of an intervening galaxy between a quasar and the Earth is strong enough to split light into two or more images, the time delay is defined as the difference between their travel times. The time delay can be used to constrain cosmological parameters and can be inferred from the time series of brightness data of each image. To estimate the time delay, we construct a Gaussian hierarchical model based on a state-space representation for irregularly observed time series generated by a latent continuous-time Ornstein-Uhlenbeck process. Our Bayesian approach jointly infers model parameters via a Gibbs sampler. We also introduce a profile likelihood of the time delay as an approximation of its marginal posterior distribution. The last chapter specifies a repelling-attracting Metropolis algorithm, a new Markov chain Monte Carlo method to explore multi-modal distributions in a simple and fast manner. This algorithm is essentially a Metropolis-Hastings algorithm with a proposal that consists of a downhill move in density that aims to make local modes repelling, followed by an uphill move in density that aims to make local modes attracting. The downhill move is achieved via a reciprocal Metropolis ratio so that the algorithm prefers downward movement. The uphill move does the opposite using the standard Metropolis ratio which prefers upward movement. This down-up movement in density increases the probability of a proposed move to a different mode.
Analysis of multiple tank car releases in train accidents.

PubMed

Liu, Xiang; Liu, Chang; Hong, Yili

2017-10-01

There are annually over two million carloads of hazardous materials transported by rail in the United States. The American railroads use large blocks of tank cars to transport petroleum crude oil and other flammable liquids from production to consumption sites. Being different from roadway transport of hazardous materials, a train accident can potentially result in the derailment and release of multiple tank cars, which may result in significant consequences. The prior literature predominantly assumes that the occurrence of multiple tank car releases in a train accident is a series of independent Bernoulli processes, and thus uses the binomial distribution to estimate the total number of tank car releases given the number of tank cars derailing or damaged. This paper shows that the traditional binomial model can incorrectly estimate multiple tank car release probability by magnitudes in certain circumstances, thereby significantly affecting railroad safety and risk analysis. To bridge this knowledge gap, this paper proposes a novel, alternative Correlated Binomial (CB) model that accounts for the possible correlations of multiple tank car releases in the same train. We test three distinct correlation structures in the CB model, and find that they all outperform the conventional binomial model based on empirical tank car accident data. The analysis shows that considering tank car release correlations would result in a significantly improved fit of the empirical data than otherwise. Consequently, it is prudent to consider alternative modeling techniques when analyzing the probability of multiple tank car releases in railroad accidents. Copyright © 2017 Elsevier Ltd. All rights reserved.
Linking parasite populations in hosts to parasite populations in space through Taylor's law and the negative binomial distribution

PubMed Central

Poulin, Robert; Lagrue, Clément

2017-01-01

The spatial distribution of individuals of any species is a basic concern of ecology. The spatial distribution of parasites matters to control and conservation of parasites that affect human and nonhuman populations. This paper develops a quantitative theory to predict the spatial distribution of parasites based on the distribution of parasites in hosts and the spatial distribution of hosts. Four models are tested against observations of metazoan hosts and their parasites in littoral zones of four lakes in Otago, New Zealand. These models differ in two dichotomous assumptions, constituting a 2 × 2 theoretical design. One assumption specifies whether the variance function of the number of parasites per host individual is described by Taylor's law (TL) or the negative binomial distribution (NBD). The other assumption specifies whether the numbers of parasite individuals within each host in a square meter of habitat are independent or perfectly correlated among host individuals. We find empirically that the variance–mean relationship of the numbers of parasites per square meter is very well described by TL but is not well described by NBD. Two models that posit perfect correlation of the parasite loads of hosts in a square meter of habitat approximate observations much better than two models that posit independence of parasite loads of hosts in a square meter, regardless of whether the variance–mean relationship of parasites per host individual obeys TL or NBD. We infer that high local interhost correlations in parasite load strongly influence the spatial distribution of parasites. Local hotspots could influence control and conservation of parasites. PMID:27994156
A Unifying Probability Example.

ERIC Educational Resources Information Center

Maruszewski, Richard F., Jr.

2002-01-01

Presents an example from probability and statistics that ties together several topics including the mean and variance of a discrete random variable, the binomial distribution and its particular mean and variance, the sum of independent random variables, the mean and variance of the sum, and the central limit theorem. Uses Excel to illustrate these…
Temporal Evolution of Non-equilibrium Gamma’ Precipitates in a Rapidly Quenched Nickel Base Superalloy (Preprint)

DTIC Science & Technology

2014-04-01

with the binomial distribution for a particular dataset. This technique is more commonly known as the Langer, Bar-on and Miller ( LBM ) method [22,23...distribution unlimited. Using the LBM method, the frequency distribution plot for a dataset corresponding to a phase separated system, exhibiting a split peak...estimated parameters (namely μ1, μ2, σ, fγ’ and fγ) obtained from the LBM plots in Fig. 5 are summarized in Table 3. The EWQ sample does not exhibit any
Estimation of the cure rate in Iranian breast cancer patients.

PubMed

Rahimzadeh, Mitra; Baghestani, Ahmad Reza; Gohari, Mahmood Reza; Pourhoseingholi, Mohamad Amin

2014-01-01

Although the Cox's proportional hazard model is the popular approach for survival analysis to investigate significant risk factors of cancer patient survival, it is not appropriate in the case of log-term disease free survival. Recently, cure rate models have been introduced to distinguish between clinical determinants of cure and variables associated with the time to event of interest. The aim of this study was to use a cure rate model to determine the clinical associated factors for cure rates of patients with breast cancer (BC). This prospective cohort study covered 305 patients with BC, admitted at Shahid Faiazbakhsh Hospital, Tehran, during 2006 to 2008 and followed until April 2012. Cases of patient death were confirmed by telephone contact. For data analysis, a non-mixed cure rate model with Poisson distribution and negative binomial distribution were employed. All analyses were carried out using a developed Macro in WinBugs. Deviance information criteria (DIC) were employed to find the best model. The overall 1-year, 3-year and 5-year relative survival rates were 97%, 89% and 74%. Metastasis and stage of BC were the significant factors, but age was significant only in negative binomial model. The DIC also showed that the negative binomial model had a better fit. This study indicated that, metastasis and stage of BC were identified as the clinical criteria for cure rates. There are limited studies on BC survival which employed these cure rate models to identify the clinical factors associated with cure. These models are better than Cox, in the case of long-term survival.
How Interesting Is a Cricket Match?

ERIC Educational Resources Information Center

Glaister, P.

2006-01-01

Even for those passionate about both cricket and maths, each can have their dull moments. This article brings together the sometimes-dry binomial distribution with a problem of cricket matches where the result of the series has already been decided, and so are "dead". It is hoped that readers will become more interested in at least one…
Learner-Controlled Scaffolding Linked to Open-Ended Problems in a Digital Learning Environment

ERIC Educational Resources Information Center

Edson, Alden Jack

2017-01-01

This exploratory study reports on how students activated learner-controlled scaffolding and navigated through sequences of connected problems in a digital learning environment. A design experiment was completed to (re)design, iteratively develop, test, and evaluate a digital version of an instructional unit focusing on binomial distributions and…
A chi-square goodness-of-fit test for non-identically distributed random variables: with application to empirical Bayes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Conover, W.J.; Cox, D.D.; Martz, H.F.

1997-12-01

When using parametric empirical Bayes estimation methods for estimating the binomial or Poisson parameter, the validity of the assumed beta or gamma conjugate prior distribution is an important diagnostic consideration. Chi-square goodness-of-fit tests of the beta or gamma prior hypothesis are developed for use when the binomial sample sizes or Poisson exposure times vary. Nine examples illustrate the application of the methods, using real data from such diverse applications as the loss of feedwater flow rates in nuclear power plants, the probability of failure to run on demand and the failure rates of the high pressure coolant injection systems atmore » US commercial boiling water reactors, the probability of failure to run on demand of emergency diesel generators in US commercial nuclear power plants, the rate of failure of aircraft air conditioners, baseball batting averages, the probability of testing positive for toxoplasmosis, and the probability of tumors in rats. The tests are easily applied in practice by means of corresponding Mathematica{reg_sign} computer programs which are provided.« less
Simplified pupal surveys of Aedes aegypti (L.) for entomologic surveillance and dengue control.

PubMed

Barrera, Roberto

2009-07-01

Pupal surveys of Aedes aegypti (L.) are useful indicators of risk for dengue transmission, although sample sizes for reliable estimations can be large. This study explores two methods for making pupal surveys more practical yet reliable and used data from 10 pupal surveys conducted in Puerto Rico during 2004-2008. The number of pupae per person for each sampling followed a negative binomial distribution, thus showing aggregation. One method found a common aggregation parameter (k) for the negative binomial distribution, a finding that enabled the application of a sequential sampling method requiring few samples to determine whether the number of pupae/person was above a vector density threshold for dengue transmission. A second approach used the finding that the mean number of pupae/person is correlated with the proportion of pupa-infested households and calculated equivalent threshold proportions of pupa-positive households. A sequential sampling program was also developed for this method to determine whether observed proportions of infested households were above threshold levels. These methods can be used to validate entomological thresholds for dengue transmission.
Metaprop: a Stata command to perform meta-analysis of binomial data.

PubMed

Nyaga, Victoria N; Arbyn, Marc; Aerts, Marc

2014-01-01

Meta-analyses have become an essential tool in synthesizing evidence on clinical and epidemiological questions derived from a multitude of similar studies assessing the particular issue. Appropriate and accessible statistical software is needed to produce the summary statistic of interest. Metaprop is a statistical program implemented to perform meta-analyses of proportions in Stata. It builds further on the existing Stata procedure metan which is typically used to pool effects (risk ratios, odds ratios, differences of risks or means) but which is also used to pool proportions. Metaprop implements procedures which are specific to binomial data and allows computation of exact binomial and score test-based confidence intervals. It provides appropriate methods for dealing with proportions close to or at the margins where the normal approximation procedures often break down, by use of the binomial distribution to model the within-study variability or by allowing Freeman-Tukey double arcsine transformation to stabilize the variances. Metaprop was applied on two published meta-analyses: 1) prevalence of HPV-infection in women with a Pap smear showing ASC-US; 2) cure rate after treatment for cervical precancer using cold coagulation. The first meta-analysis showed a pooled HPV-prevalence of 43% (95% CI: 38%-48%). In the second meta-analysis, the pooled percentage of cured women was 94% (95% CI: 86%-97%). By using metaprop, no studies with 0% or 100% proportions were excluded from the meta-analysis. Furthermore, study specific and pooled confidence intervals always were within admissible values, contrary to the original publication, where metan was used.
A comparison of methods for the analysis of binomial clustered outcomes in behavioral research.

PubMed

Ferrari, Alberto; Comelli, Mario

2016-12-01

In behavioral research, data consisting of a per-subject proportion of "successes" and "failures" over a finite number of trials often arise. This clustered binary data are usually non-normally distributed, which can distort inference if the usual general linear model is applied and sample size is small. A number of more advanced methods is available, but they are often technically challenging and a comparative assessment of their performances in behavioral setups has not been performed. We studied the performances of some methods applicable to the analysis of proportions; namely linear regression, Poisson regression, beta-binomial regression and Generalized Linear Mixed Models (GLMMs). We report on a simulation study evaluating power and Type I error rate of these models in hypothetical scenarios met by behavioral researchers; plus, we describe results from the application of these methods on data from real experiments. Our results show that, while GLMMs are powerful instruments for the analysis of clustered binary outcomes, beta-binomial regression can outperform them in a range of scenarios. Linear regression gave results consistent with the nominal level of significance, but was overall less powerful. Poisson regression, instead, mostly led to anticonservative inference. GLMMs and beta-binomial regression are generally more powerful than linear regression; yet linear regression is robust to model misspecification in some conditions, whereas Poisson regression suffers heavily from violations of the assumptions when used to model proportion data. We conclude providing directions to behavioral scientists dealing with clustered binary data and small sample sizes. Copyright © 2016 Elsevier B.V. All rights reserved.
HYPERSAMP - HYPERGEOMETRIC ATTRIBUTE SAMPLING SYSTEM BASED ON RISK AND FRACTION DEFECTIVE

NASA Technical Reports Server (NTRS)

De, Salvo L. J.

1994-01-01

HYPERSAMP is a demonstration of an attribute sampling system developed to determine the minimum sample size required for any preselected value for consumer's risk and fraction of nonconforming. This statistical method can be used in place of MIL-STD-105E sampling plans when a minimum sample size is desirable, such as when tests are destructive or expensive. HYPERSAMP utilizes the Hypergeometric Distribution and can be used for any fraction nonconforming. The program employs an iterative technique that circumvents the obstacle presented by the factorial of a non-whole number. HYPERSAMP provides the required Hypergeometric sample size for any equivalent real number of nonconformances in the lot or batch under evaluation. Many currently used sampling systems, such as the MIL-STD-105E, utilize the Binomial or the Poisson equations as an estimate of the Hypergeometric when performing inspection by attributes. However, this is primarily because of the difficulty in calculation of the factorials required by the Hypergeometric. Sampling plans based on the Binomial or Poisson equations will result in the maximum sample size possible with the Hypergeometric. The difference in the sample sizes between the Poisson or Binomial and the Hypergeometric can be significant. For example, a lot size of 400 devices with an error rate of 1.0% and a confidence of 99% would require a sample size of 400 (all units would need to be inspected) for the Binomial sampling plan and only 273 for a Hypergeometric sampling plan. The Hypergeometric results in a savings of 127 units, a significant reduction in the required sample size. HYPERSAMP is a demonstration program and is limited to sampling plans with zero defectives in the sample (acceptance number of zero). Since it is only a demonstration program, the sample size determination is limited to sample sizes of 1500 or less. The Hypergeometric Attribute Sampling System demonstration code is a spreadsheet program written for IBM PC compatible computers running DOS and Lotus 1-2-3 or Quattro Pro. This program is distributed on a 5.25 inch 360K MS-DOS format diskette, and the program price includes documentation. This statistical method was developed in 1992.

Higher moments of net-proton multiplicity distributions in a heavy-ion event pile-up scenario

NASA Astrophysics Data System (ADS)

Garg, P.; Mishra, D. K.

2017-10-01

High-luminosity modern accelerators, like the Relativistic Heavy Ion Collider (RHIC) at Brookhaven National Laboratory (BNL) and Large Hadron Collider (LHC) at European Organization for Nuclear Research (CERN), inherently have event pile-up scenarios which significantly contribute to physics events as a background. While state-of-the-art tracking algorithms and detector concepts take care of these event pile-up scenarios, several offline analytical techniques are used to remove such events from the physics analysis. It is still difficult to identify the remaining pile-up events in an event sample for physics analysis. Since the fraction of these events is significantly small, it may not be as serious of an issue for other analyses as it would be for an event-by-event analysis. Particularly when the characteristics of the multiplicity distribution are observable, one needs to be very careful. In the present work, we demonstrate how a small fraction of residual pile-up events can change the moments and their ratios of an event-by-event net-proton multiplicity distribution, which are sensitive to the dynamical fluctuations due to the QCD critical point. For this study, we assume that the individual event-by-event proton and antiproton multiplicity distributions follow Poisson, negative binomial, or binomial distributions. We observe a significant effect in cumulants and their ratios of net-proton multiplicity distributions due to pile-up events, particularly at lower energies. It might be crucial to estimate the fraction of pile-up events in the data sample while interpreting the experimental observable for the critical point.
Scoring in genetically modified organism proficiency tests based on log-transformed results.

PubMed

Thompson, Michael; Ellison, Stephen L R; Owen, Linda; Mathieson, Kenneth; Powell, Joanne; Key, Pauline; Wood, Roger; Damant, Andrew P

2006-01-01

The study considers data from 2 UK-based proficiency schemes and includes data from a total of 29 rounds and 43 test materials over a period of 3 years. The results from the 2 schemes are similar and reinforce each other. The amplification process used in quantitative polymerase chain reaction determinations predicts a mixture of normal, binomial, and lognormal distributions dominated by the latter 2. As predicted, the study results consistently follow a positively skewed distribution. Log-transformation prior to calculating z-scores is effective in establishing near-symmetric distributions that are sufficiently close to normal to justify interpretation on the basis of the normal distribution.
Probing the statistics of primordial fluctuations and their evolution

NASA Technical Reports Server (NTRS)

Gaztanaga, Enrique; Yokoyama, Jun'ichi

1993-01-01

The statistical distribution of fluctuations on various scales is analyzed in terms of the counts in cells of smoothed density fields, using volume-limited samples of galaxy redshift catalogs. It is shown that the distribution on large scales, with volume average of the two-point correlation function of the smoothed field less than about 0.05, is consistent with Gaussian. Statistics are shown to agree remarkably well with the negative binomial distribution, which has hierarchial correlations and a Gaussian behavior at large scales. If these observed properties correspond to the matter distribution, they suggest that our universe started with Gaussian fluctuations and evolved keeping hierarchial form.
[Distribution of individuals by spontaneous frequencies of lymphocytes with micronuclei. Particularity and consequences].

PubMed

Serebrianyĭ, A M; Akleev, A V; Aleshchenko, A V; Antoshchina, M M; Kudriashova, O V; Riabchenko, N I; Semenova, L P; Pelevina, I I

2011-01-01

By micronucleus (MN) assay with cytokinetic cytochalasin B block, the mean frequency of blood lymphocytes with MN has been determined in 76 Moscow inhabitants, 35 people from Obninsk and 122 from Chelyabinsk region. In contrast to the distribution of individuals on spontaneous frequency of cells with aberrations, which was shown to be binomial (Kusnetzov et al., 1980), the distribution of individuals on the spontaneous frequency of cells with MN in all three massif can be acknowledged as log-normal (chi2 test). Distribution of individuals in the joined massifs (Moscow and Obninsk inhabitants) and in the unique massif of all inspected with great reliability must be acknowledged as log-normal (0.70 and 0.86 correspondingly), but it cannot be regarded as Poisson, binomial or normal. Taking into account that log-normal distribution of children by spontaneous frequency of lymphocytes with MN has been observed by the inspection of 473 children from different kindergartens in Moscow we can make the conclusion that log-normal is regularity inherent in this type of damage of lymphocytes genome. On the contrary the distribution of individuals on induced by irradiation in vitro lymphocytes with MN frequency in most cases must be acknowledged as normal. This distribution character points out that damage appearance in the individual (genomic instability) in a single lymphocytes increases the probability of the damage appearance in another lymphocytes. We can propose that damaged stem cells lymphocyte progenitor's exchange by information with undamaged cells--the type of the bystander effect process. It can also be supposed that transmission of damage to daughter cells occurs in the time of stem cells division.
A Classroom Note on the Binomial and Poisson Distributions: Biomedical Examples for Use in Teaching Introductory Statistics

ERIC Educational Resources Information Center

Holland, Bart K.

2006-01-01

A generally-educated individual should have some insight into how decisions are made in the very wide range of fields that employ statistical and probabilistic reasoning. Also, students of introductory probability and statistics are often best motivated by specific applications rather than by theory and mathematical development, because most…
Analysis of overdispersed count data by mixtures of Poisson variables and Poisson processes.

PubMed

Hougaard, P; Lee, M L; Whitmore, G A

1997-12-01

Count data often show overdispersion compared to the Poisson distribution. Overdispersion is typically modeled by a random effect for the mean, based on the gamma distribution, leading to the negative binomial distribution for the count. This paper considers a larger family of mixture distributions, including the inverse Gaussian mixture distribution. It is demonstrated that it gives a significantly better fit for a data set on the frequency of epileptic seizures. The same approach can be used to generate counting processes from Poisson processes, where the rate or the time is random. A random rate corresponds to variation between patients, whereas a random time corresponds to variation within patients.
Charged particle multiplicities in deep inelastic scattering at HERA

NASA Astrophysics Data System (ADS)

Aid, S.; Anderson, M.; Andreev, V.; Andrieu, B.; Appuhn, R.-D.; Babaev, A.; Bähr, J.; Bán, J.; Ban, Y.; Baranov, P.; Barrelet, E.; Barschke, R.; Bartel, W.; Barth, M.; Bassler, U.; Beck, H. P.; Behrend, H.-J.; Belousov, A.; Berger, Ch.; Bernardi, G.; Bertrand-Coremans, G.; Besançon, M.; Beyer, R.; Biddulph, P.; Bispham, P.; Bizot, J. C.; Blobel, V.; Borras, K.; Botterweck, F.; Boudry, V.; Braemer, A.; Braunschweig, W.; Brisson, V.; Bruel, P.; Bruncko, D.; Brune, C.; Buchholz, R.; Büngener, L.; Bürger, J.; Büsser, F. W.; Buniatian, A.; Burke, S.; Burton, M. J.; Calvet, D.; Campbell, A. J.; Carli, T.; Charlet, M.; Clarke, D.; Clegg, A. B.; Clerbaux, B.; Cocks, S.; Contreras, J. G.; Cormack, C.; Coughlan, J. A.; Courau, A.; Cousinou, M.-C.; Cozzika, G.; Criegee, L.; Cussans, D. G.; Cvach, J.; Dagoret, S.; Dainton, J. B.; Dau, W. D.; Daum, K.; David, M.; Davis, C. L.; Delcourt, B.; de Roeck, A.; de Wolf, E. A.; Dirkmann, M.; Dixon, P.; di Nezza, P.; Dlugosz, W.; Dollfus, C.; Dowell, J. D.; Dreis, H. B.; Droutskoi, A.; Dünger, O.; Duhm, H.; Ebert, J.; Ebert, T. R.; Eckerlin, G.; Efremenko, V.; Egli, S.; Eichler, R.; Eisele, F.; Eisenhandler, E.; Elsen, E.; Erdmann, M.; Erdmann, W.; Evrard, E.; Fahr, A. B.; Favart, L.; Fedotov, A.; Feeken, D.; Felst, R.; Feltesse, J.; Ferencei, J.; Ferrarotto, F.; Flamm, K.; Fleischer, M.; Flieser, M.; Flügge, G.; Fomenko, A.; Fominykh, B.; Formánek, J.; Foster, J. M.; Franke, G.; Fretwurst, E.; Gabathuler, E.; Gabathuler, K.; Gaede, F.; Garvey, J.; Gayler, J.; Gebauer, M.; Genzel, H.; Gerhards, R.; Glazov, A.; Goerlach, U.; Goerlich, L.; Gogitidze, N.; Goldberg, M.; Goldner, D.; Golec-Biernat, K.; Gonzalez-Pineiro, B.; Gorelov, I.; Grab, C.; Grässler, H.; Greenshaw, T.; Griffiths, R. K.; Grindhammer, G.; Gruber, A.; Gruber, C.; Haack, J.; Hadig, T.; Haidt, D.; Hajduk, L.; Hampel, M.; Haynes, W. J.; Heinzelmann, G.; Henderson, R. C. W.; Henschel, H.; Herynek, I.; Hess, M. F.; Hewitt, K.; Hildesheim, W.; Hiller, K. H.; Hilton, C. D.; Hladký, J.; Hoeger, K. C.; Höppner, M.; Hoffmann, D.; Holtom, T.; Horisberger, R.; Hudgson, V. L.; Hütte, M.; Ibbotson, M.; Itterbeck, H.; Jacholkowska, A.; Jacobsson, C.; Jaffre, M.; Janoth, J.; Jansen, T.; Jönsson, L.; Johnson, D. P.; Jung, H.; Kalmus, P. I. P.; Kander, M.; Kant, D.; Kaschowitz, R.; Kathage, U.; Katzy, J.; Kaufmann, H. H.; Kaufmann, O.; Kazarian, S.; Kenyon, I. R.; Kermiche, S.; Keuker, C.; Kiesling, C.; Klein, M.; Kleinwort, C.; Knies, G.; Köhler, T.; Köhne, J. H.; Kolanoski, H.; Kole, F.; Kolya, S. D.; Korbel, V.; Korn, M.; Kostka, P.; Kotelnikov, S. K.; Krämerkämper, T.; Krasny, M. W.; Krehbiel, H.; Krücker, D.; Küster, H.; Kuhlen, M.; Kurča, T.; Kurzhöfer, J.; Lacour, D.; Laforge, B.; Lander, R.; Landon, M. P. J.; Lange, W.; Langenegger, U.; Laporte, J.-F.; Lebedev, A.; Lehner, F.; Levonian, S.; Lindström, G.; Lindstroem, M.; Link, J.; Linsel, F.; Lipinski, J.; List, B.; Lobo, G.; Lomas, J. W.; Lopez, G. C.; Lubimov, V.; Lüke, D.; Magnussen, N.; Malinovski, E.; Mani, S.; Maraček, R.; Marage, P.; Marks, J.; Marshall, R.; Martens, J.; Martin, G.; Martin, R.; Martyn, H.-U.; Martyniak, J.; Mavroidis, T.; Maxfield, S. J.; McMahon, S. J.; Mehta, A.; Meier, K.; Meyer, A.; Meyer, A.; Meyer, H.; Meyer, J.; Meyer, P.-O.; Migliori, A.; Mikocki, S.; Milstead, D.; Moeck, J.; Moreau, F.; Morris, J. V.; Mroczko, E.; Müller, D.; Müller, G.; Müller, K.; Müller, M.; Murín, P.; Nagovizin, V.; Nahnhauer, R.; Naroska, B.; Naumann, Th.; Négri, I.; Newman, P. R.; Newton, D.; Nguyen, H. K.; Nicholls, T. C.; Niebergall, F.; Niebuhr, C.; Niedzballa, Ch.; Niggli, H.; Nisius, R.; Nowak, G.; Noyes, G. W.; Nyberg-Werther, M.; Oakden, M.; Oberlack, H.; Olsson, J. E.; Ozerov, D.; Palmen, P.; Panaro, E.; Panitch, A.; Pascaud, C.; Patel, G. D.; Pawletta, H.; Peppel, E.; Perez, E.; Phillips, J. P.; Pieuchot, A.; Pitzl, D.; Pope, G.; Prell, S.; Rabbertz, K.; Rädel, G.; Reimer, P.; Reinshagen, S.; Rick, H.; Riech, V.; Riedlberger, J.; Riepenhausen, F.; Riess, S.; Rizvi, E.; Robertson, S. M.; Robmann, P.; Roloff, H. E.; Roosen, R.; Rosenbauer, K.; Rostovtsev, A.; Rouse, F.; Royon, C.; Rüter, K.; Rusakov, S.; Rybicki, K.; Sankey, D. P. C.; Schacht, P.; Schiek, S.; Schleif, S.; Schleper, P.; von Schlippe, W.; Schmidt, D.; Schmidt, G.; Schöning, A.; Schröder, V.; Schuhmann, E.; Schwab, B.; Sefkow, F.; Seidel, M.; Sell, R.; Semenov, A.; Shekelyan, V.; Sheviakov, I.; Shtarkov, L. N.; Siegmon, G.; Siewert, U.; Sirois, Y.; Skillicorn, I. O.; Smirnov, P.; Smith, J. R.; Solochenko, V.; Soloviev, Y.; Specka, A.; Spiekermann, J.; Spielman, S.; Spitzer, H.; Squinabol, F.; Steenbock, M.; Steffen, P.; Steinberg, R.; Steiner, H.; Steinhart, J.; Stella, B.; Stellberger, A.; Stier, J.; Stiewe, J.; Stößlein, U.; Stolze, K.; Straumann, U.; Struczinski, W.; Sutton, J. P.; Tapprogge, S.; Taševský, M.; Tchernyshov, V.; Tchetchelnitski, S.; Theissen, J.; Thiebaux, C.; Thompson, G.; Truöl, P.; Tsipolitis, G.; Turnau, J.; Tutas, J.; Uelkes, P.; Usik, A.; Valkár, S.; Valkárová, A.; Vallée, C.; Vandenplas, D.; van Esch, P.; van Mechelen, P.; Vazdik, Y.; Verrecchia, P.; Villet, G.; Wacker, K.; Wagener, A.; Wagener, M.; Walther, A.; Waugh, B.; Weber, G.; Weber, M.; Wegener, D.; Wegner, A.; Wengler, T.; Werner, M.; West, L. R.; Wilksen, T.; Willard, S.; Winde, M.; Winter, G.-G.; Wittek, C.; Wobisch, M.; Wünsch, E.; Žáček, J.; Zarbock, D.; Zhang, Z.; Zhokin, A.; Zini, P.; Zomer, F.; Zsembery, J.; Zuber, K.; Zurnedden, M.

1996-12-01

Using the H1 detector at HERA, charged particle multiplicity distributions in deep inelastic e + p scattering have been measured over a large kinematical region. The evolution with W and Q 2 of the multiplicity distribution and of the multiplicity moments in pseudorapidity domains of varying size is studied in the current fragmentation region of the hadronic centre-of-mass frame. The results are compared with data from fixed target lepton-nucleon interactions, e + e - annihilations and hadron-hadron collisions as well as with expectations from QCD based parton models. Fits to the Negative Binomial and Lognormal distributions are presented.
QNB: differential RNA methylation analysis for count-based small-sample sequencing data with a quad-negative binomial model.

PubMed

Liu, Lian; Zhang, Shao-Wu; Huang, Yufei; Meng, Jia

2017-08-31

As a newly emerged research area, RNA epigenetics has drawn increasing attention recently for the participation of RNA methylation and other modifications in a number of crucial biological processes. Thanks to high throughput sequencing techniques, such as, MeRIP-Seq, transcriptome-wide RNA methylation profile is now available in the form of count-based data, with which it is often of interests to study the dynamics at epitranscriptomic layer. However, the sample size of RNA methylation experiment is usually very small due to its costs; and additionally, there usually exist a large number of genes whose methylation level cannot be accurately estimated due to their low expression level, making differential RNA methylation analysis a difficult task. We present QNB, a statistical approach for differential RNA methylation analysis with count-based small-sample sequencing data. Compared with previous approaches such as DRME model based on a statistical test covering the IP samples only with 2 negative binomial distributions, QNB is based on 4 independent negative binomial distributions with their variances and means linked by local regressions, and in the way, the input control samples are also properly taken care of. In addition, different from DRME approach, which relies only the input control sample only for estimating the background, QNB uses a more robust estimator for gene expression by combining information from both input and IP samples, which could largely improve the testing performance for very lowly expressed genes. QNB showed improved performance on both simulated and real MeRIP-Seq datasets when compared with competing algorithms. And the QNB model is also applicable to other datasets related RNA modifications, including but not limited to RNA bisulfite sequencing, m 1 A-Seq, Par-CLIP, RIP-Seq, etc.
[Evaluation of estimation of prevalence ratio using bayesian log-binomial regression model].

PubMed

Gao, W L; Lin, H; Liu, X N; Ren, X W; Li, J S; Shen, X P; Zhu, S L

2017-03-10

To evaluate the estimation of prevalence ratio ( PR ) by using bayesian log-binomial regression model and its application, we estimated the PR of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea in their infants by using bayesian log-binomial regression model in Openbugs software. The results showed that caregivers' recognition of infant' s risk signs of diarrhea was associated significantly with a 13% increase of medical care-seeking. Meanwhile, we compared the differences in PR 's point estimation and its interval estimation of medical care-seeking prevalence to caregivers' recognition of risk signs of diarrhea and convergence of three models (model 1: not adjusting for the covariates; model 2: adjusting for duration of caregivers' education, model 3: adjusting for distance between village and township and child month-age based on model 2) between bayesian log-binomial regression model and conventional log-binomial regression model. The results showed that all three bayesian log-binomial regression models were convergence and the estimated PRs were 1.130(95 %CI : 1.005-1.265), 1.128(95 %CI : 1.001-1.264) and 1.132(95 %CI : 1.004-1.267), respectively. Conventional log-binomial regression model 1 and model 2 were convergence and their PRs were 1.130(95 % CI : 1.055-1.206) and 1.126(95 % CI : 1.051-1.203), respectively, but the model 3 was misconvergence, so COPY method was used to estimate PR , which was 1.125 (95 %CI : 1.051-1.200). In addition, the point estimation and interval estimation of PRs from three bayesian log-binomial regression models differed slightly from those of PRs from conventional log-binomial regression model, but they had a good consistency in estimating PR . Therefore, bayesian log-binomial regression model can effectively estimate PR with less misconvergence and have more advantages in application compared with conventional log-binomial regression model.
Football fever: goal distributions and non-Gaussian statistics

NASA Astrophysics Data System (ADS)

Bittner, E.; Nußbaumer, A.; Janke, W.; Weigel, M.

2009-02-01

Analyzing football score data with statistical techniques, we investigate how the not purely random, but highly co-operative nature of the game is reflected in averaged properties such as the probability distributions of scored goals for the home and away teams. As it turns out, especially the tails of the distributions are not well described by the Poissonian or binomial model resulting from the assumption of uncorrelated random events. Instead, a good effective description of the data is provided by less basic distributions such as the negative binomial one or the probability densities of extreme value statistics. To understand this behavior from a microscopical point of view, however, no waiting time problem or extremal process need be invoked. Instead, modifying the Bernoulli random process underlying the Poissonian model to include a simple component of self-affirmation seems to describe the data surprisingly well and allows to understand the observed deviation from Gaussian statistics. The phenomenological distributions used before can be understood as special cases within this framework. We analyzed historical football score data from many leagues in Europe as well as from international tournaments, including data from all past tournaments of the “FIFA World Cup” series, and found the proposed models to be applicable rather universally. In particular, here we analyze the results of the German women’s premier football league and consider the two separate German men’s premier leagues in the East and West during the cold war times as well as the unified league after 1990 to see how scoring in football and the component of self-affirmation depend on cultural and political circumstances.
On performance of parametric and distribution-free models for zero-inflated and over-dispersed count responses.

PubMed

Tang, Wan; Lu, Naiji; Chen, Tian; Wang, Wenjuan; Gunzler, Douglas David; Han, Yu; Tu, Xin M

2015-10-30

Zero-inflated Poisson (ZIP) and negative binomial (ZINB) models are widely used to model zero-inflated count responses. These models extend the Poisson and negative binomial (NB) to address excessive zeros in the count response. By adding a degenerate distribution centered at 0 and interpreting it as describing a non-risk group in the population, the ZIP (ZINB) models a two-component population mixture. As in applications of Poisson and NB, the key difference between ZIP and ZINB is the allowance for overdispersion by the ZINB in its NB component in modeling the count response for the at-risk group. Overdispersion arising in practice too often does not follow the NB, and applications of ZINB to such data yield invalid inference. If sources of overdispersion are known, other parametric models may be used to directly model the overdispersion. Such models too are subject to assumed distributions. Further, this approach may not be applicable if information about the sources of overdispersion is unavailable. In this paper, we propose a distribution-free alternative and compare its performance with these popular parametric models as well as a moment-based approach proposed by Yu et al. [Statistics in Medicine 2013; 32: 2390-2405]. Like the generalized estimating equations, the proposed approach requires no elaborate distribution assumptions. Compared with the approach of Yu et al., it is more robust to overdispersed zero-inflated responses. We illustrate our approach with both simulated and real study data. Copyright © 2015 John Wiley & Sons, Ltd.
Generalized empirical Bayesian methods for discovery of differential data in high-throughput biology.

PubMed

Hardcastle, Thomas J

2016-01-15

High-throughput data are now commonplace in biological research. Rapidly changing technologies and application mean that novel methods for detecting differential behaviour that account for a 'large P, small n' setting are required at an increasing rate. The development of such methods is, in general, being done on an ad hoc basis, requiring further development cycles and a lack of standardization between analyses. We present here a generalized method for identifying differential behaviour within high-throughput biological data through empirical Bayesian methods. This approach is based on our baySeq algorithm for identification of differential expression in RNA-seq data based on a negative binomial distribution, and in paired data based on a beta-binomial distribution. Here we show how the same empirical Bayesian approach can be applied to any parametric distribution, removing the need for lengthy development of novel methods for differently distributed data. Comparisons with existing methods developed to address specific problems in high-throughput biological data show that these generic methods can achieve equivalent or better performance. A number of enhancements to the basic algorithm are also presented to increase flexibility and reduce computational costs. The methods are implemented in the R baySeq (v2) package, available on Bioconductor http://www.bioconductor.org/packages/release/bioc/html/baySeq.html. tjh48@cam.ac.uk Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Frequency distribution of Echinococcus multilocularis and other helminths of foxes in Kyrgyzstan

PubMed Central

I., Ziadinov; P., Deplazes; A., Mathis; B., Mutunova; K., Abdykerimov; R., Nurgaziev; P.R, Torgerson

2010-01-01

Echinococcosis is a major emerging zoonosis in central Asia. A study of the helminth fauna of foxes from Naryn Oblast in central Kyrgyzstan was undertaken to investigate the abundance of Echinococcus multilocularis in a district where a high prevalence of this parasite had previously been detected in dogs. A total of 151 foxes (Vulpes vulpes) were investigated in a necropsy study. Of these 96 (64%) were infected with E. multilocularis with a mean abundance of 8669 parasites per fox. This indicates that red foxes are a major definitive host of E. multilocularis in this country. This also demonstrates that the abundance and prevalence of E. multilocularis in the natural definitive host are likely to be high in geographical regions where there is a concomitant high prevalence in alternative definitive hosts such as dogs. In addition Mesocestoides spp., Dipylidium caninum, Taenia spp., Toxocara canis, Toxascaris leonina, Capillaria and Acanthocephala spp. were found in 99 (66%), 50 (33%), 48 (32%), 46 (30%), 9 (6%), 34 (23%) and 2 (1%) of foxes, respectively. The prevalence but not the abundance of E. multilocularis decreased with age. The abundance of Dipylidium caninum also decreased with age. The frequency distribution of E. multilocularis and Mesocestoides spp. followed a zero inflated negative binomial distribution, whilst all other helminths had a negative binomial distribution. This demonstrates that the frequency distribution of positive counts and not just the frequency of zeros in the data set can determine if a zero inflated or non-zero inflated model is more appropriate. This is because the prevalences of E. multolocularis and Mesocestoides spp. were the highest (and hence had fewest zero counts) yet the parasite distribution nevertheless gave a better fit to the zero inflated models. PMID:20434845
Assessing historical rate changes in global tsunami occurrence

USGS Publications Warehouse

Geist, E.L.; Parsons, T.

2011-01-01

The global catalogue of tsunami events is examined to determine if transient variations in tsunami rates are consistent with a Poisson process commonly assumed for tsunami hazard assessments. The primary data analyzed are tsunamis with maximum sizes >1m. The record of these tsunamis appears to be complete since approximately 1890. A secondary data set of tsunamis >0.1m is also analyzed that appears to be complete since approximately 1960. Various kernel density estimates used to determine the rate distribution with time indicate a prominent rate change in global tsunamis during the mid-1990s. Less prominent rate changes occur in the early- and mid-20th century. To determine whether these rate fluctuations are anomalous, the distribution of annual event numbers for the tsunami catalogue is compared to Poisson and negative binomial distributions, the latter of which includes the effects of temporal clustering. Compared to a Poisson distribution, the negative binomial distribution model provides a consistent fit to tsunami event numbers for the >1m data set, but the Poisson null hypothesis cannot be falsified for the shorter duration >0.1m data set. Temporal clustering of tsunami sources is also indicated by the distribution of interevent times for both data sets. Tsunami event clusters consist only of two to four events, in contrast to protracted sequences of earthquakes that make up foreshock-main shock-aftershock sequences. From past studies of seismicity, it is likely that there is a physical triggering mechanism responsible for events within the tsunami source 'mini-clusters'. In conclusion, prominent transient rate increases in the occurrence of global tsunamis appear to be caused by temporal grouping of geographically distinct mini-clusters, in addition to the random preferential location of global M >7 earthquakes along offshore fault zones.
Students' Informal Inference about the Binomial Distribution of "Bunny Hops": A Dialogic Perspective

ERIC Educational Resources Information Center

Kazak, Sibel; Fujita, Taro; Wegerif, Rupert

2016-01-01

The study explores the development of 11-year-old students' informal inference about random bunny hops through student talk and use of computer simulation tools. Our aim in this paper is to draw on dialogic theory to explain how students make shifts in perspective, from intuition-based reasoning to more powerful, formal ways of using probabilistic…
A Model Comparison for Count Data with a Positively Skewed Distribution with an Application to the Number of University Mathematics Courses Completed

ERIC Educational Resources Information Center

Liou, Pey-Yan

2009-01-01

The current study examines three regression models: OLS (ordinary least square) linear regression, Poisson regression, and negative binomial regression for analyzing count data. Simulation results show that the OLS regression model performed better than the others, since it did not produce more false statistically significant relationships than…
A Negative Binomial Regression Model for Accuracy Tests

ERIC Educational Resources Information Center

Hung, Lai-Fa

2012-01-01

Rasch used a Poisson model to analyze errors and speed in reading tests. An important property of the Poisson distribution is that the mean and variance are equal. However, in social science research, it is very common for the variance to be greater than the mean (i.e., the data are overdispersed). This study embeds the Rasch model within an…
Togetherness among Plasmodium falciparum gametocytes: interpretation through simulation and consequences for malaria transmission.

PubMed

Gaillard, F O; Boudin, C; Chau, N P; Robert, V; Pichon, G

2003-11-01

Previous experimental gametocyte infections of Anopheles arabiensis on 3 volunteers naturally infected with Plasmodium falciparum were conducted in Senegal. They showed that gametocyte counts in the mosquitoes are, like macroparasite intakes, heterogeneous (overdispersed). They followed a negative binomial distribution, the overdispersion coefficient seeming constant (k = 3.1). To try to explain this heterogeneity, we used an individual-based model (IBM), simulating the behaviour of gametocytes in the human blood circulation and their ingestion by mosquitoes. The hypothesis was that there exists a clustering of the gametocytes in the capillaries. From a series of simulations, in the case of clustering the following results were obtained: (i) the distribution of the gametocytes ingested by the mosquitoes followed a negative binomial, (ii) the k coefficient significantly increased with the density of circulating gametocytes. To validate this model result, 2 more experiments were conducted in Cameroon. Pooled experiments showed a distinct density dependency of the k-values. The simulation results and the experimental results were thus in agreement and suggested that an aggregation process at the microscopic level might produce the density-dependent overdispersion at the macroscopic level. Simulations also suggested that the clustering of gametocytes might facilitate fertilization of gametes.
Void probability as a function of the void's shape and scale-invariant models

NASA Technical Reports Server (NTRS)

Elizalde, E.; Gaztanaga, E.

1991-01-01

The dependence of counts in cells on the shape of the cell for the large scale galaxy distribution is studied. A very concrete prediction can be done concerning the void distribution for scale invariant models. The prediction is tested on a sample of the CfA catalog, and good agreement is found. It is observed that the probability of a cell to be occupied is bigger for some elongated cells. A phenomenological scale invariant model for the observed distribution of the counts in cells, an extension of the negative binomial distribution, is presented in order to illustrate how this dependence can be quantitatively determined. An original, intuitive derivation of this model is presented.
Modeling the distribution of colonial species to improve estimation of plankton concentration in ballast water

NASA Astrophysics Data System (ADS)

Rajakaruna, Harshana; VandenByllaardt, Julie; Kydd, Jocelyn; Bailey, Sarah

2018-03-01

The International Maritime Organization (IMO) has set limits on allowable plankton concentrations in ballast water discharge to minimize aquatic invasions globally. Previous guidance on ballast water sampling and compliance decision thresholds was based on the assumption that probability distributions of plankton are Poisson when spatially homogenous, or negative binomial when heterogeneous. We propose a hierarchical probability model, which incorporates distributions at the level of particles (i.e., discrete individuals plus colonies per unit volume) and also within particles (i.e., individuals per particle) to estimate the average plankton concentration in ballast water. We examined the performance of the models using data for plankton in the size class ≥ 10 μm and < 50 μm, collected from five different depths of a ballast tank of a commercial ship in three independent surveys. We show that the data fit to the negative binomial and the hierarchical probability models equally well, with both models performing better than the Poisson model at the scale of our sampling. The hierarchical probability model, which accounts for both the individuals and the colonies in a sample, reduces the uncertainty associated with the concentration estimation, and improves the power of rejecting the decision on ship's compliance when a ship does not truly comply with the standard. We show examples of how to test ballast water compliance using the above models.

A comparison of observation-level random effect and Beta-Binomial models for modelling overdispersion in Binomial data in ecology & evolution.

PubMed

Harrison, Xavier A

2015-01-01

Overdispersion is a common feature of models of biological data, but researchers often fail to model the excess variation driving the overdispersion, resulting in biased parameter estimates and standard errors. Quantifying and modeling overdispersion when it is present is therefore critical for robust biological inference. One means to account for overdispersion is to add an observation-level random effect (OLRE) to a model, where each data point receives a unique level of a random effect that can absorb the extra-parametric variation in the data. Although some studies have investigated the utility of OLRE to model overdispersion in Poisson count data, studies doing so for Binomial proportion data are scarce. Here I use a simulation approach to investigate the ability of both OLRE models and Beta-Binomial models to recover unbiased parameter estimates in mixed effects models of Binomial data under various degrees of overdispersion. In addition, as ecologists often fit random intercept terms to models when the random effect sample size is low (<5 levels), I investigate the performance of both model types under a range of random effect sample sizes when overdispersion is present. Simulation results revealed that the efficacy of OLRE depends on the process that generated the overdispersion; OLRE failed to cope with overdispersion generated from a Beta-Binomial mixture model, leading to biased slope and intercept estimates, but performed well for overdispersion generated by adding random noise to the linear predictor. Comparison of parameter estimates from an OLRE model with those from its corresponding Beta-Binomial model readily identified when OLRE were performing poorly due to disagreement between effect sizes, and this strategy should be employed whenever OLRE are used for Binomial data to assess their reliability. Beta-Binomial models performed well across all contexts, but showed a tendency to underestimate effect sizes when modelling non-Beta-Binomial data. Finally, both OLRE and Beta-Binomial models performed poorly when models contained <5 levels of the random intercept term, especially for estimating variance components, and this effect appeared independent of total sample size. These results suggest that OLRE are a useful tool for modelling overdispersion in Binomial data, but that they do not perform well in all circumstances and researchers should take care to verify the robustness of parameter estimates of OLRE models.
Void probability as a function of the void's shape and scale-invariant models. [in studies of spacial galactic distribution

NASA Technical Reports Server (NTRS)

Elizalde, E.; Gaztanaga, E.

1992-01-01

The dependence of counts in cells on the shape of the cell for the large scale galaxy distribution is studied. A very concrete prediction can be done concerning the void distribution for scale invariant models. The prediction is tested on a sample of the CfA catalog, and good agreement is found. It is observed that the probability of a cell to be occupied is bigger for some elongated cells. A phenomenological scale invariant model for the observed distribution of the counts in cells, an extension of the negative binomial distribution, is presented in order to illustrate how this dependence can be quantitatively determined. An original, intuitive derivation of this model is presented.
Assessment of some important factors affecting the singing-ground survey

USGS Publications Warehouse

Tautin, J.

1982-01-01

A brief history of the procedures used to analyze singing-ground survey data is outlined. Some weaknesses associated with the analytical procedures are discussed, and preliminary results of efforts to improve the procedures are presented. The most significant finding to date is that counts made by new observers need not be omitted when calculating an index of the woodcock population. Also, the distribution of woodcock heard singing, with respect to time after sunset, affirms the appropriateness of recommended starting times for counting woodcock. Woodcock count data fit the negative binomial probability distribution.
M-Bonomial Coefficients and Their Identities

ERIC Educational Resources Information Center

Asiru, Muniru A.

2010-01-01

In this note, we introduce M-bonomial coefficients or (M-bonacci binomial coefficients). These are similar to the binomial and the Fibonomial (or Fibonacci-binomial) coefficients and can be displayed in a triangle similar to Pascal's triangle from which some identities become obvious.
Performance and structure of single-mode bosonic codes

NASA Astrophysics Data System (ADS)

Albert, Victor V.; Noh, Kyungjoo; Duivenvoorden, Kasper; Young, Dylan J.; Brierley, R. T.; Reinhold, Philip; Vuillot, Christophe; Li, Linshu; Shen, Chao; Girvin, S. M.; Terhal, Barbara M.; Jiang, Liang

2018-03-01

The early Gottesman, Kitaev, and Preskill (GKP) proposal for encoding a qubit in an oscillator has recently been followed by cat- and binomial-code proposals. Numerically optimized codes have also been proposed, and we introduce codes of this type here. These codes have yet to be compared using the same error model; we provide such a comparison by determining the entanglement fidelity of all codes with respect to the bosonic pure-loss channel (i.e., photon loss) after the optimal recovery operation. We then compare achievable communication rates of the combined encoding-error-recovery channel by calculating the channel's hashing bound for each code. Cat and binomial codes perform similarly, with binomial codes outperforming cat codes at small loss rates. Despite not being designed to protect against the pure-loss channel, GKP codes significantly outperform all other codes for most values of the loss rate. We show that the performance of GKP and some binomial codes increases monotonically with increasing average photon number of the codes. In order to corroborate our numerical evidence of the cat-binomial-GKP order of performance occurring at small loss rates, we analytically evaluate the quantum error-correction conditions of those codes. For GKP codes, we find an essential singularity in the entanglement fidelity in the limit of vanishing loss rate. In addition to comparing the codes, we draw parallels between binomial codes and discrete-variable systems. First, we characterize one- and two-mode binomial as well as multiqubit permutation-invariant codes in terms of spin-coherent states. Such a characterization allows us to introduce check operators and error-correction procedures for binomial codes. Second, we introduce a generalization of spin-coherent states, extending our characterization to qudit binomial codes and yielding a multiqudit code.
Logit and probit model in toll sensitivity analysis of Solo-Ngawi, Kartasura-Palang Joglo segment based on Willingness to Pay (WTP)

NASA Astrophysics Data System (ADS)

Handayani, Dewi; Cahyaning Putri, Hera; Mahmudah, AMH

2017-12-01

Solo-Ngawi toll road project is part of the mega project of the Trans Java toll road development initiated by the government and is still under construction until now. PT Solo Ngawi Jaya (SNJ) as the Solo-Ngawi toll management company needs to determine the toll fare that is in accordance with the business plan. The determination of appropriate toll rates will affect progress in regional economic sustainability and decrease the traffic congestion. These policy instruments is crucial for achieving environmentally sustainable transport. Therefore, the objective of this research is to find out how the toll fare sensitivity of Solo-Ngawi toll road based on Willingness To Pay (WTP). Primary data was obtained by distributing stated preference questionnaires to four wheeled vehicle users in Kartasura-Palang Joglo artery road segment. Further data obtained will be analysed with logit and probit model. Based on the analysis, it is found that the effect of fare change on the amount of WTP on the binomial logit model is more sensitive than the probit model on the same travel conditions. The range of tariff change against values of WTP on the binomial logit model is 20% greater than the range of values in the probit model . On the other hand, the probability results of the binomial logit model and the binary probit have no significant difference (less than 1%).
A comparison of different ways of including baseline counts in negative binomial models for data from falls prevention trials.

PubMed

Zheng, Han; Kimber, Alan; Goodwin, Victoria A; Pickering, Ruth M

2018-01-01

A common design for a falls prevention trial is to assess falling at baseline, randomize participants into an intervention or control group, and ask them to record the number of falls they experience during a follow-up period of time. This paper addresses how best to include the baseline count in the analysis of the follow-up count of falls in negative binomial (NB) regression. We examine the performance of various approaches in simulated datasets where both counts are generated from a mixed Poisson distribution with shared random subject effect. Including the baseline count after log-transformation as a regressor in NB regression (NB-logged) or as an offset (NB-offset) resulted in greater power than including the untransformed baseline count (NB-unlogged). Cook and Wei's conditional negative binomial (CNB) model replicates the underlying process generating the data. In our motivating dataset, a statistically significant intervention effect resulted from the NB-logged, NB-offset, and CNB models, but not from NB-unlogged, and large, outlying baseline counts were overly influential in NB-unlogged but not in NB-logged. We conclude that there is little to lose by including the log-transformed baseline count in standard NB regression compared to CNB for moderate to larger sized datasets. © 2017 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Robust inference in the negative binomial regression model with an application to falls data.

PubMed

Aeberhard, William H; Cantoni, Eva; Heritier, Stephane

2014-12-01

A popular way to model overdispersed count data, such as the number of falls reported during intervention studies, is by means of the negative binomial (NB) distribution. Classical estimating methods are well-known to be sensitive to model misspecifications, taking the form of patients falling much more than expected in such intervention studies where the NB regression model is used. We extend in this article two approaches for building robust M-estimators of the regression parameters in the class of generalized linear models to the NB distribution. The first approach achieves robustness in the response by applying a bounded function on the Pearson residuals arising in the maximum likelihood estimating equations, while the second approach achieves robustness by bounding the unscaled deviance components. For both approaches, we explore different choices for the bounding functions. Through a unified notation, we show how close these approaches may actually be as long as the bounding functions are chosen and tuned appropriately, and provide the asymptotic distributions of the resulting estimators. Moreover, we introduce a robust weighted maximum likelihood estimator for the overdispersion parameter, specific to the NB distribution. Simulations under various settings show that redescending bounding functions yield estimates with smaller biases under contamination while keeping high efficiency at the assumed model, and this for both approaches. We present an application to a recent randomized controlled trial measuring the effectiveness of an exercise program at reducing the number of falls among people suffering from Parkinsons disease to illustrate the diagnostic use of such robust procedures and their need for reliable inference. © 2014, The International Biometric Society.
Dispersion models and sampling of cacao mirid bug Sahlbergella singularis (Hemiptera: Miridae) on Theobroma Cacao in southern Cameroon.

PubMed

Bisseleua, D H B; Vidal, Stefan

2011-02-01

The spatio-temporal distribution of Sahlbergella singularis Haglung, a major pest of cacao trees (Theobroma cacao) (Malvaceae), was studied for 2 yr in traditional cacao forest gardens in the humid forest area of southern Cameroon. The first objective was to analyze the dispersion of this insect on cacao trees. The second objective was to develop sampling plans based on fixed levels of precision for estimating S. singularis populations. The following models were used to analyze the data: Taylor's power law, Iwao's patchiness regression, the Nachman model, and the negative binomial distribution. Our results document that Taylor's power law was a better fit for the data than the Iwao and Nachman models. Taylor's b and Iwao's β were both significantly >1, indicating that S. singularis aggregated on specific trees. This result was further supported by the calculated common k of 1.75444. Iwao's α was significantly <0, indicating that the basic distribution component of S. singularis was the individual insect. Comparison of negative binomial (NBD) and Nachman models indicated that the NBD model was appropriate for studying S. singularis distribution. Optimal sample sizes for fixed precision levels of 0.10, 0.15, and 0.25 were estimated with Taylor's regression coefficients. Required sample sizes increased dramatically with increasing levels of precision. This is the first study on S. singularis dispersion in cacao plantations. Sampling plans, presented here, should be a tool for research on population dynamics and pest management decisions of mirid bugs on cacao. © 2011 Entomological Society of America
Problems on Divisibility of Binomial Coefficients

ERIC Educational Resources Information Center

Osler, Thomas J.; Smoak, James

2004-01-01

Twelve unusual problems involving divisibility of the binomial coefficients are represented in this article. The problems are listed in "The Problems" section. All twelve problems have short solutions which are listed in "The Solutions" section. These problems could be assigned to students in any course in which the binomial theorem and Pascal's…
Application of binomial-edited CPMG to shale characterization

USGS Publications Warehouse

Washburn, Kathryn E.; Birdwell, Justin E.

2014-01-01

Unconventional shale resources may contain a significant amount of hydrogen in organic solids such as kerogen, but it is not possible to directly detect these solids with many NMR systems. Binomial-edited pulse sequences capitalize on magnetization transfer between solids, semi-solids, and liquids to provide an indirect method of detecting solid organic materials in shales. When the organic solids can be directly measured, binomial-editing helps distinguish between different phases. We applied a binomial-edited CPMG pulse sequence to a range of natural and experimentally-altered shale samples. The most substantial signal loss is seen in shales rich in organic solids while fluids associated with inorganic pores seem essentially unaffected. This suggests that binomial-editing is a potential method for determining fluid locations, solid organic content, and kerogen–bitumen discrimination.
Jump-and-return sandwiches: A new family of binomial-like selective inversion sequences with improved performance

NASA Astrophysics Data System (ADS)

Brenner, Tom; Chen, Johnny; Stait-Gardner, Tim; Zheng, Gang; Matsukawa, Shingo; Price, William S.

2018-03-01

A new family of binomial-like inversion sequences, named jump-and-return sandwiches (JRS), has been developed by inserting a binomial-like sequence into a standard jump-and-return sequence, discovered through use of a stochastic Genetic Algorithm optimisation. Compared to currently used binomial-like inversion sequences (e.g., 3-9-19 and W5), the new sequences afford wider inversion bands and narrower non-inversion bands with an equal number of pulses. As an example, two jump-and-return sandwich 10-pulse sequences achieved 95% inversion at offsets corresponding to 9.4% and 10.3% of the non-inversion band spacing, compared to 14.7% for the binomial-like W5 inversion sequence, i.e., they afforded non-inversion bands about two thirds the width of the W5 non-inversion band.
Limits, discovery and cut optimization for a Poisson process with uncertainty in background and signal efficiency: TRolke 2.0

NASA Astrophysics Data System (ADS)

Lundberg, J.; Conrad, J.; Rolke, W.; Lopez, A.

2010-03-01

A C++ class was written for the calculation of frequentist confidence intervals using the profile likelihood method. Seven combinations of Binomial, Gaussian, Poissonian and Binomial uncertainties are implemented. The package provides routines for the calculation of upper and lower limits, sensitivity and related properties. It also supports hypothesis tests which take uncertainties into account. It can be used in compiled C++ code, in Python or interactively via the ROOT analysis framework. Program summaryProgram title: TRolke version 2.0 Catalogue identifier: AEFT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEFT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: MIT license No. of lines in distributed program, including test data, etc.: 3431 No. of bytes in distributed program, including test data, etc.: 21 789 Distribution format: tar.gz Programming language: ISO C++. Computer: Unix, GNU/Linux, Mac. Operating system: Linux 2.6 (Scientific Linux 4 and 5, Ubuntu 8.10), Darwin 9.0 (Mac-OS X 10.5.8). RAM:˜20 MB Classification: 14.13. External routines: ROOT ( http://root.cern.ch/drupal/) Nature of problem: The problem is to calculate a frequentist confidence interval on the parameter of a Poisson process with statistical or systematic uncertainties in signal efficiency or background. Solution method: Profile likelihood method, Analytical Running time:<10 seconds per extracted limit.
Application of the Hyper-Poisson Generalized Linear Model for Analyzing Motor Vehicle Crashes.

PubMed

Khazraee, S Hadi; Sáez-Castillo, Antonio Jose; Geedipally, Srinivas Reddy; Lord, Dominique

2015-05-01

The hyper-Poisson distribution can handle both over- and underdispersion, and its generalized linear model formulation allows the dispersion of the distribution to be observation-specific and dependent on model covariates. This study's objective is to examine the potential applicability of a newly proposed generalized linear model framework for the hyper-Poisson distribution in analyzing motor vehicle crash count data. The hyper-Poisson generalized linear model was first fitted to intersection crash data from Toronto, characterized by overdispersion, and then to crash data from railway-highway crossings in Korea, characterized by underdispersion. The results of this study are promising. When fitted to the Toronto data set, the goodness-of-fit measures indicated that the hyper-Poisson model with a variable dispersion parameter provided a statistical fit as good as the traditional negative binomial model. The hyper-Poisson model was also successful in handling the underdispersed data from Korea; the model performed as well as the gamma probability model and the Conway-Maxwell-Poisson model previously developed for the same data set. The advantages of the hyper-Poisson model studied in this article are noteworthy. Unlike the negative binomial model, which has difficulties in handling underdispersed data, the hyper-Poisson model can handle both over- and underdispersed crash data. Although not a major issue for the Conway-Maxwell-Poisson model, the effect of each variable on the expected mean of crashes is easily interpretable in the case of this new model. © 2014 Society for Risk Analysis.
Linear algebra of the permutation invariant Crow-Kimura model of prebiotic evolution.

PubMed

Bratus, Alexander S; Novozhilov, Artem S; Semenov, Yuri S

2014-10-01

A particular case of the famous quasispecies model - the Crow-Kimura model with a permutation invariant fitness landscape - is investigated. Using the fact that the mutation matrix in the case of a permutation invariant fitness landscape has a special tridiagonal form, a change of the basis is suggested such that in the new coordinates a number of analytical results can be obtained. In particular, using the eigenvectors of the mutation matrix as the new basis, we show that the quasispecies distribution approaches a binomial one and give simple estimates for the speed of convergence. Another consequence of the suggested approach is a parametric solution to the system of equations determining the quasispecies. Using this parametric solution we show that our approach leads to exact asymptotic results in some cases, which are not covered by the existing methods. In particular, we are able to present not only the limit behavior of the leading eigenvalue (mean population fitness), but also the exact formulas for the limit quasispecies eigenvector for special cases. For instance, this eigenvector has a geometric distribution in the case of the classical single peaked fitness landscape. On the biological side, we propose a mathematical definition, based on the closeness of the quasispecies to the binomial distribution, which can be used as an operational definition of the notorious error threshold. Using this definition, we suggest two approximate formulas to estimate the critical mutation rate after which the quasispecies delocalization occurs. Copyright © 2014 Elsevier Inc. All rights reserved.
Sequential sampling of ribes populations in the control of white pine blister rust (Cronartium ribicola Fischer) in California

Treesearch

Harold R. Offord

1966-01-01

Sequential sampling based on a negative binomial distribution of ribes populations required less than half the time taken by regular systematic line transect sampling in a comparison test. It gave the same control decision as the regular method in 9 of 13 field trials. A computer program that permits sequential plans to be built readily for other white pine regions is...
Design and analysis of three-arm trials with negative binomially distributed endpoints.

PubMed

Mütze, Tobias; Munk, Axel; Friede, Tim

2016-02-20

A three-arm clinical trial design with an experimental treatment, an active control, and a placebo control, commonly referred to as the gold standard design, enables testing of non-inferiority or superiority of the experimental treatment compared with the active control. In this paper, we propose methods for designing and analyzing three-arm trials with negative binomially distributed endpoints. In particular, we develop a Wald-type test with a restricted maximum-likelihood variance estimator for testing non-inferiority or superiority. For this test, sample size and power formulas as well as optimal sample size allocations will be derived. The performance of the proposed test will be assessed in an extensive simulation study with regard to type I error rate, power, sample size, and sample size allocation. For the purpose of comparison, Wald-type statistics with a sample variance estimator and an unrestricted maximum-likelihood estimator are included in the simulation study. We found that the proposed Wald-type test with a restricted variance estimator performed well across the considered scenarios and is therefore recommended for application in clinical trials. The methods proposed are motivated and illustrated by a recent clinical trial in multiple sclerosis. The R package ThreeArmedTrials, which implements the methods discussed in this paper, is available on CRAN. Copyright © 2015 John Wiley & Sons, Ltd.
Statistical procedures for analyzing mental health services data.

PubMed

Elhai, Jon D; Calhoun, Patrick S; Ford, Julian D

2008-08-15

In mental health services research, analyzing service utilization data often poses serious problems, given the presence of substantially skewed data distributions. This article presents a non-technical introduction to statistical methods specifically designed to handle the complexly distributed datasets that represent mental health service use, including Poisson, negative binomial, zero-inflated, and zero-truncated regression models. A flowchart is provided to assist the investigator in selecting the most appropriate method. Finally, a dataset of mental health service use reported by medical patients is described, and a comparison of results across several different statistical methods is presented. Implications of matching data analytic techniques appropriately with the often complexly distributed datasets of mental health services utilization variables are discussed.
Estimating safety effects of pavement management factors utilizing Bayesian random effect models.

PubMed

Jiang, Ximiao; Huang, Baoshan; Zaretzki, Russell L; Richards, Stephen; Yan, Xuedong

2013-01-01

Previous studies of pavement management factors that relate to the occurrence of traffic-related crashes are rare. Traditional research has mostly employed summary statistics of bidirectional pavement quality measurements in extended longitudinal road segments over a long time period, which may cause a loss of important information and result in biased parameter estimates. The research presented in this article focuses on crash risk of roadways with overall fair to good pavement quality. Real-time and location-specific data were employed to estimate the effects of pavement management factors on the occurrence of crashes. This research is based on the crash data and corresponding pavement quality data for the Tennessee state route highways from 2004 to 2009. The potential temporal and spatial correlations among observations caused by unobserved factors were considered. Overall 6 models were built accounting for no correlation, temporal correlation only, and both the temporal and spatial correlations. These models included Poisson, negative binomial (NB), one random effect Poisson and negative binomial (OREP, ORENB), and two random effect Poisson and negative binomial (TREP, TRENB) models. The Bayesian method was employed to construct these models. The inference is based on the posterior distribution from the Markov chain Monte Carlo (MCMC) simulation. These models were compared using the deviance information criterion. Analysis of the posterior distribution of parameter coefficients indicates that the pavement management factors indexed by Present Serviceability Index (PSI) and Pavement Distress Index (PDI) had significant impacts on the occurrence of crashes, whereas the variable rutting depth was not significant. Among other factors, lane width, median width, type of terrain, and posted speed limit were significant in affecting crash frequency. The findings of this study indicate that a reduction in pavement roughness would reduce the likelihood of traffic-related crashes. Hence, maintaining a low level of pavement roughness is strongly suggested. In addition, the results suggested that the temporal correlation among observations was significant and that the ORENB model outperformed all other models.
Revealing Word Order: Using Serial Position in Binomials to Predict Properties of the Speaker

ERIC Educational Resources Information Center

Iliev, Rumen; Smirnova, Anastasia

2016-01-01

Three studies test the link between word order in binomials and psychological and demographic characteristics of a speaker. While linguists have already suggested that psychological, cultural and societal factors are important in choosing word order in binomials, the vast majority of relevant research was focused on general factors and on broadly…

Univariate and bivariate likelihood-based meta-analysis methods performed comparably when marginal sensitivity and specificity were the targets of inference.

PubMed

Dahabreh, Issa J; Trikalinos, Thomas A; Lau, Joseph; Schmid, Christopher H

2017-03-01

To compare statistical methods for meta-analysis of sensitivity and specificity of medical tests (e.g., diagnostic or screening tests). We constructed a database of PubMed-indexed meta-analyses of test performance from which 2 × 2 tables for each included study could be extracted. We reanalyzed the data using univariate and bivariate random effects models fit with inverse variance and maximum likelihood methods. Analyses were performed using both normal and binomial likelihoods to describe within-study variability. The bivariate model using the binomial likelihood was also fit using a fully Bayesian approach. We use two worked examples-thoracic computerized tomography to detect aortic injury and rapid prescreening of Papanicolaou smears to detect cytological abnormalities-to highlight that different meta-analysis approaches can produce different results. We also present results from reanalysis of 308 meta-analyses of sensitivity and specificity. Models using the normal approximation produced sensitivity and specificity estimates closer to 50% and smaller standard errors compared to models using the binomial likelihood; absolute differences of 5% or greater were observed in 12% and 5% of meta-analyses for sensitivity and specificity, respectively. Results from univariate and bivariate random effects models were similar, regardless of estimation method. Maximum likelihood and Bayesian methods produced almost identical summary estimates under the bivariate model; however, Bayesian analyses indicated greater uncertainty around those estimates. Bivariate models produced imprecise estimates of the between-study correlation of sensitivity and specificity. Differences between methods were larger with increasing proportion of studies that were small or required a continuity correction. The binomial likelihood should be used to model within-study variability. Univariate and bivariate models give similar estimates of the marginal distributions for sensitivity and specificity. Bayesian methods fully quantify uncertainty and their ability to incorporate external evidence may be useful for imprecisely estimated parameters. Copyright © 2017 Elsevier Inc. All rights reserved.
The Binomial Model in Fluctuation Analysis of Quantal Neurotransmitter Release

PubMed Central

Quastel, D. M. J.

1997-01-01

The mathematics of the binomial model for quantal neurotransmitter release is considered in general terms, to explore what information might be extractable from statistical aspects of data. For an array of N statistically independent release sites, each with a release probability p, the compound binomial always pertains, with = N
, p′ ≡ 1 - var(m)/ =
(1 + cvp2) and n′ ≡ /p′ = N/(1 + cvp2), where m is the output/stimulus and cvp2 is var(p)/
2. Unless n′ is invariant with ambient conditions or stimulation paradigms, the simple binomial (cvp = 0) is untenable and n′ is neither N nor the number of “active” sites or sites with a quantum available. At each site p = popA, where po is the output probability if a site is “eligible” or “filled” despite previous quantal discharge, and pA (eligibility probability) depends at least on the replenishment rate, po, and interstimulus time. Assuming stochastic replenishment, a simple algorithm allows calculation of the full statistical composition of outputs for any hypothetical combinations of po's and refill rates, for any stimulation paradigm and spontaneous release. A rise in n′ (reduced cvp) tends to occur whenever po varies widely between sites, with a raised stimulation frequency or factors tending to increase po's. Unlike and var(m) at equilibrium, output changes early in trains of stimuli, and covariances, potentially provide information about whether changes in reflect change in or in . Formulae are derived for variance and third moments of postsynaptic responses, which depend on the quantal mix in the signals. A new, easily computed function, the area product, gives noise-unbiased variance of a series of synaptic signals and its peristimulus time distribution, which is modified by the unit channel composition of quantal responses and if the signals reflect mixed responses from synapses with different quantal time course. PMID:9017200
Some characteristics of repeated sickness absence

PubMed Central

Ferguson, David

1972-01-01

Ferguson, D. (1972).Brit. J. industr. Med.,29, 420-431. Some characteristics of repeated sickness absence. Several studies have shown that frequency of absence attributed to sickness is not distributed randomly but tends to follow the negative binomial distribution, and this has been taken to support the concept of `proneness' to such absence. Thus, the distribution of sickness absence resembles that of minor injury at work demonstrated over 50 years ago. Because the investigation of proneness to absence does not appear to have been reported by others in Australia, the opportunity was taken, during a wider study of health among telegraphists in a large communications undertaking, to analyse some characteristics of repeated sickness absence. The records of medically certified and uncertified sickness absence of all 769 telegraphists continuously employed in all State capitals over a two-and-a-half-year period were compared with those of 411 clerks and 415 mechanics and, in Sydney, 380 mail sorters and 80 of their supervisors. All telegraphists in Sydney, Melbourne, and Brisbane, and all mail sorters in Sydney, who were available and willing were later medically examined. From their absence pattern repeaters (employees who had had eight or more certified absences in two and a half years) were separated into three types based on a presumptive origin in chance, recurrent disease and symptomatic non-specific disorder. The observed distribution of individual frequency of certified absence over the full two-and-a-half-year period of study followed that expected from the univariate negative binomial, using maximum likelihood estimators, rather than the poisson distribution, in three of the four occupational groups in Sydney. Limited correlational and bivariate analysis supported the interpretation of proneness ascribed to the univariate fit. In the two groups studied, frequency of uncertified absence could not be fitted by the negative binomial, although the numbers of such absences in individuals in successive years were relatively highly correlated. All types of repeater were commoner in Sydney than in the other capital city offices, which differed little from each other. Repeaters were more common among those whose absence was attributed to neurosis, alimentary and upper respiratory tract disorder, and injury. Out of more than 90 health, personal, social, and industrial attributes determined at examination, only two (ethanol habit and adverse attitude to pay) showed any statistically significant association when telegraphist repeaters in Sydney were compared with employees who were rarely absent. Though repeating tended to be associated with chronic or recurrent ill health revealed at examination, one quarter of repeaters had little such ill health and one quarter of rarely absent employees had much. It was concluded that, in the population studied, the fitting of the negative binomial to frequency of certified sickness absence could, in the circumstances of the study, reasonably be given an interpretation of proneness. In that population also repeating varies geographically and occupationally, and is poorly associated with disease and other attributes uncovered at examination, with the exception of the ethanol habit. Repeaters are more often neurotic than employees who are rarely absent but also are more often stable double jobbers. The repeater should be identified for what help may be given him, if needed, otherwise it would seem more profitable to attack those features in work design and organization which influence motivation to come to work. Social factors which predispose to repeated absence are less amenable to modification. PMID:4636662
Earthquake number forecasts testing

NASA Astrophysics Data System (ADS)

Kagan, Yan Y.

2017-10-01

We study the distributions of earthquake numbers in two global earthquake catalogues: Global Centroid-Moment Tensor and Preliminary Determinations of Epicenters. The properties of these distributions are especially required to develop the number test for our forecasts of future seismic activity rate, tested by the Collaboratory for Study of Earthquake Predictability (CSEP). A common assumption, as used in the CSEP tests, is that the numbers are described by the Poisson distribution. It is clear, however, that the Poisson assumption for the earthquake number distribution is incorrect, especially for the catalogues with a lower magnitude threshold. In contrast to the one-parameter Poisson distribution so widely used to describe earthquake occurrences, the negative-binomial distribution (NBD) has two parameters. The second parameter can be used to characterize the clustering or overdispersion of a process. We also introduce and study a more complex three-parameter beta negative-binomial distribution. We investigate the dependence of parameters for both Poisson and NBD distributions on the catalogue magnitude threshold and on temporal subdivision of catalogue duration. First, we study whether the Poisson law can be statistically rejected for various catalogue subdivisions. We find that for most cases of interest, the Poisson distribution can be shown to be rejected statistically at a high significance level in favour of the NBD. Thereafter, we investigate whether these distributions fit the observed distributions of seismicity. For this purpose, we study upper statistical moments of earthquake numbers (skewness and kurtosis) and compare them to the theoretical values for both distributions. Empirical values for the skewness and the kurtosis increase for the smaller magnitude threshold and increase with even greater intensity for small temporal subdivision of catalogues. The Poisson distribution for large rate values approaches the Gaussian law, therefore its skewness and kurtosis both tend to zero for large earthquake rates: for the Gaussian law, these values are identically zero. A calculation of the NBD skewness and kurtosis levels based on the values of the first two statistical moments of the distribution, shows rapid increase of these upper moments levels. However, the observed catalogue values of skewness and kurtosis are rising even faster. This means that for small time intervals, the earthquake number distribution is even more heavy-tailed than the NBD predicts. Therefore for small time intervals, we propose using empirical number distributions appropriately smoothed for testing forecasted earthquake numbers.
Modeling species-abundance relationships in multi-species collections

USGS Publications Warehouse

Peng, S.; Yin, Z.; Ren, H.; Guo, Q.

2003-01-01

Species-abundance relationship is one of the most fundamental aspects of community ecology. Since Motomura first developed the geometric series model to describe the feature of community structure, ecologists have developed many other models to fit the species-abundance data in communities. These models can be classified into empirical and theoretical ones, including (1) statistical models, i.e., negative binomial distribution (and its extension), log-series distribution (and its extension), geometric distribution, lognormal distribution, Poisson-lognormal distribution, (2) niche models, i.e., geometric series, broken stick, overlapping niche, particulate niche, random assortment, dominance pre-emption, dominance decay, random fraction, weighted random fraction, composite niche, Zipf or Zipf-Mandelbrot model, and (3) dynamic models describing community dynamics and restrictive function of environment on community. These models have different characteristics and fit species-abundance data in various communities or collections. Among them, log-series distribution, lognormal distribution, geometric series, and broken stick model have been most widely used.
Finding consistent patterns: A nonparametric approach for identifying differential expression in RNA-Seq data

PubMed Central

Li, Jun; Tibshirani, Robert

2015-01-01

We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or ‘sequencing depths’. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by ‘outliers’ in the data. We introduce a simple, nonparametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods. PMID:22127579
Modeling abundance using multinomial N-mixture models

USGS Publications Warehouse

Royle, Andy

2016-01-01

Multinomial N-mixture models are a generalization of the binomial N-mixture models described in Chapter 6 to allow for more complex and informative sampling protocols beyond simple counts. Many commonly used protocols such as multiple observer sampling, removal sampling, and capture-recapture produce a multivariate count frequency that has a multinomial distribution and for which multinomial N-mixture models can be developed. Such protocols typically result in more precise estimates than binomial mixture models because they provide direct information about parameters of the observation process. We demonstrate the analysis of these models in BUGS using several distinct formulations that afford great flexibility in the types of models that can be developed, and we demonstrate likelihood analysis using the unmarked package. Spatially stratified capture-recapture models are one class of models that fall into the multinomial N-mixture framework, and we discuss analysis of stratified versions of classical models such as model Mb, Mh and other classes of models that are only possible to describe within the multinomial N-mixture framework.
Dynamic equilibrium of reconstituting hematopoietic stem cell populations.

PubMed

O'Quigley, John

2010-12-01

Clonal dominance in hematopoietic stem cell populations is an important question of interest but not one we can directly answer. Any estimates are based on indirect measurement. For marked populations, we can equate empirical and theoretical moments for binomial sampling, in particular we can use the well-known formula for the sampling variation of a binomial proportion. The empirical variance itself cannot always be reliably estimated and some caution is needed. We describe the difficulties here and identify ready solutions which only require appropriate use of variance-stabilizing transformations. From these we obtain estimators for the steady state, or dynamic equilibrium, of the number of hematopoietic stem cells involved in repopulating the marrow. The calculations themselves are not too involved. We give the distribution theory for the estimator as well as simple approximations for practical application. As an illustration, we rework on data recently gathered to address the question as to whether or not reconstitution of marrow grafts in the clinical setting might be considered to be oligoclonal.
Jump-and-return sandwiches: A new family of binomial-like selective inversion sequences with improved performance.

PubMed

Brenner, Tom; Chen, Johnny; Stait-Gardner, Tim; Zheng, Gang; Matsukawa, Shingo; Price, William S

2018-03-01

A new family of binomial-like inversion sequences, named jump-and-return sandwiches (JRS), has been developed by inserting a binomial-like sequence into a standard jump-and-return sequence, discovered through use of a stochastic Genetic Algorithm optimisation. Compared to currently used binomial-like inversion sequences (e.g., 3-9-19 and W5), the new sequences afford wider inversion bands and narrower non-inversion bands with an equal number of pulses. As an example, two jump-and-return sandwich 10-pulse sequences achieved 95% inversion at offsets corresponding to 9.4% and 10.3% of the non-inversion band spacing, compared to 14.7% for the binomial-like W5 inversion sequence, i.e., they afforded non-inversion bands about two thirds the width of the W5 non-inversion band. Copyright © 2018 Elsevier Inc. All rights reserved.
Adjusted Wald Confidence Interval for a Difference of Binomial Proportions Based on Paired Data

ERIC Educational Resources Information Center

Bonett, Douglas G.; Price, Robert M.

2012-01-01

Adjusted Wald intervals for binomial proportions in one-sample and two-sample designs have been shown to perform about as well as the best available methods. The adjusted Wald intervals are easy to compute and have been incorporated into introductory statistics courses. An adjusted Wald interval for paired binomial proportions is proposed here and…
Censored Hurdle Negative Binomial Regression (Case Study: Neonatorum Tetanus Case in Indonesia)

NASA Astrophysics Data System (ADS)

Yuli Rusdiana, Riza; Zain, Ismaini; Wulan Purnami, Santi

2017-06-01

Hurdle negative binomial model regression is a method that can be used for discreate dependent variable, excess zero and under- and overdispersion. It uses two parts approach. The first part estimates zero elements from dependent variable is zero hurdle model and the second part estimates not zero elements (non-negative integer) from dependent variable is called truncated negative binomial models. The discrete dependent variable in such cases is censored for some values. The type of censor that will be studied in this research is right censored. This study aims to obtain the parameter estimator hurdle negative binomial regression for right censored dependent variable. In the assessment of parameter estimation methods used Maximum Likelihood Estimator (MLE). Hurdle negative binomial model regression for right censored dependent variable is applied on the number of neonatorum tetanus cases in Indonesia. The type data is count data which contains zero values in some observations and other variety value. This study also aims to obtain the parameter estimator and test statistic censored hurdle negative binomial model. Based on the regression results, the factors that influence neonatorum tetanus case in Indonesia is the percentage of baby health care coverage and neonatal visits.
Inhomogeneity of the density of Parascaris spp. eggs in faeces of individual foals and the use of hypothesis testing for treatment decision making.

PubMed

Wilkes, E J A; Cowling, A; Woodgate, R G; Hughes, K J

2016-10-15

Faecal egg counts (FEC) are used widely for monitoring of parasite infection in animals, treatment decision-making and estimation of anthelmintic efficacy. When a single count or sample mean is used as a point estimate of the expectation of the egg distribution over some time interval, the variability in the egg density is not accounted for. Although variability, including quantifying sources, of egg count data has been described, the spatiotemporal distribution of nematode eggs in faeces is not well understood. We believe that statistical inference about the mean egg count for treatment decision-making has not been used previously. The aim of this study was to examine the density of Parascaris eggs in solution and faeces and to describe the use of hypothesis testing for decision-making. Faeces from two foals with Parascaris burdens were mixed with magnesium sulphate solution and 30 McMaster chambers were examined to determine the egg distribution in a well-mixed solution. To examine the distribution of eggs in faeces from an individual animal, three faecal piles from a foal with a known Parascaris burden were obtained, from which 81 counts were performed. A single faecal sample was also collected daily from 20 foals on three consecutive days and a FEC was performed on three separate portions of each sample. As appropriate, Poisson or negative binomial confidence intervals for the distribution mean were calculated. Parascaris eggs in a well-mixed solution conformed to a homogeneous Poisson process, while the egg density in faeces was not homogeneous, but aggregated. This study provides an extension from homogeneous to inhomogeneous Poisson processes, leading to an understanding of why Poisson and negative binomial distributions correspondingly provide a good fit for egg count data. The application of one-sided hypothesis tests for decision-making is presented. Copyright © 2016 Elsevier B.V. All rights reserved.
Node degree distribution in spanning trees

NASA Astrophysics Data System (ADS)

Pozrikidis, C.

2016-03-01

A method is presented for computing the number of spanning trees involving one link or a specified group of links, and excluding another link or a specified group of links, in a network described by a simple graph in terms of derivatives of the spanning-tree generating function defined with respect to the eigenvalues of the Kirchhoff (weighted Laplacian) matrix. The method is applied to deduce the node degree distribution in a complete or randomized set of spanning trees of an arbitrary network. An important feature of the proposed method is that the explicit construction of spanning trees is not required. It is shown that the node degree distribution in the spanning trees of the complete network is described by the binomial distribution. Numerical results are presented for the node degree distribution in square, triangular, and honeycomb lattices.
CROSSER - CUMULATIVE BINOMIAL PROGRAMS

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The cumulative binomial program, CROSSER, is one of a set of three programs which calculate cumulative binomial probability distributions for arbitrary inputs. The three programs, CROSSER, CUMBIN (NPO-17555), and NEWTONP (NPO-17556), can be used independently of one another. CROSSER can be used by statisticians and users of statistical procedures, test planners, designers, and numerical analysts. The program has been used for reliability/availability calculations. CROSSER calculates the point at which the reliability of a k-out-of-n system equals the common reliability of the n components. It is designed to work well with all integer values 0 < k <= n. To run the program, the user simply runs the executable version and inputs the information requested by the program. The program is not designed to weed out incorrect inputs, so the user must take care to make sure the inputs are correct. Once all input has been entered, the program calculates and lists the result. It also lists the number of iterations of Newton's method required to calculate the answer within the given error. The CROSSER program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly with most C compilers. The program format is interactive. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CROSSER was developed in 1988.
Ecological effects of the invasive giant madagascar day gecko on endemic mauritian geckos: applications of binomial-mixture and species distribution models.

PubMed

Buckland, Steeves; Cole, Nik C; Aguirre-Gutiérrez, Jesús; Gallagher, Laura E; Henshaw, Sion M; Besnard, Aurélien; Tucker, Rachel M; Bachraz, Vishnu; Ruhomaun, Kevin; Harris, Stephen

2014-01-01

The invasion of the giant Madagascar day gecko Phelsuma grandis has increased the threats to the four endemic Mauritian day geckos (Phelsuma spp.) that have survived on mainland Mauritius. We had two main aims: (i) to predict the spatial distribution and overlap of P. grandis and the endemic geckos at a landscape level; and (ii) to investigate the effects of P. grandis on the abundance and risks of extinction of the endemic geckos at a local scale. An ensemble forecasting approach was used to predict the spatial distribution and overlap of P. grandis and the endemic geckos. We used hierarchical binomial mixture models and repeated visual estimate surveys to calculate the abundance of the endemic geckos in sites with and without P. grandis. The predicted range of each species varied from 85 km2 to 376 km2. Sixty percent of the predicted range of P. grandis overlapped with the combined predicted ranges of the four endemic geckos; 15% of the combined predicted ranges of the four endemic geckos overlapped with P. grandis. Levin's niche breadth varied from 0.140 to 0.652 between P. grandis and the four endemic geckos. The abundance of endemic geckos was 89% lower in sites with P. grandis compared to sites without P. grandis, and the endemic geckos had been extirpated at four of ten sites we surveyed with P. grandis. Species Distribution Modelling, together with the breadth metrics, predicted that P. grandis can partly share the equivalent niche with endemic species and survive in a range of environmental conditions. We provide strong evidence that smaller endemic geckos are unlikely to survive in sympatry with P. grandis. This is a cause of concern in both Mauritius and other countries with endemic species of Phelsuma.
Demonstration of fundamental statistics by studying timing of electronics signals in a physics-based laboratory

NASA Astrophysics Data System (ADS)

Beach, Shaun E.; Semkow, Thomas M.; Remling, David J.; Bradt, Clayton J.

2017-07-01

We have developed accessible methods to demonstrate fundamental statistics in several phenomena, in the context of teaching electronic signal processing in a physics-based college-level curriculum. A relationship between the exponential time-interval distribution and Poisson counting distribution for a Markov process with constant rate is derived in a novel way and demonstrated using nuclear counting. Negative binomial statistics is demonstrated as a model for overdispersion and justified by the effect of electronic noise in nuclear counting. The statistics of digital packets on a computer network are shown to be compatible with the fractal-point stochastic process leading to a power-law as well as generalized inverse Gaussian density distributions of time intervals between packets.
Identifiability in N-mixture models: a large-scale screening test with bird data.

PubMed

Kéry, Marc

2018-02-01

Binomial N-mixture models have proven very useful in ecology, conservation, and monitoring: they allow estimation and modeling of abundance separately from detection probability using simple counts. Recently, doubts about parameter identifiability have been voiced. I conducted a large-scale screening test with 137 bird data sets from 2,037 sites. I found virtually no identifiability problems for Poisson and zero-inflated Poisson (ZIP) binomial N-mixture models, but negative-binomial (NB) models had problems in 25% of all data sets. The corresponding multinomial N-mixture models had no problems. Parameter estimates under Poisson and ZIP binomial and multinomial N-mixture models were extremely similar. Identifiability problems became a little more frequent with smaller sample sizes (267 and 50 sites), but were unaffected by whether the models did or did not include covariates. Hence, binomial N-mixture model parameters with Poisson and ZIP mixtures typically appeared identifiable. In contrast, NB mixtures were often unidentifiable, which is worrying since these were often selected by Akaike's information criterion. Identifiability of binomial N-mixture models should always be checked. If problems are found, simpler models, integrated models that combine different observation models or the use of external information via informative priors or penalized likelihoods, may help. © 2017 by the Ecological Society of America.
Variability in results from negative binomial models for Lyme disease measured at different spatial scales.

PubMed

Tran, Phoebe; Waller, Lance

2015-01-01

Lyme disease has been the subject of many studies due to increasing incidence rates year after year and the severe complications that can arise in later stages of the disease. Negative binomial models have been used to model Lyme disease in the past with some success. However, there has been little focus on the reliability and consistency of these models when they are used to study Lyme disease at multiple spatial scales. This study seeks to explore how sensitive/consistent negative binomial models are when they are used to study Lyme disease at different spatial scales (at the regional and sub-regional levels). The study area includes the thirteen states in the Northeastern United States with the highest Lyme disease incidence during the 2002-2006 period. Lyme disease incidence at county level for the period of 2002-2006 was linked with several previously identified key landscape and climatic variables in a negative binomial regression model for the Northeastern region and two smaller sub-regions (the New England sub-region and the Mid-Atlantic sub-region). This study found that negative binomial models, indeed, were sensitive/inconsistent when used at different spatial scales. We discuss various plausible explanations for such behavior of negative binomial models. Further investigation of the inconsistency and sensitivity of negative binomial models when used at different spatial scales is important for not only future Lyme disease studies and Lyme disease risk assessment/management but any study that requires use of this model type in a spatial context. Copyright © 2014 Elsevier Inc. All rights reserved.
Analysis of dengue fever risk using geostatistics model in bone regency

NASA Astrophysics Data System (ADS)

Amran, Stang, Mallongi, Anwar

2017-03-01

This research aim is to analysis of dengue fever risk based on Geostatistics model in Bone Regency. Risk levels of dengue fever are denoted by parameter of Binomial distribution. Effect of temperature, rainfalls, elevation, and larvae abundance are investigated through Geostatistics model. Bayesian hierarchical method is used in estimation process. Using dengue fever data in eleven locations this research shows that temperature and rainfall have significant effect of dengue fever risk in Bone regency.
A Lower Bound to the Probability of Choosing the Optimal Passing Score for a Mastery Test When There is an External Criterion [and] Estimating the Parameters of the Beta-Binomial Distribution.

ERIC Educational Resources Information Center

Wilcox, Rand R.

A mastery test is frequently described as follows: an examinee responds to n dichotomously scored test items. Depending upon the examinee's observed (number correct) score, a mastery decision is made and the examinee is advanced to the next level of instruction. Otherwise, a nonmastery decision is made and the examinee is given remedial work. This…

Measurement error and outcome distributions: Methodological issues in regression analyses of behavioral coding data.

PubMed

Holsclaw, Tracy; Hallgren, Kevin A; Steyvers, Mark; Smyth, Padhraic; Atkins, David C

2015-12-01

Behavioral coding is increasingly used for studying mechanisms of change in psychosocial treatments for substance use disorders (SUDs). However, behavioral coding data typically include features that can be problematic in regression analyses, including measurement error in independent variables, non normal distributions of count outcome variables, and conflation of predictor and outcome variables with third variables, such as session length. Methodological research in econometrics has shown that these issues can lead to biased parameter estimates, inaccurate standard errors, and increased Type I and Type II error rates, yet these statistical issues are not widely known within SUD treatment research, or more generally, within psychotherapy coding research. Using minimally technical language intended for a broad audience of SUD treatment researchers, the present paper illustrates the nature in which these data issues are problematic. We draw on real-world data and simulation-based examples to illustrate how these data features can bias estimation of parameters and interpretation of models. A weighted negative binomial regression is introduced as an alternative to ordinary linear regression that appropriately addresses the data characteristics common to SUD treatment behavioral coding data. We conclude by demonstrating how to use and interpret these models with data from a study of motivational interviewing. SPSS and R syntax for weighted negative binomial regression models is included in online supplemental materials. (c) 2016 APA, all rights reserved).
Measurement error and outcome distributions: Methodological issues in regression analyses of behavioral coding data

PubMed Central

Holsclaw, Tracy; Hallgren, Kevin A.; Steyvers, Mark; Smyth, Padhraic; Atkins, David C.

2015-01-01

Behavioral coding is increasingly used for studying mechanisms of change in psychosocial treatments for substance use disorders (SUDs). However, behavioral coding data typically include features that can be problematic in regression analyses, including measurement error in independent variables, non-normal distributions of count outcome variables, and conflation of predictor and outcome variables with third variables, such as session length. Methodological research in econometrics has shown that these issues can lead to biased parameter estimates, inaccurate standard errors, and increased type-I and type-II error rates, yet these statistical issues are not widely known within SUD treatment research, or more generally, within psychotherapy coding research. Using minimally-technical language intended for a broad audience of SUD treatment researchers, the present paper illustrates the nature in which these data issues are problematic. We draw on real-world data and simulation-based examples to illustrate how these data features can bias estimation of parameters and interpretation of models. A weighted negative binomial regression is introduced as an alternative to ordinary linear regression that appropriately addresses the data characteristics common to SUD treatment behavioral coding data. We conclude by demonstrating how to use and interpret these models with data from a study of motivational interviewing. SPSS and R syntax for weighted negative binomial regression models is included in supplementary materials. PMID:26098126
[Spatial epidemiological study on malaria epidemics in Hainan province].

PubMed

Wen, Liang; Shi, Run-He; Fang, Li-Qun; Xu, De-Zhong; Li, Cheng-Yi; Wang, Yong; Yuan, Zheng-Quan; Zhang, Hui

2008-06-01

To better understand the characteristics of spatial distribution of malaria epidemics in Hainan province and to explore the relationship between malaria epidemics and environmental factors, as well to develop prediction model on malaria epidemics. Data on Malaria and meteorological factors were collected in all 19 counties in Hainan province from May to Oct., 2000, and the proportion of land use types of these counties in this period were extracted from digital map of land use in Hainan province. Land surface temperatures (LST) were extracted from MODIS images and elevations of these counties were extracted from DEM of Hainan province. The coefficients of correlation of malaria incidences and these environmental factors were then calculated with SPSS 13.0, and negative binomial regression analysis were done using SAS 9.0. The incidence of malaria showed (1) positive correlations to elevation, proportion of forest land area and grassland area; (2) negative correlations to the proportion of cultivated area, urban and rural residents and to industrial enterprise area, LST; (3) no correlations to meteorological factors, proportion of water area, and unemployed land area. The prediction model of malaria which came from negative binomial regression analysis was: I (monthly, unit: 1/1,000,000) = exp (-1.672-0.399xLST). Spatial distribution of malaria epidemics was associated with some environmental factors, and prediction model of malaria epidemic could be developed with indexes which extracted from satellite remote sensing images.
Binomial probability distribution model-based protein identification algorithm for tandem mass spectrometry utilizing peak intensity information.

PubMed

Xiao, Chuan-Le; Chen, Xiao-Zhou; Du, Yang-Li; Sun, Xuesong; Zhang, Gong; He, Qing-Yu

2013-01-04

Mass spectrometry has become one of the most important technologies in proteomic analysis. Tandem mass spectrometry (LC-MS/MS) is a major tool for the analysis of peptide mixtures from protein samples. The key step of MS data processing is the identification of peptides from experimental spectra by searching public sequence databases. Although a number of algorithms to identify peptides from MS/MS data have been already proposed, e.g. Sequest, OMSSA, X!Tandem, Mascot, etc., they are mainly based on statistical models considering only peak-matches between experimental and theoretical spectra, but not peak intensity information. Moreover, different algorithms gave different results from the same MS data, implying their probable incompleteness and questionable reproducibility. We developed a novel peptide identification algorithm, ProVerB, based on a binomial probability distribution model of protein tandem mass spectrometry combined with a new scoring function, making full use of peak intensity information and, thus, enhancing the ability of identification. Compared with Mascot, Sequest, and SQID, ProVerB identified significantly more peptides from LC-MS/MS data sets than the current algorithms at 1% False Discovery Rate (FDR) and provided more confident peptide identifications. ProVerB is also compatible with various platforms and experimental data sets, showing its robustness and versatility. The open-source program ProVerB is available at http://bioinformatics.jnu.edu.cn/software/proverb/ .
nbCNV: a multi-constrained optimization model for discovering copy number variants in single-cell sequencing data.

PubMed

Zhang, Changsheng; Cai, Hongmin; Huang, Jingying; Song, Yan

2016-09-17

Variations in DNA copy number have an important contribution to the development of several diseases, including autism, schizophrenia and cancer. Single-cell sequencing technology allows the dissection of genomic heterogeneity at the single-cell level, thereby providing important evolutionary information about cancer cells. In contrast to traditional bulk sequencing, single-cell sequencing requires the amplification of the whole genome of a single cell to accumulate enough samples for sequencing. However, the amplification process inevitably introduces amplification bias, resulting in an over-dispersing portion of the sequencing data. Recent study has manifested that the over-dispersed portion of the single-cell sequencing data could be well modelled by negative binomial distributions. We developed a read-depth based method, nbCNV to detect the copy number variants (CNVs). The nbCNV method uses two constraints-sparsity and smoothness to fit the CNV patterns under the assumption that the read signals are negatively binomially distributed. The problem of CNV detection was formulated as a quadratic optimization problem, and was solved by an efficient numerical solution based on the classical alternating direction minimization method. Extensive experiments to compare nbCNV with existing benchmark models were conducted on both simulated data and empirical single-cell sequencing data. The results of those experiments demonstrate that nbCNV achieves superior performance and high robustness for the detection of CNVs in single-cell sequencing data.
Seasonal changes in spatial patterns of two annual plants in the Chihuahuan Desert, USA

USGS Publications Warehouse

Yin, Z.-Y.; Guo, Q.; Ren, H.; Peng, S.-L.

2005-01-01

Spatial pattern of a biotic population may change over time as its component individuals grow or die out, but whether this is the case for desert annual plants is largely unknown. Here we examined seasonal changes in spatial patterns of two annuals, Eriogonum abertianum and Haplopappus gracilis, in initial (winter) and final (summer) densities. The density was measured as the number of individuals from 384 permanent quadrats (each 0.5 m × 0.5 m) in the Chihuahuan Desert near Portal, Arizona, USA. We used three probability distributions (binomial, Poisson, and negative binomial or NB) that represent three basic spatial patterns (regular, random, and clumped) to fit the observed frequency distributions of densities of the two annuals. Both species showed clear clumped patterns as characterized by the NB and had similar inverse J-shaped frequency distribution curves in two density categories. Also, both species displayed a reduced degree of aggregation from winter to summer after the spring drought (massive die-off), as indicated by the increased k-parameter of the NB and decreased values of another NB parameter p, variance/mean ratio, Lloyd’s Index of Patchiness, and David and Moore’s Index of Clumping. Further, we hypothesized that while the NB (i.e., Poisson-logarithmic) well fits the distribution of individuals per quadrat, its components, the Poisson and logarithmic, may describe the distributions of clumps per quadrat and of individuals per clump, respectively. We thus obtained the means and variances for (1) individuals per quadrat, (2) clumps per quadrat, and (3) individuals per clump. The results showed that the decrease of the density from winter to summer for each plant resulted from the decrease of individuals per clump, rather than from the decrease of clumps per quadrat. The great similarities between the two annuals indicate that our observed temporal changes in spatial patterns may be common among desert annual plants.
Genetic parameters for racing records in trotters using linear and generalized linear models.

PubMed

Suontama, M; van der Werf, J H J; Juga, J; Ojala, M

2012-09-01

Heritability and repeatability and genetic and phenotypic correlations were estimated for trotting race records with linear and generalized linear models using 510,519 records on 17,792 Finnhorses and 513,161 records on 25,536 Standardbred trotters. Heritability and repeatability were estimated for single racing time and earnings traits with linear models, and logarithmic scale was used for racing time and fourth-root scale for earnings to correct for nonnormality. Generalized linear models with a gamma distribution were applied for single racing time and with a multinomial distribution for single earnings traits. In addition, genetic parameters for annual earnings were estimated with linear models on the observed and fourth-root scales. Racing success traits of single placings, winnings, breaking stride, and disqualifications were analyzed using generalized linear models with a binomial distribution. Estimates of heritability were greatest for racing time, which ranged from 0.32 to 0.34. Estimates of heritability were low for single earnings with all distributions, ranging from 0.01 to 0.09. Annual earnings were closer to normal distribution than single earnings. Heritability estimates were moderate for annual earnings on the fourth-root scale, 0.19 for Finnhorses and 0.27 for Standardbred trotters. Heritability estimates for binomial racing success variables ranged from 0.04 to 0.12, being greatest for winnings and least for breaking stride. Genetic correlations among racing traits were high, whereas phenotypic correlations were mainly low to moderate, except correlations between racing time and earnings were high. On the basis of a moderate heritability and moderate to high repeatability for racing time and annual earnings, selection of horses for these traits is effective when based on a few repeated records. Because of high genetic correlations, direct selection for racing time and annual earnings would also result in good genetic response in racing success.
Multiplicity distributions of charged hadrons in vp and charged current interactions

NASA Astrophysics Data System (ADS)

Jones, G. T.; Jones, R. W. L.; Kennedy, B. W.; Morrison, D. R. O.; Mobayyen, M. M.; Wainstein, S.; Aderholz, M.; Hantke, D.; Katz, U. F.; Kern, J.; Schmitz, N.; Wittek, W.; Borner, H. P.; Myatt, G.; Radojicic, D.; Burke, S.

1992-03-01

Using data on vp andbar vp charged current interactions from a bubble chamber experiment with BEBC at CERN, the multiplicity distributions of charged hadrons are investigated. The analysis is based on ˜20000 events with incident v and ˜10000 events with incidentbar v. The invariant mass W of the total hadronic system ranges from 3 GeV to ˜14 GeV. The experimental multiplicity distributions are fitted by the binomial function (for different intervals of W and in different intervals of the rapidity y), by the Levy function and the lognormal function. All three parametrizations give acceptable values for X 2. For fixed W, forward and backward multiplicities are found to be uncorrelated. The normalized moments of the charged multiplicity distributions are measured as a function of W. They show a violation of KNO scaling.
New Class of Quantum Error-Correcting Codes for a Bosonic Mode

NASA Astrophysics Data System (ADS)

Michael, Marios H.; Silveri, Matti; Brierley, R. T.; Albert, Victor V.; Salmilehto, Juha; Jiang, Liang; Girvin, S. M.

2016-07-01

We construct a new class of quantum error-correcting codes for a bosonic mode, which are advantageous for applications in quantum memories, communication, and scalable computation. These "binomial quantum codes" are formed from a finite superposition of Fock states weighted with binomial coefficients. The binomial codes can exactly correct errors that are polynomial up to a specific degree in bosonic creation and annihilation operators, including amplitude damping and displacement noise as well as boson addition and dephasing errors. For realistic continuous-time dissipative evolution, the codes can perform approximate quantum error correction to any given order in the time step between error detection measurements. We present an explicit approximate quantum error recovery operation based on projective measurements and unitary operations. The binomial codes are tailored for detecting boson loss and gain errors by means of measurements of the generalized number parity. We discuss optimization of the binomial codes and demonstrate that by relaxing the parity structure, codes with even lower unrecoverable error rates can be achieved. The binomial codes are related to existing two-mode bosonic codes, but offer the advantage of requiring only a single bosonic mode to correct amplitude damping as well as the ability to correct other errors. Our codes are similar in spirit to "cat codes" based on superpositions of the coherent states but offer several advantages such as smaller mean boson number, exact rather than approximate orthonormality of the code words, and an explicit unitary operation for repumping energy into the bosonic mode. The binomial quantum codes are realizable with current superconducting circuit technology, and they should prove useful in other quantum technologies, including bosonic quantum memories, photonic quantum communication, and optical-to-microwave up- and down-conversion.
Estimating spatial and temporal components of variation in count data using negative binomial mixed models

USGS Publications Warehouse

Irwin, Brian J.; Wagner, Tyler; Bence, James R.; Kepler, Megan V.; Liu, Weihai; Hayes, Daniel B.

2013-01-01

Partitioning total variability into its component temporal and spatial sources is a powerful way to better understand time series and elucidate trends. The data available for such analyses of fish and other populations are usually nonnegative integer counts of the number of organisms, often dominated by many low values with few observations of relatively high abundance. These characteristics are not well approximated by the Gaussian distribution. We present a detailed description of a negative binomial mixed-model framework that can be used to model count data and quantify temporal and spatial variability. We applied these models to data from four fishery-independent surveys of Walleyes Sander vitreus across the Great Lakes basin. Specifically, we fitted models to gill-net catches from Wisconsin waters of Lake Superior; Oneida Lake, New York; Saginaw Bay in Lake Huron, Michigan; and Ohio waters of Lake Erie. These long-term monitoring surveys varied in overall sampling intensity, the total catch of Walleyes, and the proportion of zero catches. Parameter estimation included the negative binomial scaling parameter, and we quantified the random effects as the variations among gill-net sampling sites, the variations among sampled years, and site × year interactions. This framework (i.e., the application of a mixed model appropriate for count data in a variance-partitioning context) represents a flexible approach that has implications for monitoring programs (e.g., trend detection) and for examining the potential of individual variance components to serve as response metrics to large-scale anthropogenic perturbations or ecological changes.
Comparison of robustness to outliers between robust poisson models and log-binomial models when estimating relative risks for common binary outcomes: a simulation study.

PubMed

Chen, Wansu; Shi, Jiaxiao; Qian, Lei; Azen, Stanley P

2014-06-26

To estimate relative risks or risk ratios for common binary outcomes, the most popular model-based methods are the robust (also known as modified) Poisson and the log-binomial regression. Of the two methods, it is believed that the log-binomial regression yields more efficient estimators because it is maximum likelihood based, while the robust Poisson model may be less affected by outliers. Evidence to support the robustness of robust Poisson models in comparison with log-binomial models is very limited. In this study a simulation was conducted to evaluate the performance of the two methods in several scenarios where outliers existed. The findings indicate that for data coming from a population where the relationship between the outcome and the covariate was in a simple form (e.g. log-linear), the two models yielded comparable biases and mean square errors. However, if the true relationship contained a higher order term, the robust Poisson models consistently outperformed the log-binomial models even when the level of contamination is low. The robust Poisson models are more robust (or less sensitive) to outliers compared to the log-binomial models when estimating relative risks or risk ratios for common binary outcomes. Users should be aware of the limitations when choosing appropriate models to estimate relative risks or risk ratios.
Inference for binomial probability based on dependent Bernoulli random variables with applications to meta‐analysis and group level studies

PubMed Central

Bakbergenuly, Ilyas; Morgenthaler, Stephan

2016-01-01

We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability p^, both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence. PMID:27192062
Mixture models for estimating the size of a closed population when capture rates vary among individuals

USGS Publications Warehouse

Dorazio, R.M.; Royle, J. Andrew

2003-01-01

We develop a parameterization of the beta-binomial mixture that provides sensible inferences about the size of a closed population when probabilities of capture or detection vary among individuals. Three classes of mixture models (beta-binomial, logistic-normal, and latent-class) are fitted to recaptures of snowshoe hares for estimating abundance and to counts of bird species for estimating species richness. In both sets of data, rates of detection appear to vary more among individuals (animals or species) than among sampling occasions or locations. The estimates of population size and species richness are sensitive to model-specific assumptions about the latent distribution of individual rates of detection. We demonstrate using simulation experiments that conventional diagnostics for assessing model adequacy, such as deviance, cannot be relied on for selecting classes of mixture models that produce valid inferences about population size. Prior knowledge about sources of individual heterogeneity in detection rates, if available, should be used to help select among classes of mixture models that are to be used for inference.
Speech-discrimination scores modeled as a binomial variable.

PubMed

Thornton, A R; Raffin, M J

1978-09-01

Many studies have reported variability data for tests of speech discrimination, and the disparate results of these studies have not been given a simple explanation. Arguments over the relative merits of 25- vs 50-word tests have ignored the basic mathematical properties inherent in the use of percentage scores. The present study models performance on clinical tests of speech discrimination as a binomial variable. A binomial model was developed, and some of its characteristics were tested against data from 4120 scores obtained on the CID Auditory Test W-22. A table for determining significant deviations between scores was generated and compared to observed differences in half-list scores for the W-22 tests. Good agreement was found between predicted and observed values. Implications of the binomial characteristics of speech-discrimination scores are discussed.
Possibility and Challenges of Conversion of Current Virus Species Names to Linnaean Binomials

DOE Office of Scientific and Technical Information (OSTI.GOV)

Postler, Thomas S.; Clawson, Anna N.; Amarasinghe, Gaya K.

Botanical, mycological, zoological, and prokaryotic species names follow the Linnaean format, consisting of an italicized Latinized binomen with a capitalized genus name and a lower case species epithet (e.g., Homo sapiens). Virus species names, however, do not follow a uniform format, and, even when binomial, are not Linnaean in style. In this thought exercise, we attempted to convert all currently official names of species included in the virus family Arenaviridae and the virus order Mononegavirales to Linnaean binomials, and to identify and address associated challenges and concerns. Surprisingly, this endeavor was not as complicated or time-consuming as even the authorsmore » of this article expected when conceiving the experiment. [Arenaviridae; binomials; ICTV; International Committee on Taxonomy of Viruses; Mononegavirales; virus nomenclature; virus taxonomy.]« less
[Individual variation in the frequency of chromosome aberrations under the influence of chemical mutagens. I. Inter-cultural and inter-individual variations in the effect of mutagens on human lymphocytes].

PubMed

Iakovenko, K N; Tarusina, T O

1976-01-01

The study of the distribution law of human peripheral blood cultures for the sensitivity to thiophosphamide was performed. In the first experiment the blood from one person was used, in the second one the blood was used from different persons. "The percent of aberrant cells" and "the number of chromosome breaks per 100 cells" were scored. The distribution law of the cultures in all the experiments was found to be normal. Analysis of the variances on the percent of aberrant cells showed that the distribution law of the cultures received from one donor corresponded to the binomial one, and that of the cultures received from different donors--to the Poisson's one.
Maximally Informative Stimuli and Tuning Curves for Sigmoidal Rate-Coding Neurons and Populations

NASA Astrophysics Data System (ADS)

McDonnell, Mark D.; Stocks, Nigel G.

2008-08-01

A general method for deriving maximally informative sigmoidal tuning curves for neural systems with small normalized variability is presented. The optimal tuning curve is a nonlinear function of the cumulative distribution function of the stimulus and depends on the mean-variance relationship of the neural system. The derivation is based on a known relationship between Shannon’s mutual information and Fisher information, and the optimality of Jeffrey’s prior. It relies on the existence of closed-form solutions to the converse problem of optimizing the stimulus distribution for a given tuning curve. It is shown that maximum mutual information corresponds to constant Fisher information only if the stimulus is uniformly distributed. As an example, the case of sub-Poisson binomial firing statistics is analyzed in detail.
Radial basis function and its application in tourism management

NASA Astrophysics Data System (ADS)

Hu, Shan-Feng; Zhu, Hong-Bin; Zhao, Lei

2018-05-01

In this work, several applications and the performances of the radial basis function (RBF) are briefly reviewed at first. After that, the binomial function combined with three different RBFs including the multiquadric (MQ), inverse quadric (IQ) and inverse multiquadric (IMQ) distributions are adopted to model the tourism data of Huangshan in China. Simulation results showed that all the models match very well with the sample data. It is found that among the three models, the IMQ-RBF model is more suitable for forecasting the tourist flow.
Predictive accuracy of particle filtering in dynamic models supporting outbreak projections.

PubMed

Safarishahrbijari, Anahita; Teyhouee, Aydin; Waldner, Cheryl; Liu, Juxin; Osgood, Nathaniel D

2017-09-26

While a new generation of computational statistics algorithms and availability of data streams raises the potential for recurrently regrounding dynamic models with incoming observations, the effectiveness of such arrangements can be highly subject to specifics of the configuration (e.g., frequency of sampling and representation of behaviour change), and there has been little attempt to identify effective configurations. Combining dynamic models with particle filtering, we explored a solution focusing on creating quickly formulated models regrounded automatically and recurrently as new data becomes available. Given a latent underlying case count, we assumed that observed incident case counts followed a negative binomial distribution. In accordance with the condensation algorithm, each such observation led to updating of particle weights. We evaluated the effectiveness of various particle filtering configurations against each other and against an approach without particle filtering according to the accuracy of the model in predicting future prevalence, given data to a certain point and a norm-based discrepancy metric. We examined the effectiveness of particle filtering under varying times between observations, negative binomial dispersion parameters, and rates with which the contact rate could evolve. We observed that more frequent observations of empirical data yielded super-linearly improved accuracy in model predictions. We further found that for the data studied here, the most favourable assumptions to make regarding the parameters associated with the negative binomial distribution and changes in contact rate were robust across observation frequency and the observation point in the outbreak. Combining dynamic models with particle filtering can perform well in projecting future evolution of an outbreak. Most importantly, the remarkable improvements in predictive accuracy resulting from more frequent sampling suggest that investments to achieve efficient reporting mechanisms may be more than paid back by improved planning capacity. The robustness of the results on particle filter configuration in this case study suggests that it may be possible to formulate effective standard guidelines and regularized approaches for such techniques in particular epidemiological contexts. Most importantly, the work tentatively suggests potential for health decision makers to secure strong guidance when anticipating outbreak evolution for emerging infectious diseases by combining even very rough models with particle filtering method.
Modeling Tetanus Neonatorum case using the regression of negative binomial and zero-inflated negative binomial

NASA Astrophysics Data System (ADS)

Amaliana, Luthfatul; Sa'adah, Umu; Wayan Surya Wardhani, Ni

2017-12-01

Tetanus Neonatorum is an infectious disease that can be prevented by immunization. The number of Tetanus Neonatorum cases in East Java Province is the highest in Indonesia until 2015. Tetanus Neonatorum data contain over dispersion and big enough proportion of zero-inflation. Negative Binomial (NB) regression is an alternative method when over dispersion happens in Poisson regression. However, the data containing over dispersion and zero-inflation are more appropriately analyzed by using Zero-Inflated Negative Binomial (ZINB) regression. The purpose of this study are: (1) to model Tetanus Neonatorum cases in East Java Province with 71.05 percent proportion of zero-inflation by using NB and ZINB regression, (2) to obtain the best model. The result of this study indicates that ZINB is better than NB regression with smaller AIC.

Binomial leap methods for simulating stochastic chemical kinetics.

PubMed

Tian, Tianhai; Burrage, Kevin

2004-12-01

This paper discusses efficient simulation methods for stochastic chemical kinetics. Based on the tau-leap and midpoint tau-leap methods of Gillespie [D. T. Gillespie, J. Chem. Phys. 115, 1716 (2001)], binomial random variables are used in these leap methods rather than Poisson random variables. The motivation for this approach is to improve the efficiency of the Poisson leap methods by using larger stepsizes. Unlike Poisson random variables whose range of sample values is from zero to infinity, binomial random variables have a finite range of sample values. This probabilistic property has been used to restrict possible reaction numbers and to avoid negative molecular numbers in stochastic simulations when larger stepsize is used. In this approach a binomial random variable is defined for a single reaction channel in order to keep the reaction number of this channel below the numbers of molecules that undergo this reaction channel. A sampling technique is also designed for the total reaction number of a reactant species that undergoes two or more reaction channels. Samples for the total reaction number are not greater than the molecular number of this species. In addition, probability properties of the binomial random variables provide stepsize conditions for restricting reaction numbers in a chosen time interval. These stepsize conditions are important properties of robust leap control strategies. Numerical results indicate that the proposed binomial leap methods can be applied to a wide range of chemical reaction systems with very good accuracy and significant improvement on efficiency over existing approaches. (c) 2004 American Institute of Physics.
Assumptions of acceptance sampling and the implications for lot contamination: Escherichia coli O157 in lots of Australian manufacturing beef.

PubMed

Kiermeier, Andreas; Mellor, Glen; Barlow, Robert; Jenson, Ian

2011-04-01

The aims of this work were to determine the distribution and concentration of Escherichia coli O157 in lots of beef destined for grinding (manufacturing beef) that failed to meet Australian requirements for export, to use these data to better understand the performance of sampling plans based on the binomial distribution, and to consider alternative approaches for evaluating sampling plans. For each of five lots from which E. coli O157 had been detected, 900 samples from the external carcass surface were tested. E. coli O157 was not detected in three lots, whereas in two lots E. coli O157 was detected in 2 and 74 samples. For lots in which E. coli O157 was not detected in the present study, the E. coli O157 level was estimated to be <12 cells per 27.2-kg carton. For the most contaminated carton, the total number of E. coli O157 cells was estimated to be 813. In the two lots in which E. coli O157 was detected, the pathogen was detected in 1 of 12 and 2 of 12 cartons. The use of acceptance sampling plans based on a binomial distribution can provide a falsely optimistic view of the value of sampling as a control measure when applied to assessment of E. coli O157 contamination in manufacturing beef. Alternative approaches to understanding sampling plans, which do not assume homogeneous contamination throughout the lot, appear more realistic. These results indicate that despite the application of stringent sampling plans, sampling and testing approaches are inefficient for controlling microbiological quality.
GENE-Counter: A Computational Pipeline for the Analysis of RNA-Seq Data for Gene Expression Differences

PubMed Central

Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.

2011-01-01

GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647
Are star formation rates of galaxies bimodal?

NASA Astrophysics Data System (ADS)

Feldmann, Robert

2017-09-01

Star formation rate (SFR) distributions of galaxies are often assumed to be bimodal with modes corresponding to star-forming and quiescent galaxies, respectively. Both classes of galaxies are typically studied separately, and SFR distributions of star-forming galaxies are commonly modelled as lognormals. Using both observational data and results from numerical simulations, I argue that this division into star-forming and quiescent galaxies is unnecessary from a theoretical point of view and that the SFR distributions of the whole population can be well fitted by zero-inflated negative binomial distributions. This family of distributions has three parameters that determine the average SFR of the galaxies in the sample, the scatter relative to the star-forming sequence and the fraction of galaxies with zero SFRs, respectively. The proposed distributions naturally account for (I) the discrete nature of star formation, (II) the presence of 'dead' galaxies with zero SFRs and (III) asymmetric scatter. Excluding 'dead' galaxies, the distribution of log SFR is unimodal with a peak at the star-forming sequence and an extended tail towards low SFRs. However, uncertainties and biases in the SFR measurements can create the appearance of a bimodal distribution.
Abstract knowledge versus direct experience in processing of binomial expressions

PubMed Central

Morgan, Emily; Levy, Roger

2016-01-01

We ask whether word order preferences for binomial expressions of the form A and B (e.g. bread and butter) are driven by abstract linguistic knowledge of ordering constraints referencing the semantic, phonological, and lexical properties of the constituent words, or by prior direct experience with the specific items in questions. Using forced-choice and self-paced reading tasks, we demonstrate that online processing of never-before-seen binomials is influenced by abstract knowledge of ordering constraints, which we estimate with a probabilistic model. In contrast, online processing of highly frequent binomials is primarily driven by direct experience, which we estimate from corpus frequency counts. We propose a trade-off wherein processing of novel expressions relies upon abstract knowledge, while reliance upon direct experience increases with increased exposure to an expression. Our findings support theories of language processing in which both compositional generation and direct, holistic reuse of multi-word expressions play crucial roles. PMID:27776281
On the robustness of the q-Gaussian family

NASA Astrophysics Data System (ADS)

Sicuro, Gabriele; Tempesta, Piergiulio; Rodríguez, Antonio; Tsallis, Constantino

2015-12-01

We introduce three deformations, called α-, β- and γ-deformation respectively, of a N-body probabilistic model, first proposed by Rodríguez et al. (2008), having q-Gaussians as N → ∞ limiting probability distributions. The proposed α- and β-deformations are asymptotically scale-invariant, whereas the γ-deformation is not. We prove that, for both α- and β-deformations, the resulting deformed triangles still have q-Gaussians as limiting distributions, with a value of q independent (dependent) on the deformation parameter in the α-case (β-case). In contrast, the γ-case, where we have used the celebrated Q-numbers and the Gauss binomial coefficients, yields other limiting probability distribution functions, outside the q-Gaussian family. These results suggest that scale-invariance might play an important role regarding the robustness of the q-Gaussian family.
Remote sensing of earth terrain

NASA Technical Reports Server (NTRS)

Kong, J. A.

1988-01-01

Two monographs and 85 journal and conference papers on remote sensing of earth terrain have been published, sponsored by NASA Contract NAG5-270. A multivariate K-distribution is proposed to model the statistics of fully polarimetric data from earth terrain with polarizations HH, HV, VH, and VV. In this approach, correlated polarizations of radar signals, as characterized by a covariance matrix, are treated as the sum of N n-dimensional random vectors; N obeys the negative binomial distribution with a parameter alpha and mean bar N. Subsequently, and n-dimensional K-distribution, with either zero or non-zero mean, is developed in the limit of infinite bar N or illuminated area. The probability density function (PDF) of the K-distributed vector normalized by its Euclidean norm is independent of the parameter alpha and is the same as that derived from a zero-mean Gaussian-distributed random vector. The above model is well supported by experimental data provided by MIT Lincoln Laboratory and the Jet Propulsion Laboratory in the form of polarimetric measurements.
Self-affirmation model for football goal distributions

NASA Astrophysics Data System (ADS)

Bittner, E.; Nußbaumer, A.; Janke, W.; Weigel, M.

2007-06-01

Analyzing football score data with statistical techniques, we investigate how the highly co-operative nature of the game is reflected in averaged properties such as the distributions of scored goals for the home and away teams. It turns out that in particular the tails of the distributions are not well described by independent Bernoulli trials, but rather well modeled by negative binomial or generalized extreme value distributions. To understand this behavior from first principles, we suggest to modify the Bernoulli random process to include a simple component of self-affirmation which seems to describe the data surprisingly well and allows to interpret the observed deviation from Gaussian statistics. The phenomenological distributions used before can be understood as special cases within this framework. We analyzed historical football score data from many leagues in Europe as well as from international tournaments and found the proposed models to be applicable rather universally. In particular, here we compare men's and women's leagues and the separate German leagues during the cold war times and find some remarkable differences.
On extinction time of a generalized endemic chain-binomial model.

PubMed

Aydogmus, Ozgur

2016-09-01

We considered a chain-binomial epidemic model not conferring immunity after infection. Mean field dynamics of the model has been analyzed and conditions for the existence of a stable endemic equilibrium are determined. The behavior of the chain-binomial process is probabilistically linked to the mean field equation. As a result of this link, we were able to show that the mean extinction time of the epidemic increases at least exponentially as the population size grows. We also present simulation results for the process to validate our analytical findings. Copyright © 2016 Elsevier Inc. All rights reserved.
Solar San Diego: The Impact of Binomial Rate Structures on Real PV Systems; Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

VanGeet, O.; Brown, E.; Blair, T.

2008-05-01

There is confusion in the marketplace regarding the impact of solar photovoltaics (PV) on the user's actual electricity bill under California Net Energy Metering, particularly with binomial tariffs (those that include both demand and energy charges) and time-of-use (TOU) rate structures. The City of San Diego has extensive real-time electrical metering on most of its buildings and PV systems, with interval data for overall consumption and PV electrical production available for multiple years. This paper uses 2007 PV-system data from two city facilities to illustrate the impacts of binomial rate designs. The analysis will determine the energy and demand savingsmore » that the PV systems are achieving relative to the absence of systems. A financial analysis of PV-system performance under various rate structures is presented. The data revealed that actual demand and energy use benefits of binomial tariffs increase in summer months, when solar resources allow for maximized electricity production. In a binomial tariff system, varying on- and semi-peak times can result in approximately $1,100 change in demand charges per month over not having a PV system in place, an approximate 30% cost savings. The PV systems are also shown to have a 30%-50% reduction in facility energy charges in 2007.« less
Time analysis of volcanic activity on Io by means of plasma observations

NASA Technical Reports Server (NTRS)

Mekler, Y.; Eviatar, A.

1980-01-01

A model of Io volcanism in which the probability of activity obeys a binomial distribution is presented. Observed values of the electron density obtained over a 3-year period by ground-based spectroscopy are fitted to such a distribution. The best fit is found for a total number of 15 volcanoes with a probability of individual activity at any time of 0.143. The Pioneer 10 ultraviolet observations are reinterpreted as emissions of sulfur and oxygen ions and are found to be consistent with a plasma much less dense than that observed by the Voyager spacecraft. Late 1978 and the first half of 1979 are shown to be periods of anomalous volcanicity. Rapid variations in electron density are related to enhanced radial diffusion.
ATM Quality of Service Tests for Digitized Video Using ATM Over Satellite: Laboratory Tests

NASA Technical Reports Server (NTRS)

Ivancic, William D.; Brooks, David E.; Frantz, Brian D.

1997-01-01

A digitized video application was used to help determine minimum quality of service parameters for asynchronous transfer mode (ATM) over satellite. For these tests, binomially distributed and other errors were digitally inserted in an intermediate frequency link via a satellite modem and a commercial gaussian noise generator. In this paper, the relation- ship between the ATM cell error and cell loss parameter specifications is discussed with regard to this application. In addition, the video-encoding algorithms, test configurations, and results are presented in detail.
Population Characteristics and the Nature of Egg Shells of two Phthirapteran Species Parasitizing Indian Cattle Egrets

PubMed Central

Ahmad, Aftab; Khan, Vikram; Badola, Smita; Arya, Gaurav; Bansal, Nayanci; Saxena, A. K.

2010-01-01

The prevalence, intensities of infestation, range of infestation and population composition of two phthirapteran species, Ardeicola expallidus Blagoveshtchensky (Phthiraptera: Philopteridae) and Ciconiphilus decimfasciatus Boisduval and Lacordaire (Menoponidae) on seventy cattle egrets were recorded during August 2004 to March 2005, in India. The frequency distribution patterns of both the species were skewed but did not correspond to the negative binomial model. The oviposition sites, egg laying patterns and the nature of the eggs of the two species were markedly different. PMID:21067416
Inference for binomial probability based on dependent Bernoulli random variables with applications to meta-analysis and group level studies.

PubMed

Bakbergenuly, Ilyas; Kulinskaya, Elena; Morgenthaler, Stephan

2016-07-01

We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group-level studies or in meta-analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log-odds and arcsine transformations of the estimated probability p̂, both for single-group studies and in combining results from several groups or studies in meta-analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta-analysis and result in abysmal coverage of the combined effect for large K. We also propose bias-correction for the arcsine transformation. Our simulations demonstrate that this bias-correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta-analyses of prevalence. © 2016 The Authors. Biometrical Journal Published by Wiley-VCH Verlag GmbH & Co. KGaA.
Spatiotemporal hurdle models for zero-inflated count data: Exploring trends in emergency department visits.

PubMed

Neelon, Brian; Chang, Howard H; Ling, Qiang; Hastings, Nicole S

2016-12-01

Motivated by a study exploring spatiotemporal trends in emergency department use, we develop a class of two-part hurdle models for the analysis of zero-inflated areal count data. The models consist of two components-one for the probability of any emergency department use and one for the number of emergency department visits given use. Through a hierarchical structure, the models incorporate both patient- and region-level predictors, as well as spatially and temporally correlated random effects for each model component. The random effects are assigned multivariate conditionally autoregressive priors, which induce dependence between the components and provide spatial and temporal smoothing across adjacent spatial units and time periods, resulting in improved inferences. To accommodate potential overdispersion, we consider a range of parametric specifications for the positive counts, including truncated negative binomial and generalized Poisson distributions. We adopt a Bayesian inferential approach, and posterior computation is handled conveniently within standard Bayesian software. Our results indicate that the negative binomial and generalized Poisson hurdle models vastly outperform the Poisson hurdle model, demonstrating that overdispersed hurdle models provide a useful approach to analyzing zero-inflated spatiotemporal data. © The Author(s) 2014.
Emperical Tests of Acceptance Sampling Plans

NASA Technical Reports Server (NTRS)

White, K. Preston, Jr.; Johnson, Kenneth L.

2012-01-01

Acceptance sampling is a quality control procedure applied as an alternative to 100% inspection. A random sample of items is drawn from a lot to determine the fraction of items which have a required quality characteristic. Both the number of items to be inspected and the criterion for determining conformance of the lot to the requirement are given by an appropriate sampling plan with specified risks of Type I and Type II sampling errors. In this paper, we present the results of empirical tests of the accuracy of selected sampling plans reported in the literature. These plans are for measureable quality characteristics which are known have either binomial, exponential, normal, gamma, Weibull, inverse Gaussian, or Poisson distributions. In the main, results support the accepted wisdom that variables acceptance plans are superior to attributes (binomial) acceptance plans, in the sense that these provide comparable protection against risks at reduced sampling cost. For the Gaussian and Weibull plans, however, there are ranges of the shape parameters for which the required sample sizes are in fact larger than the corresponding attributes plans, dramatically so for instances of large skew. Tests further confirm that the published inverse-Gaussian (IG) plan is flawed, as reported by White and Johnson (2011).
Comparison and Field Validation of Binomial Sampling Plans for Oligonychus perseae (Acari: Tetranychidae) on Hass Avocado in Southern California.

PubMed

Lara, Jesus R; Hoddle, Mark S

2015-08-01

Oligonychus perseae Tuttle, Baker, & Abatiello is a foliar pest of 'Hass' avocados [Persea americana Miller (Lauraceae)]. The recommended action threshold is 50-100 motile mites per leaf, but this count range and other ecological factors associated with O. perseae infestations limit the application of enumerative sampling plans in the field. Consequently, a comprehensive modeling approach was implemented to compare the practical application of various binomial sampling models for decision-making of O. perseae in California. An initial set of sequential binomial sampling models were developed using three mean-proportion modeling techniques (i.e., Taylor's power law, maximum likelihood, and an empirical model) in combination with two-leaf infestation tally thresholds of either one or two mites. Model performance was evaluated using a robust mite count database consisting of >20,000 Hass avocado leaves infested with varying densities of O. perseae and collected from multiple locations. Operating characteristic and average sample number results for sequential binomial models were used as the basis to develop and validate a standardized fixed-size binomial sampling model with guidelines on sample tree and leaf selection within blocks of avocado trees. This final validated model requires a leaf sampling cost of 30 leaves and takes into account the spatial dynamics of O. perseae to make reliable mite density classifications for a 50-mite action threshold. Recommendations for implementing this fixed-size binomial sampling plan to assess densities of O. perseae in commercial California avocado orchards are discussed. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Using the Binomial Series to Prove the Arithmetic Mean-Geometric Mean Inequality

ERIC Educational Resources Information Center

Persky, Ronald L.

2003-01-01

In 1968, Leon Gerber compared (1 + x)[superscript a] to its kth partial sum as a binomial series. His result is stated and, as an application of this result, a proof of the arithmetic mean-geometric mean inequality is presented.
Four Bootstrap Confidence Intervals for the Binomial-Error Model.

ERIC Educational Resources Information Center

Lin, Miao-Hsiang; Hsiung, Chao A.

1992-01-01

Four bootstrap methods are identified for constructing confidence intervals for the binomial-error model. The extent to which similar results are obtained and the theoretical foundation of each method and its relevance and ranges of modeling the true score uncertainty are discussed. (SLD)
Possibility and Challenges of Conversion of Current Virus Species Names to Linnaean Binomials.

PubMed

Postler, Thomas S; Clawson, Anna N; Amarasinghe, Gaya K; Basler, Christopher F; Bavari, Sbina; Benko, Mária; Blasdell, Kim R; Briese, Thomas; Buchmeier, Michael J; Bukreyev, Alexander; Calisher, Charles H; Chandran, Kartik; Charrel, Rémi; Clegg, Christopher S; Collins, Peter L; Juan Carlos, De La Torre; Derisi, Joseph L; Dietzgen, Ralf G; Dolnik, Olga; Dürrwald, Ralf; Dye, John M; Easton, Andrew J; Emonet, Sébastian; Formenty, Pierre; Fouchier, Ron A M; Ghedin, Elodie; Gonzalez, Jean-Paul; Harrach, Balázs; Hewson, Roger; Horie, Masayuki; Jiang, Dàohóng; Kobinger, Gary; Kondo, Hideki; Kropinski, Andrew M; Krupovic, Mart; Kurath, Gael; Lamb, Robert A; Leroy, Eric M; Lukashevich, Igor S; Maisner, Andrea; Mushegian, Arcady R; Netesov, Sergey V; Nowotny, Norbert; Patterson, Jean L; Payne, Susan L; PaWeska, Janusz T; Peters, Clarence J; Radoshitzky, Sheli R; Rima, Bertus K; Romanowski, Victor; Rubbenstroth, Dennis; Sabanadzovic, Sead; Sanfaçon, Hélène; Salvato, Maria S; Schwemmle, Martin; Smither, Sophie J; Stenglein, Mark D; Stone, David M; Takada, Ayato; Tesh, Robert B; Tomonaga, Keizo; Tordo, Noël; Towner, Jonathan S; Vasilakis, Nikos; Volchkov, Viktor E; Wahl-Jensen, Victoria; Walker, Peter J; Wang, Lin-Fa; Varsani, Arvind; Whitfield, Anna E; Zerbini, F Murilo; Kuhn, Jens H

2017-05-01

Botanical, mycological, zoological, and prokaryotic species names follow the Linnaean format, consisting of an italicized Latinized binomen with a capitalized genus name and a lower case species epithet (e.g., Homo sapiens). Virus species names, however, do not follow a uniform format, and, even when binomial, are not Linnaean in style. In this thought exercise, we attempted to convert all currently official names of species included in the virus family Arenaviridae and the virus order Mononegavirales to Linnaean binomials, and to identify and address associated challenges and concerns. Surprisingly, this endeavor was not as complicated or time-consuming as even the authors of this article expected when conceiving the experiment. [Arenaviridae; binomials; ICTV; International Committee on Taxonomy of Viruses; Mononegavirales; virus nomenclature; virus taxonomy.]. Published by Oxford University Press on behalf of Society of Systematic Biologists 2016. This work is written by a US Government employee and is in the public domain in the US.

The arcsine is asinine: the analysis of proportions in ecology.

PubMed

Warton, David I; Hui, Francis K C

2011-01-01

The arcsine square root transformation has long been standard procedure when analyzing proportional data in ecology, with applications in data sets containing binomial and non-binomial response variables. Here, we argue that the arcsine transform should not be used in either circumstance. For binomial data, logistic regression has greater interpretability and higher power than analyses of transformed data. However, it is important to check the data for additional unexplained variation, i.e., overdispersion, and to account for it via the inclusion of random effects in the model if found. For non-binomial data, the arcsine transform is undesirable on the grounds of interpretability, and because it can produce nonsensical predictions. The logit transformation is proposed as an alternative approach to address these issues. Examples are presented in both cases to illustrate these advantages, comparing various methods of analyzing proportions including untransformed, arcsine- and logit-transformed linear models and logistic regression (with or without random effects). Simulations demonstrate that logistic regression usually provides a gain in power over other methods.
On Models for Binomial Data with Random Numbers of Trials

PubMed Central

Comulada, W. Scott; Weiss, Robert E.

2010-01-01

Summary A binomial outcome is a count s of the number of successes out of the total number of independent trials n = s + f, where f is a count of the failures. The n are random variables not fixed by design in many studies. Joint modeling of (s, f) can provide additional insight into the science and into the probability π of success that cannot be directly incorporated by the logistic regression model. Observations where n = 0 are excluded from the binomial analysis yet may be important to understanding how π is influenced by covariates. Correlation between s and f may exist and be of direct interest. We propose Bayesian multivariate Poisson models for the bivariate response (s, f), correlated through random effects. We extend our models to the analysis of longitudinal and multivariate longitudinal binomial outcomes. Our methodology was motivated by two disparate examples, one from teratology and one from an HIV tertiary intervention study. PMID:17688514
Spiritual and ceremonial plants in North America: an assessment of Moerman's ethnobotanical database comparing Residual, Binomial, Bayesian and Imprecise Dirichlet Model (IDM) analysis.

PubMed

Turi, Christina E; Murch, Susan J

2013-07-09

Ethnobotanical research and the study of plants used for rituals, ceremonies and to connect with the spirit world have led to the discovery of many novel psychoactive compounds such as nicotine, caffeine, and cocaine. In North America, spiritual and ceremonial uses of plants are well documented and can be accessed online via the University of Michigan's Native American Ethnobotany Database. The objective of the study was to compare Residual, Bayesian, Binomial and Imprecise Dirichlet Model (IDM) analyses of ritual, ceremonial and spiritual plants in Moerman's ethnobotanical database and to identify genera that may be good candidates for the discovery of novel psychoactive compounds. The database was queried with the following format "Family Name AND Ceremonial OR Spiritual" for 263 North American botanical families. Spiritual and ceremonial flora consisted of 86 families with 517 species belonging to 292 genera. Spiritual taxa were then grouped further into ceremonial medicines and items categories. Residual, Bayesian, Binomial and IDM analysis were performed to identify over and under-utilized families. The 4 statistical approaches were in good agreement when identifying under-utilized families but large families (>393 species) were underemphasized by Binomial, Bayesian and IDM approaches for over-utilization. Residual, Binomial, and IDM analysis identified similar families as over-utilized in the medium (92-392 species) and small (<92 species) classes. The families Apiaceae, Asteraceae, Ericacea, Pinaceae and Salicaceae were identified as significantly over-utilized as ceremonial medicines in medium and large sized families. Analysis of genera within the Apiaceae and Asteraceae suggest that the genus Ligusticum and Artemisia are good candidates for facilitating the discovery of novel psychoactive compounds. The 4 statistical approaches were not consistent in the selection of over-utilization of flora. Residual analysis revealed overall trends that were supported by Binomial analysis when separated into small, medium and large families. The Bayesian, Binomial and IDM approaches identified different genera as potentially important. Species belonging to the genus Artemisia and Ligusticum were most consistently identified and may be valuable in future studies of the ethnopharmacology. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Single particle momentum and angular distributions in hadron-hadron collisions at ultrahigh energies

NASA Technical Reports Server (NTRS)

Chou, T. T.; Chen, N. Y.

1985-01-01

The forward-backward charged multiplicity distribution (P n sub F, n sub B) of events in the 540 GeV antiproton-proton collider has been extensively studied by the UA5 Collaboration. It was pointed out that the distribution with respect to n = n sub F + n sub B satisfies approximate KNO scaling and that with respect to Z = n sub F - n sub B is binomial. The geometrical model of hadron-hadron collision interprets the large multiplicity fluctuation as due to the widely different nature of collisions at different impact parameters b. For a single impact parameter b, the collision in the geometrical model should exhibit stochastic behavior. This separation of the stochastic and nonstochastic (KNO) aspects of multiparticle production processes gives conceptually a lucid and attractive picture of such collisions, leading to the concept of partition temperature T sub p and the single particle momentum spectrum to be discussed in detail.
The Rainbow Spectrum of RNA Secondary Structures.

PubMed

Li, Thomas J X; Reidys, Christian M

2018-06-01

In this paper, we analyze the length spectrum of rainbows in RNA secondary structures. A rainbow in a secondary structure is a maximal arc with respect to the partial order induced by nesting. We show that there is a significant gap in this length spectrum. We shall prove that there asymptotically almost surely exists a unique longest rainbow of length at least [Formula: see text] and that with high probability any other rainbow has finite length. We show that the distribution of the length of the longest rainbow converges to a discrete limit law and that, for finite k, the distribution of rainbows of length k becomes for large n a negative binomial distribution. We then put the results of this paper into context, comparing the analytical results with those observed in RNA minimum free energy structures, biological RNA structures and relate our findings to the sparsification of folding algorithms.
Understanding poisson regression.

PubMed

Hayat, Matthew J; Higgins, Melinda

2014-04-01

Nurse investigators often collect study data in the form of counts. Traditional methods of data analysis have historically approached analysis of count data either as if the count data were continuous and normally distributed or with dichotomization of the counts into the categories of occurred or did not occur. These outdated methods for analyzing count data have been replaced with more appropriate statistical methods that make use of the Poisson probability distribution, which is useful for analyzing count data. The purpose of this article is to provide an overview of the Poisson distribution and its use in Poisson regression. Assumption violations for the standard Poisson regression model are addressed with alternative approaches, including addition of an overdispersion parameter or negative binomial regression. An illustrative example is presented with an application from the ENSPIRE study, and regression modeling of comorbidity data is included for illustrative purposes. Copyright 2014, SLACK Incorporated.
How to retrieve additional information from the multiplicity distributions

NASA Astrophysics Data System (ADS)

Wilk, Grzegorz; Włodarczyk, Zbigniew

2017-01-01

Multiplicity distributions (MDs) P(N) measured in multiparticle production processes are most frequently described by the negative binomial distribution (NBD). However, with increasing collision energy some systematic discrepancies have become more and more apparent. They are usually attributed to the possible multi-source structure of the production process and described using a multi-NBD form of the MD. We investigate the possibility of keeping a single NBD but with its parameters depending on the multiplicity N. This is done by modifying the widely known clan model of particle production leading to the NBD form of P(N). This is then confronted with the approach based on the so-called cascade-stochastic formalism which is based on different types of recurrence relations defining P(N). We demonstrate that a combination of both approaches allows the retrieval of additional valuable information from the MDs, namely the oscillatory behavior of the counting statistics apparently visible in the high energy data.
Selecting Tools to Model Integer and Binomial Multiplication

ERIC Educational Resources Information Center

Pratt, Sarah Smitherman; Eddy, Colleen M.

2017-01-01

Mathematics teachers frequently provide concrete manipulatives to students during instruction; however, the rationale for using certain manipulatives in conjunction with concepts may not be explored. This article focuses on area models that are currently used in classrooms to provide concrete examples of integer and binomial multiplication. The…
On mobile wireless ad hoc IP video transports

NASA Astrophysics Data System (ADS)

Kazantzidis, Matheos

2006-05-01

Multimedia transports in wireless, ad-hoc, multi-hop or mobile networks must be capable of obtaining information about the network and adaptively tune sending and encoding parameters to the network response. Obtaining meaningful metrics to guide a stable congestion control mechanism in the transport (i.e. passive, simple, end-to-end and network technology independent) is a complex problem. Equally difficult is obtaining a reliable QoS metrics that agrees with user perception in a client/server or distributed environment. Existing metrics, objective or subjective, are commonly used after or before to test or report on a transmission and require access to both original and transmitted frames. In this paper, we propose that an efficient and successful video delivery and the optimization of overall network QoS requires innovation in a) a direct measurement of available and bottleneck capacity for its congestion control and b) a meaningful subjective QoS metric that is dynamically reported to video sender. Once these are in place, a binomial -stable, fair and TCP friendly- algorithm can be used to determine the sending rate and other packet video parameters. An adaptive mpeg codec can then continually test and fit its parameters and temporal-spatial data-error control balance using the perceived QoS dynamic feedback. We suggest a new measurement based on a packet dispersion technique that is independent of underlying network mechanisms. We then present a binomial control based on direct measurements. We implement a QoS metric that is known to agree with user perception (MPQM) in a client/server, distributed environment by using predetermined table lookups and characterization of video content.
Photon Counting Data Analysis: Application of the Maximum Likelihood and Related Methods for the Determination of Lifetimes in Mixtures of Rose Bengal and Rhodamine B

DOE PAGES

Santra, Kalyan; Smith, Emily A.; Petrich, Jacob W.; ...

2016-12-12

It is often convenient to know the minimum amount of data needed in order to obtain a result of desired accuracy and precision. It is a necessity in the case of subdiffraction-limited microscopies, such as stimulated emission depletion (STED) microscopy, owing to the limited sample volumes and the extreme sensitivity of the samples to photobleaching and photodamage. We present a detailed comparison of probability-based techniques (the maximum likelihood method and methods based on the binomial and the Poisson distributions) with residual minimization-based techniques for retrieving the fluorescence decay parameters for various two-fluorophore mixtures, as a function of the total numbermore » of photon counts, in time-correlated, single-photon counting experiments. The probability-based techniques proved to be the most robust (insensitive to initial values) in retrieving the target parameters and, in fact, performed equivalently to 2-3 significant figures. This is to be expected, as we demonstrate that the three methods are fundamentally related. Furthermore, methods based on the Poisson and binomial distributions have the desirable feature of providing a bin-by-bin analysis of a single fluorescence decay trace, which thus permits statistics to be acquired using only the one trace for not only the mean and median values of the fluorescence decay parameters but also for the associated standard deviations. Lastly, these probability-based methods lend themselves well to the analysis of the sparse data sets that are encountered in subdiffraction-limited microscopies.« less
Association between month of birth and melanoma risk: fact or fiction?

PubMed

Fiessler, Cornelia; Pfahlberg, Annette B; Keller, Andrea K; Radespiel-Tröger, Martin; Uter, Wolfgang; Gefeller, Olaf

2017-04-01

Evidence on the effect of ultraviolet radiation (UVR) exposure in infancy on melanoma risk in later life is scarce. Three recent studies suggest that people born in spring carry a higher melanoma risk. Our study aimed at verifying whether such a seasonal pattern of melanoma risk actually exists. Data from the population-based Cancer Registry Bavaria (CRB) on the birth months of 28 374 incident melanoma cases between 2002 and 2012 were analysed and compared with data from the Bavarian State Office for Statistics and Data Processing on the birth month distribution in the Bavarian population. Crude and adjusted analyses using negative binomial regression models were performed in the total study group and supplemented by several subgroup analyses. In the crude analysis, the birth months March-May were over-represented among melanoma cases. Negative binomial regression models adjusted only for sex and birth year revealed a seasonal association between melanoma risk and birth month with 13-21% higher relative incidence rates for March, April and May compared with the reference December. However, after additionally adjusting for the birth month distribution of the Bavarian population, these risk estimates decreased markedly and no association with the birth month was observed any more. Similar results emerged in all subgroup analyses. Our large registry-based study provides no evidence that people born in spring carry a higher risk for developing melanoma in later life and thus lends no support to the hypothesis of higher UVR susceptibility during the first months of life. © The Author 2016; all rights reserved. Published by Oxford University Press on behalf of the International Epidemiological Association
Generalized seasonal autoregressive integrated moving average models for count data with application to malaria time series with low case numbers.

PubMed

Briët, Olivier J T; Amerasinghe, Priyanie H; Vounatsou, Penelope

2013-01-01

With the renewed drive towards malaria elimination, there is a need for improved surveillance tools. While time series analysis is an important tool for surveillance, prediction and for measuring interventions' impact, approximations by commonly used Gaussian methods are prone to inaccuracies when case counts are low. Therefore, statistical methods appropriate for count data are required, especially during "consolidation" and "pre-elimination" phases. Generalized autoregressive moving average (GARMA) models were extended to generalized seasonal autoregressive integrated moving average (GSARIMA) models for parsimonious observation-driven modelling of non Gaussian, non stationary and/or seasonal time series of count data. The models were applied to monthly malaria case time series in a district in Sri Lanka, where malaria has decreased dramatically in recent years. The malaria series showed long-term changes in the mean, unstable variance and seasonality. After fitting negative-binomial Bayesian models, both a GSARIMA and a GARIMA deterministic seasonality model were selected based on different criteria. Posterior predictive distributions indicated that negative-binomial models provided better predictions than Gaussian models, especially when counts were low. The G(S)ARIMA models were able to capture the autocorrelation in the series. G(S)ARIMA models may be particularly useful in the drive towards malaria elimination, since episode count series are often seasonal and non-stationary, especially when control is increased. Although building and fitting GSARIMA models is laborious, they may provide more realistic prediction distributions than do Gaussian methods and may be more suitable when counts are low.
Maximizing Statistical Power When Verifying Probabilistic Forecasts of Hydrometeorological Events

NASA Astrophysics Data System (ADS)

DeChant, C. M.; Moradkhani, H.

2014-12-01

Hydrometeorological events (i.e. floods, droughts, precipitation) are increasingly being forecasted probabilistically, owing to the uncertainties in the underlying causes of the phenomenon. In these forecasts, the probability of the event, over some lead time, is estimated based on some model simulations or predictive indicators. By issuing probabilistic forecasts, agencies may communicate the uncertainty in the event occurring. Assuming that the assigned probability of the event is correct, which is referred to as a reliable forecast, the end user may perform some risk management based on the potential damages resulting from the event. Alternatively, an unreliable forecast may give false impressions of the actual risk, leading to improper decision making when protecting resources from extreme events. Due to this requisite for reliable forecasts to perform effective risk management, this study takes a renewed look at reliability assessment in event forecasts. Illustrative experiments will be presented, showing deficiencies in the commonly available approaches (Brier Score, Reliability Diagram). Overall, it is shown that the conventional reliability assessment techniques do not maximize the ability to distinguish between a reliable and unreliable forecast. In this regard, a theoretical formulation of the probabilistic event forecast verification framework will be presented. From this analysis, hypothesis testing with the Poisson-Binomial distribution is the most exact model available for the verification framework, and therefore maximizes one's ability to distinguish between a reliable and unreliable forecast. Application of this verification system was also examined within a real forecasting case study, highlighting the additional statistical power provided with the use of the Poisson-Binomial distribution.
Generalized Seasonal Autoregressive Integrated Moving Average Models for Count Data with Application to Malaria Time Series with Low Case Numbers

PubMed Central

Briët, Olivier J. T.; Amerasinghe, Priyanie H.; Vounatsou, Penelope

2013-01-01

Introduction With the renewed drive towards malaria elimination, there is a need for improved surveillance tools. While time series analysis is an important tool for surveillance, prediction and for measuring interventions’ impact, approximations by commonly used Gaussian methods are prone to inaccuracies when case counts are low. Therefore, statistical methods appropriate for count data are required, especially during “consolidation” and “pre-elimination” phases. Methods Generalized autoregressive moving average (GARMA) models were extended to generalized seasonal autoregressive integrated moving average (GSARIMA) models for parsimonious observation-driven modelling of non Gaussian, non stationary and/or seasonal time series of count data. The models were applied to monthly malaria case time series in a district in Sri Lanka, where malaria has decreased dramatically in recent years. Results The malaria series showed long-term changes in the mean, unstable variance and seasonality. After fitting negative-binomial Bayesian models, both a GSARIMA and a GARIMA deterministic seasonality model were selected based on different criteria. Posterior predictive distributions indicated that negative-binomial models provided better predictions than Gaussian models, especially when counts were low. The G(S)ARIMA models were able to capture the autocorrelation in the series. Conclusions G(S)ARIMA models may be particularly useful in the drive towards malaria elimination, since episode count series are often seasonal and non-stationary, especially when control is increased. Although building and fitting GSARIMA models is laborious, they may provide more realistic prediction distributions than do Gaussian methods and may be more suitable when counts are low. PMID:23785448
[Application of negative binomial regression and modified Poisson regression in the research of risk factors for injury frequency].

PubMed

Cao, Qingqing; Wu, Zhenqiang; Sun, Ying; Wang, Tiezhu; Han, Tengwei; Gu, Chaomei; Sun, Yehuan

2011-11-01

To Eexplore the application of negative binomial regression and modified Poisson regression analysis in analyzing the influential factors for injury frequency and the risk factors leading to the increase of injury frequency. 2917 primary and secondary school students were selected from Hefei by cluster random sampling method and surveyed by questionnaire. The data on the count event-based injuries used to fitted modified Poisson regression and negative binomial regression model. The risk factors incurring the increase of unintentional injury frequency for juvenile students was explored, so as to probe the efficiency of these two models in studying the influential factors for injury frequency. The Poisson model existed over-dispersion (P < 0.0001) based on testing by the Lagrangemultiplier. Therefore, the over-dispersion dispersed data using a modified Poisson regression and negative binomial regression model, was fitted better. respectively. Both showed that male gender, younger age, father working outside of the hometown, the level of the guardian being above junior high school and smoking might be the results of higher injury frequencies. On a tendency of clustered frequency data on injury event, both the modified Poisson regression analysis and negative binomial regression analysis can be used. However, based on our data, the modified Poisson regression fitted better and this model could give a more accurate interpretation of relevant factors affecting the frequency of injury.
Patterns of medicinal plant use: an examination of the Ecuadorian Shuar medicinal flora using contingency table and binomial analyses.

PubMed

Bennett, Bradley C; Husby, Chad E

2008-03-28

Botanical pharmacopoeias are non-random subsets of floras, with some taxonomic groups over- or under-represented. Moerman [Moerman, D.E., 1979. Symbols and selectivity: a statistical analysis of Native American medical ethnobotany, Journal of Ethnopharmacology 1, 111-119] introduced linear regression/residual analysis to examine these patterns. However, regression, the commonly-employed analysis, suffers from several statistical flaws. We use contingency table and binomial analyses to examine patterns of Shuar medicinal plant use (from Amazonian Ecuador). We first analyzed the Shuar data using Moerman's approach, modified to better meet requirements of linear regression analysis. Second, we assessed the exact randomization contingency table test for goodness of fit. Third, we developed a binomial model to test for non-random selection of plants in individual families. Modified regression models (which accommodated assumptions of linear regression) reduced R(2) to from 0.59 to 0.38, but did not eliminate all problems associated with regression analyses. Contingency table analyses revealed that the entire flora departs from the null model of equal proportions of medicinal plants in all families. In the binomial analysis, only 10 angiosperm families (of 115) differed significantly from the null model. These 10 families are largely responsible for patterns seen at higher taxonomic levels. Contingency table and binomial analyses offer an easy and statistically valid alternative to the regression approach.
General solution of the chemical master equation and modality of marginal distributions for hierarchic first-order reaction networks.

PubMed

Reis, Matthias; Kromer, Justus A; Klipp, Edda

2018-01-20

Multimodality is a phenomenon which complicates the analysis of statistical data based exclusively on mean and variance. Here, we present criteria for multimodality in hierarchic first-order reaction networks, consisting of catalytic and splitting reactions. Those networks are characterized by independent and dependent subnetworks. First, we prove the general solvability of the Chemical Master Equation (CME) for this type of reaction network and thereby extend the class of solvable CME's. Our general solution is analytical in the sense that it allows for a detailed analysis of its statistical properties. Given Poisson/deterministic initial conditions, we then prove the independent species to be Poisson/binomially distributed, while the dependent species exhibit generalized Poisson/Khatri Type B distributions. Generalized Poisson/Khatri Type B distributions are multimodal for an appropriate choice of parameters. We illustrate our criteria for multimodality by several basic models, as well as the well-known two-stage transcription-translation network and Bateman's model from nuclear physics. For both examples, multimodality was previously not reported.
Multilevel Models for Binary Data

ERIC Educational Resources Information Center

Powers, Daniel A.

2012-01-01

The methods and models for categorical data analysis cover considerable ground, ranging from regression-type models for binary and binomial data, count data, to ordered and unordered polytomous variables, as well as regression models that mix qualitative and continuous data. This article focuses on methods for binary or binomial data, which are…
Speed congenics: accelerated genome recovery using genetic markers.

PubMed

Visscher, P M

1999-08-01

Genetic markers throughout the genome can be used to speed up 'recovery' of the recipient genome in the backcrossing phase of the construction of a congenic strain. The prediction of the genomic proportion during backcrossing depends on the assumptions regarding the distribution of chromosome segments, the population structure, the marker spacing and the selection strategy. In this study simulation was used to investigate the rate of recovery of the recipient genome for a mouse, Drosophila and Arabidopsis genome. It was shown that an incorrect assumption of a binomial distribution of chromosome segments, and failing to take account of a reduction in variance in genomic proportion due to selection, can lead to a downward bias of up to two generations in the estimation of the number of generations required for the formation of a congenic strain.
Epidemiological study of the intestinal helminths of wild boar (Sus scrofa) and mouflon (Ovis gmelini musimon) in central Italy.

PubMed

Magi, M; Bertani, M; Dell'Omodarme, M; Prati, M C

2002-12-01

Since 1995 the population of wild ungulates increased significantly in the "Parco provinciale dei Monti Livornesi" (Livorno, Tuscany, Central Italy). We studied the intestinal macroparasites of two hosts, the wild boar (Sus scrofa) and the mouflon (Ovis gmelini musimon). In the case of wild boars we found a dominant parasite species, Globocephalus urosubulatus. For this parasite the frequency distribution of the number of parasites per host agrees with a negative binomial distribution. There is not a significant correlation between the age of the animals and the parasitosis. Furthermore the mean parasite burden of male and female wild boars does not differ significantly. In the case of mouflons we found a dominant parasite species Nematodirus filicollis with Trichuris ovis as codominant species.

Bursts of Self-Conscious Emotions in the Daily Lives of Emerging Adults.

PubMed

Conroy, David E; Ram, Nilam; Pincus, Aaron L; Rebar, Amanda L

Self-conscious emotions play a role in regulating daily achievement strivings, social behavior, and health, but little is known about the processes underlying their daily manifestation. Emerging adults (n = 182) completed daily diaries for eight days and multilevel models were estimated to evaluate whether, how much, and why their emotions varied from day-to-day. Within-person variation in authentic pride was normally-distributed across people and days whereas the other emotions were burst-like and characterized by zero-inflated, negative binomial distributions. Perceiving social interactions as generally communal increased the odds of hubristic pride activation and reduced the odds of guilt activation; daily communal behavior reduced guilt intensity. Results illuminated processes through which meaning about the self-in-relation-to-others is constructed during a critical period of development.
Flood Frequency Analysis With Historical and Paleoflood Information

NASA Astrophysics Data System (ADS)

Stedinger, Jery R.; Cohn, Timothy A.

1986-05-01

An investigation is made of flood quantile estimators which can employ "historical" and paleoflood information in flood frequency analyses. Two categories of historical information are considered: "censored" data, where the magnitudes of historical flood peaks are known; and "binomial" data, where only threshold exceedance information is available. A Monte Carlo study employing the two-parameter lognormal distribution shows that maximum likelihood estimators (MLEs) can extract the equivalent of an additional 10-30 years of gage record from a 50-year period of historical observation. The MLE routines are shown to be substantially better than an adjusted-moment estimator similar to the one recommended in Bulletin 17B of the United States Water Resources Council Hydrology Committee (1982). The MLE methods performed well even when floods were drawn from other than the assumed lognormal distribution.
Possibility and Challenges of Conversion of Current Virus Species Names to Linnaean Binomials

PubMed Central

Postler, Thomas S.; Clawson, Anna N.; Amarasinghe, Gaya K.; Basler, Christopher F.; Bavari, Sbina; Benkő, Mária; Blasdell, Kim R.; Briese, Thomas; Buchmeier, Michael J.; Bukreyev, Alexander; Calisher, Charles H.; Chandran, Kartik; Charrel, Rémi; Clegg, Christopher S.; Collins, Peter L.; Juan Carlos, De La Torre; Derisi, Joseph L.; Dietzgen, Ralf G.; Dolnik, Olga; Dürrwald, Ralf; Dye, John M.; Easton, Andrew J.; Emonet, Sébastian; Formenty, Pierre; Fouchier, Ron A. M.; Ghedin, Elodie; Gonzalez, Jean-Paul; Harrach, Balázs; Hewson, Roger; Horie, Masayuki; Jiāng, Dàohóng; Kobinger, Gary; Kondo, Hideki; Kropinski, Andrew M.; Krupovic, Mart; Kurath, Gael; Lamb, Robert A.; Leroy, Eric M.; Lukashevich, Igor S.; Maisner, Andrea; Mushegian, Arcady R.; Netesov, Sergey V.; Nowotny, Norbert; Patterson, Jean L.; Payne, Susan L.; PaWeska, Janusz T.; Peters, Clarence J.; Radoshitzky, Sheli R.; Rima, Bertus K.; Romanowski, Victor; Rubbenstroth, Dennis; Sabanadzovic, Sead; Sanfaçon, Hélène; Salvato, Maria S.; Schwemmle, Martin; Smither, Sophie J.; Stenglein, Mark D.; Stone, David M.; Takada, Ayato; Tesh, Robert B.; Tomonaga, Keizo; Tordo, Noël; Towner, Jonathan S.; Vasilakis, Nikos; Volchkov, Viktor E.; Wahl-Jensen, Victoria; Walker, Peter J.; Wang, Lin-Fa; Varsani, Arvind; Whitfield, Anna E.; Zerbini, F. Murilo; Kuhn, Jens H.

2017-01-01

Abstract Botanical, mycological, zoological, and prokaryotic species names follow the Linnaean format, consisting of an italicized Latinized binomen with a capitalized genus name and a lower case species epithet (e.g., Homo sapiens). Virus species names, however, do not follow a uniform format, and, even when binomial, are not Linnaean in style. In this thought exercise, we attempted to convert all currently official names of species included in the virus family Arenaviridae and the virus order Mononegavirales to Linnaean binomials, and to identify and address associated challenges and concerns. Surprisingly, this endeavor was not as complicated or time-consuming as even the authors of this article expected when conceiving the experiment. PMID:27798405
Binomial Coefficients Modulo a Prime--A Visualization Approach to Undergraduate Research

ERIC Educational Resources Information Center

Bardzell, Michael; Poimenidou, Eirini

2011-01-01

In this article we present, as a case study, results of undergraduate research involving binomial coefficients modulo a prime "p." We will discuss how undergraduates were involved in the project, even with a minimal mathematical background beforehand. There are two main avenues of exploration described to discover these binomial…
Integer Solutions of Binomial Coefficients

ERIC Educational Resources Information Center

Gilbertson, Nicholas J.

2016-01-01

A good formula is like a good story, rich in description, powerful in communication, and eye-opening to readers. The formula presented in this article for determining the coefficients of the binomial expansion of (x + y)n is one such "good read." The beauty of this formula is in its simplicity--both describing a quantitative situation…
Confidence Intervals for Weighted Composite Scores under the Compound Binomial Error Model

ERIC Educational Resources Information Center

Kim, Kyung Yong; Lee, Won-Chan

2018-01-01

Reporting confidence intervals with test scores helps test users make important decisions about examinees by providing information about the precision of test scores. Although a variety of estimation procedures based on the binomial error model are available for computing intervals for test scores, these procedures assume that items are randomly…
Macro-level pedestrian and bicycle crash analysis: Incorporating spatial spillover effects in dual state count models.

PubMed

Cai, Qing; Lee, Jaeyoung; Eluru, Naveen; Abdel-Aty, Mohamed

2016-08-01

This study attempts to explore the viability of dual-state models (i.e., zero-inflated and hurdle models) for traffic analysis zones (TAZs) based pedestrian and bicycle crash frequency analysis. Additionally, spatial spillover effects are explored in the models by employing exogenous variables from neighboring zones. The dual-state models such as zero-inflated negative binomial and hurdle negative binomial models (with and without spatial effects) are compared with the conventional single-state model (i.e., negative binomial). The model comparison for pedestrian and bicycle crashes revealed that the models that considered observed spatial effects perform better than the models that did not consider the observed spatial effects. Across the models with spatial spillover effects, the dual-state models especially zero-inflated negative binomial model offered better performance compared to single-state models. Moreover, the model results clearly highlighted the importance of various traffic, roadway, and sociodemographic characteristics of the TAZ as well as neighboring TAZs on pedestrian and bicycle crash frequency. Copyright © 2016 Elsevier Ltd. All rights reserved.
A comparison of different statistical methods analyzing hypoglycemia data using bootstrap simulations.

PubMed

Jiang, Honghua; Ni, Xiao; Huster, William; Heilmann, Cory

2015-01-01

Hypoglycemia has long been recognized as a major barrier to achieving normoglycemia with intensive diabetic therapies. It is a common safety concern for the diabetes patients. Therefore, it is important to apply appropriate statistical methods when analyzing hypoglycemia data. Here, we carried out bootstrap simulations to investigate the performance of the four commonly used statistical models (Poisson, negative binomial, analysis of covariance [ANCOVA], and rank ANCOVA) based on the data from a diabetes clinical trial. Zero-inflated Poisson (ZIP) model and zero-inflated negative binomial (ZINB) model were also evaluated. Simulation results showed that Poisson model inflated type I error, while negative binomial model was overly conservative. However, after adjusting for dispersion, both Poisson and negative binomial models yielded slightly inflated type I errors, which were close to the nominal level and reasonable power. Reasonable control of type I error was associated with ANCOVA model. Rank ANCOVA model was associated with the greatest power and with reasonable control of type I error. Inflated type I error was observed with ZIP and ZINB models.
Discrimination of numerical proportions: A comparison of binomial and Gaussian models.

PubMed

Raidvee, Aire; Lember, Jüri; Allik, Jüri

2017-01-01

Observers discriminated the numerical proportion of two sets of elements (N = 9, 13, 33, and 65) that differed either by color or orientation. According to the standard Thurstonian approach, the accuracy of proportion discrimination is determined by irreducible noise in the nervous system that stochastically transforms the number of presented visual elements onto a continuum of psychological states representing numerosity. As an alternative to this customary approach, we propose a Thurstonian-binomial model, which assumes discrete perceptual states, each of which is associated with a certain visual element. It is shown that the probability β with which each visual element can be noticed and registered by the perceptual system can explain data of numerical proportion discrimination at least as well as the continuous Thurstonian-Gaussian model, and better, if the greater parsimony of the Thurstonian-binomial model is taken into account using AIC model selection. We conclude that Gaussian and binomial models represent two different fundamental principles-internal noise vs. using only a fraction of available information-which are both plausible descriptions of visual perception.
Random trinomial tree models and vanilla options

NASA Astrophysics Data System (ADS)

Ganikhodjaev, Nasir; Bayram, Kamola

2013-09-01

In this paper we introduce and study random trinomial model. The usual trinomial model is prescribed by triple of numbers (u, d, m). We call the triple (u, d, m) an environment of the trinomial model. A triple (Un, Dn, Mn), where {Un}, {Dn} and {Mn} are the sequences of independent, identically distributed random variables with 0 < Dn < 1 < Un and Mn = 1 for all n, is called a random environment and trinomial tree model with random environment is called random trinomial model. The random trinomial model is considered to produce more accurate results than the random binomial model or usual trinomial model.
Event-by-event gluon multiplicity, energy density, and eccentricities in ultrarelativistic heavy-ion collisions

NASA Astrophysics Data System (ADS)

Schenke, Björn; Tribedy, Prithwish; Venugopalan, Raju

2012-09-01

The event-by-event multiplicity distribution, the energy densities and energy density weighted eccentricity moments ɛn (up to n=6) at early times in heavy-ion collisions at both the BNL Relativistic Heavy Ion Collider (RHIC) (s=200GeV) and the CERN Large Hardron Collider (LHC) (s=2.76TeV) are computed in the IP-Glasma model. This framework combines the impact parameter dependent saturation model (IP-Sat) for nucleon parton distributions (constrained by HERA deeply inelastic scattering data) with an event-by-event classical Yang-Mills description of early-time gluon fields in heavy-ion collisions. The model produces multiplicity distributions that are convolutions of negative binomial distributions without further assumptions or parameters. In the limit of large dense systems, the n-particle gluon distribution predicted by the Glasma-flux tube model is demonstrated to be nonperturbatively robust. In the general case, the effect of additional geometrical fluctuations is quantified. The eccentricity moments are compared to the MC-KLN model; a noteworthy feature is that fluctuation dominated odd moments are consistently larger than in the MC-KLN model.
Intermittency via moments and distributions in central O+Cu collisions at 14. 6 A[center dot]GeV/c

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tannenbaum, M.J.

Fluctuations in pseudorapidity distributions of charged particles from central (ZCAL) collisions of [sup 16]O+Cu at 14.6 A[center dot]GeV/c have been analyzed by Ju Kang using the method of scaled factorial moments as a function of the interval [delta][eta] an apparent power-law growth of moments with decreasing interval is observed down to [delta][eta] [approximately] 0.1, and the measured slope parameters are found to obey two scaling rules. Previous experience with E[sub T] distributions suggested that fluctuations of multiplicity and transverse energy can be well described by Gamma or Negative Binomial Distributions (NBD) and excellent fits to NBD were obtained in allmore » [delta][eta] bins. The k parameter of the NBD fit was found to increase linearly with the [delta][eta] interval, which due to the well known property of the NBD under convolution, indicates that the multiplicity distributions in adjacent bins of pseudorapidity [delta][eta] [approximately] 0.1 are largely statistically independent.« less
Intermittency via moments and distributions in central O+Cu collisions at 14.6 A{center_dot}GeV/c

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tannenbaum, M.J.; The E802 Collaboration

Fluctuations in pseudorapidity distributions of charged particles from central (ZCAL) collisions of {sup 16}O+Cu at 14.6 A{center_dot}GeV/c have been analyzed by Ju Kang using the method of scaled factorial moments as a function of the interval {delta}{eta} an apparent power-law growth of moments with decreasing interval is observed down to {delta}{eta} {approximately} 0.1, and the measured slope parameters are found to obey two scaling rules. Previous experience with E{sub T} distributions suggested that fluctuations of multiplicity and transverse energy can be well described by Gamma or Negative Binomial Distributions (NBD) and excellent fits to NBD were obtained in all {delta}{eta}more » bins. The k parameter of the NBD fit was found to increase linearly with the {delta}{eta} interval, which due to the well known property of the NBD under convolution, indicates that the multiplicity distributions in adjacent bins of pseudorapidity {delta}{eta} {approximately} 0.1 are largely statistically independent.« less
Spatial distribution of citizen science casuistic observations for different taxonomic groups.

PubMed

Tiago, Patrícia; Ceia-Hasse, Ana; Marques, Tiago A; Capinha, César; Pereira, Henrique M

2017-10-16

Opportunistic citizen science databases are becoming an important way of gathering information on species distributions. These data are temporally and spatially dispersed and could have limitations regarding biases in the distribution of the observations in space and/or time. In this work, we test the influence of landscape variables in the distribution of citizen science observations for eight taxonomic groups. We use data collected through a Portuguese citizen science database (biodiversity4all.org). We use a zero-inflated negative binomial regression to model the distribution of observations as a function of a set of variables representing the landscape features plausibly influencing the spatial distribution of the records. Results suggest that the density of paths is the most important variable, having a statistically significant positive relationship with number of observations for seven of the eight taxa considered. Wetland coverage was also identified as having a significant, positive relationship, for birds, amphibians and reptiles, and mammals. Our results highlight that the distribution of species observations, in citizen science projects, is spatially biased. Higher frequency of observations is driven largely by accessibility and by the presence of water bodies. We conclude that efforts are required to increase the spatial evenness of sampling effort from volunteers.
Kinetics of low-temperature transitions and a reaction rate theory from non-equilibrium distributions

PubMed Central

Aquilanti, Vincenzo; Coutinho, Nayara Dantas

2017-01-01

This article surveys the empirical information which originated both by laboratory experiments and by computational simulations, and expands previous understanding of the rates of chemical processes in the low-temperature range, where deviations from linearity of Arrhenius plots were revealed. The phenomenological two-parameter Arrhenius equation requires improvement for applications where interpolation or extrapolations are demanded in various areas of modern science. Based on Tolman's theorem, the dependence of the reciprocal of the apparent activation energy as a function of reciprocal absolute temperature permits the introduction of a deviation parameter d covering uniformly a variety of rate processes, from those where quantum mechanical tunnelling is significant and d < 0, to those where d > 0, corresponding to the Pareto–Tsallis statistical weights: these generalize the Boltzmann–Gibbs weight, which is recovered for d = 0. It is shown here how the weights arise, relaxing the thermodynamic equilibrium limit, either for a binomial distribution if d > 0 or for a negative binomial distribution if d < 0, formally corresponding to Fermion-like or Boson-like statistics, respectively. The current status of the phenomenology is illustrated emphasizing case studies; specifically (i) the super-Arrhenius kinetics, where transport phenomena accelerate processes as the temperature increases; (ii) the sub-Arrhenius kinetics, where quantum mechanical tunnelling propitiates low-temperature reactivity; (iii) the anti-Arrhenius kinetics, where processes with no energetic obstacles are rate-limited by molecular reorientation requirements. Particular attention is given for case (i) to the treatment of diffusion and viscosity, for case (ii) to formulation of a transition rate theory for chemical kinetics including quantum mechanical tunnelling, and for case (iii) to the stereodirectional specificity of the dynamics of reactions strongly hindered by the increase of temperature. This article is part of the themed issue ‘Theoretical and computational studies of non-equilibrium and non-statistical dynamics in the gas phase, in the condensed phase and at interfaces’. PMID:28320904
Kinetics of low-temperature transitions and a reaction rate theory from non-equilibrium distributions.

PubMed

Aquilanti, Vincenzo; Coutinho, Nayara Dantas; Carvalho-Silva, Valter Henrique

2017-04-28

This article surveys the empirical information which originated both by laboratory experiments and by computational simulations, and expands previous understanding of the rates of chemical processes in the low-temperature range, where deviations from linearity of Arrhenius plots were revealed. The phenomenological two-parameter Arrhenius equation requires improvement for applications where interpolation or extrapolations are demanded in various areas of modern science. Based on Tolman's theorem, the dependence of the reciprocal of the apparent activation energy as a function of reciprocal absolute temperature permits the introduction of a deviation parameter d covering uniformly a variety of rate processes, from those where quantum mechanical tunnelling is significant and d < 0, to those where d > 0, corresponding to the Pareto-Tsallis statistical weights: these generalize the Boltzmann-Gibbs weight, which is recovered for d = 0. It is shown here how the weights arise, relaxing the thermodynamic equilibrium limit, either for a binomial distribution if d > 0 or for a negative binomial distribution if d < 0, formally corresponding to Fermion-like or Boson-like statistics, respectively. The current status of the phenomenology is illustrated emphasizing case studies; specifically (i) the super -Arrhenius kinetics, where transport phenomena accelerate processes as the temperature increases; (ii) the sub -Arrhenius kinetics, where quantum mechanical tunnelling propitiates low-temperature reactivity; (iii) the anti -Arrhenius kinetics, where processes with no energetic obstacles are rate-limited by molecular reorientation requirements. Particular attention is given for case (i) to the treatment of diffusion and viscosity, for case (ii) to formulation of a transition rate theory for chemical kinetics including quantum mechanical tunnelling, and for case (iii) to the stereodirectional specificity of the dynamics of reactions strongly hindered by the increase of temperature.This article is part of the themed issue 'Theoretical and computational studies of non-equilibrium and non-statistical dynamics in the gas phase, in the condensed phase and at interfaces'. © 2017 The Author(s).
Zero adjusted models with applications to analysing helminths count data.

PubMed

Chipeta, Michael G; Ngwira, Bagrey M; Simoonga, Christopher; Kazembe, Lawrence N

2014-11-27

It is common in public health and epidemiology that the outcome of interest is counts of events occurrence. Analysing these data using classical linear models is mostly inappropriate, even after transformation of outcome variables due to overdispersion. Zero-adjusted mixture count models such as zero-inflated and hurdle count models are applied to count data when over-dispersion and excess zeros exist. Main objective of the current paper is to apply such models to analyse risk factors associated with human helminths (S. haematobium) particularly in a case where there's a high proportion of zero counts. The data were collected during a community-based randomised control trial assessing the impact of mass drug administration (MDA) with praziquantel in Malawi, and a school-based cross sectional epidemiology survey in Zambia. Count data models including traditional (Poisson and negative binomial) models, zero modified models (zero inflated Poisson and zero inflated negative binomial) and hurdle models (Poisson logit hurdle and negative binomial logit hurdle) were fitted and compared. Using Akaike information criteria (AIC), the negative binomial logit hurdle (NBLH) and zero inflated negative binomial (ZINB) showed best performance in both datasets. With regards to zero count capturing, these models performed better than other models. This paper showed that zero modified NBLH and ZINB models are more appropriate methods for the analysis of data with excess zeros. The choice between the hurdle and zero-inflated models should be based on the aim and endpoints of the study.
Evaluation of an operational malaria outbreak identification and response system in Mpumalanga Province, South Africa.

PubMed

Coleman, Marlize; Coleman, Michael; Mabuza, Aaron M; Kok, Gerdalize; Coetzee, Maureen; Durrheim, David N

2008-04-27

To evaluate the performance of a novel malaria outbreak identification system in the epidemic prone rural area of Mpumalanga Province, South Africa, for timely identification of malaria outbreaks and guiding integrated public health responses. Using five years of historical notification data, two binomial thresholds were determined for each primary health care facility in the highest malaria risk area of Mpumalanga province. Whenever the thresholds were exceeded at health facility level (tier 1), primary health care staff notified the malaria control programme, which then confirmed adequate stocks of malaria treatment to manage potential increased cases. The cases were followed up at household level to verify the likely source of infection. The binomial thresholds were reviewed at village/town level (tier 2) to determine whether additional response measures were required. In addition, an automated electronic outbreak identification system at town/village level (tier 2) was integrated into the case notification database (tier 3) to ensure that unexpected increases in case notification were not missed.The performance of these binomial outbreak thresholds was evaluated against other currently recommended thresholds using retrospective data. The acceptability of the system at primary health care level was evaluated through structured interviews with health facility staff. Eighty four percent of health facilities reported outbreaks within 24 hours (n = 95), 92% (n = 104) within 48 hours and 100% (n = 113) within 72 hours. Appropriate response to all malaria outbreaks (n = 113, tier 1, n = 46, tier 2) were achieved within 24 hours. The system was positively viewed by all health facility staff. When compared to other epidemiological systems for a specified 12 month outbreak season (June 2003 to July 2004) the binomial exact thresholds produced one false weekly outbreak, the C-sum 12 weekly outbreaks and the mean + 2 SD nine false weekly outbreaks. Exceeding the binomial level 1 threshold triggered an alert four weeks prior to an outbreak, but exceeding the binomial level 2 threshold identified an outbreak as it occurred. The malaria outbreak surveillance system using binomial thresholds achieved its primary goal of identifying outbreaks early facilitating appropriate local public health responses aimed at averting a possible large-scale epidemic in a low, and unstable, malaria transmission setting.
Ecological and pest-management implications of sex differences in scarab landing patterns on grape vines

PubMed Central

Boyer, Stéphane; Lefort, Marie-Caroline; Nboyine, Jerry; Wratten, Steve D.

2017-01-01

Background Melolonthinae beetles, comprising different white grub species, are a globally-distributed pest group. Their larvae feed on roots of several crop and forestry species, and adults can cause severe defoliation. In New Zealand, the endemic scarab pest Costelytra zealandica (White) causes severe defoliation on different horticultural crops, including grape vines (Vitis vinifera). Understanding flight and landing behaviours of this pest can help inform pest management decisions. Methods Adult beetles were counted and then removed from 96 grape vine plants from 21:30 until 23:00 h, every day from October 26 until December 2, during 2014 and 2015. Also, adults were removed from the grape vine foliage at dusk 5, 10, 15, 20 and 25 min after flight started on 2015. Statistical analyses were performed using generalised linear models with a beta-binomial distribution to analyse proportions and with a negative binomial distribution for beetle abundance. Results By analysing C. zealandica sex ratios during its entire flight season, it is clear that the proportion of males is higher at the beginning of the season, gradually declining towards its end. When adults were successively removed from the grape vines at 5-min intervals after flight activity begun, the mean proportion of males ranged from 6–28%. The male proportion suggests males were attracted to females that had already landed on grape vines, probably through pheromone release. Discussion The seasonal and daily changes in adult C. zealandica sex ratio throughout its flight season are presented for the first time. Although seasonal changes in sex ratio have been reported for other melolonthines, changes during their daily flight activity have not been analysed so far. Sex-ratio changes can have important consequences for the management of this pest species, and possibly for other melolonthines, as it has been previously suggested that C. zealandica females land on plants that produce a silhouette against the sky. Therefore, long-term management might evaluate the effect of different plant heights and architecture on female melolonthine landing patterns, with consequences for male distribution, and subsequently overall damage within horticultural areas. PMID:28462026
Ecological and pest-management implications of sex differences in scarab landing patterns on grape vines.

PubMed

González-Chang, Mauricio; Boyer, Stéphane; Lefort, Marie-Caroline; Nboyine, Jerry; Wratten, Steve D

2017-01-01

Melolonthinae beetles, comprising different white grub species, are a globally-distributed pest group. Their larvae feed on roots of several crop and forestry species, and adults can cause severe defoliation. In New Zealand, the endemic scarab pest Costelytra zealandica (White) causes severe defoliation on different horticultural crops, including grape vines ( Vitis vinifera ). Understanding flight and landing behaviours of this pest can help inform pest management decisions. Adult beetles were counted and then removed from 96 grape vine plants from 21:30 until 23:00 h, every day from October 26 until December 2, during 2014 and 2015. Also, adults were removed from the grape vine foliage at dusk 5, 10, 15, 20 and 25 min after flight started on 2015. Statistical analyses were performed using generalised linear models with a beta-binomial distribution to analyse proportions and with a negative binomial distribution for beetle abundance. By analysing C. zealandica sex ratios during its entire flight season, it is clear that the proportion of males is higher at the beginning of the season, gradually declining towards its end. When adults were successively removed from the grape vines at 5-min intervals after flight activity begun, the mean proportion of males ranged from 6-28%. The male proportion suggests males were attracted to females that had already landed on grape vines, probably through pheromone release. The seasonal and daily changes in adult C. zealandica sex ratio throughout its flight season are presented for the first time. Although seasonal changes in sex ratio have been reported for other melolonthines, changes during their daily flight activity have not been analysed so far. Sex-ratio changes can have important consequences for the management of this pest species, and possibly for other melolonthines, as it has been previously suggested that C. zealandica females land on plants that produce a silhouette against the sky. Therefore, long-term management might evaluate the effect of different plant heights and architecture on female melolonthine landing patterns, with consequences for male distribution, and subsequently overall damage within horticultural areas.

DRME: Count-based differential RNA methylation analysis at small sample size scenario.

PubMed

Liu, Lian; Zhang, Shao-Wu; Gao, Fan; Zhang, Yixin; Huang, Yufei; Chen, Runsheng; Meng, Jia

2016-04-15

Differential methylation, which concerns difference in the degree of epigenetic regulation via methylation between two conditions, has been formulated as a beta or beta-binomial distribution to address the within-group biological variability in sequencing data. However, a beta or beta-binomial model is usually difficult to infer at small sample size scenario with discrete reads count in sequencing data. On the other hand, as an emerging research field, RNA methylation has drawn more and more attention recently, and the differential analysis of RNA methylation is significantly different from that of DNA methylation due to the impact of transcriptional regulation. We developed DRME to better address the differential RNA methylation problem. The proposed model can effectively describe within-group biological variability at small sample size scenario and handles the impact of transcriptional regulation on RNA methylation. We tested the newly developed DRME algorithm on simulated and 4 MeRIP-Seq case-control studies and compared it with Fisher's exact test. It is in principle widely applicable to several other RNA-related data types as well, including RNA Bisulfite sequencing and PAR-CLIP. The code together with an MeRIP-Seq dataset is available online (https://github.com/lzcyzm/DRME) for evaluation and reproduction of the figures shown in this article. Copyright © 2016 Elsevier Inc. All rights reserved.
Accident prediction model for public highway-rail grade crossings.

PubMed

Lu, Pan; Tolliver, Denver

2016-05-01

Considerable research has focused on roadway accident frequency analysis, but relatively little research has examined safety evaluation at highway-rail grade crossings. Highway-rail grade crossings are critical spatial locations of utmost importance for transportation safety because traffic crashes at highway-rail grade crossings are often catastrophic with serious consequences. The Poisson regression model has been employed to analyze vehicle accident frequency as a good starting point for many years. The most commonly applied variations of Poisson including negative binomial, and zero-inflated Poisson. These models are used to deal with common crash data issues such as over-dispersion (sample variance is larger than the sample mean) and preponderance of zeros (low sample mean and small sample size). On rare occasions traffic crash data have been shown to be under-dispersed (sample variance is smaller than the sample mean) and traditional distributions such as Poisson or negative binomial cannot handle under-dispersion well. The objective of this study is to investigate and compare various alternate highway-rail grade crossing accident frequency models that can handle the under-dispersion issue. The contributions of the paper are two-fold: (1) application of probability models to deal with under-dispersion issues and (2) obtain insights regarding to vehicle crashes at public highway-rail grade crossings. Copyright © 2016 Elsevier Ltd. All rights reserved.
I Remember You: Independence and the Binomial Model

ERIC Educational Resources Information Center

Levine, Douglas W.; Rockhill, Beverly

2006-01-01

We focus on the problem of ignoring statistical independence. A binomial experiment is used to determine whether judges could match, based on looks alone, dogs to their owners. The experimental design introduces dependencies such that the probability of a given judge correctly matching a dog and an owner changes from trial to trial. We show how…
Possibility and challenges of conversion of current virus species names to Linnaean binomials

USGS Publications Warehouse

Thomas, Postler; Clawson, Anna N.; Amarasinghe, Gaya K.; Basler, Christopher F.; Bavari, Sina; Benko, Maria; Blasdell, Kim R.; Briese, Thomas; Buchmeier, Michael J.; Bukreyev, Alexander; Calisher, Charles H.; Chandran, Kartik; Charrel, Remi; Clegg, Christopher S.; Collins, Peter L.; De la Torre, Juan Carlos; DeRisi, Joseph L.; Dietzgen, Ralf G.; Dolnik, Olga; Durrwald, Ralf; Dye, John M.; Easton, Andrew J.; Emonet, Sebastian; Formenty, Pierre; Fouchier, Ron A. M.; Ghedin, Elodie; Gonzalez, Jean-Paul; Harrach, Balazs; Hewson, Roger; Horie, Masayuki; Jiang, Daohong; Kobinger, Gary P.; Kondo, Hideki; Kropinski, Andrew; Krupovic, Mart; Kurath, Gael; Lamb, Robert A.; Leroy, Eric M.; Lukashevich, Igor S.; Maisner, Andrea; Mushegian, Arcady; Netesov, Sergey V.; Nowotny, Norbert; Patterson, Jean L.; Payne, Susan L.; Paweska, Janusz T.; Peters, C.J.; Radoshitzky, Sheli; Rima, Bertus K.; Romanowski, Victor; Rubbenstroth, Dennis; Sabanadzovic, Sead; Sanfacon, Helene; Salvato , Maria; Schwemmle, Martin; Smither, Sophie J.; Stenglein, Mark; Stone, D.M.; Takada , Ayato; Tesh, Robert B.; Tomonaga, Keizo; Tordo, N.; Towner, Jonathan S.; Vasilakis, Nikos; Volchkov, Victor E.; Jensen, Victoria; Walker, Peter J.; Wang, Lin-Fa; Varsani, Arvind; Whitfield , Anna E.; Zerbini, Francisco Murilo; Kuhn, Jens H.

2017-01-01

Botanical, mycological, zoological, and prokaryotic species names follow the Linnaean format, consisting of an italicized Latinized binomen with a capitalized genus name and a lower case species epithet (e.g., Homo sapiens). Virus species names, however, do not follow a uniform format, and, even when binomial, are not Linnaean in style. In this thought exercise, we attempted to convert all currently official names of species included in the virus family Arenaviridae and the virus order Mononegavirales to Linnaean binomials, and to identify and address associated challenges and concerns. Surprisingly, this endeavor was not as complicated or time-consuming as even the authors of this article expected when conceiving the experiment.
Testing the anisotropy in the angular distribution of Fermi/GBM gamma-ray bursts

NASA Astrophysics Data System (ADS)

Tarnopolski, M.

2017-12-01

Gamma-ray bursts (GRBs) were confirmed to be of extragalactic origin due to their isotropic angular distribution, combined with the fact that they exhibited an intensity distribution that deviated strongly from the -3/2 power law. This finding was later confirmed with the first redshift, equal to at least z = 0.835, measured for GRB970508. Despite this result, the data from CGRO/BATSE and Swift/BAT indicate that long GRBs are indeed distributed isotropically, but the distribution of short GRBs is anisotropic. Fermi/GBM has detected 1669 GRBs up to date, and their sky distribution is examined in this paper. A number of statistical tests are applied: nearest neighbour analysis, fractal dimension, dipole and quadrupole moments of the distribution function decomposed into spherical harmonics, binomial test and the two-point angular correlation function. Monte Carlo benchmark testing of each test is performed in order to evaluate its reliability. It is found that short GRBs are distributed anisotropically in the sky, and long ones have an isotropic distribution. The probability that these results are not a chance occurrence is equal to at least 99.98 per cent and 30.68 per cent for short and long GRBs, respectively. The cosmological context of this finding and its relation to large-scale structures is discussed.
A semi-nonparametric Poisson regression model for analyzing motor vehicle crash data.

PubMed

Ye, Xin; Wang, Ke; Zou, Yajie; Lord, Dominique

2018-01-01

This paper develops a semi-nonparametric Poisson regression model to analyze motor vehicle crash frequency data collected from rural multilane highway segments in California, US. Motor vehicle crash frequency on rural highway is a topic of interest in the area of transportation safety due to higher driving speeds and the resultant severity level. Unlike the traditional Negative Binomial (NB) model, the semi-nonparametric Poisson regression model can accommodate an unobserved heterogeneity following a highly flexible semi-nonparametric (SNP) distribution. Simulation experiments are conducted to demonstrate that the SNP distribution can well mimic a large family of distributions, including normal distributions, log-gamma distributions, bimodal and trimodal distributions. Empirical estimation results show that such flexibility offered by the SNP distribution can greatly improve model precision and the overall goodness-of-fit. The semi-nonparametric distribution can provide a better understanding of crash data structure through its ability to capture potential multimodality in the distribution of unobserved heterogeneity. When estimated coefficients in empirical models are compared, SNP and NB models are found to have a substantially different coefficient for the dummy variable indicating the lane width. The SNP model with better statistical performance suggests that the NB model overestimates the effect of lane width on crash frequency reduction by 83.1%.
Risk factors related to Toxoplasma gondii seroprevalence in indoor-housed Dutch dairy goats.

PubMed

Deng, Huifang; Dam-Deisz, Cecile; Luttikholt, Saskia; Maas, Miriam; Nielen, Mirjam; Swart, Arno; Vellema, Piet; van der Giessen, Joke; Opsteegh, Marieke

2016-02-01

Toxoplasma gondii can cause disease in goats, but also has impact on human health through food-borne transmission. Our aims were to determine the seroprevalence of T. gondii infection in indoor-housed Dutch dairy goats and to identify the risk factors related to T. gondii seroprevalence. Fifty-two out of ninety approached farmers with indoor-kept goats (58%) participated by answering a standardized questionnaire and contributing 32 goat blood samples each. Serum samples were tested for T. gondii SAG1 antibodies by ELISA and results showed that the frequency distribution of the log10-transformed OD-values fitted well with a binary mixture of a shifted gamma and a shifted reflected gamma distribution. The overall animal seroprevalence was 13.3% (95% CI: 11.7–14.9%), and at least one seropositive animal was found on 61.5% (95% CI: 48.3–74.7%) of the farms. To evaluate potential risk factors on herd level, three modeling strategies (Poisson, negative binomial and zero-inflated) were compared. The negative binomial model fitted the data best with the number of cats (1–4 cats: IR: 2.6, 95% CI: 1.1–6.5; > = 5 cats:IR: 14.2, 95% CI: 3.9–51.1) and mean animal age (IR: 1.5, 95% CI: 1.1–2.1) related to herd positivity. In conclusion, the ELISA test was 100% sensitive and specific based on binary mixture analysis. T. gondii infection is prevalent in indoor housed Dutch dairy goats but at a lower overall animal level seroprevalence than outdoor farmed goats in other European countries, and cat exposure is an important risk factor.
Building test data from real outbreaks for evaluating detection algorithms.

PubMed

Texier, Gaetan; Jackson, Michael L; Siwe, Leonel; Meynard, Jean-Baptiste; Deparis, Xavier; Chaudet, Herve

2017-01-01

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method-ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals.
Building test data from real outbreaks for evaluating detection algorithms

PubMed Central

Texier, Gaetan; Jackson, Michael L.; Siwe, Leonel; Meynard, Jean-Baptiste; Deparis, Xavier; Chaudet, Herve

2017-01-01

Benchmarking surveillance systems requires realistic simulations of disease outbreaks. However, obtaining these data in sufficient quantity, with a realistic shape and covering a sufficient range of agents, size and duration, is known to be very difficult. The dataset of outbreak signals generated should reflect the likely distribution of authentic situations faced by the surveillance system, including very unlikely outbreak signals. We propose and evaluate a new approach based on the use of historical outbreak data to simulate tailored outbreak signals. The method relies on a homothetic transformation of the historical distribution followed by resampling processes (Binomial, Inverse Transform Sampling Method—ITSM, Metropolis-Hasting Random Walk, Metropolis-Hasting Independent, Gibbs Sampler, Hybrid Gibbs Sampler). We carried out an analysis to identify the most important input parameters for simulation quality and to evaluate performance for each of the resampling algorithms. Our analysis confirms the influence of the type of algorithm used and simulation parameters (i.e. days, number of cases, outbreak shape, overall scale factor) on the results. We show that, regardless of the outbreaks, algorithms and metrics chosen for the evaluation, simulation quality decreased with the increase in the number of days simulated and increased with the number of cases simulated. Simulating outbreaks with fewer cases than days of duration (i.e. overall scale factor less than 1) resulted in an important loss of information during the simulation. We found that Gibbs sampling with a shrinkage procedure provides a good balance between accuracy and data dependency. If dependency is of little importance, binomial and ITSM methods are accurate. Given the constraint of keeping the simulation within a range of plausible epidemiological curves faced by the surveillance system, our study confirms that our approach can be used to generate a large spectrum of outbreak signals. PMID:28863159
Modelling road accident blackspots data with the discrete generalized Pareto distribution.

PubMed

Prieto, Faustino; Gómez-Déniz, Emilio; Sarabia, José María

2014-10-01

This study shows how road traffic networks events, in particular road accidents on blackspots, can be modelled with simple probabilistic distributions. We considered the number of crashes and the number of fatalities on Spanish blackspots in the period 2003-2007, from Spanish General Directorate of Traffic (DGT). We modelled those datasets, respectively, with the discrete generalized Pareto distribution (a discrete parametric model with three parameters) and with the discrete Lomax distribution (a discrete parametric model with two parameters, and particular case of the previous model). For that, we analyzed the basic properties of both parametric models: cumulative distribution, survival, probability mass, quantile and hazard functions, genesis and rth-order moments; applied two estimation methods of their parameters: the μ and (μ+1) frequency method and the maximum likelihood method; used two goodness-of-fit tests: Chi-square test and discrete Kolmogorov-Smirnov test based on bootstrap resampling; and compared them with the classical negative binomial distribution in terms of absolute probabilities and in models including covariates. We found that those probabilistic models can be useful to describe the road accident blackspots datasets analyzed. Copyright © 2014 Elsevier Ltd. All rights reserved.
Flood return level analysis of Peaks over Threshold series under changing climate

NASA Astrophysics Data System (ADS)

Li, L.; Xiong, L.; Hu, T.; Xu, C. Y.; Guo, S.

2016-12-01

Obtaining insights into future flood estimation is of great significance for water planning and management. Traditional flood return level analysis with the stationarity assumption has been challenged by changing environments. A method that takes into consideration the nonstationarity context has been extended to derive flood return levels for Peaks over Threshold (POT) series. With application to POT series, a Poisson distribution is normally assumed to describe the arrival rate of exceedance events, but this distribution assumption has at times been reported as invalid. The Negative Binomial (NB) distribution is therefore proposed as an alternative to the Poisson distribution assumption. Flood return levels were extrapolated in nonstationarity context for the POT series of the Weihe basin, China under future climate scenarios. The results show that the flood return levels estimated under nonstationarity can be different with an assumption of Poisson and NB distribution, respectively. The difference is found to be related to the threshold value of POT series. The study indicates the importance of distribution selection in flood return level analysis under nonstationarity and provides a reference on the impact of climate change on flood estimation in the Weihe basin for the future.
[Monitoring microbiological safety of small systems of water distribution. Comparison of two sampling programs in a town in central Italy].

PubMed

Papini, Paolo; Faustini, Annunziata; Manganello, Rosa; Borzacchi, Giancarlo; Spera, Domenico; Perucci, Carlo A

2005-01-01

To determine the frequency of sampling in small water distribution systems (<5,000 inhabitants) and compare the results according to different hypotheses in bacteria distribution. We carried out two sampling programs to monitor the water distribution system in a town in Central Italy between July and September 1992; the Poisson distribution assumption implied 4 water samples, the assumption of negative binomial distribution implied 21 samples. Coliform organisms were used as indicators of water safety. The network consisted of two pipe rings and two wells fed by the same water source. The number of summer customers varied considerably from 3,000 to 20,000. The mean density was 2.33 coliforms/100 ml (sd= 5.29) for 21 samples and 3 coliforms/100 ml (sd= 6) for four samples. However the hypothesis of homogeneity was rejected (p-value <0.001) and the probability of II type error with the assumption of heterogeneity was higher with 4 samples (beta= 0.24) than with 21 (beta= 0.05). For this small network, determining the samples' size according to heterogeneity hypothesis strengthens the statement that water is drinkable compared with homogeneity assumption.
A Bayesian method for inferring transmission chains in a partially observed epidemic.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Marzouk, Youssef M.; Ray, Jaideep

2008-10-01

We present a Bayesian approach for estimating transmission chains and rates in the Abakaliki smallpox epidemic of 1967. The epidemic affected 30 individuals in a community of 74; only the dates of appearance of symptoms were recorded. Our model assumes stochastic transmission of the infections over a social network. Distinct binomial random graphs model intra- and inter-compound social connections, while disease transmission over each link is treated as a Poisson process. Link probabilities and rate parameters are objects of inference. Dates of infection and recovery comprise the remaining unknowns. Distributions for smallpox incubation and recovery periods are obtained from historicalmore » data. Using Markov chain Monte Carlo, we explore the joint posterior distribution of the scalar parameters and provide an expected connectivity pattern for the social graph and infection pathway.« less
Bursts of Self-Conscious Emotions in the Daily Lives of Emerging Adults

PubMed Central

Conroy, David E.; Ram, Nilam; Pincus, Aaron L.; Rebar, Amanda L.

2015-01-01

Self-conscious emotions play a role in regulating daily achievement strivings, social behavior, and health, but little is known about the processes underlying their daily manifestation. Emerging adults (n = 182) completed daily diaries for eight days and multilevel models were estimated to evaluate whether, how much, and why their emotions varied from day-to-day. Within-person variation in authentic pride was normally-distributed across people and days whereas the other emotions were burst-like and characterized by zero-inflated, negative binomial distributions. Perceiving social interactions as generally communal increased the odds of hubristic pride activation and reduced the odds of guilt activation; daily communal behavior reduced guilt intensity. Results illuminated processes through which meaning about the self-in-relation-to-others is constructed during a critical period of development. PMID:25859164
Improving removal-based estimates of abundance by sampling a population of spatially distinct subpopulations

USGS Publications Warehouse

Dorazio, R.M.; Jelks, H.L.; Jordan, F.

2005-01-01

A statistical modeling framework is described for estimating the abundances of spatially distinct subpopulations of animals surveyed using removal sampling. To illustrate this framework, hierarchical models are developed using the Poisson and negative-binomial distributions to model variation in abundance among subpopulations and using the beta distribution to model variation in capture probabilities. These models are fitted to the removal counts observed in a survey of a federally endangered fish species. The resulting estimates of abundance have similar or better precision than those computed using the conventional approach of analyzing the removal counts of each subpopulation separately. Extension of the hierarchical models to include spatial covariates of abundance is straightforward and may be used to identify important features of an animal's habitat or to predict the abundance of animals at unsampled locations.
Raw and Central Moments of Binomial Random Variables via Stirling Numbers

ERIC Educational Resources Information Center

Griffiths, Martin

2013-01-01

We consider here the problem of calculating the moments of binomial random variables. It is shown how formulae for both the raw and the central moments of such random variables may be obtained in a recursive manner utilizing Stirling numbers of the first kind. Suggestions are also provided as to how students might be encouraged to explore this…
A Mixed-Effects Heterogeneous Negative Binomial Model for Postfire Conifer Regeneration in Northeastern California, USA

Treesearch

Justin S. Crotteau; Martin W. Ritchie; J. Morgan Varner

2014-01-01

Many western USA fire regimes are typified by mixed-severity fire, which compounds the variability inherent to natural regeneration densities in associated forests. Tree regeneration data are often discrete and nonnegative; accordingly, we fit a series of Poisson and negative binomial variation models to conifer seedling counts across four distinct burn severities and...
Performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data.

PubMed

Yelland, Lisa N; Salter, Amy B; Ryan, Philip

2011-10-15

Modified Poisson regression, which combines a log Poisson regression model with robust variance estimation, is a useful alternative to log binomial regression for estimating relative risks. Previous studies have shown both analytically and by simulation that modified Poisson regression is appropriate for independent prospective data. This method is often applied to clustered prospective data, despite a lack of evidence to support its use in this setting. The purpose of this article is to evaluate the performance of the modified Poisson regression approach for estimating relative risks from clustered prospective data, by using generalized estimating equations to account for clustering. A simulation study is conducted to compare log binomial regression and modified Poisson regression for analyzing clustered data from intervention and observational studies. Both methods generally perform well in terms of bias, type I error, and coverage. Unlike log binomial regression, modified Poisson regression is not prone to convergence problems. The methods are contrasted by using example data sets from 2 large studies. The results presented in this article support the use of modified Poisson regression as an alternative to log binomial regression for analyzing clustered prospective data when clustering is taken into account by using generalized estimating equations.
Quasi-equilibrium theory for the distribution of rare alleles in a subdivided population: justification and implications.

PubMed

Burr, T L

2000-05-01

This paper examines a quasi-equilibrium theory of rare alleles for subdivided populations that follow an island-model version of the Wright-Fisher model of evolution. All mutations are assumed to create new alleles. We present four results: (1) conditions for the theory to apply are formally established using properties of the moments of the binomial distribution; (2) approximations currently in the literature can be replaced with exact results that are in better agreement with our simulations; (3) a modified maximum likelihood estimator of migration rate exhibits the same good performance on island-model data or on data simulated from the multinomial mixed with the Dirichlet distribution, and (4) a connection between the rare-allele method and the Ewens Sampling Formula for the infinite-allele mutation model is made. This introduces a new and simpler proof for the expected number of alleles implied by the Ewens Sampling Formula. Copyright 2000 Academic Press.
Confidence of compliance: a Bayesian approach for percentile standards.

PubMed

McBride, G B; Ellis, J C

2001-04-01

Rules for assessing compliance with percentile standards commonly limit the number of exceedances permitted in a batch of samples taken over a defined assessment period. Such rules are commonly developed using classical statistical methods. Results from alternative Bayesian methods are presented (using beta-distributed prior information and a binomial likelihood), resulting in "confidence of compliance" graphs. These allow simple reading of the consumer's risk and the supplier's risks for any proposed rule. The influence of the prior assumptions required by the Bayesian technique on the confidence results is demonstrated, using two reference priors (uniform and Jeffreys') and also using optimistic and pessimistic user-defined priors. All four give less pessimistic results than does the classical technique, because interpreting classical results as "confidence of compliance" actually invokes a Bayesian approach with an extreme prior distribution. Jeffreys' prior is shown to be the most generally appropriate choice of prior distribution. Cost savings can be expected using rules based on this approach.

Statistical guides to estimating the number of undiscovered mineral deposits: an example with porphyry copper deposits

USGS Publications Warehouse

Singer, Donald A.; Menzie, W.D.; Cheng, Qiuming; Bonham-Carter, G. F.

2005-01-01

Estimating numbers of undiscovered mineral deposits is a fundamental part of assessing mineral resources. Some statistical tools can act as guides to low variance, unbiased estimates of the number of deposits. The primary guide is that the estimates must be consistent with the grade and tonnage models. Another statistical guide is the deposit density (i.e., the number of deposits per unit area of permissive rock in well-explored control areas). Preliminary estimates and confidence limits of the number of undiscovered deposits in a tract of given area may be calculated using linear regression and refined using frequency distributions with appropriate parameters. A Poisson distribution leads to estimates having lower relative variances than the regression estimates and implies a random distribution of deposits. Coefficients of variation are used to compare uncertainties of negative binomial, Poisson, or MARK3 empirical distributions that have the same expected number of deposits as the deposit density. Statistical guides presented here allow simple yet robust estimation of the number of undiscovered deposits in permissive terranes.
Negative Binomial Fits to Multiplicity Distributions from Central Collisions of (16)O+Cu at 14.6A GeV/c and Intermittency

NASA Technical Reports Server (NTRS)

Tannenbaum, M. J.

1994-01-01

The concept of "Intermittency" was introduced by Bialas and Peschanski to try to explain the "large" fluctuations of multiplicity in restricted intervals of rapidity or pseudorapidity. A formalism was proposed to to study non-statistical (more precisely, non-Poisson) fluctuations as a function of the size of rapidity interval, and it was further suggested that the "spikes" in the rapidity fluctuations were evidence of fractal or intermittent behavior, in analogy to turbulence in fluid dynamics which is characterized by self-similar fluctuations at all scales-the absence of well defined scale of length.
Definite Integrals, Some Involving Residue Theory Evaluated by Maple Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowman, Kimiko o

2010-01-01

The calculus of residue is applied to evaluate certain integrals in the range (-{infinity} to {infinity}) using the Maple symbolic code. These integrals are of the form {integral}{sub -{infinity}}{sup {infinity}} cos(x)/[(x{sup 2} + a{sup 2})(x{sup 2} + b{sup 2}) (x{sup 2} + c{sup 2})]dx and similar extensions. The Maple code is also applied to expressions in maximum likelihood estimator moments when sampling from the negative binomial distribution. In general the Maple code approach to the integrals gives correct answers to specified decimal places, but the symbolic result may be extremely long and complex.
Estimating cavity tree and snag abundance using negative binomial regression models and nearest neighbor imputation methods

Treesearch

Bianca N.I. Eskelson; Hailemariam Temesgen; Tara M. Barrett

2009-01-01

Cavity tree and snag abundance data are highly variable and contain many zero observations. We predict cavity tree and snag abundance from variables that are readily available from forest cover maps or remotely sensed data using negative binomial (NB), zero-inflated NB, and zero-altered NB (ZANB) regression models as well as nearest neighbor (NN) imputation methods....
Spatial distribution of parasitism on Phyllocnistis citrella Stainton, 1856 (Lepidoptera: Gracillariidae) in citrus orchards.

PubMed

Jahnke, S M; Redaelli, L R; Diefenbach, L M G; Efrom, C F

2008-11-01

Many species of microhymenopterous parasitoids have been registered on Phyllocnistis citrella, the citrus leafminer. The present study aimed to identify the spatial distribution pattern of the native and introduced parasitoids of P. citrella in two citrus orchards in Montenegro, RS. The new shoots from 24 randomly selected trees in each orchard were inspected at the bottom (0-1.5 m) and top (1.5-2.5 m) stratum and had their position relative to the quadrants (North, South, East and West) registered at every 15 days from July/2002 to June/2003. The leaves with pupae were collected and kept isolated until the emergence of parasitoids or of the leaf miner; so, the sampling was biased towards parasitoids that emerge in the host pupal phase. The horizontal spatial distribution was evaluated testing the fitness of data to the Poisson and negative binomial distributions. In Montenegrina, there was no significant difference in the number of parasitoids and in the mean number of pupae found in the top and bottom strata (chi2 = 0.66; df = 1; P > 0.05) (chi2 = 0.27; df =1; P > 0.05), respectively. In relation to the quadrants, the highest average numbers of the leafminer pupae and of parasitoids were registered at the East quadrant (chi2 = 11.81; df = 3; P < 0.05), (chi2 = 10.36; df = 3; P < 0.05). In the Murcott orchard, a higher number of parasitoids was found at the top stratum (63.5%) (chi2 = 7.24; df =1 P < 0.05), the same occurring with the average number of P. citrella pupae (62.9%) (chi2 = 6.66; df = 1; P < 0.05). The highest number of parasitoids and of miners was registered at the North quadrant (chi2 = 19. 29; df = 3; P < 0.05), (chi2 = 4.39; df = 3; P < 0.05). In both orchards, there was no difference between the numbers of shoots either relative to the strata as well as to the quadrants. As the number of shoots did not varied much relative to the quadrants, it is possible that the higher number of miners and parasitoids in the East and West quadrants would be influenced by the higher solar exposure of these quadrants. The data of the horizontal spatial distribution of the parasitism fit to the negative binomial distribution in all sampling occasions, indicating an aggregated pattern.
Bayesian analysis of volcanic eruptions

NASA Astrophysics Data System (ADS)

Ho, Chih-Hsiang

1990-10-01

The simple Poisson model generally gives a good fit to many volcanoes for volcanic eruption forecasting. Nonetheless, empirical evidence suggests that volcanic activity in successive equal time-periods tends to be more variable than a simple Poisson with constant eruptive rate. An alternative model is therefore examined in which eruptive rate(λ) for a given volcano or cluster(s) of volcanoes is described by a gamma distribution (prior) rather than treated as a constant value as in the assumptions of a simple Poisson model. Bayesian analysis is performed to link two distributions together to give the aggregate behavior of the volcanic activity. When the Poisson process is expanded to accomodate a gamma mixing distribution on λ, a consequence of this mixed (or compound) Poisson model is that the frequency distribution of eruptions in any given time-period of equal length follows the negative binomial distribution (NBD). Applications of the proposed model and comparisons between the generalized model and simple Poisson model are discussed based on the historical eruptive count data of volcanoes Mauna Loa (Hawaii) and Etna (Italy). Several relevant facts lead to the conclusion that the generalized model is preferable for practical use both in space and time.
Narrow log-periodic modulations in non-Markovian random walks

NASA Astrophysics Data System (ADS)

Diniz, R. M. B.; Cressoni, J. C.; da Silva, M. A. A.; Mariz, A. M.; de Araújo, J. M.

2017-12-01

What are the necessary ingredients for log-periodicity to appear in the dynamics of a random walk model? Can they be subtle enough to be overlooked? Previous studies suggest that long-range damaged memory and negative feedback together are necessary conditions for the emergence of log-periodic oscillations. The role of negative feedback would then be crucial, forcing the system to change direction. In this paper we show that small-amplitude log-periodic oscillations can emerge when the system is driven by positive feedback. Due to their very small amplitude, these oscillations can easily be mistaken for numerical finite-size effects. The models we use consist of discrete-time random walks with strong memory correlations where the decision process is taken from memory profiles based either on a binomial distribution or on a delta distribution. Anomalous superdiffusive behavior and log-periodic modulations are shown to arise in the large time limit for convenient choices of the models parameters.
It's time to move on from the bell curve.

PubMed

Robinson, Lawrence R

2017-11-01

The bell curve was first described in the 18th century by de Moivre and Gauss to depict the distribution of binomial events, such as coin tossing, or repeated measures of physical objects. In the 19th and 20th centuries, the bell curve was appropriated, or perhaps misappropriated, to apply to biologic and social measures across people. For many years we used it to derive reference values for our electrophysiologic studies. There is, however, no reason to believe that electrophysiologic measures should approximate a bell-curve distribution, and empiric evidence suggests they do not. The concept of using mean ± 2 standard deviations should be abandoned. Reference values are best derived by using non-parametric analyses, such as percentile values. This proposal aligns with the recommendation of the recent normative data task force of the American Association of Neuromuscular & Electrodiagnostic Medicine and follows sound statistical principles. Muscle Nerve 56: 859-860, 2017. © 2017 Wiley Periodicals, Inc.
Host nutrition alters the variance in parasite transmission potential

PubMed Central

Vale, Pedro F.; Choisy, Marc; Little, Tom J.

2013-01-01

The environmental conditions experienced by hosts are known to affect their mean parasite transmission potential. How different conditions may affect the variance of transmission potential has received less attention, but is an important question for disease management, especially if specific ecological contexts are more likely to foster a few extremely infectious hosts. Using the obligate-killing bacterium Pasteuria ramosa and its crustacean host Daphnia magna, we analysed how host nutrition affected the variance of individual parasite loads, and, therefore, transmission potential. Under low food, individual parasite loads showed similar mean and variance, following a Poisson distribution. By contrast, among well-nourished hosts, parasite loads were right-skewed and overdispersed, following a negative binomial distribution. Abundant food may, therefore, yield individuals causing potentially more transmission than the population average. Measuring both the mean and variance of individual parasite loads in controlled experimental infections may offer a useful way of revealing risk factors for potential highly infectious hosts. PMID:23407498
Host nutrition alters the variance in parasite transmission potential.

PubMed

Vale, Pedro F; Choisy, Marc; Little, Tom J

2013-04-23

The environmental conditions experienced by hosts are known to affect their mean parasite transmission potential. How different conditions may affect the variance of transmission potential has received less attention, but is an important question for disease management, especially if specific ecological contexts are more likely to foster a few extremely infectious hosts. Using the obligate-killing bacterium Pasteuria ramosa and its crustacean host Daphnia magna, we analysed how host nutrition affected the variance of individual parasite loads, and, therefore, transmission potential. Under low food, individual parasite loads showed similar mean and variance, following a Poisson distribution. By contrast, among well-nourished hosts, parasite loads were right-skewed and overdispersed, following a negative binomial distribution. Abundant food may, therefore, yield individuals causing potentially more transmission than the population average. Measuring both the mean and variance of individual parasite loads in controlled experimental infections may offer a useful way of revealing risk factors for potential highly infectious hosts.
Discrete epidemic models with arbitrary stage distributions and applications to disease control.

PubMed

Hernandez-Ceron, Nancy; Feng, Zhilan; Castillo-Chavez, Carlos

2013-10-01

W.O. Kermack and A.G. McKendrick introduced in their fundamental paper, A Contribution to the Mathematical Theory of Epidemics, published in 1927, a deterministic model that captured the qualitative dynamic behavior of single infectious disease outbreaks. A Kermack–McKendrick discrete-time general framework, motivated by the emergence of a multitude of models used to forecast the dynamics of epidemics, is introduced in this manuscript. Results that allow us to measure quantitatively the role of classical and general distributions on disease dynamics are presented. The case of the geometric distribution is used to evaluate the impact of waiting-time distributions on epidemiological processes or public health interventions. In short, the geometric distribution is used to set up the baseline or null epidemiological model used to test the relevance of realistic stage-period distribution on the dynamics of single epidemic outbreaks. A final size relationship involving the control reproduction number, a function of transmission parameters and the means of distributions used to model disease or intervention control measures, is computed. Model results and simulations highlight the inconsistencies in forecasting that emerge from the use of specific parametric distributions. Examples, using the geometric, Poisson and binomial distributions, are used to highlight the impact of the choices made in quantifying the risk posed by single outbreaks and the relative importance of various control measures.
Spatial Dependence and Sampling of Phytoseiid Populations on Hass Avocados in Southern California.

PubMed

Lara, Jesús R; Amrich, Ruth; Saremi, Naseem T; Hoddle, Mark S

2016-04-22

Research on phytoseiid mites has been critical for developing an effective biocontrol strategy for suppressing Oligonchus perseae Tuttle, Baker, and Abatiello (Acari: Tetranychidae) in California avocado orchards. However, basic understanding of the spatial ecology of natural populations of phytoseiids in relation to O. perseae infestations and the validation of research-based strategies for assessing densities of these predators has been limited. To address these shortcomings, cross-sectional and longitudinal observations consisting of >3,000 phytoseiids and 500,000 O. perseae counted on 11,341 leaves were collected across 10 avocado orchards during a 10-yr period. Subsets of these data were analyzed statistically to characterize the spatial distribution of phytoseiids in avocado orchards and to evaluate the merits of developing binomial and enumerative sampling strategies for these predators. Spatial correlation of phytoseiids between trees was detected at one site, and a strong association of phytoseiids with elevated O. perseae densities was detected at four sites. Sampling simulations revealed that enumeration-based sampling performed better than binomial sampling for estimating phytoseiid densities. The ecological implications of these findings and potential for developing a custom sampling plan to estimate densities of phytoseiids inhabiting sampled trees in avocado orchards in California are discussed. © The Authors 2016. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Poisson and negative binomial item count techniques for surveys with sensitive question.

PubMed

Tian, Guo-Liang; Tang, Man-Lai; Wu, Qin; Liu, Yin

2017-04-01

Although the item count technique is useful in surveys with sensitive questions, privacy of those respondents who possess the sensitive characteristic of interest may not be well protected due to a defect in its original design. In this article, we propose two new survey designs (namely the Poisson item count technique and negative binomial item count technique) which replace several independent Bernoulli random variables required by the original item count technique with a single Poisson or negative binomial random variable, respectively. The proposed models not only provide closed form variance estimate and confidence interval within [0, 1] for the sensitive proportion, but also simplify the survey design of the original item count technique. Most importantly, the new designs do not leak respondents' privacy. Empirical results show that the proposed techniques perform satisfactorily in the sense that it yields accurate parameter estimate and confidence interval.
On the Relationship between Molecular Hit Rates in High-Throughput Screening and Molecular Descriptors.

PubMed

Hansson, Mari; Pemberton, John; Engkvist, Ola; Feierberg, Isabella; Brive, Lars; Jarvis, Philip; Zander-Balderud, Linda; Chen, Hongming

2014-06-01

High-throughput screening (HTS) is widely used in the pharmaceutical industry to identify novel chemical starting points for drug discovery projects. The current study focuses on the relationship between molecular hit rate in recent in-house HTS and four common molecular descriptors: lipophilicity (ClogP), size (heavy atom count, HEV), fraction of sp(3)-hybridized carbons (Fsp3), and fraction of molecular framework (f(MF)). The molecular hit rate is defined as the fraction of times the molecule has been assigned as active in the HTS campaigns where it has been screened. Beta-binomial statistical models were built to model the molecular hit rate as a function of these descriptors. The advantage of the beta-binomial statistical models is that the correlation between the descriptors is taken into account. Higher degree polynomial terms of the descriptors were also added into the beta-binomial statistic model to improve the model quality. The relative influence of different molecular descriptors on molecular hit rate has been estimated, taking into account that the descriptors are correlated to each other through applying beta-binomial statistical modeling. The results show that ClogP has the largest influence on the molecular hit rate, followed by Fsp3 and HEV. f(MF) has only a minor influence besides its correlation with the other molecular descriptors. © 2013 Society for Laboratory Automation and Screening.
Data mining of tree-based models to analyze freeway accident frequency.

PubMed

Chang, Li-Yen; Chen, Wen-Chieh

2005-01-01

Statistical models, such as Poisson or negative binomial regression models, have been employed to analyze vehicle accident frequency for many years. However, these models have their own model assumptions and pre-defined underlying relationship between dependent and independent variables. If these assumptions are violated, the model could lead to erroneous estimation of accident likelihood. Classification and Regression Tree (CART), one of the most widely applied data mining techniques, has been commonly employed in business administration, industry, and engineering. CART does not require any pre-defined underlying relationship between target (dependent) variable and predictors (independent variables) and has been shown to be a powerful tool, particularly for dealing with prediction and classification problems. This study collected the 2001-2002 accident data of National Freeway 1 in Taiwan. A CART model and a negative binomial regression model were developed to establish the empirical relationship between traffic accidents and highway geometric variables, traffic characteristics, and environmental factors. The CART findings indicated that the average daily traffic volume and precipitation variables were the key determinants for freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies. By comparing the prediction performance between the CART and the negative binomial regression models, this study demonstrates that CART is a good alternative method for analyzing freeway accident frequencies.
The Sequential Probability Ratio Test: An efficient alternative to exact binomial testing for Clean Water Act 303(d) evaluation.

PubMed

Chen, Connie; Gribble, Matthew O; Bartroff, Jay; Bay, Steven M; Goldstein, Larry

2017-05-01

The United States's Clean Water Act stipulates in section 303(d) that states must identify impaired water bodies for which total maximum daily loads (TMDLs) of pollution inputs into water bodies are developed. Decision-making procedures about how to list, or delist, water bodies as impaired, or not, per Clean Water Act 303(d) differ across states. In states such as California, whether or not a particular monitoring sample suggests that water quality is impaired can be regarded as a binary outcome variable, and California's current regulatory framework invokes a version of the exact binomial test to consolidate evidence across samples and assess whether the overall water body complies with the Clean Water Act. Here, we contrast the performance of California's exact binomial test with one potential alternative, the Sequential Probability Ratio Test (SPRT). The SPRT uses a sequential testing framework, testing samples as they become available and evaluating evidence as it emerges, rather than measuring all the samples and calculating a test statistic at the end of the data collection process. Through simulations and theoretical derivations, we demonstrate that the SPRT on average requires fewer samples to be measured to have comparable Type I and Type II error rates as the current fixed-sample binomial test. Policymakers might consider efficient alternatives such as SPRT to current procedure. Copyright © 2017 Elsevier Ltd. All rights reserved.
A new framework of statistical inferences based on the valid joint sampling distribution of the observed counts in an incomplete contingency table.

PubMed

Tian, Guo-Liang; Li, Hui-Qiong

2017-08-01

Some existing confidence interval methods and hypothesis testing methods in the analysis of a contingency table with incomplete observations in both margins entirely depend on an underlying assumption that the sampling distribution of the observed counts is a product of independent multinomial/binomial distributions for complete and incomplete counts. However, it can be shown that this independency assumption is incorrect and can result in unreliable conclusions because of the under-estimation of the uncertainty. Therefore, the first objective of this paper is to derive the valid joint sampling distribution of the observed counts in a contingency table with incomplete observations in both margins. The second objective is to provide a new framework for analyzing incomplete contingency tables based on the derived joint sampling distribution of the observed counts by developing a Fisher scoring algorithm to calculate maximum likelihood estimates of parameters of interest, the bootstrap confidence interval methods, and the bootstrap testing hypothesis methods. We compare the differences between the valid sampling distribution and the sampling distribution under the independency assumption. Simulation studies showed that average/expected confidence-interval widths of parameters based on the sampling distribution under the independency assumption are shorter than those based on the new sampling distribution, yielding unrealistic results. A real data set is analyzed to illustrate the application of the new sampling distribution for incomplete contingency tables and the analysis results again confirm the conclusions obtained from the simulation studies.
A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments

PubMed Central

2013-01-01

Background High-throughput RNA sequencing (RNA-seq) offers unprecedented power to capture the real dynamics of gene expression. Experimental designs with extensive biological replication present a unique opportunity to exploit this feature and distinguish expression profiles with higher resolution. RNA-seq data analysis methods so far have been mostly applied to data sets with few replicates and their default settings try to provide the best performance under this constraint. These methods are based on two well-known count data distributions: the Poisson and the negative binomial. The way to properly calibrate them with large RNA-seq data sets is not trivial for the non-expert bioinformatics user. Results Here we show that expression profiles produced by extensively-replicated RNA-seq experiments lead to a rich diversity of count data distributions beyond the Poisson and the negative binomial, such as Poisson-Inverse Gaussian or Pólya-Aeppli, which can be captured by a more general family of count data distributions called the Poisson-Tweedie. The flexibility of the Poisson-Tweedie family enables a direct fitting of emerging features of large expression profiles, such as heavy-tails or zero-inflation, without the need to alter a single configuration parameter. We provide a software package for R called tweeDEseq implementing a new test for differential expression based on the Poisson-Tweedie family. Using simulations on synthetic and real RNA-seq data we show that tweeDEseq yields P-values that are equally or more accurate than competing methods under different configuration parameters. By surveying the tiny fraction of sex-specific gene expression changes in human lymphoblastoid cell lines, we also show that tweeDEseq accurately detects differentially expressed genes in a real large RNA-seq data set with improved performance and reproducibility over the previously compared methodologies. Finally, we compared the results with those obtained from microarrays in order to check for reproducibility. Conclusions RNA-seq data with many replicates leads to a handful of count data distributions which can be accurately estimated with the statistical model illustrated in this paper. This method provides a better fit to the underlying biological variability; this may be critical when comparing groups of RNA-seq samples with markedly different count data distributions. The tweeDEseq package forms part of the Bioconductor project and it is available for download at http://www.bioconductor.org. PMID:23965047
Coronary artery calcium distributions in older persons in the AGES-Reykjavik study

PubMed Central

Gudmundsson, Elias Freyr; Gudnason, Vilmundur; Sigurdsson, Sigurdur; Launer, Lenore J.; Harris, Tamara B.; Aspelund, Thor

2013-01-01

Coronary Artery Calcium (CAC) is a sign of advanced atherosclerosis and an independent risk factor for cardiac events. Here, we describe CAC-distributions in an unselected aged population and compare modelling methods to characterize CAC-distribution. CAC is difficult to model because it has a skewed and zero inflated distribution with over-dispersion. Data are from the AGES-Reykjavik sample, a large population based study [2002-2006] in Iceland of 5,764 persons aged 66-96 years. Linear regressions using logarithmic- and Box-Cox transformations on CAC+1, quantile regression and a Zero-Inflated Negative Binomial model (ZINB) were applied. Methods were compared visually and with the PRESS-statistic, R2 and number of detected associations with concurrently measured variables. There were pronounced differences in CAC according to sex, age, history of coronary events and presence of plaque in the carotid artery. Associations with conventional coronary artery disease (CAD) risk factors varied between the sexes. The ZINB model provided the best results with respect to the PRESS-statistic, R2, and predicted proportion of zero scores. The ZINB model detected similar numbers of associations as the linear regression on ln(CAC+1) and usually with the same risk factors. PMID:22990371
Measurement of higher cumulants of net-charge multiplicity distributions in Au +Au collisions at √{sN N}=7.7 -200 GeV

NASA Astrophysics Data System (ADS)

Adare, A.; Afanasiev, S.; Aidala, C.; Ajitanand, N. N.; Akiba, Y.; Akimoto, R.; Al-Bataineh, H.; Alexander, J.; Al-Ta'Ani, H.; Angerami, A.; Aoki, K.; Apadula, N.; Aramaki, Y.; Asano, H.; Aschenauer, E. C.; Atomssa, E. T.; Averbeck, R.; Awes, T. C.; Azmoun, B.; Babintsev, V.; Bai, M.; Baksay, G.; Baksay, L.; Bannier, B.; Barish, K. N.; Bassalleck, B.; Basye, A. T.; Bathe, S.; Baublis, V.; Baumann, C.; Baumgart, S.; Bazilevsky, A.; Belikov, S.; Belmont, R.; Bennett, R.; Berdnikov, A.; Berdnikov, Y.; Bickley, A. A.; Black, D.; Blau, D. S.; Bok, J. S.; Boyle, K.; Brooks, M. L.; Bryslawskyj, J.; Buesching, H.; Bumazhnov, V.; Bunce, G.; Butsyk, S.; Camacho, C. M.; Campbell, S.; Castera, P.; Chen, C.-H.; Chi, C. Y.; Chiu, M.; Choi, I. J.; Choi, J. B.; Choi, S.; Choudhury, R. K.; Christiansen, P.; Chujo, T.; Chung, P.; Chvala, O.; Cianciolo, V.; Citron, Z.; Cole, B. A.; Connors, M.; Constantin, P.; Cronin, N.; Crossette, N.; Csanád, M.; Csörgő, T.; Dahms, T.; Dairaku, S.; Danchev, I.; Das, K.; Datta, A.; Daugherity, M. S.; David, G.; Dehmelt, K.; Denisov, A.; Deshpande, A.; Desmond, E. J.; Dharmawardane, K. V.; Dietzsch, O.; Ding, L.; Dion, A.; Do, J. H.; Donadelli, M.; D'Orazio, L.; Drapier, O.; Drees, A.; Drees, K. A.; Durham, J. M.; Durum, A.; Dutta, D.; Edwards, S.; Efremenko, Y. V.; Ellinghaus, F.; Engelmore, T.; Enokizono, A.; En'yo, H.; Esumi, S.; Eyser, K. O.; Fadem, B.; Fields, D. E.; Finger, M.; Finger, M.; Fleuret, F.; Fokin, S. L.; Fraenkel, Z.; Frantz, J. E.; Franz, A.; Frawley, A. D.; Fujiwara, K.; Fukao, Y.; Fusayasu, T.; Gainey, K.; Gal, C.; Garg, P.; Garishvili, A.; Garishvili, I.; Giordano, F.; Glenn, A.; Gong, H.; Gong, X.; Gonin, M.; Goto, Y.; Granier de Cassagnac, R.; Grau, N.; Greene, S. V.; Grosse Perdekamp, M.; Gu, Y.; Gunji, T.; Guo, L.; Gustafsson, H.-Å.; Hachiya, T.; Haggerty, J. S.; Hahn, K. I.; Hamagaki, H.; Hamblen, J.; Han, R.; Hanks, J.; Hartouni, E. P.; Hashimoto, K.; Haslum, E.; Hayano, R.; Hayashi, S.; He, X.; Heffner, M.; Hemmick, T. K.; Hester, T.; Hill, J. C.; Hohlmann, M.; Hollis, R. S.; Holzmann, W.; Homma, K.; Hong, B.; Horaguchi, T.; Hori, Y.; Hornback, D.; Huang, S.; Ichihara, T.; Ichimiya, R.; Ide, J.; Iinuma, H.; Ikeda, Y.; Imai, K.; Imazu, Y.; Imrek, J.; Inaba, M.; Iordanova, A.; Isenhower, D.; Ishihara, M.; Isinhue, A.; Isobe, T.; Issah, M.; Isupov, A.; Ivanishchev, D.; Jacak, B. V.; Javani, M.; Jia, J.; Jiang, X.; Jin, J.; Johnson, B. M.; Joo, K. S.; Jouan, D.; Jumper, D. S.; Kajihara, F.; Kametani, S.; Kamihara, N.; Kamin, J.; Kaneti, S.; Kang, B. H.; Kang, J. H.; Kang, J. S.; Kapustinsky, J.; Karatsu, K.; Kasai, M.; Kawall, D.; Kawashima, M.; Kazantsev, A. V.; Kempel, T.; Key, J. A.; Khandai, P. K.; Khanzadeev, A.; Kijima, K. M.; Kim, B. I.; Kim, C.; Kim, D. H.; Kim, D. J.; Kim, E.; Kim, E.-J.; Kim, H. J.; Kim, K.-B.; Kim, S. H.; Kim, Y.-J.; Kim, Y. K.; Kinney, E.; Kiriluk, K.; Kiss, Á.; Kistenev, E.; Klatsky, J.; Kleinjan, D.; Kline, P.; Kochenda, L.; Komatsu, Y.; Komkov, B.; Konno, M.; Koster, J.; Kotchetkov, D.; Kotov, D.; Kozlov, A.; Král, A.; Kravitz, A.; Krizek, F.; Kunde, G. J.; Kurita, K.; Kurosawa, M.; Kwon, Y.; Kyle, G. S.; Lacey, R.; Lai, Y. S.; Lajoie, J. G.; Lebedev, A.; Lee, B.; Lee, D. M.; Lee, J.; Lee, K.; Lee, K. B.; Lee, K. S.; Lee, S. H.; Lee, S. R.; Leitch, M. J.; Leite, M. A. L.; Leitgab, M.; Leitner, E.; Lenzi, B.; Lewis, B.; Li, X.; Liebing, P.; Lim, S. H.; Linden Levy, L. A.; Liška, T.; Litvinenko, A.; Liu, H.; Liu, M. X.; Love, B.; Luechtenborg, R.; Lynch, D.; Maguire, C. F.; Makdisi, Y. I.; Makek, M.; Malakhov, A.; Malik, M. D.; Manion, A.; Manko, V. I.; Mannel, E.; Mao, Y.; Maruyama, T.; Masui, H.; Masumoto, S.; Matathias, F.; McCumber, M.; McGaughey, P. L.; McGlinchey, D.; McKinney, C.; Means, N.; Meles, A.; Mendoza, M.; Meredith, B.; Miake, Y.; Mibe, T.; Midori, J.; Mignerey, A. C.; Mikeš, P.; Miki, K.; Milov, A.; Mishra, D. K.; Mishra, M.; Mitchell, J. T.; Miyachi, Y.; Miyasaka, S.; Mohanty, A. K.; Mohapatra, S.; Moon, H. J.; Morino, Y.; Morreale, A.; Morrison, D. P.; Moskowitz, M.; Motschwiller, S.; Moukhanova, T. V.; Murakami, T.; Murata, J.; Mwai, A.; Nagae, T.; Nagamiya, S.; Nagle, J. L.; Naglis, M.; Nagy, M. I.; Nakagawa, I.; Nakamiya, Y.; Nakamura, K. R.; Nakamura, T.; Nakano, K.; Nattrass, C.; Nederlof, A.; Netrakanti, P. K.; Newby, J.; Nguyen, M.; Nihashi, M.; Niida, T.; Nouicer, R.; Novitzky, N.; Nukariya, A.; Nyanin, A. S.; Obayashi, H.; O'Brien, E.; Oda, S. X.; Ogilvie, C. A.; Oka, M.; Okada, K.; Onuki, Y.; Oskarsson, A.; Ouchida, M.; Ozawa, K.; Pak, R.; Pantuev, V.; Papavassiliou, V.; Park, B. H.; Park, I. H.; Park, J.; Park, S.; Park, S. K.; Park, W. J.; Pate, S. F.; Patel, L.; Pei, H.; Peng, J.-C.; Pereira, H.; Perepelitsa, D. V.; Peresedov, V.; Peressounko, D. Yu.; Petti, R.; Pinkenburg, C.; Pisani, R. P.; Proissl, M.; Purschke, M. L.; Purwar, A. K.; Qu, H.; Rak, J.; Rakotozafindrabe, A.; Ravinovich, I.; Read, K. F.; Reygers, K.; Reynolds, D.; Riabov, V.; Riabov, Y.; Richardson, E.; Riveli, N.; Roach, D.; Roche, G.; Rolnick, S. D.; Rosati, M.; Rosen, C. A.; Rosendahl, S. S. E.; Rosnet, P.; Rukoyatkin, P.; Ružička, P.; Ryu, M. S.; Sahlmueller, B.; Saito, N.; Sakaguchi, T.; Sakashita, K.; Sako, H.; Samsonov, V.; Sano, M.; Sano, S.; Sarsour, M.; Sato, S.; Sato, T.; Sawada, S.; Sedgwick, K.; Seele, J.; Seidl, R.; Semenov, A. Yu.; Sen, A.; Seto, R.; Sett, P.; Sharma, D.; Shein, I.; Shibata, T.-A.; Shigaki, K.; Shimomura, M.; Shoji, K.; Shukla, P.; Sickles, A.; Silva, C. L.; Silvermyr, D.; Silvestre, C.; Sim, K. S.; Singh, B. K.; Singh, C. P.; Singh, V.; Skolnik, M.; Slunečka, M.; Solano, S.; Soltz, R. A.; Sondheim, W. E.; Sorensen, S. P.; Sourikova, I. V.; Sparks, N. A.; Stankus, P. W.; Steinberg, P.; Stenlund, E.; Stepanov, M.; Ster, A.; Stoll, S. P.; Sugitate, T.; Sukhanov, A.; Sun, J.; Sziklai, J.; Takagui, E. M.; Takahara, A.; Taketani, A.; Tanabe, R.; Tanaka, Y.; Taneja, S.; Tanida, K.; Tannenbaum, M. J.; Tarafdar, S.; Taranenko, A.; Tarján, P.; Tennant, E.; Themann, H.; Thomas, T. L.; Todoroki, T.; Togawa, M.; Toia, A.; Tomášek, L.; Tomášek, M.; Torii, H.; Towell, R. S.; Tserruya, I.; Tsuchimoto, Y.; Tsuji, T.; Vale, C.; Valle, H.; van Hecke, H. W.; Vargyas, M.; Vazquez-Zambrano, E.; Veicht, A.; Velkovska, J.; Vértesi, R.; Vinogradov, A. A.; Virius, M.; Voas, B.; Vossen, A.; Vrba, V.; Vznuzdaev, E.; Wang, X. R.; Watanabe, D.; Watanabe, K.; Watanabe, Y.; Watanabe, Y. S.; Wei, F.; Wei, R.; Wessels, J.; Whitaker, S.; White, S. N.; Winter, D.; Wolin, S.; Wood, J. P.; Woody, C. L.; Wright, R. M.; Wysocki, M.; Xia, B.; Xie, W.; Yamaguchi, Y. L.; Yamaura, K.; Yang, R.; Yanovich, A.; Ying, J.; Yokkaichi, S.; You, Z.; Young, G. R.; Younus, I.; Yushmanov, I. E.; Zajc, W. A.; Zelenski, A.; Zhang, C.; Zhou, S.; Zolin, L.; Phenix Collaboration

2016-01-01

We report the measurement of cumulants (Cn,n =1 ,...,4 ) of the net-charge distributions measured within pseudorapidity (|η |<0.35 ) in Au +Au collisions at √{sNN}=7.7 -200 GeV with the PHENIX experiment at the Relativistic Heavy Ion Collider. The ratios of cumulants (e.g., C1/C2 , C3/C1 ) of the net-charge distributions, which can be related to volume independent susceptibility ratios, are studied as a function of centrality and energy. These quantities are important to understand the quantum-chromodynamics phase diagram and possible existence of a critical end point. The measured values are very well described by expectation from negative binomial distributions. We do not observe any nonmonotonic behavior in the ratios of the cumulants as a function of collision energy. The measured values of C1/C2 and C3/C1 can be directly compared to lattice quantum-chromodynamics calculations and thus allow extraction of both the chemical freeze-out temperature and the baryon chemical potential at each center-of-mass energy. The extracted baryon chemical potentials are in excellent agreement with a thermal-statistical analysis model.

Distributional assumptions in food and feed commodities- development of fit-for-purpose sampling protocols.

PubMed

Paoletti, Claudia; Esbensen, Kim H

2015-01-01

Material heterogeneity influences the effectiveness of sampling procedures. Most sampling guidelines used for assessment of food and/or feed commodities are based on classical statistical distribution requirements, the normal, binomial, and Poisson distributions-and almost universally rely on the assumption of randomness. However, this is unrealistic. The scientific food and feed community recognizes a strong preponderance of non random distribution within commodity lots, which should be a more realistic prerequisite for definition of effective sampling protocols. Nevertheless, these heterogeneity issues are overlooked as the prime focus is often placed only on financial, time, equipment, and personnel constraints instead of mandating acquisition of documented representative samples under realistic heterogeneity conditions. This study shows how the principles promulgated in the Theory of Sampling (TOS) and practically tested over 60 years provide an effective framework for dealing with the complete set of adverse aspects of both compositional and distributional heterogeneity (material sampling errors), as well as with the errors incurred by the sampling process itself. The results of an empirical European Union study on genetically modified soybean heterogeneity, Kernel Lot Distribution Assessment are summarized, as they have a strong bearing on the issue of proper sampling protocol development. TOS principles apply universally in the food and feed realm and must therefore be considered the only basis for development of valid sampling protocols free from distributional constraints.
Initial conditions in high-energy collisions

NASA Astrophysics Data System (ADS)

Petreska, Elena

This thesis is focused on the initial stages of high-energy collisions in the saturation regime. We start by extending the McLerran-Venugopalan distribution of color sources in the initial wave-function of nuclei in heavy-ion collisions. We derive a fourth-order operator in the action and discuss its relevance for the description of color charge distributions in protons in high-energy experiments. We calculate the dipole scattering amplitude in proton-proton collisions with the quartic action and find an agreement with experimental data. We also obtain a modification to the fluctuation parameter of the negative binomial distribution of particle multiplicities in proton-proton experiments. The result implies an advancement of the fourth-order action towards Gaussian when the energy is increased. Finally, we calculate perturbatively the expectation value of the magnetic Wilson loop operator in the first moments of heavy-ion collisions. For the magnetic flux we obtain a first non-trivial term that is proportional to the square of the area of the loop. The result is close to numerical calculations for small area loops.
Characterization of trapped charges distribution in terms of mirror plot curve.

PubMed

Al-Obaidi, Hassan N; Mahdi, Ali S; Khaleel, Imad H

2018-01-01

Accumulation of charges (electrons) at the specimen surface in scanning electron microscope (SEM) lead to generate an electrostatic potential. By using the method of image charges, this potential is defined in the chamber's space of such apparatus. The deduced formula is expressed in terms a general volumetric distribution which proposed to be an infinitesimal spherical extension. With aid of a binomial theorem the defined potential is expanded to a multipolar form. Then resultant formula is adopted to modify a novel mirror plot equation so as to detect the real distribution of trapped charges. Simulation results reveal that trapped charges may take a various sort of arrangement such as monopole, quadruple and octuple. But existence of any of these arrangements alone may never be take place, rather are some a formations of a mix of them. Influence of each type of these profiles depends on the distance between the incident electron and surface of a sample. Result also shows that trapped charge's amount of trapped charges can refer to a threshold for failing of point charge approximation. Copyright © 2017 Elsevier B.V. All rights reserved.
Site occupancy models with heterogeneous detection probabilities

USGS Publications Warehouse

Royle, J. Andrew

2006-01-01

Models for estimating the probability of occurrence of a species in the presence of imperfect detection are important in many ecological disciplines. In these ?site occupancy? models, the possibility of heterogeneity in detection probabilities among sites must be considered because variation in abundance (and other factors) among sampled sites induces variation in detection probability (p). In this article, I develop occurrence probability models that allow for heterogeneous detection probabilities by considering several common classes of mixture distributions for p. For any mixing distribution, the likelihood has the general form of a zero-inflated binomial mixture for which inference based upon integrated likelihood is straightforward. A recent paper by Link (2003, Biometrics 59, 1123?1130) demonstrates that in closed population models used for estimating population size, different classes of mixture distributions are indistinguishable from data, yet can produce very different inferences about population size. I demonstrate that this problem can also arise in models for estimating site occupancy in the presence of heterogeneous detection probabilities. The implications of this are discussed in the context of an application to avian survey data and the development of animal monitoring programs.
Does the name really matter? The importance of botanical nomenclature and plant taxonomy in biomedical research.

PubMed

Bennett, Bradley C; Balick, Michael J

2014-03-28

Medical research on plant-derived compounds requires a breadth of expertise from field to laboratory and clinical skills. Too often basic botanical skills are evidently lacking, especially with respect to plant taxonomy and botanical nomenclature. Binomial and familial names, synonyms and author citations are often misconstrued. The correct botanical name, linked to a vouchered specimen, is the sine qua non of phytomedical research. Without the unique identifier of a proper binomial, research cannot accurately be linked to the existing literature. Perhaps more significant, is the ambiguity of species determinations that ensues of from poor taxonomic practices. This uncertainty, not surprisingly, obstructs reproducibility of results-the cornerstone of science. Based on our combined six decades of experience with medicinal plants, we discuss the problems of inaccurate taxonomy and botanical nomenclature in biomedical research. This problems appear all too frequently in manuscripts and grant applications that we review and they extend to the published literature. We also review the literature on the importance of taxonomy in other disciplines that relate to medicinal plant research. In most cases, questions regarding orthography, synonymy, author citations, and current family designations of most plant binomials can be resolved using widely-available online databases and other electronic resources. Some complex problems require consultation with a professional plant taxonomist, which also is important for accurate identification of voucher specimens. Researchers should provide the currently accepted binomial and complete author citation, provide relevant synonyms, and employ the Angiosperm Phylogeny Group III family name. Taxonomy is a vital adjunct not only to plant-medicine research but to virtually every field of science. Medicinal plant researchers can increase the precision and utility of their investigations by following sound practices with respect to botanical nomenclature. Correct spellings, accepted binomials, author citations, synonyms, and current family designations can readily be found on reliable online databases. When questions arise, researcher should consult plant taxonomists. © 2013 Published by Elsevier Ireland Ltd.
Exploring the effects of roadway characteristics on the frequency and severity of head-on crashes: case studies from Malaysian federal roads.

PubMed

Hosseinpour, Mehdi; Yahaya, Ahmad Shukri; Sadullah, Ahmad Farhan

2014-01-01

Head-on crashes are among the most severe collision types and of great concern to road safety authorities. Therefore, it justifies more efforts to reduce both the frequency and severity of this collision type. To this end, it is necessary to first identify factors associating with the crash occurrence. This can be done by developing crash prediction models that relate crash outcomes to a set of contributing factors. This study intends to identify the factors affecting both the frequency and severity of head-on crashes that occurred on 448 segments of five federal roads in Malaysia. Data on road characteristics and crash history were collected on the study segments during a 4-year period between 2007 and 2010. The frequency of head-on crashes were fitted by developing and comparing seven count-data models including Poisson, standard negative binomial (NB), random-effect negative binomial, hurdle Poisson, hurdle negative binomial, zero-inflated Poisson, and zero-inflated negative binomial models. To model crash severity, a random-effect generalized ordered probit model (REGOPM) was used given a head-on crash had occurred. With respect to the crash frequency, the random-effect negative binomial (RENB) model was found to outperform the other models according to goodness of fit measures. Based on the results of the model, the variables horizontal curvature, terrain type, heavy-vehicle traffic, and access points were found to be positively related to the frequency of head-on crashes, while posted speed limit and shoulder width decreased the crash frequency. With regard to the crash severity, the results of REGOPM showed that horizontal curvature, paved shoulder width, terrain type, and side friction were associated with more severe crashes, whereas land use, access points, and presence of median reduced the probability of severe crashes. Based on the results of this study, some potential countermeasures were proposed to minimize the risk of head-on crashes. Copyright © 2013 Elsevier Ltd. All rights reserved.
Binomial test statistics using Psi functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bowman, Kimiko o

2007-01-01

For the negative binomial model (probability generating function (p + 1 - pt){sup -k}) a logarithmic derivative is the Psi function difference {psi}(k + x) - {psi}(k); this and its derivatives lead to a test statistic to decide on the validity of a specified model. The test statistic uses a data base so there exists a comparison available between theory and application. Note that the test function is not dominated by outliers. Applications to (i) Fisher's tick data, (ii) accidents data, (iii) Weldon's dice data are included.
A modified chain binomial model to analyse the ongoing measles epidemic in Greece, July 2017 to February 2018

PubMed Central

Lytras, Theodore; Georgakopoulou, Theano; Tsiodras, Sotirios

2018-01-01

Greece is currently experiencing a large measles outbreak, in the context of multiple similar outbreaks across Europe. We devised and applied a modified chain-binomial epidemic model, requiring very simple data, to estimate the transmission parameters of this outbreak. Model results indicate sustained measles transmission among the Greek Roma population, necessitating a targeted mass vaccination campaign to halt further spread of the epidemic. Our model may be useful for other countries facing similar measles outbreaks. PMID:29717695
A modified chain binomial model to analyse the ongoing measles epidemic in Greece, July 2017 to February 2018.

PubMed

Lytras, Theodore; Georgakopoulou, Theano; Tsiodras, Sotirios

2018-04-01

Greece is currently experiencing a large measles outbreak, in the context of multiple similar outbreaks across Europe. We devised and applied a modified chain-binomial epidemic model, requiring very simple data, to estimate the transmission parameters of this outbreak. Model results indicate sustained measles transmission among the Greek Roma population, necessitating a targeted mass vaccination campaign to halt further spread of the epidemic. Our model may be useful for other countries facing similar measles outbreaks.
A brief history of numbers and statistics with cytometric applications.

PubMed

Watson, J V

2001-02-15

A brief history of numbers and statistics traces the development of numbers from prehistory to completion of our current system of numeration with the introduction of the decimal fraction by Viete, Stevin, Burgi, and Galileo at the turn of the 16th century. This was followed by the development of what we now know as probability theory by Pascal, Fermat, and Huygens in the mid-17th century which arose in connection with questions in gambling with dice and can be regarded as the origin of statistics. The three main probability distributions on which statistics depend were introduced and/or formalized between the mid-17th and early 19th centuries: the binomial distribution by Pascal; the normal distribution by de Moivre, Gauss, and Laplace, and the Poisson distribution by Poisson. The formal discipline of statistics commenced with the works of Pearson, Yule, and Gosset at the turn of the 19th century when the first statistical tests were introduced. Elementary descriptions of the statistical tests most likely to be used in conjunction with cytometric data are given and it is shown how these can be applied to the analysis of difficult immunofluorescence distributions when there is overlap between the labeled and unlabeled cell populations. Copyright 2001 Wiley-Liss, Inc.
Estimating the prevalence and intensity of Schistosoma mansoni infection among rural communities in Western Tanzania: The influence of sampling strategy and statistical approach

PubMed Central

Bakuza, Jared S.; Denwood, Matthew J.; Nkwengulila, Gamba

2017-01-01

Background Schistosoma mansoni is a parasite of major public health importance in developing countries, where it causes a neglected tropical disease known as intestinal schistosomiasis. However, the distribution of the parasite within many endemic regions is currently unknown, which hinders effective control. The purpose of this study was to characterize the prevalence and intensity of infection of S. mansoni in a remote area of western Tanzania. Methodology/Principal findings Stool samples were collected from 192 children and 147 adults residing in Gombe National Park and four nearby villages. Children were actively sampled in local schools, and adults were sampled passively by voluntary presentation at the local health clinics. The two datasets were therefore analysed separately. Faecal worm egg count (FWEC) data were analysed using negative binomial and zero-inflated negative binomial (ZINB) models with explanatory variables of site, sex, and age. The ZINB models indicated that a substantial proportion of the observed zero FWEC reflected a failure to detect eggs in truly infected individuals, meaning that the estimated true prevalence was much higher than the apparent prevalence as calculated based on the simple proportion of non-zero FWEC. For the passively sampled data from adults, the data were consistent with close to 100% true prevalence of infection. Both the prevalence and intensity of infection differed significantly between sites, but there were no significant associations with sex or age. Conclusions/Significance Overall, our data suggest a more widespread distribution of S. mansoni in this part of Tanzania than was previously thought. The apparent prevalence estimates substantially under-estimated the true prevalence as determined by the ZINB models, and the two types of sampling strategies also resulted in differing conclusions regarding prevalence of infection. We therefore recommend that future surveillance programmes designed to assess risk factors should use active sampling whenever possible, in order to avoid the self-selection bias associated with passive sampling. PMID:28934206
Enrollment Management in Medical School Admissions: A Novel Evidence-Based Approach at One Institution.

PubMed

Burkhardt, John C; DesJardins, Stephen L; Teener, Carol A; Gay, Steven E; Santen, Sally A

2016-11-01

In higher education, enrollment management has been developed to accurately predict the likelihood of enrollment of admitted students. This allows evidence to dictate numbers of interviews scheduled, offers of admission, and financial aid package distribution. The applicability of enrollment management techniques for use in medical education was tested through creation of a predictive enrollment model at the University of Michigan Medical School (U-M). U-M and American Medical College Application Service data (2006-2014) were combined to create a database including applicant demographics, academic application scores, institutional financial aid offer, and choice of school attended. Binomial logistic regression and multinomial logistic regression models were estimated in order to study factors related to enrollment at the local institution versus elsewhere and to groupings of competing peer institutions. A predictive analytic "dashboard" was created for practical use. Both models were significant at P < .001 and had similar predictive performance. In the binomial model female, underrepresented minority students, grade point average, Medical College Admission Test score, admissions committee desirability score, and most individual financial aid offers were significant (P < .05). The significant covariates were similar in the multinomial model (excluding female) and provided separate likelihoods of students enrolling at different institutional types. An enrollment-management-based approach would allow medical schools to better manage the number of students they admit and target recruitment efforts to improve their likelihood of success. It also performs a key institutional research function for understanding failed recruitment of highly desirable candidates.
A preliminary investigation of the relationships between historical crash and naturalistic driving.

PubMed

Pande, Anurag; Chand, Sai; Saxena, Neeraj; Dixit, Vinayak; Loy, James; Wolshon, Brian; Kent, Joshua D

2017-04-01

This paper describes a project that was undertaken using naturalistic driving data collected via Global Positioning System (GPS) devices to demonstrate a proof-of-concept for proactive safety assessments of crash-prone locations. The main hypothesis for the study is that the segments where drivers have to apply hard braking (higher jerks) more frequently might be the "unsafe" segments with more crashes over a long-term. The linear referencing methodology in ArcMap was used to link the GPS data with roadway characteristic data of US Highway 101 northbound (NB) and southbound (SB) in San Luis Obispo, California. The process used to merge GPS data with quarter-mile freeway segments for traditional crash frequency analysis is also discussed in the paper. A negative binomial regression analyses showed that proportion of high magnitude jerks while decelerating on freeway segments (from the driving data) was significantly related with the long-term crash frequency of those segments. A random parameter negative binomial model with uniformly distributed parameter for ADT and a fixed parameter for jerk provided a statistically significant estimate for quarter-mile segments. The results also indicated that roadway curvature and the presence of auxiliary lane are not significantly related with crash frequency for the highway segments under consideration. The results from this exploration are promising since the data used to derive the explanatory variable(s) can be collected using most off-the-shelf GPS devices, including many smartphones. Copyright © 2017 Elsevier Ltd. All rights reserved.
Analysis of railroad tank car releases using a generalized binomial model.

PubMed

Liu, Xiang; Hong, Yili

2015-11-01

The United States is experiencing an unprecedented boom in shale oil production, leading to a dramatic growth in petroleum crude oil traffic by rail. In 2014, U.S. railroads carried over 500,000 tank carloads of petroleum crude oil, up from 9500 in 2008 (a 5300% increase). In light of continual growth in crude oil by rail, there is an urgent national need to manage this emerging risk. This need has been underscored in the wake of several recent crude oil release incidents. In contrast to highway transport, which usually involves a tank trailer, a crude oil train can carry a large number of tank cars, having the potential for a large, multiple-tank-car release incident. Previous studies exclusively assumed that railroad tank car releases in the same train accident are mutually independent, thereby estimating the number of tank cars releasing given the total number of tank cars derailed based on a binomial model. This paper specifically accounts for dependent tank car releases within a train accident. We estimate the number of tank cars releasing given the number of tank cars derailed based on a generalized binomial model. The generalized binomial model provides a significantly better description for the empirical tank car accident data through our numerical case study. This research aims to provide a new methodology and new insights regarding the further development of risk management strategies for improving railroad crude oil transportation safety. Copyright © 2015 Elsevier Ltd. All rights reserved.
Diagnostic test accuracy and prevalence inferences based on joint and sequential testing with finite population sampling.

PubMed

Su, Chun-Lung; Gardner, Ian A; Johnson, Wesley O

2004-07-30

The two-test two-population model, originally formulated by Hui and Walter, for estimation of test accuracy and prevalence estimation assumes conditionally independent tests, constant accuracy across populations and binomial sampling. The binomial assumption is incorrect if all individuals in a population e.g. child-care centre, village in Africa, or a cattle herd are sampled or if the sample size is large relative to population size. In this paper, we develop statistical methods for evaluating diagnostic test accuracy and prevalence estimation based on finite sample data in the absence of a gold standard. Moreover, two tests are often applied simultaneously for the purpose of obtaining a 'joint' testing strategy that has either higher overall sensitivity or specificity than either of the two tests considered singly. Sequential versions of such strategies are often applied in order to reduce the cost of testing. We thus discuss joint (simultaneous and sequential) testing strategies and inference for them. Using the developed methods, we analyse two real and one simulated data sets, and we compare 'hypergeometric' and 'binomial-based' inferences. Our findings indicate that the posterior standard deviations for prevalence (but not sensitivity and specificity) based on finite population sampling tend to be smaller than their counterparts for infinite population sampling. Finally, we make recommendations about how small the sample size should be relative to the population size to warrant use of the binomial model for prevalence estimation. Copyright 2004 John Wiley & Sons, Ltd.
Solar San Diego: The Impact of Binomial Rate Structures on Real PV-Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Van Geet, O.; Brown, E.; Blair, T.

2008-01-01

There is confusion in the marketplace regarding the impact of solar photovoltaics (PV) on the user's actual electricity bill under California Net Energy Metering, particularly with binomial tariffs (those that include both demand and energy charges) and time-of-use (TOU) rate structures. The City of San Diego has extensive real-time electrical metering on most of its buildings and PV systems, with interval data for overall consumption and PV electrical production available for multiple years. This paper uses 2007 PV-system data from two city facilities to illustrate the impacts of binomial rate designs. The analysis will determine the energy and demand savingsmore » that the PV systems are achieving relative to the absence of systems. A financial analysis of PV-system performance under various rates structures is presented. The data revealed that actual demand and energy use benefits of bionomial tariffs increase in summer months, when solar resources allow for maximized electricity production. In a binomial tariff system, varying on- and semi-peak times can result in approximately $1,100 change in demand charges per month over not having a PV system in place, an approximate 30% cost savings. The PV systems are also shown to have a 30%-50% reduction in facility energy charges in 2007. Future work will include combining demand and electricity charges and increasing the breadth of rate structures tested, including the impacts of non-coincident demand charges.« less
Valuating Privacy with Option Pricing Theory

NASA Astrophysics Data System (ADS)

Berthold, Stefan; Böhme, Rainer

One of the key challenges in the information society is responsible handling of personal data. An often-cited reason why people fail to make rational decisions regarding their own informational privacy is the high uncertainty about future consequences of information disclosures today. This chapter builds an analogy to financial options and draws on principles of option pricing to account for this uncertainty in the valuation of privacy. For this purpose, the development of a data subject's personal attributes over time and the development of the attribute distribution in the population are modeled as two stochastic processes, which fit into the Binomial Option Pricing Model (BOPM). Possible applications of such valuation methods to guide decision support in future privacy-enhancing technologies (PETs) are sketched.
A review on models for count data with extra zeros

NASA Astrophysics Data System (ADS)

Zamri, Nik Sarah Nik; Zamzuri, Zamira Hasanah

2017-04-01

Typically, the zero inflated models are usually used in modelling count data with excess zeros. The existence of the extra zeros could be structural zeros or random which occur by chance. These types of data are commonly found in various disciplines such as finance, insurance, biomedical, econometrical, ecology, and health sciences. As found in the literature, the most popular zero inflated models used are zero inflated Poisson and zero inflated negative binomial. Recently, more complex models have been developed to account for overdispersion and unobserved heterogeneity. In addition, more extended distributions are also considered in modelling data with this feature. In this paper, we review related literature, provide a recent development and summary on models for count data with extra zeros.
An empirical study of statistical properties of variance partition coefficients for multi-level logistic regression models

USGS Publications Warehouse

Li, Ji; Gray, B.R.; Bates, D.M.

2008-01-01

Partitioning the variance of a response by design levels is challenging for binomial and other discrete outcomes. Goldstein (2003) proposed four definitions for variance partitioning coefficients (VPC) under a two-level logistic regression model. In this study, we explicitly derived formulae for multi-level logistic regression model and subsequently studied the distributional properties of the calculated VPCs. Using simulations and a vegetation dataset, we demonstrated associations between different VPC definitions, the importance of methods for estimating VPCs (by comparing VPC obtained using Laplace and penalized quasilikehood methods), and bivariate dependence between VPCs calculated at different levels. Such an empirical study lends an immediate support to wider applications of VPC in scientific data analysis.
Assessment of NDE reliability data

NASA Technical Reports Server (NTRS)

Yee, B. G. W.; Couchman, J. C.; Chang, F. H.; Packman, D. F.

1975-01-01

Twenty sets of relevant nondestructive test (NDT) reliability data were identified, collected, compiled, and categorized. A criterion for the selection of data for statistical analysis considerations was formulated, and a model to grade the quality and validity of the data sets was developed. Data input formats, which record the pertinent parameters of the defect/specimen and inspection procedures, were formulated for each NDE method. A comprehensive computer program was written and debugged to calculate the probability of flaw detection at several confidence limits by the binomial distribution. This program also selects the desired data sets for pooling and tests the statistical pooling criteria before calculating the composite detection reliability. An example of the calculated reliability of crack detection in bolt holes by an automatic eddy current method is presented.

Bayesian sample size calculations in phase II clinical trials using a mixture of informative priors.

PubMed

Gajewski, Byron J; Mayo, Matthew S

2006-08-15

A number of researchers have discussed phase II clinical trials from a Bayesian perspective. A recent article by Mayo and Gajewski focuses on sample size calculations, which they determine by specifying an informative prior distribution and then calculating a posterior probability that the true response will exceed a prespecified target. In this article, we extend these sample size calculations to include a mixture of informative prior distributions. The mixture comes from several sources of information. For example consider information from two (or more) clinicians. The first clinician is pessimistic about the drug and the second clinician is optimistic. We tabulate the results for sample size design using the fact that the simple mixture of Betas is a conjugate family for the Beta- Binomial model. We discuss the theoretical framework for these types of Bayesian designs and show that the Bayesian designs in this paper approximate this theoretical framework. Copyright 2006 John Wiley & Sons, Ltd.
Making sense of sparse rating data in collaborative filtering via topographic organization of user preference patterns.

PubMed

Polcicová, Gabriela; Tino, Peter

2004-01-01

We introduce topographic versions of two latent class models (LCM) for collaborative filtering. Latent classes are topologically organized on a square grid. Topographic organization of latent classes makes orientation in rating/preference patterns captured by the latent classes easier and more systematic. The variation in film rating patterns is modelled by multinomial and binomial distributions with varying independence assumptions. In the first stage of topographic LCM construction, self-organizing maps with neural field organized according to the LCM topology are employed. We apply our system to a large collection of user ratings for films. The system can provide useful visualization plots unveiling user preference patterns buried in the data, without loosing potential to be a good recommender model. It appears that multinomial distribution is most adequate if the model is regularized by tight grid topologies. Since we deal with probabilistic models of the data, we can readily use tools from probability and information theories to interpret and visualize information extracted by our system.
Improved confidence intervals when the sample is counted an integer times longer than the blank.

PubMed

Potter, William Edward; Strzelczyk, Jadwiga Jodi

2011-05-01

Past computer solutions for confidence intervals in paired counting are extended to the case where the ratio of the sample count time to the blank count time is taken to be an integer, IRR. Previously, confidence intervals have been named Neyman-Pearson confidence intervals; more correctly they should have been named Neyman confidence intervals or simply confidence intervals. The technique utilized mimics a technique used by Pearson and Hartley to tabulate confidence intervals for the expected value of the discrete Poisson and Binomial distributions. The blank count and the contribution of the sample to the gross count are assumed to be Poisson distributed. The expected value of the blank count, in the sample count time, is assumed known. The net count, OC, is taken to be the gross count minus the product of IRR with the blank count. The probability density function (PDF) for the net count can be determined in a straightforward manner.
The spatial distribution of fixed mutations within genes coding for proteins

NASA Technical Reports Server (NTRS)

Holmquist, R.; Goodman, M.; Conroy, T.; Czelusniak, J.

1983-01-01

An examination has been conducted of the extensive amino acid sequence data now available for five protein families - the alpha crystallin A chain, myoglobin, alpha and beta hemoglobin, and the cytochromes c - with the goal of estimating the true spatial distribution of base substitutions within genes that code for proteins. In every case the commonly used Poisson density failed to even approximate the experimental pattern of base substitution. For the 87 species of beta hemoglobin examined, for example, the probability that the observed results were from a Poisson process was the minuscule 10 to the -44th. Analogous results were obtained for the other functional families. All the data were reasonably, but not perfectly, described by the negative binomial density. In particular, most of the data were described by one of the very simple limiting forms of this density, the geometric density. The implications of this for evolutionary inference are discussed. It is evident that most estimates of total base substitutions between genes are badly in need of revision.
Taxonomy of the order Mononegavirales: update 2017.

PubMed

Amarasinghe, Gaya K; Bào, Yīmíng; Basler, Christopher F; Bavari, Sina; Beer, Martin; Bejerman, Nicolás; Blasdell, Kim R; Bochnowski, Alisa; Briese, Thomas; Bukreyev, Alexander; Calisher, Charles H; Chandran, Kartik; Collins, Peter L; Dietzgen, Ralf G; Dolnik, Olga; Dürrwald, Ralf; Dye, John M; Easton, Andrew J; Ebihara, Hideki; Fang, Qi; Formenty, Pierre; Fouchier, Ron A M; Ghedin, Elodie; Harding, Robert M; Hewson, Roger; Higgins, Colleen M; Hong, Jian; Horie, Masayuki; James, Anthony P; Jiāng, Dàohóng; Kobinger, Gary P; Kondo, Hideki; Kurath, Gael; Lamb, Robert A; Lee, Benhur; Leroy, Eric M; Li, Ming; Maisner, Andrea; Mühlberger, Elke; Netesov, Sergey V; Nowotny, Norbert; Patterson, Jean L; Payne, Susan L; Paweska, Janusz T; Pearson, Michael N; Randall, Rick E; Revill, Peter A; Rima, Bertus K; Rota, Paul; Rubbenstroth, Dennis; Schwemmle, Martin; Smither, Sophie J; Song, Qisheng; Stone, David M; Takada, Ayato; Terregino, Calogero; Tesh, Robert B; Tomonaga, Keizo; Tordo, Noël; Towner, Jonathan S; Vasilakis, Nikos; Volchkov, Viktor E; Wahl-Jensen, Victoria; Walker, Peter J; Wang, Beibei; Wang, David; Wang, Fei; Wang, Lin-Fa; Werren, John H; Whitfield, Anna E; Yan, Zhichao; Ye, Gongyin; Kuhn, Jens H

2017-08-01

In 2017, the order Mononegavirales was expanded by the inclusion of a total of 69 novel species. Five new rhabdovirus genera and one new nyamivirus genus were established to harbor 41 of these species, whereas the remaining new species were assigned to already established genera. Furthermore, non-Latinized binomial species names replaced all paramyxovirus and pneumovirus species names, thereby accomplishing application of binomial species names throughout the entire order. This article presents the updated taxonomy of the order Mononegavirales as now accepted by the International Committee on Taxonomy of Viruses (ICTV).
Categorical Data Analysis Using a Skewed Weibull Regression Model

NASA Astrophysics Data System (ADS)

Caron, Renault; Sinha, Debajyoti; Dey, Dipak; Polpo, Adriano

2018-03-01

In this paper, we present a Weibull link (skewed) model for categorical response data arising from binomial as well as multinomial model. We show that, for such types of categorical data, the most commonly used models (logit, probit and complementary log-log) can be obtained as limiting cases. We further compare the proposed model with some other asymmetrical models. The Bayesian as well as frequentist estimation procedures for binomial and multinomial data responses are presented in details. The analysis of two data sets to show the efficiency of the proposed model is performed.
QMRA for Drinking Water: 2. The Effect of Pathogen Clustering in Single-Hit Dose-Response Models.

PubMed

Nilsen, Vegard; Wyller, John

2016-01-01

Spatial and/or temporal clustering of pathogens will invalidate the commonly used assumption of Poisson-distributed pathogen counts (doses) in quantitative microbial risk assessment. In this work, the theoretically predicted effect of spatial clustering in conventional "single-hit" dose-response models is investigated by employing the stuttering Poisson distribution, a very general family of count distributions that naturally models pathogen clustering and contains the Poisson and negative binomial distributions as special cases. The analysis is facilitated by formulating the dose-response models in terms of probability generating functions. It is shown formally that the theoretical single-hit risk obtained with a stuttering Poisson distribution is lower than that obtained with a Poisson distribution, assuming identical mean doses. A similar result holds for mixed Poisson distributions. Numerical examples indicate that the theoretical single-hit risk is fairly insensitive to moderate clustering, though the effect tends to be more pronounced for low mean doses. Furthermore, using Jensen's inequality, an upper bound on risk is derived that tends to better approximate the exact theoretical single-hit risk for highly overdispersed dose distributions. The bound holds with any dose distribution (characterized by its mean and zero inflation index) and any conditional dose-response model that is concave in the dose variable. Its application is exemplified with published data from Norovirus feeding trials, for which some of the administered doses were prepared from an inoculum of aggregated viruses. The potential implications of clustering for dose-response assessment as well as practical risk characterization are discussed. © 2016 Society for Risk Analysis.
Golay Complementary Waveforms in Reed–Müller Sequences for Radar Detection of Nonzero Doppler Targets

PubMed Central

Wang, Xuezhi; Huang, Xiaotao; Suvorova, Sofia; Moran, Bill

2018-01-01

Golay complementary waveforms can, in theory, yield radar returns of high range resolution with essentially zero sidelobes. In practice, when deployed conventionally, while high signal-to-noise ratios can be achieved for static target detection, significant range sidelobes are generated by target returns of nonzero Doppler causing unreliable detection. We consider signal processing techniques using Golay complementary waveforms to improve radar detection performance in scenarios involving multiple nonzero Doppler targets. A signal processing procedure based on an existing, so called, Binomial Design algorithm that alters the transmission order of Golay complementary waveforms and weights the returns is proposed in an attempt to achieve an enhanced illumination performance. The procedure applies one of three proposed waveform transmission ordering algorithms, followed by a pointwise nonlinear processor combining the outputs of the Binomial Design algorithm and one of the ordering algorithms. The computational complexity of the Binomial Design algorithm and the three ordering algorithms are compared, and a statistical analysis of the performance of the pointwise nonlinear processing is given. Estimation of the areas in the Delay–Doppler map occupied by significant range sidelobes for given targets are also discussed. Numerical simulations for the comparison of the performances of the Binomial Design algorithm and the three ordering algorithms are presented for both fixed and randomized target locations. The simulation results demonstrate that the proposed signal processing procedure has a better detection performance in terms of lower sidelobes and higher Doppler resolution in the presence of multiple nonzero Doppler targets compared to existing methods. PMID:29324708
Some considerations for excess zeroes in substance abuse research.

PubMed

Bandyopadhyay, Dipankar; DeSantis, Stacia M; Korte, Jeffrey E; Brady, Kathleen T

2011-09-01

Count data collected in substance abuse research often come with an excess of "zeroes," which are typically handled using zero-inflated regression models. However, there is a need to consider the design aspects of those studies before using such a statistical model to ascertain the sources of zeroes. We sought to illustrate hurdle models as alternatives to zero-inflated models to validate a two-stage decision-making process in situations of "excess zeroes." We use data from a study of 45 cocaine-dependent subjects where the primary scientific question was to evaluate whether study participation influences drug-seeking behavior. The outcome, "the frequency (count) of cocaine use days per week," is bounded (ranging from 0 to 7). We fit and compare binomial, Poisson, negative binomial, and the hurdle version of these models to study the effect of gender, age, time, and study participation on cocaine use. The hurdle binomial model provides the best fit. Gender and time are not predictive of use. Higher odds of use versus no use are associated with age; however once use is experienced, odds of further use decrease with increase in age. Participation was associated with higher odds of no-cocaine use; once there is use, participation reduced the odds of further use. Age and study participation are significantly predictive of cocaine-use behavior. The two-stage decision process as modeled by a hurdle binomial model (appropriate for bounded count data with excess zeroes) provides interesting insights into the study of covariate effects on count responses of substance use, when all enrolled subjects are believed to be "at-risk" of use.
Measurement of higher cumulants of net-charge multiplicity distributions in Au + Au collisions at s N N = 7.7 – 200 GeV

DOE PAGES

Adare, A.; Afanasiev, S.; Aidala, C.; ...

2016-01-19

Our report presents the measurement of cumulants (C n,n=1,...,4) of the net-charge distributions measured within pseudorapidity (|η|<0.35) in Au+Au collisions at √s NN=7.7–200GeV with the PHENIX experiment at the Relativistic Heavy Ion Collider. The ratios of cumulants (e.g., C 1/C 2, C 3/C 1) of the net-charge distributions, which can be related to volume independent susceptibility ratios, are studied as a function of centrality and energy. These quantities are important to understand the quantum-chromodynamics phase diagram and possible existence of a critical end point. The measured values are very well described by expectation from negative binomial distributions. We do notmore » observe any nonmonotonic behavior in the ratios of the cumulants as a function of collision energy. These measured values of C 1/C 2 and C 3/C 1 can be directly compared to lattice quantum-chromodynamics calculations and thus allow extraction of both the chemical freeze-out temperature and the baryon chemical potential at each center-of-mass energy. Moreover, the extracted baryon chemical potentials are in excellent agreement with a thermal-statistical analysis model.« less
Finite-key analysis for quantum key distribution with weak coherent pulses based on Bernoulli sampling

NASA Astrophysics Data System (ADS)

Kawakami, Shun; Sasaki, Toshihiko; Koashi, Masato

2017-07-01

An essential step in quantum key distribution is the estimation of parameters related to the leaked amount of information, which is usually done by sampling of the communication data. When the data size is finite, the final key rate depends on how the estimation process handles statistical fluctuations. Many of the present security analyses are based on the method with simple random sampling, where hypergeometric distribution or its known bounds are used for the estimation. Here we propose a concise method based on Bernoulli sampling, which is related to binomial distribution. Our method is suitable for the Bennett-Brassard 1984 (BB84) protocol with weak coherent pulses [C. H. Bennett and G. Brassard, Proceedings of the IEEE Conference on Computers, Systems and Signal Processing (IEEE, New York, 1984), Vol. 175], reducing the number of estimated parameters to achieve a higher key generation rate compared to the method with simple random sampling. We also apply the method to prove the security of the differential-quadrature-phase-shift (DQPS) protocol in the finite-key regime. The result indicates that the advantage of the DQPS protocol over the phase-encoding BB84 protocol in terms of the key rate, which was previously confirmed in the asymptotic regime, persists in the finite-key regime.
Estimating relative risks for common outcome using PROC NLP.

PubMed

Yu, Binbing; Wang, Zhuoqiao

2008-05-01

In cross-sectional or cohort studies with binary outcomes, it is biologically interpretable and of interest to estimate the relative risk or prevalence ratio, especially when the response rates are not rare. Several methods have been used to estimate the relative risk, among which the log-binomial models yield the maximum likelihood estimate (MLE) of the parameters. Because of restrictions on the parameter space, the log-binomial models often run into convergence problems. Some remedies, e.g., the Poisson and Cox regressions, have been proposed. However, these methods may give out-of-bound predicted response probabilities. In this paper, a new computation method using the SAS Nonlinear Programming (NLP) procedure is proposed to find the MLEs. The proposed NLP method was compared to the COPY method, a modified method to fit the log-binomial model. Issues in the implementation are discussed. For illustration, both methods were applied to data on the prevalence of microalbuminuria (micro-protein leakage into urine) for kidney disease patients from the Diabetes Control and Complications Trial. The sample SAS macro for calculating relative risk is provided in the appendix.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Adare, A.; Afanasiev, S.; Aidala, C.

Our report presents the measurement of cumulants (C n,n=1,...,4) of the net-charge distributions measured within pseudorapidity (|η|<0.35) in Au+Au collisions at √s NN=7.7–200GeV with the PHENIX experiment at the Relativistic Heavy Ion Collider. The ratios of cumulants (e.g., C 1/C 2, C 3/C 1) of the net-charge distributions, which can be related to volume independent susceptibility ratios, are studied as a function of centrality and energy. These quantities are important to understand the quantum-chromodynamics phase diagram and possible existence of a critical end point. The measured values are very well described by expectation from negative binomial distributions. We do notmore » observe any nonmonotonic behavior in the ratios of the cumulants as a function of collision energy. These measured values of C 1/C 2 and C 3/C 1 can be directly compared to lattice quantum-chromodynamics calculations and thus allow extraction of both the chemical freeze-out temperature and the baryon chemical potential at each center-of-mass energy. Moreover, the extracted baryon chemical potentials are in excellent agreement with a thermal-statistical analysis model.« less
Collective Human Mobility Pattern from Taxi Trips in Urban Area

PubMed Central

Peng, Chengbin; Jin, Xiaogang; Wong, Ka-Chun; Shi, Meixia; Liò, Pietro

2012-01-01

We analyze the passengers' traffic pattern for 1.58 million taxi trips of Shanghai, China. By employing the non-negative matrix factorization and optimization methods, we find that, people travel on workdays mainly for three purposes: commuting between home and workplace, traveling from workplace to workplace, and others such as leisure activities. Therefore, traffic flow in one area or between any pair of locations can be approximated by a linear combination of three basis flows, corresponding to the three purposes respectively. We name the coefficients in the linear combination as traffic powers, each of which indicates the strength of each basis flow. The traffic powers on different days are typically different even for the same location, due to the uncertainty of the human motion. Therefore, we provide a probability distribution function for the relative deviation of the traffic power. This distribution function is in terms of a series of functions for normalized binomial distributions. It can be well explained by statistical theories and is verified by empirical data. These findings are applicable in predicting the road traffic, tracing the traffic pattern and diagnosing the traffic related abnormal events. These results can also be used to infer land uses of urban area quite parsimoniously. PMID:22529917
Variable selection for distribution-free models for longitudinal zero-inflated count responses.

PubMed

Chen, Tian; Wu, Pan; Tang, Wan; Zhang, Hui; Feng, Changyong; Kowalski, Jeanne; Tu, Xin M

2016-07-20

Zero-inflated count outcomes arise quite often in research and practice. Parametric models such as the zero-inflated Poisson and zero-inflated negative binomial are widely used to model such responses. Like most parametric models, they are quite sensitive to departures from assumed distributions. Recently, new approaches have been proposed to provide distribution-free, or semi-parametric, alternatives. These methods extend the generalized estimating equations to provide robust inference for population mixtures defined by zero-inflated count outcomes. In this paper, we propose methods to extend smoothly clipped absolute deviation (SCAD)-based variable selection methods to these new models. Variable selection has been gaining popularity in modern clinical research studies, as determining differential treatment effects of interventions for different subgroups has become the norm, rather the exception, in the era of patent-centered outcome research. Such moderation analysis in general creates many explanatory variables in regression analysis, and the advantages of SCAD-based methods over their traditional counterparts render them a great choice for addressing this important and timely issues in clinical research. We illustrate the proposed approach with both simulated and real study data. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Binomial tree method for pricing a regime-switching volatility stock loans

NASA Astrophysics Data System (ADS)

Putri, Endah R. M.; Zamani, Muhammad S.; Utomo, Daryono B.

2018-03-01

Binomial model with regime switching may represents the price of stock loan which follows the stochastic process. Stock loan is one of alternative that appeal investors to get the liquidity without selling the stock. The stock loan mechanism resembles that of American call option when someone can exercise any time during the contract period. From the resembles both of mechanism, determination price of stock loan can be interpreted from the model of American call option. The simulation result shows the behavior of the price of stock loan under a regime-switching with respect to various interest rate and maturity.
Umbral Calculus and Holonomic Modules in Positive Characteristic

NASA Astrophysics Data System (ADS)

Kochubei, Anatoly N.

2006-03-01

In the framework of analysis over local fields of positive characteristic, we develop algebraic tools for introducing and investigating various polynomial systems. In this survey paper we describe a function field version of umbral calculus developed on the basis of a relation of binomial type satisfied by the Carlitz polynomials. We consider modules over the Weyl-Carlitz ring, a function field counterpart of the Weyl algebra. It is shown that some basic objects of function field arithmetic, like the Carlitz module, Thakur's hypergeometric polynomials, and analogs of binomial coefficients arising in the positive characteristic version of umbral calculus, generate holonomic modules.
Modelling parasite aggregation: disentangling statistical and ecological approaches.

PubMed

Yakob, Laith; Soares Magalhães, Ricardo J; Gray, Darren J; Milinovich, Gabriel; Wardrop, Nicola; Dunning, Rebecca; Barendregt, Jan; Bieri, Franziska; Williams, Gail M; Clements, Archie C A

2014-05-01

The overdispersion in macroparasite infection intensity among host populations is commonly simulated using a constant negative binomial aggregation parameter. We describe an alternative to utilising the negative binomial approach and demonstrate important disparities in intervention efficacy projections that can come about from opting for pattern-fitting models that are not process-explicit. We present model output in the context of the epidemiology and control of soil-transmitted helminths due to the significant public health burden imposed by these parasites, but our methods are applicable to other infections with demonstrable aggregation in parasite numbers among hosts. Copyright © 2014. Published by Elsevier Ltd.
A Binomial Modeling Approach for Upscaling Colloid Transport Under Unfavorable Attachment Conditions: Emergent Prediction of Nonmonotonic Retention Profiles

NASA Astrophysics Data System (ADS)

Hilpert, Markus; Johnson, William P.

2018-01-01

We used a recently developed simple mathematical network model to upscale pore-scale colloid transport information determined under unfavorable attachment conditions. Classical log-linear and nonmonotonic retention profiles, both well-reported under favorable and unfavorable attachment conditions, respectively, emerged from our upscaling. The primary attribute of the network is colloid transfer between bulk pore fluid, the near-surface fluid domain (NSFD), and attachment (treated as irreversible). The network model accounts for colloid transfer to the NSFD of downgradient grains and for reentrainment to bulk pore fluid via diffusion or via expulsion at rear flow stagnation zones (RFSZs). The model describes colloid transport by a sequence of random trials in a one-dimensional (1-D) network of Happel cells, which contain a grain and a pore. Using combinatorial analysis that capitalizes on the binomial coefficient, we derived from the pore-scale information the theoretical residence time distribution of colloids in the network. The transition from log-linear to nonmonotonic retention profiles occurs when the conditions underlying classical filtration theory are not fulfilled, i.e., when an NSFD colloid population is maintained. Then, nonmonotonic retention profiles result potentially both for attached and NSFD colloids. The concentration maxima shift downgradient depending on specific parameter choice. The concentration maxima were also shown to shift downgradient temporally (with continued elution) under conditions where attachment is negligible, explaining experimentally observed downgradient transport of retained concentration maxima of adhesion-deficient bacteria. For the case of zero reentrainment, we develop closed-form, analytical expressions for the shape, and the maximum of the colloid retention profile.
A binomial modeling approach for upscaling colloid transport under unfavorable conditions: Emergent prediction of extended tailing

NASA Astrophysics Data System (ADS)

Hilpert, Markus; Rasmuson, Anna; Johnson, William P.

2017-07-01

Colloid transport in saturated porous media is significantly influenced by colloidal interactions with grain surfaces. Near-surface fluid domain colloids experience relatively low fluid drag and relatively strong colloidal forces that slow their downgradient translation relative to colloids in bulk fluid. Near-surface fluid domain colloids may reenter into the bulk fluid via diffusion (nanoparticles) or expulsion at rear flow stagnation zones, they may immobilize (attach) via primary minimum interactions, or they may move along a grain-to-grain contact to the near-surface fluid domain of an adjacent grain. We introduce a simple model that accounts for all possible permutations of mass transfer within a dual pore and grain network. The primary phenomena thereby represented in the model are mass transfer of colloids between the bulk and near-surface fluid domains and immobilization. Colloid movement is described by a Markov chain, i.e., a sequence of trials in a 1-D network of unit cells, which contain a pore and a grain. Using combinatorial analysis, which utilizes the binomial coefficient, we derive the residence time distribution, i.e., an inventory of the discrete colloid travel times through the network and of their probabilities to occur. To parameterize the network model, we performed mechanistic pore-scale simulations in a single unit cell that determined the likelihoods and timescales associated with the above colloid mass transfer processes. We found that intergrain transport of colloids in the near-surface fluid domain can cause extended tailing, which has traditionally been attributed to hydrodynamic dispersion emanating from flow tortuosity of solute trajectories.

The Effect of Two Receivers on Broadcast Molecular Communication Systems.

PubMed

Lu, Yi; Higgins, Matthew D; Noel, Adam; Leeson, Mark S; Chen, Yunfei

2016-12-01

Molecular communication is a paradigm that utilizes molecules to exchange information between nano-machines. When considering such systems where multiple receivers are present, prior work has assumed for simplicity that they do not interfere with each other. This paper aims to address this issue and shows to what extent an interfering receiver, [Formula: see text], will have an impact on the target receiver, [Formula: see text], with respect to Bit Error Rate (BER) and capacity. Furthermore, approximations of the Binomial distribution are applied to reduce the complexity of calculations. Results show the sensitivity in communication performance due to the relative location of the interfering receiver. Critically, placing [Formula: see text] between the transmitter [Formula: see text] and [Formula: see text] causes a significant increase in BER or decrease in capacity.
Misconduct Within the "Four Walls": Does Organizational Justice Matter in Explaining Prison Officers' Misconduct and Job Stress?

PubMed

Boateng, Francis D; Hsieh, Ming-Li

2018-06-01

Primarily, this article examines the role of organizational justice in understanding prison officers' behavior. The authors surveyed 169 correctional officers across five correctional facilities in Ghana to explore the role of three organizational justice dimensions in prison misconduct and job stress. Results from the negative binomial and ordinal logistic analyses revealed the significant contributions of two dimensions of organizational justice in explaining misconduct and stress among officers. Officers who had higher perceptions of distributive fairness and interaction in the organization had lower odds of receiving misconduct-related complaints. Also, greater interaction was found to be associated with reduced job stress among prison officers. In addition, several officers' characteristics were found to predict the number of times officers received misconduct complaints.
Financial Data Analysis by means of Coupled Continuous-Time Random Walk in Rachev-Rűschendorf Model

NASA Astrophysics Data System (ADS)

Jurlewicz, A.; Wyłomańska, A.; Żebrowski, P.

2008-09-01

We adapt the continuous-time random walk formalism to describe asset price evolution. We expand the idea proposed by Rachev and Rűschendorf who analyzed the binomial pricing model in the discrete time with randomization of the number of price changes. As a result, in the framework of the proposed model we obtain a mixture of the Gaussian and a generalized arcsine laws as the limiting distribution of log-returns. Moreover, we derive an European-call-option price that is an extension of the Black-Scholes formula. We apply the obtained theoretical results to model actual financial data and try to show that the continuous-time random walk offers alternative tools to deal with several complex issues of financial markets.
Statistical distributions of earthquake numbers: consequence of branching process

NASA Astrophysics Data System (ADS)

Kagan, Yan Y.

2010-03-01

We discuss various statistical distributions of earthquake numbers. Previously, we derived several discrete distributions to describe earthquake numbers for the branching model of earthquake occurrence: these distributions are the Poisson, geometric, logarithmic and the negative binomial (NBD). The theoretical model is the `birth and immigration' population process. The first three distributions above can be considered special cases of the NBD. In particular, a point branching process along the magnitude (or log seismic moment) axis with independent events (immigrants) explains the magnitude/moment-frequency relation and the NBD of earthquake counts in large time/space windows, as well as the dependence of the NBD parameters on the magnitude threshold (magnitude of an earthquake catalogue completeness). We discuss applying these distributions, especially the NBD, to approximate event numbers in earthquake catalogues. There are many different representations of the NBD. Most can be traced either to the Pascal distribution or to the mixture of the Poisson distribution with the gamma law. We discuss advantages and drawbacks of both representations for statistical analysis of earthquake catalogues. We also consider applying the NBD to earthquake forecasts and describe the limits of the application for the given equations. In contrast to the one-parameter Poisson distribution so widely used to describe earthquake occurrence, the NBD has two parameters. The second parameter can be used to characterize clustering or overdispersion of a process. We determine the parameter values and their uncertainties for several local and global catalogues, and their subdivisions in various time intervals, magnitude thresholds, spatial windows, and tectonic categories. The theoretical model of how the clustering parameter depends on the corner (maximum) magnitude can be used to predict future earthquake number distribution in regions where very large earthquakes have not yet occurred.
An examination of sources of sensitivity of consumer surplus estimates in travel cost models.

PubMed

Blaine, Thomas W; Lichtkoppler, Frank R; Bader, Timothy J; Hartman, Travis J; Lucente, Joseph E

2015-03-15

We examine sensitivity of estimates of recreation demand using the Travel Cost Method (TCM) to four factors. Three of the four have been routinely and widely discussed in the TCM literature: a) Poisson verses negative binomial regression; b) application of Englin correction to account for endogenous stratification; c) truncation of the data set to eliminate outliers. A fourth issue we address has not been widely modeled: the potential effect on recreation demand of the interaction between income and travel cost. We provide a straightforward comparison of all four factors, analyzing the impact of each on regression parameters and consumer surplus estimates. Truncation has a modest effect on estimates obtained from the Poisson models but a radical effect on the estimates obtained by way of the negative binomial. Inclusion of an income-travel cost interaction term generally produces a more conservative but not a statistically significantly different estimate of consumer surplus in both Poisson and negative binomial models. It also generates broader confidence intervals. Application of truncation, the Englin correction and the income-travel cost interaction produced the most conservative estimates of consumer surplus and eliminated the statistical difference between the Poisson and the negative binomial. Use of the income-travel cost interaction term reveals that for visitors who face relatively low travel costs, the relationship between income and travel demand is negative, while it is positive for those who face high travel costs. This provides an explanation of the ambiguities on the findings regarding the role of income widely observed in the TCM literature. Our results suggest that policies that reduce access to publicly owned resources inordinately impact local low income recreationists and are contrary to environmental justice. Copyright © 2014 Elsevier Ltd. All rights reserved.
Partitioning Detectability Components in Populations Subject to Within-Season Temporary Emigration Using Binomial Mixture Models

PubMed Central

O’Donnell, Katherine M.; Thompson, Frank R.; Semlitsch, Raymond D.

2015-01-01

Detectability of individual animals is highly variable and nearly always < 1; imperfect detection must be accounted for to reliably estimate population sizes and trends. Hierarchical models can simultaneously estimate abundance and effective detection probability, but there are several different mechanisms that cause variation in detectability. Neglecting temporary emigration can lead to biased population estimates because availability and conditional detection probability are confounded. In this study, we extend previous hierarchical binomial mixture models to account for multiple sources of variation in detectability. The state process of the hierarchical model describes ecological mechanisms that generate spatial and temporal patterns in abundance, while the observation model accounts for the imperfect nature of counting individuals due to temporary emigration and false absences. We illustrate our model’s potential advantages, including the allowance of temporary emigration between sampling periods, with a case study of southern red-backed salamanders Plethodon serratus. We fit our model and a standard binomial mixture model to counts of terrestrial salamanders surveyed at 40 sites during 3–5 surveys each spring and fall 2010–2012. Our models generated similar parameter estimates to standard binomial mixture models. Aspect was the best predictor of salamander abundance in our case study; abundance increased as aspect became more northeasterly. Increased time-since-rainfall strongly decreased salamander surface activity (i.e. availability for sampling), while higher amounts of woody cover objects and rocks increased conditional detection probability (i.e. probability of capture, given an animal is exposed to sampling). By explicitly accounting for both components of detectability, we increased congruence between our statistical modeling and our ecological understanding of the system. We stress the importance of choosing survey locations and protocols that maximize species availability and conditional detection probability to increase population parameter estimate reliability. PMID:25775182
A big data approach to the development of mixed-effects models for seizure count data.

PubMed

Tharayil, Joseph J; Chiang, Sharon; Moss, Robert; Stern, John M; Theodore, William H; Goldenholz, Daniel M

2017-05-01

Our objective was to develop a generalized linear mixed model for predicting seizure count that is useful in the design and analysis of clinical trials. This model also may benefit the design and interpretation of seizure-recording paradigms. Most existing seizure count models do not include children, and there is currently no consensus regarding the most suitable model that can be applied to children and adults. Therefore, an additional objective was to develop a model that accounts for both adult and pediatric epilepsy. Using data from SeizureTracker.com, a patient-reported seizure diary tool with >1.2 million recorded seizures across 8 years, we evaluated the appropriateness of Poisson, negative binomial, zero-inflated negative binomial, and modified negative binomial models for seizure count data based on minimization of the Bayesian information criterion. Generalized linear mixed-effects models were used to account for demographic and etiologic covariates and for autocorrelation structure. Holdout cross-validation was used to evaluate predictive accuracy in simulating seizure frequencies. For both adults and children, we found that a negative binomial model with autocorrelation over 1 day was optimal. Using holdout cross-validation, the proposed model was found to provide accurate simulation of seizure counts for patients with up to four seizures per day. The optimal model can be used to generate more realistic simulated patient data with very few input parameters. The availability of a parsimonious, realistic virtual patient model can be of great utility in simulations of phase II/III clinical trials, epilepsy monitoring units, outpatient biosensors, and mobile Health (mHealth) applications. Wiley Periodicals, Inc. © 2017 International League Against Epilepsy.
Operating characteristics of full count and binomial sampling plans for green peach aphid (Hemiptera: Aphididae) in potato.

PubMed

Kabaluk, J Todd; Binns, Michael R; Vernon, Robert S

2006-06-01

Counts of green peach aphid, Myzus persicae (Sulzer) (Hemiptera: Aphididae), in potato, Solanum tuberosum L., fields were used to evaluate the performance of the sampling plan from a pest management company. The counts were further used to develop a binomial sampling method, and both full count and binomial plans were evaluated using operating characteristic curves. Taylor's power law provided a good fit of the data (r2 = 0.95), with the relationship between the variance (s2) and mean (m) as ln(s2) = 1.81(+/- 0.02) + 1.55(+/- 0.01) ln(m). A binomial sampling method was developed using the empirical model ln(m) = c + dln(-ln(1 - P(T))), to which the data fit well for tally numbers (T) of 0, 1, 3, 5, 7, and 10. Although T = 3 was considered the most reasonable given its operating characteristics and presumed ease of classification above or below critical densities (i.e., action thresholds) of one and 10 M. persicae per leaf, the full count method is shown to be superior. The mean number of sample sites per field visit by the pest management company was 42 +/- 19, with more than one-half (54%) of the field visits involving sampling 31-50 sample sites, which was acceptable in the context of operating characteristic curves for a critical density of 10 M. persicae per leaf. Based on operating characteristics, actual sample sizes used by the pest management company can be reduced by at least 50%, on average, for a critical density of 10 M. persicae per leaf. For a critical density of one M. persicae per leaf used to avert the spread of potato leaf roll virus, sample sizes from 50 to 100 were considered more suitable.
Solving the problem of negative populations in approximate accelerated stochastic simulations using the representative reaction approach.

PubMed

Kadam, Shantanu; Vanka, Kumar

2013-02-15

Methods based on the stochastic formulation of chemical kinetics have the potential to accurately reproduce the dynamical behavior of various biochemical systems of interest. However, the computational expense makes them impractical for the study of real systems. Attempts to render these methods practical have led to the development of accelerated methods, where the reaction numbers are modeled by Poisson random numbers. However, for certain systems, such methods give rise to physically unrealistic negative numbers for species populations. The methods which make use of binomial variables, in place of Poisson random numbers, have since become popular, and have been partially successful in addressing this problem. In this manuscript, the development of two new computational methods, based on the representative reaction approach (RRA), has been discussed. The new methods endeavor to solve the problem of negative numbers, by making use of tools like the stochastic simulation algorithm and the binomial method, in conjunction with the RRA. It is found that these newly developed methods perform better than other binomial methods used for stochastic simulations, in resolving the problem of negative populations. Copyright © 2012 Wiley Periodicals, Inc.
Estimating abundance while accounting for rarity, correlated behavior, and other sources of variation in counts

USGS Publications Warehouse

Dorazio, Robert M.; Martin, Juulien; Edwards, Holly H.

2013-01-01

The class of N-mixture models allows abundance to be estimated from repeated, point count surveys while adjusting for imperfect detection of individuals. We developed an extension of N-mixture models to account for two commonly observed phenomena in point count surveys: rarity and lack of independence induced by unmeasurable sources of variation in the detectability of individuals. Rarity increases the number of locations with zero detections in excess of those expected under simple models of abundance (e.g., Poisson or negative binomial). Correlated behavior of individuals and other phenomena, though difficult to measure, increases the variation in detection probabilities among surveys. Our extension of N-mixture models includes a hurdle model of abundance and a beta-binomial model of detectability that accounts for additional (extra-binomial) sources of variation in detections among surveys. As an illustration, we fit this model to repeated point counts of the West Indian manatee, which was observed in a pilot study using aerial surveys. Our extension of N-mixture models provides increased flexibility. The effects of different sets of covariates may be estimated for the probability of occurrence of a species, for its mean abundance at occupied locations, and for its detectability.
Estimating abundance while accounting for rarity, correlated behavior, and other sources of variation in counts.

PubMed

Dorazio, Robert M; Martin, Julien; Edwards, Holly H

2013-07-01

The class of N-mixture models allows abundance to be estimated from repeated, point count surveys while adjusting for imperfect detection of individuals. We developed an extension of N-mixture models to account for two commonly observed phenomena in point count surveys: rarity and lack of independence induced by unmeasurable sources of variation in the detectability of individuals. Rarity increases the number of locations with zero detections in excess of those expected under simple models of abundance (e.g., Poisson or negative binomial). Correlated behavior of individuals and other phenomena, though difficult to measure, increases the variation in detection probabilities among surveys. Our extension of N-mixture models includes a hurdle model of abundance and a beta-binomial model of detectability that accounts for additional (extra-binomial) sources of variation in detections among surveys. As an illustration, we fit this model to repeated point counts of the West Indian manatee, which was observed in a pilot study using aerial surveys. Our extension of N-mixture models provides increased flexibility. The effects of different sets of covariates may be estimated for the probability of occurrence of a species, for its mean abundance at occupied locations, and for its detectability.
High precision and high yield fabrication of dense nanoparticle arrays onto DNA origami at statistically independent binding sites

NASA Astrophysics Data System (ADS)

Takabayashi, Sadao; Klein, William P.; Onodera, Craig; Rapp, Blake; Flores-Estrada, Juan; Lindau, Elias; Snowball, Lejmarc; Sam, Joseph T.; Padilla, Jennifer E.; Lee, Jeunghoon; Knowlton, William B.; Graugnard, Elton; Yurke, Bernard; Kuang, Wan; Hughes, William L.

2014-10-01

High precision, high yield, and high density self-assembly of nanoparticles into arrays is essential for nanophotonics. Spatial deviations as small as a few nanometers can alter the properties of near-field coupled optical nanostructures. Several studies have reported assemblies of few nanoparticle structures with controlled spacing using DNA nanostructures with variable yield. Here, we report multi-tether design strategies and attachment yields for homo- and hetero-nanoparticle arrays templated by DNA origami nanotubes. Nanoparticle attachment yield via DNA hybridization is comparable with streptavidin-biotin binding. Independent of the number of binding sites, >97% site-occupation was achieved with four tethers and 99.2% site-occupation is theoretically possible with five tethers. The interparticle distance was within 2 nm of all design specifications and the nanoparticle spatial deviations decreased with interparticle spacing. Modified geometric, binomial, and trinomial distributions indicate that site-bridging, steric hindrance, and electrostatic repulsion were not dominant barriers to self-assembly and both tethers and binding sites were statistically independent at high particle densities.High precision, high yield, and high density self-assembly of nanoparticles into arrays is essential for nanophotonics. Spatial deviations as small as a few nanometers can alter the properties of near-field coupled optical nanostructures. Several studies have reported assemblies of few nanoparticle structures with controlled spacing using DNA nanostructures with variable yield. Here, we report multi-tether design strategies and attachment yields for homo- and hetero-nanoparticle arrays templated by DNA origami nanotubes. Nanoparticle attachment yield via DNA hybridization is comparable with streptavidin-biotin binding. Independent of the number of binding sites, >97% site-occupation was achieved with four tethers and 99.2% site-occupation is theoretically possible with five tethers. The interparticle distance was within 2 nm of all design specifications and the nanoparticle spatial deviations decreased with interparticle spacing. Modified geometric, binomial, and trinomial distributions indicate that site-bridging, steric hindrance, and electrostatic repulsion were not dominant barriers to self-assembly and both tethers and binding sites were statistically independent at high particle densities. Electronic supplementary information (ESI) available. See DOI: 10.1039/c4nr03069a
Relation of lineaments to sulfide deposits: Bald Eagle Mountain, Centre County, Pennsylvania

NASA Technical Reports Server (NTRS)

Mcmurtry, G. J.; Petersen, G. W. (Principal Investigator); Krohn, M. D.; Gold, D. P.

1975-01-01

The author has identified the following significant results. Discrete areas of finely-fractured and brecciated sandstone float are present along the crest of Bald Mountain and are commonly sites of sulfide mineralization, as evidenced by the presence of barite and limonite gossans. The frequency distributions of the brecciated float as the negative binomial distribution supports the interpretation of a separate population of intensely fractured material. Such zones of concentrated breccia float have an average width of one kilometer with a range from 0.4 to 1.6 kilometers and were observed in a quarry face to have subvertical dips. Direct spatial correlation of the Landsat-derived lineaments to the fractured areas on the ridge is low; however, the mineralized and fracture zones are commonly assymetrical to the lineament positions. Such a systematic dislocation might result from an inherent bias in the float population or could be the product of the relative erosional resistance of the silicified material in the mineralized areas in relation to the erosionally weak material at the stream gaps.
Geographic Distribution of Healthy Resources and Adverse Pregnancy Outcomes.

PubMed

Young, Christopher; Laurent, Olivier; Chung, Judith H; Wu, Jun

2016-08-01

Objective To determine the risk of gestational diabetes (GDM) and preeclampsia associated with various community resources. Methods An ecological study was performed in Los Angeles and Orange counties in California. Fast food restaurants, supermarkets, grocery stores, gyms, health clubs and green space were identified using Google © Maps Extractor and through the Southern California Association of Government. California Birth Certificate data was used to identify cases of GDM and preeclampsia. Unadjusted and adjusted risk ratios were calculated using negative binomial regression. Results There were 9692 cases of GDM and 6288 cases of preeclampsia corresponding to incidences of 2.5 and 1.4 % respectively. The adjusted risk of GDM was reduced in zip codes with greater concentration of grocery stores [relative risk (RR) 0.95, 95 % confidence interval (CI) 0.92-0.99] and supermarkets (RR 0.94, 95 % CI 0.90-0.98). There were no significant relationships between preeclampsia and the concentration of fast food restaurants, grocery store, supermarkets or the amount of green space. Conclusion The distribution of community resources has a significant association with the risk of developing GDM but not preeclampsia.
Analytical workflow profiling gene expression in murine macrophages

PubMed Central

Nixon, Scott E.; González-Peña, Dianelys; Lawson, Marcus A.; McCusker, Robert H.; Hernandez, Alvaro G.; O’Connor, Jason C.; Dantzer, Robert; Kelley, Keith W.

2015-01-01

Comprehensive and simultaneous analysis of all genes in a biological sample is a capability of RNA-Seq technology. Analysis of the entire transcriptome benefits from summarization of genes at the functional level. As a cellular response of interest not previously explored with RNA-Seq, peritoneal macrophages from mice under two conditions (control and immunologically challenged) were analyzed for gene expression differences. Quantification of individual transcripts modeled RNA-Seq read distribution and uncertainty (using a Beta Negative Binomial distribution), then tested for differential transcript expression (False Discovery Rate-adjusted p-value < 0.05). Enrichment of functional categories utilized the list of differentially expressed genes. A total of 2079 differentially expressed transcripts representing 1884 genes were detected. Enrichment of 92 categories from Gene Ontology Biological Processes and Molecular Functions, and KEGG pathways were grouped into 6 clusters. Clusters included defense and inflammatory response (Enrichment Score = 11.24) and ribosomal activity (Enrichment Score = 17.89). Our work provides a context to the fine detail of individual gene expression differences in murine peritoneal macrophages during immunological challenge with high throughput RNA-Seq. PMID:25708305
Automated segmentation of linear time-frequency representations of marine-mammal sounds.

PubMed

Dadouchi, Florian; Gervaise, Cedric; Ioana, Cornel; Huillery, Julien; Mars, Jérôme I

2013-09-01

Many marine mammals produce highly nonlinear frequency modulations. Determining the time-frequency support of these sounds offers various applications, which include recognition, localization, and density estimation. This study introduces a low parameterized automated spectrogram segmentation method that is based on a theoretical probabilistic framework. In the first step, the background noise in the spectrogram is fitted with a Chi-squared distribution and thresholded using a Neyman-Pearson approach. In the second step, the number of false detections in time-frequency regions is modeled as a binomial distribution, and then through a Neyman-Pearson strategy, the time-frequency bins are gathered into regions of interest. The proposed method is validated on real data of large sequences of whistles from common dolphins, collected in the Bay of Biscay (France). The proposed method is also compared with two alternative approaches: the first is smoothing and thresholding of the spectrogram; the second is thresholding of the spectrogram followed by the use of morphological operators to gather the time-frequency bins and to remove false positives. This method is shown to increase the probability of detection for the same probability of false alarms.
Exponential Family Functional data analysis via a low-rank model.

PubMed

Li, Gen; Huang, Jianhua Z; Shen, Haipeng

2018-05-08

In many applications, non-Gaussian data such as binary or count are observed over a continuous domain and there exists a smooth underlying structure for describing such data. We develop a new functional data method to deal with this kind of data when the data are regularly spaced on the continuous domain. Our method, referred to as Exponential Family Functional Principal Component Analysis (EFPCA), assumes the data are generated from an exponential family distribution, and the matrix of the canonical parameters has a low-rank structure. The proposed method flexibly accommodates not only the standard one-way functional data, but also two-way (or bivariate) functional data. In addition, we introduce a new cross validation method for estimating the latent rank of a generalized data matrix. We demonstrate the efficacy of the proposed methods using a comprehensive simulation study. The proposed method is also applied to a real application of the UK mortality study, where data are binomially distributed and two-way functional across age groups and calendar years. The results offer novel insights into the underlying mortality pattern. © 2018, The International Biometric Society.
Sequence-based predictive modeling to identify cancerlectins

PubMed Central

Lai, Hong-Yan; Chen, Xin-Xin; Chen, Wei; Tang, Hua; Lin, Hao

2017-01-01

Lectins are a diverse type of glycoproteins or carbohydrate-binding proteins that have a wide distribution to various species. They can specially identify and exclusively bind to a certain kind of saccharide groups. Cancerlectins are a group of lectins that are closely related to cancer and play a major role in the initiation, survival, growth, metastasis and spread of tumor. Several computational methods have emerged to discriminate cancerlectins from non-cancerlectins, which promote the study on pathogenic mechanisms and clinical treatment of cancer. However, the predictive accuracies of most of these techniques are very limited. In this work, by constructing a benchmark dataset based on the CancerLectinDB database, a new amino acid sequence-based strategy for feature description was developed, and then the binomial distribution was applied to screen the optimal feature set. Ultimately, an SVM-based predictor was performed to distinguish cancerlectins from non-cancerlectins, and achieved an accuracy of 77.48% with AUC of 85.52% in jackknife cross-validation. The results revealed that our prediction model could perform better comparing with published predictive tools. PMID:28423655
Inclusive photon production at forward rapidities in proton–proton collisions at $$\\mathbf {\\sqrt{s}}$$ = 0.9, 2.76 and 7 TeV

DOE PAGES

Abelev, B.; Adam, J.; Adamová, D.; ...

2015-04-09

The multiplicity and pseudorapidity distributions of inclusive photons have been measured at forward rapidities (2.3 < η < 3.9) in proton–proton collisions at three center-of-mass energies, √s = 0.9, 2.76 and 7 TeV using the ALICE detector. It is observed that the increase in the average photon multiplicity as a function of beam energy is compatible with both a logarithmic and a power-law dependence. The relative increase in average photon multiplicity produced in inelastic pp collisions at 2.76 and 7 TeV center-of-mass energies with respect to 0.9 TeV are 37.2 ± 0.3 % (stat) ± 8.8 % (sys) and 61.2more » ± 0.3 % (stat) ± 7.6 % (sys), respectively. The photon multiplicity distributions for all center-of-mass energies are well described by negative binomial distributions. The multiplicity distributions are also presented in terms of KNO variables. The results are compared to model predictions, which are found in general to underestimate the data at large photon multiplicities, in particular at the highest center-of-mass energy. As a result, limiting fragmentation behavior of photons has been explored with the data, but is not observed in the measured pseudorapidity range.« less
Deep-sea benthic habitats modeling and mapping in a NE Atlantic seamount (Galicia Bank)

NASA Astrophysics Data System (ADS)

Serrano, A.; González-Irusta, J. M.; Punzón, A.; García-Alegre, A.; Lourido, A.; Ríos, P.; Blanco, M.; Gómez-Ballesteros, M.; Druet, M.; Cristobo, J.; Cartes, J. E.

2017-08-01

This study presents the results of seafloor habitat identification and mapping of a NE Atlantic deep seamount. An ;assemble first, predict later; approach has been followed to identify and map the benthic habitats of the Galicia Bank (NW Iberian). Biotic patterns inferred from the survey data have been used to drive the definition of benthic assemblages using multivariate tools. Eight assemblages, four hard substrates and four sedimentary ones, have been described from a matrix of structural species. Distribution of these assemblages was correlated with environmental factors (multibeam and backscatter data) using binomial GAMs. Finally, the distribution model of each assemblage was applied to produce continuous maps and pooled in a final map with the distribution of the main benthic habitats. Depth and substrate type are key factors when determining soft bottom communities, whereas rocky habitat distribution is mainly explained by rock slope and orientation. Enrichment by northern water masses (LSW) arriving to GB and possible zooplankton biomass increase at vertical-steep walls by ;bottom trapping; can explain the higher diversity of habitat providing filter-feeders at slope rocky breaks. These results concerning vulnerable species and habitats, such as Lophelia and Madrepora communities and black and bamboo coral aggregations were the basis of the Spanish proposal of inclusion within the Natura 2000 network. The aim of the present study was to establish the scientific criteria needed for managing and protecting those environmental values.

Structure and plasticity potential of neural networks in the cerebral cortex

NASA Astrophysics Data System (ADS)

Fares, Tarec Edmond

In this thesis, we first described a theoretical framework for the analysis of spine remodeling plasticity. We provided a quantitative description of two models of spine remodeling in which the presence of a bouton is either required or not for the formation of a new synapse. We derived expressions for the density of potential synapses in the neuropil, the connectivity fraction, which is the ratio of actual to potential synapses, and the number of structurally different circuits attainable with spine remodeling. We calculated these parameters in mouse occipital cortex, rat CA1, monkey V1, and human temporal cortex. We found that on average a dendritic spine can choose among 4-7 potential targets in rodents and 10-20 potential targets in primates. The neuropil's potential for structural circuit remodeling is highest in rat CA1 (7.1-8.6 bits/mum3) and lowest in monkey V1 (1.3-1.5 bits/mum 3 We next studied the role neuron morphology plays in defining synaptic connectivity. As previously stated it is clear that only pairs of neurons with closely positioned axonal and dendritic branches can be synaptically coupled. For excitatory neurons in the cerebral cortex, ). We also evaluated the lower bound of neuron selectivity in the choice of synaptic partners. Post-synaptic excitatory neurons in rodents make synaptic contacts with more than 21-30% of pre-synaptic axons encountered with new spine growth. Primate neurons appear to be more selective, making synaptic connections with more than 7-15% of encountered axons. We next studied the role neuron morphology plays in defining synaptic connectivity. As previously stated it is clear that only pairs of neurons with closely positioned axonal and dendritic branches can be synaptically coupled. For excitatory neurons in the cerebral cortex, such axo-dendritic oppositions, or potential synapses, must be bridged by dendritic spines to form synaptic connections. To explore the rules by which synaptic connections are formed within the constraints imposed by neuron morphology, we compared the distributions of the numbers of actual and potential synapses between pre- and post-synaptic neurons forming different laminar projections in rat barrel cortex. Quantitative comparison explicitly ruled out the hypothesis that individual synapses between neurons are formed independently of each other. Instead, the data are consistent with a cooperative scheme of synapse formation, where multiple-synaptic connections between neurons are stabilized, while neurons that do not establish a critical number of synapses are not likely to remain synaptically coupled. In the above two projects, analysis of potential synapse numbers played an important role in shaping our understanding of connectivity and structural plasticity. In the third part of this thesis, we shift our attention to the study of the distribution of potential synapse numbers. This distribution is dependent on the details of neuron morphology and it defines synaptic connectivity patterns attainable with spine remodeling. To better understand how the distribution of potential synapse numbers is influenced by the overlap and the shapes of axonal and dendritic arbors, we first analyzed uniform disconnected arbors generated in silico. The resulting distributions are well described by binomial functions. We used a dataset of neurons reconstructed in 3D and generated the potential synapse distributions for neurons of different classes. Quantitative analysis showed that the binomial distribution is a good fit to this data as well. All distributions considered clustered into two categories, inhibitory to inhibitory and excitatory to excitatory projections. We showed that the distributions of potential synapse numbers are universally described by a family of single parameter (p) binomial functions, where p = 0.08, and for the inhibitory and p = 0.19 for the excitatory projections. In the last part of this thesis an attempt is made to incorporate some of the biological constraints we considered thus far, into an artificial neural network model. It became clear that several features of synaptic connectivity are ubiquitous among different cortical networks: (1) neural networks are predominately excitatory, containing roughly 80% of excitatory neurons and synapses, (2) neural networks are only sparsely interconnected, where the probabilities of finding connected neurons are always less than 50% even for neighboring cells, (3) the distribution of connection strengths has been shown to have a slow non-exponential decay. In the attempt to understand the advantage of such network architecture for learning and memory, we analyzed the associative memory capacity of a biologically constrained perceptron-like neural network model. The artificial neural network we consider consists of robust excitatory and inhibitory McCulloch and Pitts neurons with a constant firing threshold. Our theoretical results show that the capacity for associative memory storage in such networks increases with an addition of a small fraction of inhibitory neurons, while the connection probability remains below 50%. (Abstract shortened by UMI.)
Linear approximations of global behaviors in nonlinear systems with moderate or strong noise

NASA Astrophysics Data System (ADS)

Liang, Junhao; Din, Anwarud; Zhou, Tianshou

2018-03-01

While many physical or chemical systems can be modeled by nonlinear Langevin equations (LEs), dynamical analysis of these systems is challenging in the cases of moderate and strong noise. Here we develop a linear approximation scheme, which can transform an often intractable LE into a linear set of binomial moment equations (BMEs). This scheme provides a feasible way to capture nonlinear behaviors in the sense of probability distribution and is effective even when the noise is moderate or big. Based on BMEs, we further develop a noise reduction technique, which can effectively handle tough cases where traditional small-noise theories are inapplicable. The overall method not only provides an approximation-based paradigm to analysis of the local and global behaviors of nonlinear noisy systems but also has a wide range of applications.
Unicorns or Tiger Woods: are lie detection experts myths or rarities? A response to on lie detection "wizards" by Bond and Uysal.

PubMed

O'Sullivan, Maureen

2007-02-01

Bond and Uysal (this issue) complain that expert lie detectors identified by O'Sullivan and Ekman (2004) are statistical flukes. They ignore one class of experts we have identified and misrepresent the procedures we use to identify the others. They also question the psychometric validity of the measures and protocol used. Many of their points are addressed in the chapter they criticize. The fruitfulness of the O'Sullivan-Ekman protocol is illustrated with respect to improved identification of expert lie detectors, as well as a replicated pattern of errors made by experts from different professional groups. The statistical arguments offered confuse the theoretical use of the binomial with the empirical use of the normal distribution. Data are provided that may clarify this distinction.
Late winter survival of female mallards in Arkansas

USGS Publications Warehouse

Dugger, B.D.; Reinecke, K.J.; Fredrickson, L.H.

1994-01-01

Determining factors that limit winter survival of waterfowl is necessary to develop effective management plans. We radiomarked immature and adult female mallards (Anas platyrhynchos) after the 1988 and 1989 hunting seasons in eastcentral Arkansas to test whether natural mortality sources and habitat conditions during late winter limit seasonal survival. We used data from 92 females to calculate survival estimates. We observed no mortalities during 2,510 exposure days, despite differences in habitat conditions between years. We used the binomial distribution to calculate daily and 30-day survival estimates plus 95% confidence intervals of 0.9988 ltoreq 0.9997 ltoreq 1.00 and 0.9648 ltoreq 0.9925 ltoreq 1.00, respectively. Our data indirectly support the hypothesis that hunting mortality and habitat conditions during the hunting season are the major determinants of winter survival for female mallards in Arkansas.
Assessment of NDE Reliability Data

NASA Technical Reports Server (NTRS)

Yee, B. G. W.; Chang, F. H.; Couchman, J. C.; Lemon, G. H.; Packman, P. F.

1976-01-01

Twenty sets of relevant Nondestructive Evaluation (NDE) reliability data have been identified, collected, compiled, and categorized. A criterion for the selection of data for statistical analysis considerations has been formulated. A model to grade the quality and validity of the data sets has been developed. Data input formats, which record the pertinent parameters of the defect/specimen and inspection procedures, have been formulated for each NDE method. A comprehensive computer program has been written to calculate the probability of flaw detection at several confidence levels by the binomial distribution. This program also selects the desired data sets for pooling and tests the statistical pooling criteria before calculating the composite detection reliability. Probability of detection curves at 95 and 50 percent confidence levels have been plotted for individual sets of relevant data as well as for several sets of merged data with common sets of NDE parameters.
Shot-noise evidence of fractional quasiparticle creation in a local fractional quantum Hall state.

PubMed

Hashisaka, Masayuki; Ota, Tomoaki; Muraki, Koji; Fujisawa, Toshimasa

2015-02-06

We experimentally identify fractional quasiparticle creation in a tunneling process through a local fractional quantum Hall (FQH) state. The local FQH state is prepared in a low-density region near a quantum point contact in an integer quantum Hall (IQH) system. Shot-noise measurements reveal a clear transition from elementary-charge tunneling at low bias to fractional-charge tunneling at high bias. The fractional shot noise is proportional to T(1)(1-T(1)) over a wide range of T(1), where T(1) is the transmission probability of the IQH edge channel. This binomial distribution indicates that fractional quasiparticles emerge from the IQH state to be transmitted through the local FQH state. The study of this tunneling process enables us to elucidate the dynamics of Laughlin quasiparticles in FQH systems.
EM Adaptive LASSO—A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes

PubMed Central

Mallick, Himel; Tiwari, Hemant K.

2016-01-01

Count data are increasingly ubiquitous in genetic association studies, where it is possible to observe excess zero counts as compared to what is expected based on standard assumptions. For instance, in rheumatology, data are usually collected in multiple joints within a person or multiple sub-regions of a joint, and it is not uncommon that the phenotypes contain enormous number of zeroes due to the presence of excessive zero counts in majority of patients. Most existing statistical methods assume that the count phenotypes follow one of these four distributions with appropriate dispersion-handling mechanisms: Poisson, Zero-inflated Poisson (ZIP), Negative Binomial, and Zero-inflated Negative Binomial (ZINB). However, little is known about their implications in genetic association studies. Also, there is a relative paucity of literature on their usefulness with respect to model misspecification and variable selection. In this article, we have investigated the performance of several state-of-the-art approaches for handling zero-inflated count data along with a novel penalized regression approach with an adaptive LASSO penalty, by simulating data under a variety of disease models and linkage disequilibrium patterns. By taking into account data-adaptive weights in the estimation procedure, the proposed method provides greater flexibility in multi-SNP modeling of zero-inflated count phenotypes. A fast coordinate descent algorithm nested within an EM (expectation-maximization) algorithm is implemented for estimating the model parameters and conducting variable selection simultaneously. Results show that the proposed method has optimal performance in the presence of multicollinearity, as measured by both prediction accuracy and empirical power, which is especially apparent as the sample size increases. Moreover, the Type I error rates become more or less uncontrollable for the competing methods when a model is misspecified, a phenomenon routinely encountered in practice. PMID:27066062
EM Adaptive LASSO-A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes.

PubMed

Mallick, Himel; Tiwari, Hemant K

2016-01-01

Count data are increasingly ubiquitous in genetic association studies, where it is possible to observe excess zero counts as compared to what is expected based on standard assumptions. For instance, in rheumatology, data are usually collected in multiple joints within a person or multiple sub-regions of a joint, and it is not uncommon that the phenotypes contain enormous number of zeroes due to the presence of excessive zero counts in majority of patients. Most existing statistical methods assume that the count phenotypes follow one of these four distributions with appropriate dispersion-handling mechanisms: Poisson, Zero-inflated Poisson (ZIP), Negative Binomial, and Zero-inflated Negative Binomial (ZINB). However, little is known about their implications in genetic association studies. Also, there is a relative paucity of literature on their usefulness with respect to model misspecification and variable selection. In this article, we have investigated the performance of several state-of-the-art approaches for handling zero-inflated count data along with a novel penalized regression approach with an adaptive LASSO penalty, by simulating data under a variety of disease models and linkage disequilibrium patterns. By taking into account data-adaptive weights in the estimation procedure, the proposed method provides greater flexibility in multi-SNP modeling of zero-inflated count phenotypes. A fast coordinate descent algorithm nested within an EM (expectation-maximization) algorithm is implemented for estimating the model parameters and conducting variable selection simultaneously. Results show that the proposed method has optimal performance in the presence of multicollinearity, as measured by both prediction accuracy and empirical power, which is especially apparent as the sample size increases. Moreover, the Type I error rates become more or less uncontrollable for the competing methods when a model is misspecified, a phenomenon routinely encountered in practice.
Evaluation of logistic regression models and effect of covariates for case-control study in RNA-Seq analysis.

PubMed

Choi, Seung Hoan; Labadorf, Adam T; Myers, Richard H; Lunetta, Kathryn L; Dupuis, Josée; DeStefano, Anita L

2017-02-06

Next generation sequencing provides a count of RNA molecules in the form of short reads, yielding discrete, often highly non-normally distributed gene expression measurements. Although Negative Binomial (NB) regression has been generally accepted in the analysis of RNA sequencing (RNA-Seq) data, its appropriateness has not been exhaustively evaluated. We explore logistic regression as an alternative method for RNA-Seq studies designed to compare cases and controls, where disease status is modeled as a function of RNA-Seq reads using simulated and Huntington disease data. We evaluate the effect of adjusting for covariates that have an unknown relationship with gene expression. Finally, we incorporate the data adaptive method in order to compare false positive rates. When the sample size is small or the expression levels of a gene are highly dispersed, the NB regression shows inflated Type-I error rates but the Classical logistic and Bayes logistic (BL) regressions are conservative. Firth's logistic (FL) regression performs well or is slightly conservative. Large sample size and low dispersion generally make Type-I error rates of all methods close to nominal alpha levels of 0.05 and 0.01. However, Type-I error rates are controlled after applying the data adaptive method. The NB, BL, and FL regressions gain increased power with large sample size, large log2 fold-change, and low dispersion. The FL regression has comparable power to NB regression. We conclude that implementing the data adaptive method appropriately controls Type-I error rates in RNA-Seq analysis. Firth's logistic regression provides a concise statistical inference process and reduces spurious associations from inaccurately estimated dispersion parameters in the negative binomial framework.
Spatial distribution of psychotic disorders in an urban area of France: an ecological study.

PubMed

Pignon, Baptiste; Schürhoff, Franck; Baudin, Grégoire; Ferchiou, Aziz; Richard, Jean-Romain; Saba, Ghassen; Leboyer, Marion; Kirkbride, James B; Szöke, Andrei

2016-05-18

Previous analyses of neighbourhood variations of non-affective psychotic disorders (NAPD) have focused mainly on incidence. However, prevalence studies provide important insights on factors associated with disease evolution as well as for healthcare resource allocation. This study aimed to investigate the distribution of prevalent NAPD cases in an urban area in France. The number of cases in each neighbourhood was modelled as a function of potential confounders and ecological variables, namely: migrant density, economic deprivation and social fragmentation. This was modelled using statistical models of increasing complexity: frequentist models (using Poisson and negative binomial regressions), and several Bayesian models. For each model, assumptions validity were checked and compared as to how this fitted to the data, in order to test for possible spatial variation in prevalence. Data showed significant overdispersion (invalidating the Poisson regression model) and residual autocorrelation (suggesting the need to use Bayesian models). The best Bayesian model was Leroux's model (i.e. a model with both strong correlation between neighbouring areas and weaker correlation between areas further apart), with economic deprivation as an explanatory variable (OR = 1.13, 95% CI [1.02-1.25]). In comparison with frequentist methods, the Bayesian model showed a better fit. The number of cases showed non-random spatial distribution and was linked to economic deprivation.
Number of infection events per cell during HIV-1 cell-free infection.

PubMed

Ito, Yusuke; Remion, Azaria; Tauzin, Alexandra; Ejima, Keisuke; Nakaoka, Shinji; Iwasa, Yoh; Iwami, Shingo; Mammano, Fabrizio

2017-07-26

HIV-1 accumulates changes in its genome through both recombination and mutation during the course of infection. For recombination to occur, a single cell must be infected by two HIV strains. These coinfection events were experimentally demonstrated to occur more frequently than would be expected for independent infection events and do not follow a random distribution. Previous mathematical modeling approaches demonstrated that differences in target cell susceptibility can explain the non-randomness, both in the context of direct cell-to-cell transmission, and in the context of free virus transmission (Q. Dang et al., Proc. Natl. Acad. Sci. USA 101:632-7, 2004: K. M. Law et al., Cell reports 15:2711-83, 2016). Here, we build on these notions and provide a more detailed and extensive quantitative framework. We developed a novel mathematical model explicitly considering the heterogeneity of target cells and analysed datasets of cell-free HIV-1 single and double infection experiments in cell culture. Particularly, in contrast to the previous studies, we took into account the different susceptibility of the target cells as a continuous distribution. Interestingly, we showed that the number of infection events per cell during cell-free HIV-1 infection follows a negative-binomial distribution, and our model reproduces these datasets.
A new approach to modelling schistosomiasis transmission based on stratified worm burden.

PubMed

Gurarie, D; King, C H; Wang, X

2010-11-01

Multiple factors affect schistosomiasis transmission in distributed meta-population systems including age, behaviour, and environment. The traditional approach to modelling macroparasite transmission often exploits the 'mean worm burden' (MWB) formulation for human hosts. However, typical worm distribution in humans is overdispersed, and classic models either ignore this characteristic or make ad hoc assumptions about its pattern (e.g., by assuming a negative binomial distribution). Such oversimplifications can give wrong predictions for the impact of control interventions. We propose a new modelling approach to macro-parasite transmission by stratifying human populations according to worm burden, and replacing MWB dynamics with that of 'population strata'. We developed proper calibration procedures for such multi-component systems, based on typical epidemiological and demographic field data, and implemented them using Wolfram Mathematica. Model programming and calibration proved to be straightforward. Our calibrated system provided good agreement with the individual level field data from the Msambweni region of eastern Kenya. The Stratified Worm Burden (SWB) approach offers many advantages, in that it accounts naturally for overdispersion and accommodates other important factors and measures of human infection and demographics. Future work will apply this model and methodology to evaluate innovative control intervention strategies, including expanded drug treatment programmes proposed by the World Health Organization and its partners.
Examining Potential Boundary Bias Effects in Kernel Smoothing on Equating: An Introduction for the Adaptive and Epanechnikov Kernels.

PubMed

Cid, Jaime A; von Davier, Alina A

2015-05-01

Test equating is a method of making the test scores from different test forms of the same assessment comparable. In the equating process, an important step involves continuizing the discrete score distributions. In traditional observed-score equating, this step is achieved using linear interpolation (or an unscaled uniform kernel). In the kernel equating (KE) process, this continuization process involves Gaussian kernel smoothing. It has been suggested that the choice of bandwidth in kernel smoothing controls the trade-off between variance and bias. In the literature on estimating density functions using kernels, it has also been suggested that the weight of the kernel depends on the sample size, and therefore, the resulting continuous distribution exhibits bias at the endpoints, where the samples are usually smaller. The purpose of this article is (a) to explore the potential effects of atypical scores (spikes) at the extreme ends (high and low) on the KE method in distributions with different degrees of asymmetry using the randomly equivalent groups equating design (Study I), and (b) to introduce the Epanechnikov and adaptive kernels as potential alternative approaches to reducing boundary bias in smoothing (Study II). The beta-binomial model is used to simulate observed scores reflecting a range of different skewed shapes.
Broad distribution spectrum from Gaussian to power law appears in stochastic variations in RNA-seq data.

PubMed

Awazu, Akinori; Tanabe, Takahiro; Kamitani, Mari; Tezuka, Ayumi; Nagano, Atsushi J

2018-05-29

Gene expression levels exhibit stochastic variations among genetically identical organisms under the same environmental conditions. In many recent transcriptome analyses based on RNA sequencing (RNA-seq), variations in gene expression levels among replicates were assumed to follow a negative binomial distribution, although the physiological basis of this assumption remains unclear. In this study, RNA-seq data were obtained from Arabidopsis thaliana under eight conditions (21-27 replicates), and the characteristics of gene-dependent empirical probability density function (ePDF) profiles of gene expression levels were analyzed. For A. thaliana and Saccharomyces cerevisiae, various types of ePDF of gene expression levels were obtained that were classified as Gaussian, power law-like containing a long tail, or intermediate. These ePDF profiles were well fitted with a Gauss-power mixing distribution function derived from a simple model of a stochastic transcriptional network containing a feedback loop. The fitting function suggested that gene expression levels with long-tailed ePDFs would be strongly influenced by feedback regulation. Furthermore, the features of gene expression levels are correlated with their functions, with the levels of essential genes tending to follow a Gaussian-like ePDF while those of genes encoding nucleic acid-binding proteins and transcription factors exhibit long-tailed ePDF.
Using real options analysis to support strategic management decisions

NASA Astrophysics Data System (ADS)

Kabaivanov, Stanimir; Markovska, Veneta; Milev, Mariyan

2013-12-01

Decision making is a complex process that requires taking into consideration multiple heterogeneous sources of uncertainty. Standard valuation and financial analysis techniques often fail to properly account for all these sources of risk as well as for all sources of additional flexibility. In this paper we explore applications of a modified binomial tree method for real options analysis (ROA) in an effort to improve decision making process. Usual cases of use of real options are analyzed with elaborate study on the applications and advantages that company management can derive from their application. A numeric results based on extending simple binomial tree approach for multiple sources of uncertainty are provided to demonstrate the improvement effects on management decisions.
P-Hacking in Orthopaedic Literature: A Twist to the Tail.

PubMed

Bin Abd Razak, Hamid Rahmatullah; Ang, Jin-Guang Ernest; Attal, Hersh; Howe, Tet-Sen; Allen, John Carson

2016-10-19

"P-hacking" occurs when researchers preferentially select data or statistical analyses until nonsignificant results become significant. We wanted to evaluate if the phenomenon of p-hacking was evident in orthopaedic literature. We text-mined through all articles published in three top orthopaedic journals in 2015. For anonymity, we cipher-coded the three journals. We included all studies that reported a single p value to answer their main hypothesis. These p values were then charted and frequency graphs were generated to illustrate any evidence of p-hacking. Binomial tests were employed to look for evidence of evidential value and significance of p-hacking. Frequency plots for all three journals revealed evidence of p-hacking. Binomial tests for all three journals were significant for evidence of evidential value (p < 0.0001 for all). However, the binomial test for p-hacking was significant only for one journal (p = 0.0092). P-hacking is an evolving phenomenon that threatens to jeopardize the evidence-based practice of medicine. Although our results show that there is good evidential value for orthopaedic literature published in our top journals, there is some evidence of p-hacking of which authors and readers should be wary. Copyright © 2016 by The Journal of Bone and Joint Surgery, Incorporated.
Sample size determination for a three-arm equivalence trial of Poisson and negative binomial responses.

PubMed

Chang, Yu-Wei; Tsong, Yi; Zhao, Zhigen

2017-01-01

Assessing equivalence or similarity has drawn much attention recently as many drug products have lost or will lose their patents in the next few years, especially certain best-selling biologics. To claim equivalence between the test treatment and the reference treatment when assay sensitivity is well established from historical data, one has to demonstrate both superiority of the test treatment over placebo and equivalence between the test treatment and the reference treatment. Thus, there is urgency for practitioners to derive a practical way to calculate sample size for a three-arm equivalence trial. The primary endpoints of a clinical trial may not always be continuous, but may be discrete. In this paper, the authors derive power function and discuss sample size requirement for a three-arm equivalence trial with Poisson and negative binomial clinical endpoints. In addition, the authors examine the effect of the dispersion parameter on the power and the sample size by varying its coefficient from small to large. In extensive numerical studies, the authors demonstrate that required sample size heavily depends on the dispersion parameter. Therefore, misusing a Poisson model for negative binomial data may easily lose power up to 20%, depending on the value of the dispersion parameter.
Statistical tests to compare motif count exceptionalities

PubMed Central

Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

2007-01-01

Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349
Interrelationships Between Receiver/Relative Operating Characteristics Display, Binomial, Logit, and Bayes' Rule Probability of Detection Methodologies

NASA Technical Reports Server (NTRS)

Generazio, Edward R.

2014-01-01

Unknown risks are introduced into failure critical systems when probability of detection (POD) capabilities are accepted without a complete understanding of the statistical method applied and the interpretation of the statistical results. The presence of this risk in the nondestructive evaluation (NDE) community is revealed in common statements about POD. These statements are often interpreted in a variety of ways and therefore, the very existence of the statements identifies the need for a more comprehensive understanding of POD methodologies. Statistical methodologies have data requirements to be met, procedures to be followed, and requirements for validation or demonstration of adequacy of the POD estimates. Risks are further enhanced due to the wide range of statistical methodologies used for determining the POD capability. Receiver/Relative Operating Characteristics (ROC) Display, simple binomial, logistic regression, and Bayes' rule POD methodologies are widely used in determining POD capability. This work focuses on Hit-Miss data to reveal the framework of the interrelationships between Receiver/Relative Operating Characteristics Display, simple binomial, logistic regression, and Bayes' Rule methodologies as they are applied to POD. Knowledge of these interrelationships leads to an intuitive and global understanding of the statistical data, procedural and validation requirements for establishing credible POD estimates.
Exact tests using two correlated binomial variables in contemporary cancer clinical trials.

PubMed

Yu, Jihnhee; Kepner, James L; Iyer, Renuka

2009-12-01

New therapy strategies for the treatment of cancer are rapidly emerging because of recent technology advances in genetics and molecular biology. Although newer targeted therapies can improve survival without measurable changes in tumor size, clinical trial conduct has remained nearly unchanged. When potentially efficacious therapies are tested, current clinical trial design and analysis methods may not be suitable for detecting therapeutic effects. We propose an exact method with respect to testing cytostatic cancer treatment using correlated bivariate binomial random variables to simultaneously assess two primary outcomes. The method is easy to implement. It does not increase the sample size over that of the univariate exact test and in most cases reduces the sample size required. Sample size calculations are provided for selected designs.

Modelling the current distribution and predicted spread of the flea species Ctenocephalides felis infesting outdoor dogs in Spain.

PubMed

Gálvez, Rosa; Musella, Vicenzo; Descalzo, Miguel A; Montoya, Ana; Checa, Rocío; Marino, Valentina; Martín, Oihane; Cringoli, Giuseppe; Rinaldi, Laura; Miró, Guadalupe

2017-09-19

The cat flea, Ctenocephalides felis, is the most prevalent flea species detected on dogs and cats in Europe and other world regions. The status of flea infestation today is an evident public health concern because of their cosmopolitan distribution and the flea-borne diseases transmission. This study determines the spatial distribution of the cat flea C. felis infesting dogs in Spain. Using geospatial tools, models were constructed based on entomological data collected from dogs during the period 2013-2015. Bioclimatic zones, covering broad climate and vegetation ranges, were surveyed in relation to their size. The models builded were obtained by negative binomial regression of several environmental variables to show impacts on C. felis infestation prevalence: land cover, bioclimatic zone, mean summer and autumn temperature, mean summer rainfall, distance to urban settlement and normalized difference vegetation index. In the face of climate change, we also simulated the future distributions of C. felis for the global climate model (GCM) "GFDL-CM3" and for the representative concentration pathway RCP45, which predicts their spread in the country. Predictive models for current climate conditions indicated the widespread distribution of C. felis throughout Spain, mainly across the central northernmost zone of the mainland. Under predicted conditions of climate change, the risk of spread was slightly greater, especially in the north and central peninsula, than for the current situation. The data provided will be useful for local veterinarians to design effective strategies against flea infestation and the pathogens transmitted by these arthropods.
Seasonal and Geographical Variation of Dengue Vectors in Narathiwat, South Thailand

PubMed Central

Boonklong, Ornanong; Bhumiratana, Adisak

2016-01-01

Using GIS-based land use map for the urban-rural division (the relative ratio of population density adjusted to relatively Aedes-infested land area), we demonstrated significant independent observations of seasonal and geographical variation of Aedes aegypti and Aedes albopictus vectors between Muang Narathiwat district (urban setting) and neighbor districts (rural setting) of Narathiwat, Southern Thailand, based on binomial distribution of Aedes vectors in water-holding containers (water storage containers, discarded receptacles, miscellaneous containers, and natural containers). The distribution of Aedes vectors was influenced seasonally by breeding outdoors rather than indoors in all 4 containers. Accordingly, both urban and rural settings elicited significantly seasonal (wet versus dry) distributions of Ae. aegypti larvae observed in water storage containers (P = 0.001 and P = 0.002) and natural containers (P = 0.016 and P = 0.015), whereas, in rural setting, the significant difference was observed in discarded receptacles (P = 0.028) and miscellaneous containers (P < 0.001). Seasonal distribution of Ae. albopictus larvae in any containers in urban setting was not remarkably noticed, whereas, in rural setting, the significant difference was observed in water storage containers (P = 0.007) and discarded receptacles (P < 0.001). Moreover, the distributions of percentages of container index for Aedes-infested households in dry season were significantly lower than that in other wet seasons, P = 0.034 for urban setting and P = 0.001 for rural setting. Findings suggest that seasonal and geographical variation of Aedes vectors affect the infestation in those containers in human inhabitations and surroundings. PMID:27437001
The Effect of Exposure to Ultraviolet Radiation in Infancy on Melanoma Risk.

PubMed

Gefeller, Olaf; Fiessler, Cornelia; Radespiel-Tröger, Martin; Uter, Wolfgang; Pfahlberg, Annette B

2016-01-01

Evidence on the effect of ultraviolet radiation (UVR) exposure in infancy on melanoma risk in later life is scarce. Three recent studies suffering from methodological shortcomings suggested that people born in spring carry a higher melanoma risk. Data from the Bavarian population-based cancer registry on 28374 incident melanoma cases between 2002 and 2012 were analyzed to reexamine this finding. Crude and adjusted analyses - using negative binomial regression models - were performed addressing the relationship. In the crude analysis, the birth months March - May were significantly overrepresented among melanoma cases. However, after additionally adjusting for the birth month distribution of the Bavarian population, the ostensible seasonal effect disappeared. Similar results emerged in all subgroup analyses. Our large registry-based study provides no evidence that people born in spring carry a higher risk for developing melanoma in later life and thus lends no support to the hypothesis of higher UVR-susceptibility during the first months of life.
Multi-scaling modelling in financial markets

NASA Astrophysics Data System (ADS)

Liu, Ruipeng; Aste, Tomaso; Di Matteo, T.

2007-12-01

In the recent years, a new wave of interest spurred the involvement of complexity in finance which might provide a guideline to understand the mechanism of financial markets, and researchers with different backgrounds have made increasing contributions introducing new techniques and methodologies. In this paper, Markov-switching multifractal models (MSM) are briefly reviewed and the multi-scaling properties of different financial data are analyzed by computing the scaling exponents by means of the generalized Hurst exponent H(q). In particular we have considered H(q) for price data, absolute returns and squared returns of different empirical financial time series. We have computed H(q) for the simulated data based on the MSM models with Binomial and Lognormal distributions of the volatility components. The results demonstrate the capacity of the multifractal (MF) models to capture the stylized facts in finance, and the ability of the generalized Hurst exponents approach to detect the scaling feature of financial time series.
On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

NASA Astrophysics Data System (ADS)

Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein

2018-07-01

One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.
Gastrointestinal parasite egg excretion in young calves in periurban livestock production in Mali.

PubMed

Wymann, Monica Natalie; Traore, Koniba; Bonfoh, Bassirou; Tembely, Saïdou; Tembely, Sékouba; Zinsstag, Jakob

2008-04-01

To acquire the information needed to improve parasite control in periurban cattle production in Mali, repeated sampling of faeces of 694 calves kept around Bamako was done in 2003/2004. The effects of season, age, breed, management type, parasite control and presence of sheep on egg and oocyst counts were determined. A Bayesian model was used with a negative binomial distribution and herd and individual effects, to account for the clustering of calves in herds and the repeated sampling. Interviews were conducted to report the current control strategies. We found eggs of Strongyloides papillosus (Age class 0-1 month: prevalence 39%, 2-3 months: 59%, 5-6 months: 42%), strongyles (14%, 24%, 36%), coccidian oocysts (37%, 68%, 64%) and at low prevalence eggs of Toxocara vitulorum, Moniezia sp., Trichuris sp. and Paramphistomum sp. Season and age effects occurred. Reported utilisation of parasite control was high (92%) but monthly recorded use was significantly lower (61%).
Optimal estimation for discrete time jump processes

NASA Technical Reports Server (NTRS)

Vaca, M. V.; Tretter, S. A.

1978-01-01

Optimum estimates of nonobservable random variables or random processes which influence the rate functions of a discrete time jump process (DTJP) are derived. The approach used is based on the a posteriori probability of a nonobservable event expressed in terms of the a priori probability of that event and of the sample function probability of the DTJP. Thus a general representation is obtained for optimum estimates, and recursive equations are derived for minimum mean-squared error (MMSE) estimates. In general, MMSE estimates are nonlinear functions of the observations. The problem is considered of estimating the rate of a DTJP when the rate is a random variable with a beta probability density function and the jump amplitudes are binomially distributed. It is shown that the MMSE estimates are linear. The class of beta density functions is rather rich and explains why there are insignificant differences between optimum unconstrained and linear MMSE estimates in a variety of problems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bradonjic, Milan; Hagberg, Aric; Hengartner, Nick

We analyze component evolution in general random intersection graphs (RIGs) and give conditions on existence and uniqueness of the giant component. Our techniques generalize the existing methods for analysis on component evolution in RIGs. That is, we analyze survival and extinction properties of a dependent, inhomogeneous Galton-Watson branching process on general RIGs. Our analysis relies on bounding the branching processes and inherits the fundamental concepts from the study on component evolution in Erdos-Renyi graphs. The main challenge becomes from the underlying structure of RIGs, when the number of offsprings follows a binomial distribution with a different number of nodes andmore » different rate at each step during the evolution. RIGs can be interpreted as a model for large randomly formed non-metric data sets. Besides the mathematical analysis on component evolution, which we provide in this work, we perceive RIGs as an important random structure which has already found applications in social networks, epidemic networks, blog readership, or wireless sensor networks.« less
LS-CAP: an algorithm for identifying cytogenetic aberrations in hepatocellular carcinoma using microarray data.

PubMed

He, Xianmin; Wei, Qing; Sun, Meiqian; Fu, Xuping; Fan, Sichang; Li, Yao

2006-05-01

Biological techniques such as Array-Comparative genomic hybridization (CGH), fluorescent in situ hybridization (FISH) and affymetrix single nucleotide pleomorphism (SNP) array have been used to detect cytogenetic aberrations. However, on genomic scale, these techniques are labor intensive and time consuming. Comparative genomic microarray analysis (CGMA) has been used to identify cytogenetic changes in hepatocellular carcinoma (HCC) using gene expression microarray data. However, CGMA algorithm can not give precise localization of aberrations, fails to identify small cytogenetic changes, and exhibits false negatives and positives. Locally un-weighted smoothing cytogenetic aberrations prediction (LS-CAP) based on local smoothing and binomial distribution can be expected to address these problems. LS-CAP algorithm was built and used on HCC microarray profiles. Eighteen cytogenetic abnormalities were identified, among them 5 were reported previously, and 12 were proven by CGH studies. LS-CAP effectively reduced the false negatives and positives, and precisely located small fragments with cytogenetic aberrations.
Optimizing Probability of Detection Point Estimate Demonstration

NASA Technical Reports Server (NTRS)

Koshti, Ajay M.

2017-01-01

Probability of detection (POD) analysis is used in assessing reliably detectable flaw size in nondestructive evaluation (NDE). MIL-HDBK-18231and associated mh18232POD software gives most common methods of POD analysis. Real flaws such as cracks and crack-like flaws are desired to be detected using these NDE methods. A reliably detectable crack size is required for safe life analysis of fracture critical parts. The paper provides discussion on optimizing probability of detection (POD) demonstration experiments using Point Estimate Method. POD Point estimate method is used by NASA for qualifying special NDE procedures. The point estimate method uses binomial distribution for probability density. Normally, a set of 29 flaws of same size within some tolerance are used in the demonstration. The optimization is performed to provide acceptable value for probability of passing demonstration (PPD) and achieving acceptable value for probability of false (POF) calls while keeping the flaw sizes in the set as small as possible.
Bio-Ecology of the Louse, Upupicola upupae, Infesting the Common Hoopoe, Upupa epops

PubMed Central

Agarwal, G. P; Ahmad, Aftab; Rashmi, Archna; Arya, Gaurav; Bansal, Nayanci; Saxena, A.K.

2011-01-01

The population characteristics of the louse, Upupicola upupae (Shrank) (Mallophaga: Philopteridae: Ishnocera), infesting the Common Hoopae, Upupa epops L. (Aves: Upupiformes), were recorded during 2007–08 in District Rampur, Uttar Pradesh India. The pattern of frequency distribution of the louse conformed to the negative binomial model. The lice and its nits were reared in vitro at 35 ± 1° C, 75–82 % RH, on a feather diet. The data obtained was used to construct the life table and to determine the intrinsic rate of natural increase (0.035 female/day), the net reproductive rate was 3.67 female eggs/female, the generation time was 37 days, and the doubling time of the population was 19 days. The chaetotaxy of the three nymphal instars has also been noted to record their diagnostic characteristics. Information on egg morphology and antennal sensilla is also presented. PMID:21861650
On Statistical Modeling of Sequencing Noise in High Depth Data to Assess Tumor Evolution

NASA Astrophysics Data System (ADS)

Rabadan, Raul; Bhanot, Gyan; Marsilio, Sonia; Chiorazzi, Nicholas; Pasqualucci, Laura; Khiabanian, Hossein

2017-12-01

One cause of cancer mortality is tumor evolution to therapy-resistant disease. First line therapy often targets the dominant clone, and drug resistance can emerge from preexisting clones that gain fitness through therapy-induced natural selection. Such mutations may be identified using targeted sequencing assays by analysis of noise in high-depth data. Here, we develop a comprehensive, unbiased model for sequencing error background. We find that noise in sufficiently deep DNA sequencing data can be approximated by aggregating negative binomial distributions. Mutations with frequencies above noise may have prognostic value. We evaluate our model with simulated exponentially expanded populations as well as data from cell line and patient sample dilution experiments, demonstrating its utility in prognosticating tumor progression. Our results may have the potential to identify significant mutations that can cause recurrence. These results are relevant in the pretreatment clinical setting to determine appropriate therapy and prepare for potential recurrence pretreatment.
Isotopic studies of metabolic systems by mass spectrometry: using Pascal's triangle to produce biological standards with fully controlled labeling patterns.

PubMed

Millard, Pierre; Massou, Stéphane; Portais, Jean-Charles; Létisse, Fabien

2014-10-21

Mass spectrometry (MS) is widely used for isotopic studies of metabolism in which detailed information about biochemical processes is obtained from the analysis of isotope incorporation into metabolites. The biological value of such experiments is dependent on the accuracy of the isotopic measurements. Using MS, isotopologue distributions are measured from the quantitative analysis of isotopic clusters. These measurements are prone to various biases, which can occur during the experimental workflow and/or MS analysis. The lack of relevant standards limits investigations of the quality of the measured isotopologue distributions. To meet that need, we developed a complete theoretical and experimental framework for the biological production of metabolites with fully controlled and predictable labeling patterns. This strategy is valid for different isotopes and different types of metabolisms and organisms, and was applied to two model microorganisms, Pichia augusta and Escherichia coli, cultivated on (13)C-labeled methanol and acetate as sole carbon source, respectively. The isotopic composition of the substrates was designed to obtain samples in which the isotopologue distribution of all the metabolites should give the binomial coefficients found in Pascal's triangle. The strategy was validated on a liquid chromatography-tandem mass spectrometry (LC-MS/MS) platform by quantifying the complete isotopologue distributions of different intracellular metabolites, which were in close agreement with predictions. This strategy can be used to evaluate entire experimental workflows (from sampling to data processing) or different analytical platforms in the context of isotope labeling experiments.
Explanation of the Reaction of Monoclonal Antibodies with Candida Albicans Cell Surface in Terms of Compound Poisson Process

NASA Astrophysics Data System (ADS)

Dudek, Mirosław R.; Mleczko, Józef

Surprisingly, still very little is known about the mathematical modeling of peaks in the binding affinities distribution function. In general, it is believed that the peaks represent antibodies directed towards single epitopes. In this paper, we refer to fluorescence flow cytometry experiments and show that even monoclonal antibodies can display multi-modal histograms of affinity distribution. This result take place when some obstacles appear in the paratope-epitope reaction such that the process of reaching the specific epitope ceases to be a point Poisson process. A typical example is the large area of cell surface, which could be unreachable by antibodies leading to the heterogeneity of the cell surface repletion. In this case the affinity of cells to bind the antibodies should be described by a more complex process than the pure-Poisson point process. We suggested to use a doubly stochastic Poisson process, where the points are replaced by a binomial point process resulting in the Neyman distribution. The distribution can have a strongly multinomial character, and with the number of modes depending on the concentration of antibodies and epitopes. All this means that there is a possibility to go beyond the simplified theory, one response towards one epitope. As a consequence, our description provides perspectives for describing antigen-antibody reactions, both qualitatively and quantitavely, even in the case when some peaks result from more than one binding mechanism.
Fitting Cure Rate Model to Breast Cancer Data of Cancer Research Center.

PubMed

Baghestani, Ahmad Reza; Zayeri, Farid; Akbari, Mohammad Esmaeil; Shojaee, Leyla; Khadembashi, Naghmeh; Shahmirzalou, Parviz

2015-01-01

The Cox PH model is one of the most significant statistical models in studying survival of patients. But, in the case of patients with long-term survival, it may not be the most appropriate. In such cases, a cure rate model seems more suitable. The purpose of this study was to determine clinical factors associated with cure rate of patients with breast cancer. In order to find factors affecting cure rate (response), a non-mixed cure rate model with negative binomial distribution for latent variable was used. Variables selected were recurrence cancer, status for HER2, estrogen receptor (ER) and progesterone receptor (PR), size of tumor, grade of cancer, stage of cancer, type of surgery, age at the diagnosis time and number of removed positive lymph nodes. All analyses were performed using PROC MCMC processes in the SAS 9.2 program. The mean (SD) age of patients was equal to 48.9 (11.1) months. For these patients, 1, 5 and 10-year survival rates were 95, 79 and 50 percent respectively. All of the mentioned variables were effective in cure fraction. Kaplan-Meier curve showed cure model's use competence. Unlike other variables, existence of ER and PR positivity will increase probability of cure in patients. In the present study, Weibull distribution was used for the purpose of analysing survival times. Model fitness with other distributions such as log-N and log-logistic and other distributions for latent variable is recommended.
Copy number variants calling for single cell sequencing data by multi-constrained optimization.

PubMed

Xu, Bo; Cai, Hongmin; Zhang, Changsheng; Yang, Xi; Han, Guoqiang

2016-08-01

Variations in DNA copy number carry important information on genome evolution and regulation of DNA replication in cancer cells. The rapid development of single-cell sequencing technology allows one to explore gene expression heterogeneity among single-cells, thus providing important cancer cell evolution information. Single-cell DNA/RNA sequencing data usually have low genome coverage, which requires an extra step of amplification to accumulate enough samples. However, such amplification will introduce large bias and makes bioinformatics analysis challenging. Accurately modeling the distribution of sequencing data and effectively suppressing the bias influence is the key to success variations analysis. Recent advances demonstrate the technical noises by amplification are more likely to follow negative binomial distribution, a special case of Poisson distribution. Thus, we tackle the problem CNV detection by formulating it into a quadratic optimization problem involving two constraints, in which the underling signals are corrupted by Poisson distributed noises. By imposing the constraints of sparsity and smoothness, the reconstructed read depth signals from single-cell sequencing data are anticipated to fit the CNVs patterns more accurately. An efficient numerical solution based on the classical alternating direction minimization method (ADMM) is tailored to solve the proposed model. We demonstrate the advantages of the proposed method using both synthetic and empirical single-cell sequencing data. Our experimental results demonstrate that the proposed method achieves excellent performance and high promise of success with single-cell sequencing data. Crown Copyright © 2016. Published by Elsevier Ltd. All rights reserved.
A procedure for removing the effect of response bias errors from waterfowl hunter questionnaire responses

USGS Publications Warehouse

Atwood, E.L.

1958-01-01

Response bias errors are studied by comparing questionnaire responses from waterfowl hunters using four large public hunting areas with actual hunting data from these areas during two hunting seasons. To the extent that the data permit, the sources of the error in the responses were studied and the contribution of each type to the total error was measured. Response bias errors, including both prestige and memory bias, were found to be very large as compared to non-response and sampling errors. Good fits were obtained with the seasonal kill distribution of the actual hunting data and the negative binomial distribution and a good fit was obtained with the distribution of total season hunting activity and the semi-logarithmic curve. A comparison of the actual seasonal distributions with the questionnaire response distributions revealed that the prestige and memory bias errors are both positive. The comparisons also revealed the tendency for memory bias errors to occur at digit frequencies divisible by five and for prestige bias errors to occur at frequencies which are multiples of the legal daily bag limit. A graphical adjustment of the response distributions was carried out by developing a smooth curve from those frequency classes not included in the predictable biased frequency classes referred to above. Group averages were used in constructing the curve, as suggested by Ezekiel [1950]. The efficiency of the technique described for reducing response bias errors in hunter questionnaire responses on seasonal waterfowl kill is high in large samples. The graphical method is not as efficient in removing response bias errors in hunter questionnaire responses on seasonal hunting activity where an average of 60 percent was removed.
Interpreting carnivore scent-station surveys

USGS Publications Warehouse

Sargeant, G.A.; Johnson, D.H.; Berg, W.E.

1998-01-01

The scent-station survey method has been widely used to estimate trends in carnivore abundance. However, statistical properties of scent-station data are poorly understood, and the relation between scent-station indices and carnivore abundance has not been adequately evaluated. We assessed properties of scent-station indices by analyzing data collected in Minnesota during 1986-03. Visits to stations separated by <2 km were correlated for all species because individual carnivores sometimes visited several stations in succession. Thus, visits to stations had an intractable statistical distribution. Dichotomizing results for lines of 10 stations (0 or 21 visits) produced binomially distributed data that were robust to multiple visits by individuals. We abandoned 2-way comparisons among years in favor of tests for population trend, which are less susceptible to bias, and analyzed results separately for biogeographic sections of Minnesota because trends differed among sections. Before drawing inferences about carnivore population trends, we reevaluated published validation experiments. Results implicated low statistical power and confounding as possible explanations for equivocal or conflicting results of validation efforts. Long-term trends in visitation rates probably reflect real changes in populations, but poor spatial and temporal resolution, susceptibility to confounding, and low statistical power limit the usefulness of this survey method.
Consideration of species community composition in statistical ...

EPA Pesticide Factsheets

Diseases are increasing in marine ecosystems, and these increases have been attributed to a number of environmental factors including climate change, pollution, and overfishing. However, many studies pool disease prevalence into taxonomic groups, disregarding host species composition when comparing sites or assessing environmental impacts on patterns of disease presence. We used simulated data under a known environmental effect to assess the ability of standard statistical methods (binomial and linear regression, ANOVA) to detect a significant environmental effect on pooled disease prevalence with varying species abundance distributions and relative susceptibilities to disease. When one species was more susceptible to a disease and both species only partially overlapped in their distributions, models tended to produce a greater number of false positives (Type I error). Differences in disease risk between regions or along an environmental gradient tended to be underestimated, or even in the wrong direction, when highly susceptible taxa had reduced abundances in impacted sites, a situation likely to be common in nature. Including relative abundance as an additional variable in regressions improved model accuracy, but tended to be conservative, producing more false negatives (Type II error) when species abundance was strongly correlated with the environmental effect. Investigators should be cautious of underlying assumptions of species similarity in susceptib
Modeling the frequency of opposing left-turn conflicts at signalized intersections using generalized linear regression models.

PubMed

Zhang, Xin; Liu, Pan; Chen, Yuguang; Bai, Lu; Wang, Wei

2014-01-01

The primary objective of this study was to identify whether the frequency of traffic conflicts at signalized intersections can be modeled. The opposing left-turn conflicts were selected for the development of conflict predictive models. Using data collected at 30 approaches at 20 signalized intersections, the underlying distributions of the conflicts under different traffic conditions were examined. Different conflict-predictive models were developed to relate the frequency of opposing left-turn conflicts to various explanatory variables. The models considered include a linear regression model, a negative binomial model, and separate models developed for four traffic scenarios. The prediction performance of different models was compared. The frequency of traffic conflicts follows a negative binominal distribution. The linear regression model is not appropriate for the conflict frequency data. In addition, drivers behaved differently under different traffic conditions. Accordingly, the effects of conflicting traffic volumes on conflict frequency vary across different traffic conditions. The occurrences of traffic conflicts at signalized intersections can be modeled using generalized linear regression models. The use of conflict predictive models has potential to expand the uses of surrogate safety measures in safety estimation and evaluation.

Scale-free Graphs for General Aviation Flight Schedules

NASA Technical Reports Server (NTRS)

Alexandov, Natalia M. (Technical Monitor); Kincaid, Rex K.

2003-01-01

In the late 1990s a number of researchers noticed that networks in biology, sociology, and telecommunications exhibited similar characteristics unlike standard random networks. In particular, they found that the cummulative degree distributions of these graphs followed a power law rather than a binomial distribution and that their clustering coefficients tended to a nonzero constant as the number of nodes, n, became large rather than O(1/n). Moreover, these networks shared an important property with traditional random graphs as n becomes large the average shortest path length scales with log n. This latter property has been coined the small-world property. When taken together these three properties small-world, power law, and constant clustering coefficient describe what are now most commonly referred to as scale-free networks. Since 1997 at least six books and over 400 articles have been written about scale-free networks. In this manuscript an overview of the salient characteristics of scale-free networks. Computational experience will be provided for two mechanisms that grow (dynamic) scale-free graphs. Additional computational experience will be given for constructing (static) scale-free graphs via a tabu search optimization approach. Finally, a discussion of potential applications to general aviation networks is given.
[Sequential sampling plans to Orthezia praelonga Douglas (Hemiptera: Sternorrhyncha, Ortheziidae) in citrus].

PubMed

Costa, Marilia G; Barbosa, José C; Yamamoto, Pedro T

2007-01-01

The sequential sampling is characterized by using samples of variable sizes, and has the advantage of reducing sampling time and costs if compared to fixed-size sampling. To introduce an adequate management for orthezia, sequential sampling plans were developed for orchards under low and high infestation. Data were collected in Matão, SP, in commercial stands of the orange variety 'Pêra Rio', at five, nine and 15 years of age. Twenty samplings were performed in the whole area of each stand by observing the presence or absence of scales on plants, being plots comprised of ten plants. After observing that in all of the three stands the scale population was distributed according to the contagious model, fitting the Negative Binomial Distribution in most samplings, two sequential sampling plans were constructed according to the Sequential Likelihood Ratio Test (SLRT). To construct these plans an economic threshold of 2% was adopted and the type I and II error probabilities were fixed in alpha = beta = 0.10. Results showed that the maximum numbers of samples expected to determine control need were 172 and 76 samples for stands with low and high infestation, respectively.
Cellular Gauge Symmetry and the Li Organization Principle: A Mathematical Addendum. Quantifying energetic dynamics in physical and biological systems through a simple geometric tool and geodetic curves.

PubMed

Yurkin, Alexander; Tozzi, Arturo; Peters, James F; Marijuán, Pedro C

2017-12-01

The present Addendum complements the accompanying paper "Cellular Gauge Symmetry and the Li Organization Principle"; it illustrates a recently-developed geometrical physical model able to assess electronic movements and energetic paths in atomic shells. The model describes a multi-level system of circular, wavy and zigzag paths which can be projected onto a horizontal tape. This model ushers in a visual interpretation of the distribution of atomic electrons' energy levels and the corresponding quantum numbers through rather simple tools, such as compasses, rulers and straightforward calculations. Here we show how this geometrical model, with the due corrections, among them the use of geodetic curves, might be able to describe and quantify the structure and the temporal development of countless physical and biological systems, from Langevin equations for random paths, to symmetry breaks occurring ubiquitously in physical and biological phenomena, to the relationships among different frequencies of EEG electric spikes. Therefore, in our work we explore the possible association of binomial distribution and geodetic curves configuring a uniform approach for the research of natural phenomena, in biology, medicine or the neurosciences. Copyright © 2017 Elsevier Ltd. All rights reserved.
The Production of Hadrons in Muon Scattering on Deuterium and Xenon Nuclei at 480-GeV (in German)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soldner-Rembold, Stefan

1992-01-01

For the present thesis the hadronic final states of 6309 muon-deuterium events and 2064 muon-xenon events in the kinematical range Q 2>1 (GeV/c) 2, x>0.002, 0.1< y<0.85, 8< W<30 GeV, and θ>3.5 mrad were studied. The multiplicity distributions of the muon-deuterium events and the muon-xenon events were described by means of the negative binomial distribution in intervals of the c.m. energy W. The two parameters anti n (mean multiplicity) and 1/k show for the muon-deuterium events a linear dependence on ln W2. The mean multiplicity anti n on xenon (anti n=10.43±0.19) is distinctly higher than on deuterium (anti n=7.76±0.07). Themore » rapidity distributions of the positively charged and the negatively charged hadrons from muon-deuterium events are very well described by the Monte-Carlo program LUND. In the two-particle rapidity correlation both short-range and long-range correlations can be detected. The two-particle rapidity correlation in the xenon data are different from the deuterium data in the backward range. This difference indicates that the intranuclear cascade takes place in a limited range of small rapidities - relatively independently on the residual fragmentation process.« less
Neighborhood-Level and Spatial Characteristics Associated with Lay Naloxone Reversal Events and Opioid Overdose Deaths.

PubMed

Rowe, Christopher; Santos, Glenn-Milo; Vittinghoff, Eric; Wheeler, Eliza; Davidson, Peter; Coffin, Philip O

2016-02-01

There were over 23,000 opioid overdose deaths in the USA in 2013, and opioid-related mortality is increasing. Increased access to naloxone, particularly through community-based lay naloxone distribution, is a widely supported strategy to reduce opioid overdose mortality; however, little is known about the ecological and spatial patterns of the distribution and utilization of lay naloxone. This study aims to investigate the neighborhood-level correlates and spatial relationships of lay naloxone distribution and utilization and opioid overdose deaths. We determined the locations of lay naloxone distribution sites and the number of unintentional opioid overdose deaths and reported reversal events in San Francisco census tracts (n = 195) from 2010 to 2012. We used Wilcoxon rank-sum tests to compare census tract characteristics across tracts adjacent and not adjacent to distribution sites and multivariable negative binomial regression models to assess the association between census tract characteristics, including distance to the nearest site, and counts of opioid overdose deaths and naloxone reversal events. Three hundred forty-two opioid overdose deaths and 316 overdose reversals with valid location data were included in our analysis. Census tracts including or adjacent to a distribution site had higher income inequality, lower percentage black or African American residents, more drug arrests, higher population density, more overdose deaths, and more reversal events (all p < 0.05). In multivariable analysis, greater distance to the nearest distribution site (up to a distance of 4000 m) was associated with a lower count of Naloxone reversals [incidence rate ratio (IRR) = 0.51 per 500 m increase, 95% CI 0.39-0.67, p < 0.001] but was not significantly associated with opioid overdose deaths. These findings affirm that locating lay naloxone distribution sites in areas with high levels of substance use and overdose risk facilitates reversals of opioid overdoses in those immediate areas but suggests that alternative delivery methods may be necessary to reach individuals in other areas with less concentrated risk.
Development of enhanced pavement deterioration curves.

DOT National Transportation Integrated Search

2016-10-01

This report describes the research performed by the Center for Sustainable Transportation Infrastructure (CSTI) at the Virginia Tech Transportation Institute (VTTI) to develop a pavement condition prediction model, using (negative binomial) regressio...
A Taxonomic Reduced-Space Pollen Model for Paleoclimate Reconstruction

NASA Astrophysics Data System (ADS)

Wahl, E. R.; Schoelzel, C.

2010-12-01

Paleoenvironmental reconstruction from fossil pollen often attempts to take advantage of the rich taxonomic diversity in such data. Here, a taxonomically "reduced-space" reconstruction model is explored that would be parsimonious in introducing parameters needing to be estimated within a Bayesian Hierarchical Modeling context. This work involves a refinement of the traditional pollen ratio method. This method is useful when one (or a few) dominant pollen type(s) in a region have a strong positive correlation with a climate variable of interest and another (or a few) dominant pollen type(s) have a strong negative correlation. When, e.g., counts of pollen taxa a and b (r >0) are combined with pollen types c and d (r <0) to form ratios of the form (a + b) / (a + b + c + d), an appropriate estimation form is the binomial logistic generalized linear model (GLM). The GLM can readily model this relationship in the forward form, pollen = g(climate), which is more physically realistic than inverse models often used in paleoclimate reconstruction [climate = f(pollen)]. The specification of the model is: rnum Bin(n,p), where E(r|T) = p = exp(η)/[1+exp(η)], and η = α + β(T); r is the pollen ratio formed as above, rnum is the ratio numerator, n is the ratio denominator (i.e., the sum of pollen counts), the denominator-specific count is (n - rnum), and T is the temperature at each site corresponding to a specific value of r. Ecological and empirical screening identified the model (Spruce+Birch) / (Spruce+Birch+Oak+Hickory) for use in temperate eastern N. America. α and β were estimated using both "traditional" and Bayesian GLM algorithms (in R). Although it includes only four pollen types, the ratio model yields more explained variation ( 80%) in the pollen-temperature relationship of the study region than a 64-taxon modern analog technique (MAT). Thus, the new pollen ratio method represents an information-rich, reduced space data model that can be efficiently employed in a BHM framework. The ratio model can directly reconstruct past temperature by solving the GLM equations for T as a function of α, β, and E(r|T): T = {ln[E(r|T)/{1-E(r|T)}]-α}/β. To enable use in paleoreconstruction, the observed r values from fossil pollen data are, by assumption, treated as unbiased estimators of the true r value at each time sampled, which can be substituted for E(r|T). Uncertainty in this reconstruction is systematically evaluated in two parts: 1) the observed r values and their corresponding n values are input as parameters into the binomial distribution, Monte Carlo random pollen count draws are made, and a new ratio value is determined for each iteration; and 2) in the "traditional" GLM the estimated SEs for α and β are used with the α and β EV estimates to yield Monte Carlo random draws for each binomial draw (assuming α and β are Gaussian), in the Bayesian GLM random draws for α and β are taken directly from their estimated posterior distribution. Both methods yield nearly identical reconstructions from varved lakes in Wisconsin where the model has been tested; slightly narrower uncertainty ranges are produced by the Bayesian model. The Little Ice Age is readily identified. Pine:Oak and Fir:Oak versions of the model used in S. California show differences from MAT-based reconstructions.
Indicators of Terrorism Vulnerability in Africa

DTIC Science & Technology

2015-03-26

the terror threat and vulnerabilities across Africa. Key words: Terrorism, Africa, Negative Binomial Regression, Classification Tree iv I would like...31 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Log -likelihood...70 viii Page 5.3 Classification Tree Description
Points on the Path to Probability.

ERIC Educational Resources Information Center

Kiernan, James F.

2001-01-01

Presents the problem of points and the development of the binomial triangle, or Pascal's triangle. Examines various attempts to solve this problem to give students insight into the nature of mathematical discovery. (KHR)
Application of the Conway-Maxwell-Poisson generalized linear model for analyzing motor vehicle crashes.

PubMed

Lord, Dominique; Guikema, Seth D; Geedipally, Srinivas Reddy

2008-05-01

This paper documents the application of the Conway-Maxwell-Poisson (COM-Poisson) generalized linear model (GLM) for modeling motor vehicle crashes. The COM-Poisson distribution, originally developed in 1962, has recently been re-introduced by statisticians for analyzing count data subjected to over- and under-dispersion. This innovative distribution is an extension of the Poisson distribution. The objectives of this study were to evaluate the application of the COM-Poisson GLM for analyzing motor vehicle crashes and compare the results with the traditional negative binomial (NB) model. The comparison analysis was carried out using the most common functional forms employed by transportation safety analysts, which link crashes to the entering flows at intersections or on segments. To accomplish the objectives of the study, several NB and COM-Poisson GLMs were developed and compared using two datasets. The first dataset contained crash data collected at signalized four-legged intersections in Toronto, Ont. The second dataset included data collected for rural four-lane divided and undivided highways in Texas. Several methods were used to assess the statistical fit and predictive performance of the models. The results of this study show that COM-Poisson GLMs perform as well as NB models in terms of GOF statistics and predictive performance. Given the fact the COM-Poisson distribution can also handle under-dispersed data (while the NB distribution cannot or has difficulties converging), which have sometimes been observed in crash databases, the COM-Poisson GLM offers a better alternative over the NB model for modeling motor vehicle crashes, especially given the important limitations recently documented in the safety literature about the latter type of model.
Development of tiger habitat suitability model using geospatial tools-a case study in Achankmar Wildlife Sanctuary (AMWLS), Chhattisgarh India.

PubMed

Singh, R; Joshi, P K; Kumar, M; Dash, P P; Joshi, B D

2009-08-01

Geospatial tools supported by ancillary geo-database and extensive fieldwork regarding the distribution of tiger and its prey in Anchankmar Wildlife Sanctuary (AMWLS) were used to build a tiger habitat suitability model. This consists of a quantitative geographical information system (GIS) based approach using field parameters and spatial thematic information. The estimates of tiger sightings, its prey sighting and predicted distribution with the assistance of contextual environmental data including terrain, road network, settlement and drainage surfaces were used to develop the model. Eight variables in the dataset viz., forest cover type, forest cover density, slope, aspect, altitude, and distance from road, settlement and drainage were seen as suitable proxies and were used as independent variables in the analysis. Principal component analysis and binomial multiple logistic regression were used for statistical treatments of collected habitat parameters from field and independent variables respectively. The assessment showed a strong expert agreement between the predicted and observed suitable areas. A combination of the generated information and published literature was also used while building a habitat suitability map for the tiger. The modeling approach has taken the habitat preference parameters of the tiger and potential distribution of prey species into account. For assessing the potential distribution of prey species, independent suitability models were developed and validated with the ground truth. It is envisaged that inclusion of the prey distribution probability strengthens the model when a key species is under question. The results of the analysis indicate that tiger occur throughout the sanctuary. The results have been found to be an important input as baseline information for population modeling and natural resource management in the wildlife sanctuary. The development and application of similar models can help in better management of the protected areas of national interest.
Matching the Statistical Model to the Research Question for Dental Caries Indices with Many Zero Counts.

PubMed

Preisser, John S; Long, D Leann; Stamm, John W

2017-01-01

Marginalized zero-inflated count regression models have recently been introduced for the statistical analysis of dental caries indices and other zero-inflated count data as alternatives to traditional zero-inflated and hurdle models. Unlike the standard approaches, the marginalized models directly estimate overall exposure or treatment effects by relating covariates to the marginal mean count. This article discusses model interpretation and model class choice according to the research question being addressed in caries research. Two data sets, one consisting of fictional dmft counts in 2 groups and the other on DMFS among schoolchildren from a randomized clinical trial comparing 3 toothpaste formulations to prevent incident dental caries, are analyzed with negative binomial hurdle, zero-inflated negative binomial, and marginalized zero-inflated negative binomial models. In the first example, estimates of treatment effects vary according to the type of incidence rate ratio (IRR) estimated by the model. Estimates of IRRs in the analysis of the randomized clinical trial were similar despite their distinctive interpretations. The choice of statistical model class should match the study's purpose, while accounting for the broad decline in children's caries experience, such that dmft and DMFS indices more frequently generate zero counts. Marginalized (marginal mean) models for zero-inflated count data should be considered for direct assessment of exposure effects on the marginal mean dental caries count in the presence of high frequencies of zero counts. © 2017 S. Karger AG, Basel.
Matching the Statistical Model to the Research Question for Dental Caries Indices with Many Zero Counts

PubMed Central

Preisser, John S.; Long, D. Leann; Stamm, John W.

2017-01-01

Marginalized zero-inflated count regression models have recently been introduced for the statistical analysis of dental caries indices and other zero-inflated count data as alternatives to traditional zero-inflated and hurdle models. Unlike the standard approaches, the marginalized models directly estimate overall exposure or treatment effects by relating covariates to the marginal mean count. This article discusses model interpretation and model class choice according to the research question being addressed in caries research. Two datasets, one consisting of fictional dmft counts in two groups and the other on DMFS among schoolchildren from a randomized clinical trial (RCT) comparing three toothpaste formulations to prevent incident dental caries, are analysed with negative binomial hurdle (NBH), zero-inflated negative binomial (ZINB), and marginalized zero-inflated negative binomial (MZINB) models. In the first example, estimates of treatment effects vary according to the type of incidence rate ratio (IRR) estimated by the model. Estimates of IRRs in the analysis of the RCT were similar despite their distinctive interpretations. Choice of statistical model class should match the study’s purpose, while accounting for the broad decline in children’s caries experience, such that dmft and DMFS indices more frequently generate zero counts. Marginalized (marginal mean) models for zero-inflated count data should be considered for direct assessment of exposure effects on the marginal mean dental caries count in the presence of high frequencies of zero counts. PMID:28291962
Linnaean sources and concepts of orchids.

PubMed

Jarvis, Charlie; Cribb, Phillip

2009-08-01

Linnaeus developed a robust system for naming plants and a useful, if mechanical, system for classifying them. His binomial nomenclature proved the catalyst for the rapid development of our knowledge of orchids, with his work on the family dating back to 1737 in the first edition of his Genera Plantarum. His first work devoted to orchids, indeed the first monograph of the family, was published in 1740 and formed the basis for his account in Species Plantarum, published in 1753, in which he gave a binomial name to each species. Given the overwhelming number of orchids, he included surprisingly few - only 62 mostly European species - in Species Plantarum, his seminal work on the plants of the world. This reflects the European origin of modern botany and the concentration of extra-European exploration on other matters, such as conquest, gold and useful plants. Nevertheless, the scope of Linnaeus' work is broad, including plants from as far afield as India, Japan, China and the Philippines to the east, and eastern Canada, the West Indies and northern South America to the west. In his later publications he described and named a further 45 orchids, mostly from Europe, South Africa and the tropical Americas. The philosophical basis of Linnaeus' work on orchids is discussed and his contribution to our knowledge of the family assessed. His generic and species concepts are considered in the light of current systematic ideas, but his adoption of binomial nomenclature for all plants is his lasting legacy.
Development of binomial sequential sampling plans for forecasting Listronotus maculicollis (Coleoptera: Curculionidae) larvae based on the relationship to adult counts and turfgrass damage.

PubMed

McGraw, Benjamin A; Koppenhöfer, Albrecht M

2009-06-01

Binomial sequential sampling plans were developed to forecast weevil Listronotus maculicollis Kirby (Coleoptera: Curculionidae), larval damage to golf course turfgrass and aid in the development of integrated pest management programs for the weevil. Populations of emerging overwintered adults were sampled over a 2-yr period to determine the relationship between adult counts, larval density, and turfgrass damage. Larval density and composition of preferred host plants (Poa annua L.) significantly affected the expression of turfgrass damage. Multiple regression indicates that damage may occur in moderately mixed P. annua stands with as few as 10 larvae per 0.09 m2. However, > 150 larvae were required before damage became apparent in pure Agrostis stolonifera L. plots. Adult counts during peaks in emergence as well as cumulative counts across the emergence period were significantly correlated to future densities of larvae. Eight binomial sequential sampling plans based on two tally thresholds for classifying infestation (T = 1 and two adults) and four adult density thresholds (0.5, 0.85, 1.15, and 1.35 per 3.34 m2) were developed to forecast the likelihood of turfgrass damage by using adult counts during peak emergence. Resampling for validation of sample plans software was used to validate sampling plans with field-collected data sets. All sampling plans were found to deliver accurate classifications (correct decisions were made between 84.4 and 96.8%) in a practical timeframe (average sampling cost < 22.7 min).
Time-dependent summary receiver operating characteristics for meta-analysis of prognostic studies.

PubMed

Hattori, Satoshi; Zhou, Xiao-Hua

2016-11-20

Prognostic studies are widely conducted to examine whether biomarkers are associated with patient's prognoses and play important roles in medical decisions. Because findings from one prognostic study may be very limited, meta-analyses may be useful to obtain sound evidence. However, prognostic studies are often analyzed by relying on a study-specific cut-off value, which can lead to difficulty in applying the standard meta-analysis techniques. In this paper, we propose two methods to estimate a time-dependent version of the summary receiver operating characteristics curve for meta-analyses of prognostic studies with a right-censored time-to-event outcome. We introduce a bivariate normal model for the pair of time-dependent sensitivity and specificity and propose a method to form inferences based on summary statistics reported in published papers. This method provides a valid inference asymptotically. In addition, we consider a bivariate binomial model. To draw inferences from this bivariate binomial model, we introduce a multiple imputation method. The multiple imputation is found to be approximately proper multiple imputation, and thus the standard Rubin's variance formula is justified from a Bayesian view point. Our simulation study and application to a real dataset revealed that both methods work well with a moderate or large number of studies and the bivariate binomial model coupled with the multiple imputation outperforms the bivariate normal model with a small number of studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Linnaean sources and concepts of orchids

PubMed Central

Jarvis, Charlie; Cribb, Phillip

2009-01-01

Background Linnaeus developed a robust system for naming plants and a useful, if mechanical, system for classifying them. His binomial nomenclature proved the catalyst for the rapid development of our knowledge of orchids, with his work on the family dating back to 1737 in the first edition of his Genera Plantarum. His first work devoted to orchids, indeed the first monograph of the family, was published in 1740 and formed the basis for his account in Species Plantarum, published in 1753, in which he gave a binomial name to each species. Given the overwhelming number of orchids, he included surprisingly few – only 62 mostly European species – in Species Plantarum, his seminal work on the plants of the world. This reflects the European origin of modern botany and the concentration of extra-European exploration on other matters, such as conquest, gold and useful plants. Nevertheless, the scope of Linnaeus' work is broad, including plants from as far afield as India, Japan, China and the Philippines to the east, and eastern Canada, the West Indies and northern South America to the west. In his later publications he described and named a further 45 orchids, mostly from Europe, South Africa and the tropical Americas. Scope The philosophical basis of Linnaeus' work on orchids is discussed and his contribution to our knowledge of the family assessed. His generic and species concepts are considered in the light of current systematic ideas, but his adoption of binomial nomenclature for all plants is his lasting legacy. PMID:19182221
The Gumbel hypothesis test for left censored observations using regional earthquake records as an example

NASA Astrophysics Data System (ADS)

Thompson, E. M.; Hewlett, J. B.; Baise, L. G.; Vogel, R. M.

2011-01-01

Annual maximum (AM) time series are incomplete (i.e., censored) when no events are included above the assumed censoring threshold (i.e., magnitude of completeness). We introduce a distrtibutional hypothesis test for left-censored Gumbel observations based on the probability plot correlation coefficient (PPCC). Critical values of the PPCC hypothesis test statistic are computed from Monte-Carlo simulations and are a function of sample size, censoring level, and significance level. When applied to a global catalog of earthquake observations, the left-censored Gumbel PPCC tests are unable to reject the Gumbel hypothesis for 45 of 46 seismic regions. We apply four different field significance tests for combining individual tests into a collective hypothesis test. None of the field significance tests are able to reject the global hypothesis that AM earthquake magnitudes arise from a Gumbel distribution. Because the field significance levels are not conclusive, we also compute the likelihood that these field significance tests are unable to reject the Gumbel model when the samples arise from a more complex distributional alternative. A power study documents that the censored Gumbel PPCC test is unable to reject some important and viable Generalized Extreme Value (GEV) alternatives. Thus, we cannot rule out the possibility that the global AM earthquake time series could arise from a GEV distribution with a finite upper bound, also known as a reverse Weibull distribution. Our power study also indicates that the binomial and uniform field significance tests are substantially more powerful than the more commonly used Bonferonni and false discovery rate multiple comparison procedures.
Determinants of the geographic distribution of Puumala virus and Lyme borreliosis infections in Belgium.

PubMed

Linard, Catherine; Lamarque, Pénélope; Heyman, Paul; Ducoffre, Geneviève; Luyasu, Victor; Tersago, Katrien; Vanwambeke, Sophie O; Lambin, Eric F

2007-05-02

Vector-borne and zoonotic diseases generally display clear spatial patterns due to different space-dependent factors. Land cover and land use influence disease transmission by controlling both the spatial distribution of vectors or hosts, and the probability of contact with susceptible human populations. The objective of this study was to combine environmental and socio-economic factors to explain the spatial distribution of two emerging human diseases in Belgium, Puumala virus (PUUV) and Lyme borreliosis. Municipalities were taken as units of analysis. Negative binomial regressions including a correction for spatial endogeneity show that the spatial distribution of PUUV and Lyme borreliosis infections are associated with a combination of factors linked to the vector and host populations, to human behaviours, and to landscape attributes. Both diseases are associated with the presence of forests, which are the preferred habitat for vector or host populations. The PUUV infection risk is higher in remote forest areas, where the level of urbanisation is low, and among low-income populations. The Lyme borreliosis transmission risk is higher in mixed landscapes with forests and spatially dispersed houses, mostly in wealthy peri-urban areas. The spatial dependence resulting from a combination of endogenous and exogenous processes could be accounted for in the model on PUUV but not for Lyme borreliosis. A large part of the spatial variation in disease risk can be explained by environmental and socio-economic factors. The two diseases not only are most prevalent in different regions but also affect different groups of people. Combining these two criteria may increase the efficiency of information campaigns through appropriate targeting.
Effects of dynamical grouping on cooperation in N-person evolutionary snowdrift game

NASA Astrophysics Data System (ADS)

Ji, M.; Xu, C.; Hui, P. M.

2011-09-01

A population typically consists of agents that continually distribute themselves into different groups at different times. This dynamic grouping has recently been shown to be essential in explaining many features observed in human activities including social, economic, and military activities. We study the effects of dynamic grouping on the level of cooperation in a modified evolutionary N-person snowdrift game. Due to the formation of dynamical groups, the competition takes place in groups of different sizes at different times and players of different strategies are mixed by the grouping dynamics. It is found that the level of cooperation is greatly enhanced by the dynamic grouping of agents, when compared with a static population of the same size. As a parameter β, which characterizes the relative importance of the reward and cost, increases, the fraction of cooperative players fC increases and it is possible to achieve a fully cooperative state. Analytically, we present a dynamical equation that incorporates the effects of the competing game and group size distribution. The distribution of cooperators in different groups is assumed to be a binomial distribution, which is confirmed by simulations. Results from the analytic equation are in good agreement with numerical results from simulations. We also present detailed simulation results of fC over the parameter space spanned by the probabilities of group coalescence νm and group fragmentation νp in the grouping dynamics. A high νm and low νp promotes cooperation, and a favorable reward characterized by a high β would lead to a fully cooperative state.

Effects of Cognition, Function, and Behavioral and Psychological Symptoms on Medicare Expenditures and Health Care Utilization for Persons With Dementia.

PubMed

Jutkowitz, Eric; Kane, Robert L; Dowd, Bryan; Gaugler, Joseph E; MacLehose, Richard F; Kuntz, Karen M

2017-06-01

Clinical features of dementia (cognition, function, and behavioral/psychological symptoms [BPSD]) may differentially affect Medicare expenditures/health care utilization. We linked cross-sectional data from the Aging, Demographics, and Memory Study to Medicare data to evaluate the association between dementia clinical features among those with dementia and Medicare expenditures/health care utilization (n = 234). Cognition was evaluated using the Mini-Mental State Examination (MMSE). Function was evaluated as the number of functional limitations (0-10). BPSD was evaluated as the number of symptoms (0-12). Expenditures were estimated with a generalized linear model (log-link and gamma distribution). Number of hospitalizations, institutional outpatient visits, and physician visits were estimated with a negative binomial regression. Medicare covered skilled nursing days were estimated with a zero-inflated negative binomial model. Cognition and BPSD were not associated with expenditures. Among individuals with less than seven functional limitations, one additional limitation was associated with $123 (95% confidence interval: $19-$227) additional monthly Medicare spending. Better cognition and poorer function were associated with more hospitalizations among those with an MMSE less than three and less than six functional limitations, respectively. BPSD had no effect on hospitalizations. Poorer function and fewer BPSD were associated with more skilled nursing among individuals with one to seven functional limitations and more than four symptoms, respectively. Cognition had no effect on skilled nursing care. No clinical feature was associated with institutional outpatient care. Of individuals with an MMSE less than 15, poorer cognition was associated with fewer physician visits. Among those with more than six functional limitations, poorer function was associated with fewer physician visits. Poorer function, not cognition or BPSD, was associated with higher Medicare expenditures. © The Author 2017. Published by Oxford University Press on behalf of The Gerontological Society of America. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
A recurrent, multistate outbreak of salmonella serotype agona infections associated with dry, unsweetened cereal consumption, United States, 2008.

PubMed

Russo, Elizabeth T; Biggerstaff, Gwen; Hoekstra, R Michael; Meyer, Stephanie; Patel, Nehal; Miller, Benjamin; Quick, Rob

2013-02-01

An outbreak of Salmonella enterica serotype Agona infections associated with nationwide distribution of cereal from Company X was identified in April 2008. This outbreak was detected using PulseNet, the national molecular subtyping network for foodborne disease surveillance, which coincided with Company X's voluntary recall of unsweetened puffed rice and wheat cereals after routine product sampling yielded Salmonella Agona. A case patient was defined as being infected with the outbreak strain of Salmonella Agona, with illness onset from 1 January through 1 July 2008. Case patients were interviewed using a standard questionnaire, and the proportion of ill persons who reported eating Company X puffed rice cereal was compared with Company X's market share data using binomial testing. The Minnesota Department of Agriculture inspected the cereal production facility and collected both product and environmental swab samples. Routine surveillance identified 33 case patients in 17 states. Of 32 patients interviewed, 24 (83%) reported eating Company X puffed rice cereal. Company X puffed rice cereal represented 0.063% of the total ready-to-eat dry cereal market share in the United States at the time of the investigation. Binomial testing suggested that the proportion of exposed case patients would not likely occur by chance (P < 0.0001). Of 17 cereal samples collected from case patient homes for laboratory testing, 2 (12%) yielded Salmonella Agona indistinguishable from the outbreak strain. Twelve environmental swabs and nine product samples from the cereal plant yielded the outbreak strain of Salmonella Agona. Company X cereal was implicated in a similar outbreak of Salmonella Agona infection in 1998 with the same outbreak strain linked to the same production facility. We hypothesize that a recent construction project at this facility created an open wall near the cereal production area allowing reintroduction of Salmonella Agona into the product, highlighting the resilience of Salmonella in dry food production environments.
CUMPOIS- CUMULATIVE POISSON DISTRIBUTION PROGRAM

NASA Technical Reports Server (NTRS)

Bowerman, P. N.

1994-01-01

The Cumulative Poisson distribution program, CUMPOIS, is one of two programs which make calculations involving cumulative poisson distributions. Both programs, CUMPOIS (NPO-17714) and NEWTPOIS (NPO-17715), can be used independently of one another. CUMPOIS determines the approximate cumulative binomial distribution, evaluates the cumulative distribution function (cdf) for gamma distributions with integer shape parameters, and evaluates the cdf for chi-square distributions with even degrees of freedom. It can be used by statisticians and others concerned with probabilities of independent events occurring over specific units of time, area, or volume. CUMPOIS calculates the probability that n or less events (ie. cumulative) will occur within any unit when the expected number of events is given as lambda. Normally, this probability is calculated by a direct summation, from i=0 to n, of terms involving the exponential function, lambda, and inverse factorials. This approach, however, eventually fails due to underflow for sufficiently large values of n. Additionally, when the exponential term is moved outside of the summation for simplification purposes, there is a risk that the terms remaining within the summation, and the summation itself, will overflow for certain values of i and lambda. CUMPOIS eliminates these possibilities by multiplying an additional exponential factor into the summation terms and the partial sum whenever overflow/underflow situations threaten. The reciprocal of this term is then multiplied into the completed sum giving the cumulative probability. The CUMPOIS program is written in C. It was developed on an IBM AT with a numeric co-processor using Microsoft C 5.0. Because the source code is written using standard C structures and functions, it should compile correctly on most C compilers. The program format is interactive, accepting lambda and n as inputs. It has been implemented under DOS 3.2 and has a memory requirement of 26K. CUMPOIS was developed in 1988.
Mapping potential Anopheles gambiae s.l. larval distribution using remotely sensed climatic and environmental variables in Baringo, Kenya.

PubMed

Amadi, J A; Ong'amo, G O; Olago, D O; Oriaso, S O; Nyamongo, I K; Estambale, B B A

2018-06-21

Anopheles gambiae s.l. (Diptera: Culicidae) is responsible for the transmission of the devastating Plasmodium falciparum (Haemosporida: Plasmodiidae) strain of malaria in Africa. This study investigated the relationship between climate and environmental conditions and An. gambiae s.l. larvae abundance and modelled the larval distribution of this species in Baringo County, Kenya. Mosquito larvae were collected using a 350-mL dipper and a pipette once per month from December 2015 to December 2016. A random forest algorithm was used to generate vegetation cover classes. A negative binomial regression was used to model the association between remotely sensed climate (rainfall and temperature) and environmental (vegetation cover, vegetation health, topographic wetness and slope) factors and An. gambiae s.l. for December 2015. Anopheles gambiae s.l. was significantly more frequent in the riverine zone (P < 0.05, r = 0.59) compared with the lowland zone. Rainfall (b = 6.22, P < 0.001), slope (b = - 4.81, P = 0.012) and vegetation health (b = - 5.60, P = 0.038) significantly influenced the distribution of An. gambiae s.l. larvae. High An. gambiae s.l. abundance was associated with cropland and wetland environments. Effective malaria control will require zone-specific interventions such as a focused dry season vector control strategy in the riverine zone. © 2018 The Royal Entomological Society.
Decision-theoretic designs for a series of trials with correlated treatment effects using the Sarmanov multivariate beta-binomial distribution.

PubMed

Hee, Siew Wan; Parsons, Nicholas; Stallard, Nigel

2018-03-01

The motivation for the work in this article is the setting in which a number of treatments are available for evaluation in phase II clinical trials and where it may be infeasible to try them concurrently because the intended population is small. This paper introduces an extension of previous work on decision-theoretic designs for a series of phase II trials. The program encompasses a series of sequential phase II trials with interim decision making and a single two-arm phase III trial. The design is based on a hybrid approach where the final analysis of the phase III data is based on a classical frequentist hypothesis test, whereas the trials are designed using a Bayesian decision-theoretic approach in which the unknown treatment effect is assumed to follow a known prior distribution. In addition, as treatments are intended for the same population it is not unrealistic to consider treatment effects to be correlated. Thus, the prior distribution will reflect this. Data from a randomized trial of severe arthritis of the hip are used to test the application of the design. We show that the design on average requires fewer patients in phase II than when the correlation is ignored. Correspondingly, the time required to recommend an efficacious treatment for phase III is quicker. © 2017 The Author. Biometrical Journal published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
An In-Depth Analysis of the Chung-Lu Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Winlaw, M.; DeSterck, H.; Sanders, G.

2015-10-28

In the classic Erd}os R enyi random graph model [5] each edge is chosen with uniform probability and the degree distribution is binomial, limiting the number of graphs that can be modeled using the Erd}os R enyi framework [10]. The Chung-Lu model [1, 2, 3] is an extension of the Erd}os R enyi model that allows for more general degree distributions. The probability of each edge is no longer uniform and is a function of a user-supplied degree sequence, which by design is the expected degree sequence of the model. This property makes it an easy model to work withmore » theoretically and since the Chung-Lu model is a special case of a random graph model with a given degree sequence, many of its properties are well known and have been studied extensively [2, 3, 13, 8, 9]. It is also an attractive null model for many real-world networks, particularly those with power-law degree distributions and it is sometimes used as a benchmark for comparison with other graph generators despite some of its limitations [12, 11]. We know for example, that the average clustering coe cient is too low relative to most real world networks. As well, measures of a nity are also too low relative to most real-world networks of interest. However, despite these limitations or perhaps because of them, the Chung-Lu model provides a basis for comparing new graph models.« less
RnaSeqSampleSize: real data based sample size estimation for RNA sequencing.

PubMed

Zhao, Shilin; Li, Chung-I; Guo, Yan; Sheng, Quanhu; Shyr, Yu

2018-05-30

One of the most important and often neglected components of a successful RNA sequencing (RNA-Seq) experiment is sample size estimation. A few negative binomial model-based methods have been developed to estimate sample size based on the parameters of a single gene. However, thousands of genes are quantified and tested for differential expression simultaneously in RNA-Seq experiments. Thus, additional issues should be carefully addressed, including the false discovery rate for multiple statistic tests, widely distributed read counts and dispersions for different genes. To solve these issues, we developed a sample size and power estimation method named RnaSeqSampleSize, based on the distributions of gene average read counts and dispersions estimated from real RNA-seq data. Datasets from previous, similar experiments such as the Cancer Genome Atlas (TCGA) can be used as a point of reference. Read counts and their dispersions were estimated from the reference's distribution; using that information, we estimated and summarized the power and sample size. RnaSeqSampleSize is implemented in R language and can be installed from Bioconductor website. A user friendly web graphic interface is provided at http://cqs.mc.vanderbilt.edu/shiny/RnaSeqSampleSize/ . RnaSeqSampleSize provides a convenient and powerful way for power and sample size estimation for an RNAseq experiment. It is also equipped with several unique features, including estimation for interested genes or pathway, power curve visualization, and parameter optimization.
Limitations of Poisson statistics in describing radioactive decay.

PubMed

Sitek, Arkadiusz; Celler, Anna M

2015-12-01

The assumption that nuclear decays are governed by Poisson statistics is an approximation. This approximation becomes unjustified when data acquisition times longer than or even comparable with the half-lives of the radioisotope in the sample are considered. In this work, the limits of the Poisson-statistics approximation are investigated. The formalism for the statistics of radioactive decay based on binomial distribution is derived. The theoretical factor describing the deviation of variance of the number of decays predicated by the Poisson distribution from the true variance is defined and investigated for several commonly used radiotracers such as (18)F, (15)O, (82)Rb, (13)N, (99m)Tc, (123)I, and (201)Tl. The variance of the number of decays estimated using the Poisson distribution is significantly different than the true variance for a 5-minute observation time of (11)C, (15)O, (13)N, and (82)Rb. Durations of nuclear medicine studies often are relatively long; they may be even a few times longer than the half-lives of some short-lived radiotracers. Our study shows that in such situations the Poisson statistics is unsuitable and should not be applied to describe the statistics of the number of decays in radioactive samples. However, the above statement does not directly apply to counting statistics at the level of event detection. Low sensitivities of detectors which are used in imaging studies make the Poisson approximation near perfect. Copyright © 2015 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Access to recreational physical activities by car and bus: an assessment of socio-spatial inequalities in mainland Scotland.

PubMed

Ferguson, Neil S; Lamb, Karen E; Wang, Yang; Ogilvie, David; Ellaway, Anne

2013-01-01

Obesity and other chronic conditions linked with low levels of physical activity (PA) are associated with deprivation. One reason for this could be that it is more difficult for low-income groups to access recreational PA facilities such as swimming pools and sports centres than high-income groups. In this paper, we explore the distribution of access to PA facilities by car and bus across mainland Scotland by income deprivation at datazone level. GIS car and bus networks were created to determine the number of PA facilities accessible within travel times of 10, 20 and 30 minutes. Multilevel negative binomial regression models were then used to investigate the distribution of the number of accessible facilities, adjusting for datazone population size and local authority. Access to PA facilities by car was significantly (p<0.01) higher for the most affluent quintile of area-based income deprivation than for most other quintiles in small towns and all other quintiles in rural areas. Accessibility by bus was significantly lower for the most affluent quintile than for other quintiles in urban areas and small towns, but not in rural areas. Overall, we found that the most disadvantaged groups were those without access to a car and living in the most affluent areas or in rural areas.
Sampling Error in Relation to Cyst Nematode Population Density Estimation in Small Field Plots.

PubMed

Župunski, Vesna; Jevtić, Radivoje; Jokić, Vesna Spasić; Župunski, Ljubica; Lalošević, Mirjana; Ćirić, Mihajlo; Ćurčić, Živko

2017-06-01

Cyst nematodes are serious plant-parasitic pests which could cause severe yield losses and extensive damage. Since there is still very little information about error of population density estimation in small field plots, this study contributes to the broad issue of population density assessment. It was shown that there was no significant difference between cyst counts of five or seven bulk samples taken per each 1-m 2 plot, if average cyst count per examined plot exceeds 75 cysts per 100 g of soil. Goodness of fit of data to probability distribution tested with χ 2 test confirmed a negative binomial distribution of cyst counts for 21 out of 23 plots. The recommended measure of sampling precision of 17% expressed through coefficient of variation ( cv ) was achieved if the plots of 1 m 2 contaminated with more than 90 cysts per 100 g of soil were sampled with 10-core bulk samples taken in five repetitions. If plots were contaminated with less than 75 cysts per 100 g of soil, 10-core bulk samples taken in seven repetitions gave cv higher than 23%. This study indicates that more attention should be paid on estimation of sampling error in experimental field plots to ensure more reliable estimation of population density of cyst nematodes.
Defect formation in LaGa(Mg,Ni)O3-δ : A statistical thermodynamic analysis validated by mixed conductivity and magnetic susceptibility measurements

NASA Astrophysics Data System (ADS)

Naumovich, E. N.; Kharton, V. V.; Yaremchenko, A. A.; Patrakeev, M. V.; Kellerman, D. G.; Logvinovich, D. I.; Kozhevnikov, V. L.

2006-08-01

A statistical thermodynamic approach to analyze defect thermodynamics in strongly nonideal solid solutions was proposed and validated by a case study focused on the oxygen intercalation processes in mixed-conducting LaGa0.65Mg0.15Ni0.20O3-δ perovskite. The oxygen nonstoichiometry of Ni-doped lanthanum gallate, measured by coulometric titration and thermogravimetric analysis at 923-1223K in the oxygen partial pressure range 5×10-5to0.9atm , indicates the coexistence of Ni2+ , Ni3+ , and Ni4+ oxidation states. The formation of tetravalent nickel was also confirmed by the magnetic susceptibility data at 77-600K , and by the analysis of p -type electronic conductivity and Seebeck coefficient as function of the oxygen pressure at 1023-1223K . The oxygen thermodynamics and the partial ionic and hole conductivities are strongly affected by the point-defect interactions, primarily the Coulombic repulsion between oxygen vacancies and/or electron holes and the vacancy association with Mg2+ cations. These factors can be analyzed by introducing the defect interaction energy in the concentration-dependent part of defect chemical potentials expressed by the discrete Fermi-Dirac distribution, and taking into account the probabilities of local configurations calculated via binomial distributions.
Distribution of physical activity facilities in Scotland by small area measures of deprivation and urbanicity

PubMed Central

2010-01-01

Background The aim of this study was to examine the distribution of physical activity facilities by area-level deprivation in Scotland, adjusting for differences in urbanicity, and exploring differences between and within the four largest Scottish cities. Methods We obtained a list of all recreational physical activity facilities in Scotland. These were mapped and assigned to datazones. Poisson and negative binomial regression models were used to investigate associations between the number of physical activity facilities relative to population size and quintile of area-level deprivation. Results The results showed that prior to adjustment for urbanicity, the density of all facilities lessened with increasing deprivation from quintiles 2 to 5. After adjustment for urbanicity and local authority, the effect of deprivation remained significant but the pattern altered, with datazones in quintile 3 having the highest estimated mean density of facilities. Within-city associations were identified between the number of physical activity facilities and area-level deprivation in Aberdeen and Dundee, but not in Edinburgh or Glasgow. Conclusions In conclusion, area-level deprivation appears to have a significant association with the density of physical activity facilities and although overall no clear pattern was observed, affluent areas had fewer publicly owned facilities than more deprived areas but a greater number of privately owned facilities. PMID:20955548
MSeq-CNV: accurate detection of Copy Number Variation from Sequencing of Multiple samples.

PubMed

Malekpour, Seyed Amir; Pezeshk, Hamid; Sadeghi, Mehdi

2018-03-05

Currently a few tools are capable of detecting genome-wide Copy Number Variations (CNVs) based on sequencing of multiple samples. Although aberrations in mate pair insertion sizes provide additional hints for the CNV detection based on multiple samples, the majority of the current tools rely only on the depth of coverage. Here, we propose a new algorithm (MSeq-CNV) which allows detecting common CNVs across multiple samples. MSeq-CNV applies a mixture density for modeling aberrations in depth of coverage and abnormalities in the mate pair insertion sizes. Each component in this mixture density applies a Binomial distribution for modeling the number of mate pairs with aberration in the insertion size and also a Poisson distribution for emitting the read counts, in each genomic position. MSeq-CNV is applied on simulated data and also on real data of six HapMap individuals with high-coverage sequencing, in 1000 Genomes Project. These individuals include a CEU trio of European ancestry and a YRI trio of Nigerian ethnicity. Ancestry of these individuals is studied by clustering the identified CNVs. MSeq-CNV is also applied for detecting CNVs in two samples with low-coverage sequencing in 1000 Genomes Project and six samples form the Simons Genome Diversity Project.
A Bayesian Framework for Reliability Analysis of Spacecraft Deployments

NASA Technical Reports Server (NTRS)

Evans, John W.; Gallo, Luis; Kaminsky, Mark

2012-01-01

Deployable subsystems are essential to mission success of most spacecraft. These subsystems enable critical functions including power, communications and thermal control. The loss of any of these functions will generally result in loss of the mission. These subsystems and their components often consist of unique designs and applications for which various standardized data sources are not applicable for estimating reliability and for assessing risks. In this study, a two stage sequential Bayesian framework for reliability estimation of spacecraft deployment was developed for this purpose. This process was then applied to the James Webb Space Telescope (JWST) Sunshield subsystem, a unique design intended for thermal control of the Optical Telescope Element. Initially, detailed studies of NASA deployment history, "heritage information", were conducted, extending over 45 years of spacecraft launches. This information was then coupled to a non-informative prior and a binomial likelihood function to create a posterior distribution for deployments of various subsystems uSing Monte Carlo Markov Chain sampling. Select distributions were then coupled to a subsequent analysis, using test data and anomaly occurrences on successive ground test deployments of scale model test articles of JWST hardware, to update the NASA heritage data. This allowed for a realistic prediction for the reliability of the complex Sunshield deployment, with credibility limits, within this two stage Bayesian framework.
Control charts for monitoring accumulating adverse event count frequencies from single and multiple blinded trials.

PubMed

Gould, A Lawrence

2016-12-30

Conventional practice monitors accumulating information about drug safety in terms of the numbers of adverse events reported from trials in a drug development program. Estimates of between-treatment adverse event risk differences can be obtained readily from unblinded trials with adjustment for differences among trials using conventional statistical methods. Recent regulatory guidelines require monitoring the cumulative frequency of adverse event reports to identify possible between-treatment adverse event risk differences without unblinding ongoing trials. Conventional statistical methods for assessing between-treatment adverse event risks cannot be applied when the trials are blinded. However, CUSUM charts can be used to monitor the accumulation of adverse event occurrences. CUSUM charts for monitoring adverse event occurrence in a Bayesian paradigm are based on assumptions about the process generating the adverse event counts in a trial as expressed by informative prior distributions. This article describes the construction of control charts for monitoring adverse event occurrence based on statistical models for the processes, characterizes their statistical properties, and describes how to construct useful prior distributions. Application of the approach to two adverse events of interest in a real trial gave nearly identical results for binomial and Poisson observed event count likelihoods. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Covering Resilience: A Recent Development for Binomial Checkpointing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Walther, Andrea; Narayanan, Sri Hari Krishna

In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, required, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algorithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massive parallel simulationsmore » and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We describe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding implementation and discuss first numerical results.« less
Gamma Oscillations of Spiking Neural Populations Enhance Signal Discrimination

PubMed Central

Masuda, Naoki; Doiron, Brent

2007-01-01

Selective attention is an important filter for complex environments where distractions compete with signals. Attention increases both the gamma-band power of cortical local field potentials and the spike-field coherence within the receptive field of an attended object. However, the mechanisms by which gamma-band activity enhances, if at all, the encoding of input signals are not well understood. We propose that gamma oscillations induce binomial-like spike-count statistics across noisy neural populations. Using simplified models of spiking neurons, we show how the discrimination of static signals based on the population spike-count response is improved with gamma induced binomial statistics. These results give an important mechanistic link between the neural correlates of attention and the discrimination tasks where attention is known to enhance performance. Further, they show how a rhythmicity of spike responses can enhance coding schemes that are not temporally sensitive. PMID:18052541
Application of a hurdle negative binomial count data model to demand for bass fishing in the southeastern United States.

PubMed

Bilgic, Abdulbaki; Florkowski, Wojciech J

2007-06-01

This paper identifies factors that influence the demand for a bass fishing trip taken in the southeastern United States using a hurdle negative binomial count data model. The probability of fishing for a bass is estimated in the first stage and the fishing trip frequency is estimated in the second stage for individuals reporting bass fishing trips in the Southeast. The applied approach allows the decomposition of the effects of factors responsible for the decision to take a trip and the trip number. Calculated partial and total elasticities indicate a highly inelastic demand for the number of fishing trips as trip costs increase. However, the demand can be expected to increase if anglers experience a success measured by the number of caught fish or their size. Benefit estimates based on alternative estimation methods differ substantially, suggesting the need for testing each modeling approach applied in empirical studies.
Joint multifractal analysis based on the partition function approach: analytical analysis, numerical simulation and empirical application

NASA Astrophysics Data System (ADS)

Xie, Wen-Jie; Jiang, Zhi-Qiang; Gu, Gao-Feng; Xiong, Xiong; Zhou, Wei-Xing

2015-10-01

Many complex systems generate multifractal time series which are long-range cross-correlated. Numerous methods have been proposed to characterize the multifractal nature of these long-range cross correlations. However, several important issues about these methods are not well understood and most methods consider only one moment order. We study the joint multifractal analysis based on partition function with two moment orders, which was initially invented to investigate fluid fields, and derive analytically several important properties. We apply the method numerically to binomial measures with multifractal cross correlations and bivariate fractional Brownian motions without multifractal cross correlations. For binomial multifractal measures, the explicit expressions of mass function, singularity strength and multifractal spectrum of the cross correlations are derived, which agree excellently with the numerical results. We also apply the method to stock market indexes and unveil intriguing multifractality in the cross correlations of index volatilities.
Spatiotemporal and random parameter panel data models of traffic crash fatalities in Vietnam.

PubMed

Truong, Long T; Kieu, Le-Minh; Vu, Tuan A

2016-09-01

This paper investigates factors associated with traffic crash fatalities in 63 provinces of Vietnam during the period from 2012 to 2014. Random effect negative binomial (RENB) and random parameter negative binomial (RPNB) panel data models are adopted to consider spatial heterogeneity across provinces. In addition, a spatiotemporal model with conditional autoregressive priors (ST-CAR) is utilised to account for spatiotemporal autocorrelation in the data. The statistical comparison indicates the ST-CAR model outperforms the RENB and RPNB models. Estimation results provide several significant findings. For example, traffic crash fatalities tend to be higher in provinces with greater numbers of level crossings. Passenger distance travelled and road lengths are also positively associated with fatalities. However, hospital densities are negatively associated with fatalities. The safety impact of the national highway 1A, the main transport corridor of the country, is also highlighted. Copyright © 2016 Elsevier Ltd. All rights reserved.

Temporary disaster debris management site identification using binomial cluster analysis and GIS.

PubMed

Grzeda, Stanislaw; Mazzuchi, Thomas A; Sarkani, Shahram

2014-04-01

An essential component of disaster planning and preparation is the identification and selection of temporary disaster debris management sites (DMS). However, since DMS identification is a complex process involving numerous variable constraints, many regional, county and municipal jurisdictions initiate this process during the post-disaster response and recovery phases, typically a period of severely stressed resources. Hence, a pre-disaster approach in identifying the most likely sites based on the number of locational constraints would significantly contribute to disaster debris management planning. As disasters vary in their nature, location and extent, an effective approach must facilitate scalability, flexibility and adaptability to variable local requirements, while also being generalisable to other regions and geographical extents. This study demonstrates the use of binomial cluster analysis in potential DMS identification in a case study conducted in Hamilton County, Indiana. © 2014 The Author(s). Disasters © Overseas Development Institute, 2014.
A comparative study of count models: application to pedestrian-vehicle crashes along Malaysia federal roads.

PubMed

Hosseinpour, Mehdi; Pour, Mehdi Hossein; Prasetijo, Joewono; Yahaya, Ahmad Shukri; Ghadiri, Seyed Mohammad Reza

2013-01-01

The objective of this study was to examine the effects of various roadway characteristics on the incidence of pedestrian-vehicle crashes by developing a set of crash prediction models on 543 km of Malaysia federal roads over a 4-year time span between 2007 and 2010. Four count models including the Poisson, negative binomial (NB), hurdle Poisson (HP), and hurdle negative binomial (HNB) models were developed and compared to model the number of pedestrian crashes. The results indicated the presence of overdispersion in the pedestrian crashes (PCs) and showed that it is due to excess zero rather than variability in the crash data. To handle the issue, the hurdle Poisson model was found to be the best model among the considered models in terms of comparative measures. Moreover, the variables average daily traffic, heavy vehicle traffic, speed limit, land use, and area type were significantly associated with PCs.
WHAMII - An enumeration and insertion procedure with binomial bounds for the stochastic time-constrained traveling salesman problem

NASA Technical Reports Server (NTRS)

Dahl, Roy W.; Keating, Karen; Salamone, Daryl J.; Levy, Laurence; Nag, Barindra; Sanborn, Joan A.

1987-01-01

This paper presents an algorithm (WHAMII) designed to solve the Artificial Intelligence Design Challenge at the 1987 AIAA Guidance, Navigation and Control Conference. The problem under consideration is a stochastic generalization of the traveling salesman problem in which travel costs can incur a penalty with a given probability. The variability in travel costs leads to a probability constraint with respect to violating the budget allocation. Given the small size of the problem (eleven cities), an approach is considered that combines partial tour enumeration with a heuristic city insertion procedure. For computational efficiency during both the enumeration and insertion procedures, precalculated binomial probabilities are used to determine an upper bound on the actual probability of violating the budget constraint for each tour. The actual probability is calculated for the final best tour, and additional insertions are attempted until the actual probability exceeds the bound.
Is “Hit and Run” a Single Word? The Processing of Irreversible Binomials in Neglect Dyslexia

PubMed Central

Arcara, Giorgio; Lacaita, Graziano; Mattaloni, Elisa; Passarini, Laura; Mondini, Sara; Benincà, Paola; Semenza, Carlo

2012-01-01

The present study is the first neuropsychological investigation into the problem of the mental representation and processing of irreversible binomials (IBs), i.e., word pairs linked by a conjunction (e.g., “hit and run,” “dead or alive”). In order to test their lexical status, the phenomenon of neglect dyslexia is explored. People with left-sided neglect dyslexia show a clear lexical effect: they can read IBs better (i.e., by dropping the leftmost words less frequently) when their components are presented in their correct order. This may be taken as an indication that they treat these constructions as lexical, not decomposable, elements. This finding therefore constitutes strong evidence that IBs tend to be stored in the mental lexicon as a whole and that this whole form is preferably addressed in the retrieval process. PMID:22347199
Network reliability maximization for stochastic-flow network subject to correlated failures using genetic algorithm and tabu\\xA0search

NASA Astrophysics Data System (ADS)

Yeh, Cheng-Ta; Lin, Yi-Kuei; Yang, Jo-Yun

2018-07-01

Network reliability is an important performance index for many real-life systems, such as electric power systems, computer systems and transportation systems. These systems can be modelled as stochastic-flow networks (SFNs) composed of arcs and nodes. Most system supervisors respect the network reliability maximization by finding the optimal multi-state resource assignment, which is one resource to each arc. However, a disaster may cause correlated failures for the assigned resources, affecting the network reliability. This article focuses on determining the optimal resource assignment with maximal network reliability for SFNs. To solve the problem, this study proposes a hybrid algorithm integrating the genetic algorithm and tabu search to determine the optimal assignment, called the hybrid GA-TS algorithm (HGTA), and integrates minimal paths, recursive sum of disjoint products and the correlated binomial distribution to calculate network reliability. Several practical numerical experiments are adopted to demonstrate that HGTA has better computational quality than several popular soft computing algorithms.
Error simulation of paired-comparison-based scaling methods

NASA Astrophysics Data System (ADS)

Cui, Chengwu

2000-12-01

Subjective image quality measurement usually resorts to psycho physical scaling. However, it is difficult to evaluate the inherent precision of these scaling methods. Without knowing the potential errors of the measurement, subsequent use of the data can be misleading. In this paper, the errors on scaled values derived form paired comparison based scaling methods are simulated with randomly introduced proportion of choice errors that follow the binomial distribution. Simulation results are given for various combinations of the number of stimuli and the sampling size. The errors are presented in the form of average standard deviation of the scaled values and can be fitted reasonably well with an empirical equation that can be sued for scaling error estimation and measurement design. The simulation proves paired comparison based scaling methods can have large errors on the derived scaled values when the sampling size and the number of stimuli are small. Examples are also given to show the potential errors on actually scaled values of color image prints as measured by the method of paired comparison.
Assessing Landscape Constraints on Species Abundance: Does the Neighborhood Limit Species Response to Local Habitat Conservation Programs?

PubMed Central

Jorgensen, Christopher F.; Powell, Larkin A.; Lusk, Jeffery J.; Bishop, Andrew A.; Fontaine, Joseph J.

2014-01-01

Landscapes in agricultural systems continue to undergo significant change, and the loss of biodiversity is an ever-increasing threat. Although habitat restoration is beneficial, management actions do not always result in the desired outcome. Managers must understand why management actions fail; yet, past studies have focused on assessing habitat attributes at a single spatial scale, and often fail to consider the importance of ecological mechanisms that act across spatial scales. We located survey sites across southern Nebraska, USA and conducted point counts to estimate Ring-necked Pheasant abundance, an economically important species to the region, while simultaneously quantifying landscape effects using a geographic information system. To identify suitable areas for allocating limited management resources, we assessed land cover relationships to our counts using a Bayesian binomial-Poisson hierarchical model to construct predictive Species Distribution Models of relative abundance. Our results indicated that landscape scale land cover variables severely constrained or, alternatively, facilitated the positive effects of local land management for Ring-necked Pheasants. PMID:24918779
[Multidimensional measurement of precarious employment: social distribution and its association with health in Catalonia (Spain)].

PubMed

Benach, Joan; Julià, Mireia; Tarafa, Gemma; Mir, Jordi; Molinero, Emilia; Vives, Alejandra

2015-01-01

To show the prevalence of precarious employment in Catalonia (Spain) for the first time and its association with mental and self-rated health, measured with a multidimensional scale. A cross-sectional study was conducted using data from the II Catalan Working Conditions Survey (2010) with a subsample of employed workers with a contract. The prevalence of precarious employment using a multidimensional scale and its association with health was calculated using multivariate log-binomial regression stratified by gender. The prevalence of precarious employment in Catalonia was high (42.6%). We found higher precariousness in women, youth, immigrants, and manual and less educated workers. There was a positive gradient in the association between precarious employment and poor health. Precarious employment is associated with poor health in the working population. Working conditions surveys should include questions on precarious employment and health indicators, which would allow monitoring and subsequent analyses of health inequalities. Copyright © 2015 SESPAS. Published by Elsevier Espana. All rights reserved.
Selection and quantification of infection endpoints for trials of vaccines against intestinal helminths

PubMed Central

Alexander, Neal; Cundill, Bonnie; Sabatelli, Lorenzo; Bethony, Jeffrey M.; Diemert, David; Hotez, Peter; Smith, Peter G.; Rodrigues, Laura C.; Brooker, Simon

2011-01-01

Vaccines against human helminths are being developed but the choice of optimal parasitological endpoints and effect measures to assess their efficacy has received little attention. Assuming negative binomial distributions for the parasite counts, we rank the statistical power of three measures of efficacy: ratio of mean parasite intensity at the end of the trial, the odds ratio of infection at the end of the trial, and the rate ratio of incidence of infection during the trial. We also use a modelling approach to estimate the likely impact of trial interventions on the force of infection, and hence statistical power. We conclude that (1) final mean parasite intensity is a suitable endpoint for later phase vaccine trials, and (2) mass effects of trial interventions are unlikely to appreciably reduce the force of infection in the community – and hence statistical power – unless there is a combination of high vaccine efficacy and a large proportion of the population enrolled. PMID:21435404
CAN'T MISS--conquer any number task by making important statistics simple. Part 1. Types of variables, mean, median, variance, and standard deviation.

PubMed

Hansen, John P

2003-01-01

Healthcare quality improvement professionals need to understand and use inferential statistics to interpret sample data from their organizations. In quality improvement and healthcare research studies all the data from a population often are not available, so investigators take samples and make inferences about the population by using inferential statistics. This three-part series will give readers an understanding of the concepts of inferential statistics as well as the specific tools for calculating confidence intervals for samples of data. This article, Part 1, presents basic information about data including a classification system that describes the four major types of variables: continuous quantitative variable, discrete quantitative variable, ordinal categorical variable (including the binomial variable), and nominal categorical variable. A histogram is a graph that displays the frequency distribution for a continuous variable. The article also demonstrates how to calculate the mean, median, standard deviation, and variance for a continuous variable.
Relaxed Poisson cure rate models.

PubMed

Rodrigues, Josemar; Cordeiro, Gauss M; Cancho, Vicente G; Balakrishnan, N

2016-03-01

The purpose of this article is to make the standard promotion cure rate model (Yakovlev and Tsodikov, ) more flexible by assuming that the number of lesions or altered cells after a treatment follows a fractional Poisson distribution (Laskin, ). It is proved that the well-known Mittag-Leffler relaxation function (Berberan-Santos, ) is a simple way to obtain a new cure rate model that is a compromise between the promotion and geometric cure rate models allowing for superdispersion. So, the relaxed cure rate model developed here can be considered as a natural and less restrictive extension of the popular Poisson cure rate model at the cost of an additional parameter, but a competitor to negative-binomial cure rate models (Rodrigues et al., ). Some mathematical properties of a proper relaxed Poisson density are explored. A simulation study and an illustration of the proposed cure rate model from the Bayesian point of view are finally presented. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Supply-side response to declining heroin purity: fentanyl overdose episode in New Jersey.

PubMed

Hempstead, Katherine; Yildirim, Emel O

2014-06-01

The inelastic price demand observations characteristic of illegal drug markets have led to the conclusion that the burden of a negative supply shock would be completely reflected to consumers. This paper argues that the increasing availability of prescription opioids may threaten heroin sellers' profit margin and force them to find alternative methods to compensate buyers in the event of a supply shock. We investigate the 2006 fentanyl overdose episode in New Jersey and argue that the introduction of non-pharmaceutical fentanyl, its spatial distribution, and the timing of overdose deaths may have been related to trends in heroin purity. Using medical examiner data, as well as data from the Drug Enforcement Administration, Office of Diversion Control on retail sales of prescription opioids in a negative binomial specification, we show that month-to-month fluctuations in heroin purity have a significant effect on fentanyl-related overdoses, particularly in those areas where prescription opioids are highly available. Copyright © 2013 John Wiley & Sons, Ltd.
Constituent quarks and systematic errors in mid-rapidity charged multiplicity dNch/dη distributions

NASA Astrophysics Data System (ADS)

Tannenbaum, M. J.

2018-01-01

Centrality definition in A + A collisions at colliders such as RHIC and LHC suffers from a correlated systematic uncertainty caused by the efficiency of detecting a p + p collision (50 ± 5% for PHENIX at RHIC). In A + A collisions where centrality is measured by the number of nucleon collisions, Ncoll, or the number of nucleon participants, Npart, or the number of constituent quark participants, Nqp, the error in the efficiency of the primary interaction trigger (Beam-Beam Counters) for a p + p collision leads to a correlated systematic uncertainty in Npart, Ncoll or Nqp which reduces binomially as the A + A collisions become more central. If this is not correctly accounted for in projections of A + A to p + p collisions, then mistaken conclusions can result. A recent example is presented in whether the mid-rapidity charged multiplicity per constituent quark participant (dNch/dη)/Nqp in Au + Au at RHIC was the same as the value in p + p collisions.
Assessing landscape constraints on species abundance: Does the neighborhood limit species response to local habitat conservation programs?

USGS Publications Warehouse

Jorgensen, Christopher F.; Powell, Larkin A.; Lusk, Jeffrey J.; Bishop, Andrew A.; Fontaine, Joseph J.

2014-01-01

Landscapes in agricultural systems continue to undergo significant change, and the loss of biodiversity is an ever-increasing threat. Although habitat restoration is beneficial, management actions do not always result in the desired outcome. Managers must understand why management actions fail; yet, past studies have focused on assessing habitat attributes at a single spatial scale, and often fail to consider the importance of ecological mechanisms that act across spatial scales. We located survey sites across southern Nebraska, USA and conducted point counts to estimate Ring-necked Pheasant abundance, an economically important species to the region, while simultaneously quantifying landscape effects using a geographic information system. To identify suitable areas for allocating limited management resources, we assessed land cover relationships to our counts using a Bayesian binomial-Poisson hierarchical model to construct predictive Species Distribution Models of relative abundance. Our results indicated that landscape scale land cover variables severely constrained or, alternatively, facilitated the positive effects of local land management for Ring-necked Pheasants.
Bernoulli, Darwin, and Sagan: the probability of life on other planets

NASA Astrophysics Data System (ADS)

Rossmo, D. Kim

2017-04-01

The recent discovery that billions of planets in the Milky Way Galaxy may be in circumstellar habitable zones has renewed speculation over the possibility of extraterrestrial life. The Drake equation is a probabilistic framework for estimating the number of technological advanced civilizations in our Galaxy; however, many of the equation's component probabilities are either unknown or have large error intervals. In this paper, a different method of examining this question is explored, one that replaces the various Drake factors with the single estimate for the probability of life existing on Earth. This relationship can be described by the binomial distribution if the presence of life on a given number of planets is equated to successes in a Bernoulli trial. The question of exoplanet life may then be reformulated as follows - given the probability of one or more independent successes for a given number of trials, what is the probability of two or more successes? Some of the implications of this approach for finding life on exoplanets are discussed.
[Effects of climate and grazing on the vegetation cover change in Xilinguole League of Inner Mongolia, North China].

PubMed

Wang, Hai-Mei; Li, Zheng-Hai; Wang, Zhen

2013-01-01

Based on the monthly temperature and precipitation data of 15 meteorological stations and the statistical data of livestock density in Xilinguole League in 1981-2007, and by using ArcGIS, this paper analyzed the spatial distribution of the climate aridity and livestock density in the League, and in combining with the ten-day data of the normalized difference vegetation index (NDVI) in 1981-2007, the driving factors of the vegetation cover change in the League were discussed. In the study period, there was a satisfactory linear regression relationship between the climate aridity and the vegetation coverage. The NDVI and the livestock density had a favorable binomial regression relationship. With the increase of NDVI, the livestock density increased first and decreased then. The vegetation coverage had a complex linear relationship with livestock density and climate aridity. The NDVI had a positive correlation with climate aridity, but a negative correlation with livestock density. Compared with livestock density, climate aridity had far greater effects on the NDVI.
Meta-analysis of diagnostic tests accounting for disease prevalence: a new model using trivariate copulas.

PubMed

Hoyer, A; Kuss, O

2015-05-20

In real life and somewhat contrary to biostatistical textbook knowledge, sensitivity and specificity (and not only predictive values) of diagnostic tests can vary with the underlying prevalence of disease. In meta-analysis of diagnostic studies, accounting for this fact naturally leads to a trivariate expansion of the traditional bivariate logistic regression model with random study effects. In this paper, a new model is proposed using trivariate copulas and beta-binomial marginal distributions for sensitivity, specificity, and prevalence as an expansion of the bivariate model. Two different copulas are used, the trivariate Gaussian copula and a trivariate vine copula based on the bivariate Plackett copula. This model has a closed-form likelihood, so standard software (e.g., SAS PROC NLMIXED) can be used. The results of a simulation study have shown that the copula models perform at least as good but frequently better than the standard model. The methods are illustrated by two examples. Copyright © 2015 John Wiley & Sons, Ltd.
"The Freak of Nature": On Erich Fromm's Vindication of Binomial Sexuality and the Potentials of the "Homosexual Deviation".

PubMed

Bauer, J Edgar

2017-01-01

As a Freudian revisionist and neo-Marxist, Erich Fromm (1900-1980) lessened the import of sexuality in the individual psyche but stressed the role played by the sex differential in the distribution of power throughout history and in the post-patriarchal form of matriarchy he envisioned. Seeking to reinforce the male/female divide and heteronormativity, Fromm outlined a "New Science of Man" that readily ignored not only the challenges posed to binary sexuality by post-Darwinian critical sexologies, but also the same-sex complexities evinced by key figures of his own cultural pantheon. Regardless of his declared pursuits, however, Fromm at times expressed insights suitable to undermine the cogency of his most cherished sexual convictions. As a tool for uncovering "indubitable commonsensical axioms" as sources of alienation, Fromm's conception of "idology" challenges his own sanction of sexual binarity and heterosexuality, thus facilitating an understanding of the individual's sexual difference as a unique modulation of male/female intermediariness.
Single- and multiple-pulse noncoherent detection statistics associated with partially developed speckle.

PubMed

Osche, G R

2000-08-20

Single- and multiple-pulse detection statistics are presented for aperture-averaged direct detection optical receivers operating against partially developed speckle fields. A partially developed speckle field arises when the probability density function of the received intensity does not follow negative exponential statistics. The case of interest here is the target surface that exhibits diffuse as well as specular components in the scattered radiation. An approximate expression is derived for the integrated intensity at the aperture, which leads to single- and multiple-pulse discrete probability density functions for the case of a Poisson signal in Poisson noise with an additive coherent component. In the absence of noise, the single-pulse discrete density function is shown to reduce to a generalized negative binomial distribution. The radar concept of integration loss is discussed in the context of direct detection optical systems where it is shown that, given an appropriate set of system parameters, multiple-pulse processing can be more efficient than single-pulse processing over a finite range of the integration parameter n.
Generalized binomial τ-leap method for biochemical kinetics incorporating both delay and intrinsic noise

NASA Astrophysics Data System (ADS)

Leier, André; Marquez-Lago, Tatiana T.; Burrage, Kevin

2008-05-01

The delay stochastic simulation algorithm (DSSA) by Barrio et al. [Plos Comput. Biol. 2, 117(E) (2006)] was developed to simulate delayed processes in cell biology in the presence of intrinsic noise, that is, when there are small-to-moderate numbers of certain key molecules present in a chemical reaction system. These delayed processes can faithfully represent complex interactions and mechanisms that imply a number of spatiotemporal processes often not explicitly modeled such as transcription and translation, basic in the modeling of cell signaling pathways. However, for systems with widely varying reaction rate constants or large numbers of molecules, the simulation time steps of both the stochastic simulation algorithm (SSA) and the DSSA can become very small causing considerable computational overheads. In order to overcome the limit of small step sizes, various τ-leap strategies have been suggested for improving computational performance of the SSA. In this paper, we present a binomial τ-DSSA method that extends the τ-leap idea to the delay setting and avoids drawing insufficient numbers of reactions, a common shortcoming of existing binomial τ-leap methods that becomes evident when dealing with complex chemical interactions. The resulting inaccuracies are most evident in the delayed case, even when considering reaction products as potential reactants within the same time step in which they are produced. Moreover, we extend the framework to account for multicellular systems with different degrees of intercellular communication. We apply these ideas to two important genetic regulatory models, namely, the hes1 gene, implicated as a molecular clock, and a Her1/Her 7 model for coupled oscillating cells.

Modeling left-turn crash occurrence at signalized intersections by conflicting patterns.

PubMed

Wang, Xuesong; Abdel-Aty, Mohamed

2008-01-01

In order to better understand the underlying crash mechanisms, left-turn crashes occurring at 197 four-legged signalized intersections over 6 years were classified into nine patterns based on vehicle maneuvers and then were assigned to intersection approaches. Crash frequency of each pattern was modeled at the approach level by mainly using Generalized Estimating Equations (GEE) with the Negative Binomial as the link function to account for the correlation among the crash data. GEE with a binomial logit link function was also applied for patterns with fewer crashes. The Cumulative Residuals test shows that, for correlated left-turn crashes, GEE models usually outperformed basic Negative Binomial models. The estimation results show that there are obvious differences in the factors that cause the occurrence of different left-turn collision patterns. For example, for each pattern, the traffic flows to which the colliding vehicles belong are identified to be significant. The width of the crossing distance (represented by the number of through lanes on the opposing approach of the left-turning traffic) is associated with more left-turn traffic colliding with opposing through traffic (Pattern 5), but with less left-turning traffic colliding with near-side crossing through traffic (Pattern 8). The safety effectiveness of the left-turning signal is not consistent for different crash patterns; "protected" phasing is correlated with fewer Pattern 5 crashes, but with more Pattern 8 crashes. The study indicates that in order to develop efficient countermeasures for left-turn crashes and improve safety at signalized intersections, left-turn crashes should be considered in different patterns.
Analyzing crash frequency in freeway tunnels: A correlated random parameters approach.

PubMed

Hou, Qinzhong; Tarko, Andrew P; Meng, Xianghai

2018-02-01

The majority of past road safety studies focused on open road segments while only a few focused on tunnels. Moreover, the past tunnel studies produced some inconsistent results about the safety effects of the traffic patterns, the tunnel design, and the pavement conditions. The effects of these conditions therefore remain unknown, especially for freeway tunnels in China. The study presented in this paper investigated the safety effects of these various factors utilizing a four-year period (2009-2012) of data as well as three models: 1) a random effects negative binomial model (RENB), 2) an uncorrelated random parameters negative binomial model (URPNB), and 3) a correlated random parameters negative binomial model (CRPNB). Of these three, the results showed that the CRPNB model provided better goodness-of-fit and offered more insights into the factors that contribute to tunnel safety. The CRPNB was not only able to allocate the part of the otherwise unobserved heterogeneity to the individual model parameters but also was able to estimate the cross-correlations between these parameters. Furthermore, the study results showed that traffic volume, tunnel length, proportion of heavy trucks, curvature, and pavement rutting were associated with higher frequencies of traffic crashes, while the distance to the tunnel wall, distance to the adjacent tunnel, distress ratio, International Roughness Index (IRI), and friction coefficient were associated with lower crash frequencies. In addition, the effects of the heterogeneity of the proportion of heavy trucks, the curvature, the rutting depth, and the friction coefficient were identified and their inter-correlations were analyzed. Copyright © 2017 Elsevier Ltd. All rights reserved.
S-SPatt: simple statistics for patterns on Markov chains.

PubMed

Nuel, Grégory

2005-07-01

S-SPatt allows the counting of patterns occurrences in text files and, assuming these texts are generated from a random Markovian source, the computation of the P-value of a given observation using a simple binomial approximation.
Evaluation of surrogate measures for pedestrian safety in various road and roadside environments.

DOT National Transportation Integrated Search

2012-10-01

This report presents an investigation of pedestrian conflicts and crash count models to learn which exposure measures and roadway or roadside characteristics significantly influence pedestrian safety at road crossings. Negative binomial models were e...
A review of statistical estimators for risk-adjusted length of stay: analysis of the Australian and new Zealand Intensive Care Adult Patient Data-Base, 2008-2009.

PubMed

Moran, John L; Solomon, Patricia J

2012-05-16

For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, a number of statistical estimators have been proposed as alternatives to the traditional ordinary least squares (OLS) regression with log dependent variable. Using a cohort of patients identified in the Australian and New Zealand Intensive Care Society Adult Patient Database, 2008-2009, 12 different methods were used for estimation of intensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis of firstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal and skew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models [GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial and inverse-Gaussian], extended estimating equations [EEE] and a finite mixture model including a gamma distribution. A fixed covariate list and ICU-site clustering with robust variance were utilised for model fitting with split-sample determination (80%) and validation (20%) data sets, and model simulation was undertaken to establish over-fitting (Copas test). Indices of model specification using Bayesian information criterion [BIC: lower values preferred] and residual analysis as well as predictive performance (R2, concordance correlation coefficient (CCC), mean absolute error [MAE]) were established for each estimator. The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8) years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%. ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated marked kurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from a maximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22 (LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superior residual behaviour was established for the log-scale estimators. There was a general tendency for over-prediction (negative residuals) and for over-fitting, the exception being the GLM negative binomial estimator. The mean-variance function was best approximated by a quadratic function, consistent with log-scale estimation; the link function was estimated (EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function. For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the most consistently performing estimator(s). Neither the GLM variants nor the skew-regression estimators dominated.
Determinants of the geographic distribution of Puumala virus and Lyme borreliosis infections in Belgium

PubMed Central

Linard, Catherine; Lamarque, Pénélope; Heyman, Paul; Ducoffre, Geneviève; Luyasu, Victor; Tersago, Katrien; Vanwambeke, Sophie O; Lambin, Eric F

2007-01-01

Background Vector-borne and zoonotic diseases generally display clear spatial patterns due to different space-dependent factors. Land cover and land use influence disease transmission by controlling both the spatial distribution of vectors or hosts, and the probability of contact with susceptible human populations. The objective of this study was to combine environmental and socio-economic factors to explain the spatial distribution of two emerging human diseases in Belgium, Puumala virus (PUUV) and Lyme borreliosis. Municipalities were taken as units of analysis. Results Negative binomial regressions including a correction for spatial endogeneity show that the spatial distribution of PUUV and Lyme borreliosis infections are associated with a combination of factors linked to the vector and host populations, to human behaviours, and to landscape attributes. Both diseases are associated with the presence of forests, which are the preferred habitat for vector or host populations. The PUUV infection risk is higher in remote forest areas, where the level of urbanisation is low, and among low-income populations. The Lyme borreliosis transmission risk is higher in mixed landscapes with forests and spatially dispersed houses, mostly in wealthy peri-urban areas. The spatial dependence resulting from a combination of endogenous and exogenous processes could be accounted for in the model on PUUV but not for Lyme borreliosis. Conclusion A large part of the spatial variation in disease risk can be explained by environmental and socio-economic factors. The two diseases not only are most prevalent in different regions but also affect different groups of people. Combining these two criteria may increase the efficiency of information campaigns through appropriate targeting. PMID:17474974
Distribution of the entomopathogenic nematodes from La Rioja (Northern Spain).

PubMed

Campos-Herrera, Raquel; Escuer, Miguel; Labrador, Sonia; Robertson, Lee; Barrios, Laura; Gutiérrez, Carmen

2007-06-01

Entomopathogenic nematodes (EPNs) distribution in natural areas and crop field edges in La Rioja (Northern Spain) has been studied taking into account environmental and physical-chemical soil factors. Five hundred soil samples from 100 sites of the most representative habitats were assayed for the presence of EPNs. The occurrence of EPNs statistically fitted to a negative binomial distribution, which pointed out that the natural distribution of these nematodes in La Rioja was in aggregates. There were no statistical differences (p < or = 0.05) in the abundance of EPNs to environmental and physical-chemical variables, although, there were statistical differences in the altitude, annual mean air temperature and rainfall, potential vegetation series and moisture percentage recovery frequency. Twenty-seven samples from 14 sites were positive for EPNs. From these samples, twenty isolates were identified to a species level and fifteen strains were selected: 11 Steinernema feltiae, two S. carpocapsae and two S. kraussei strains. S. kraussei was isolated from humid soils of cool and high altitude habitats and S. carpocapsae was found to occur in heavy soils of dry and temperate habitats. S. feltiae was the most common species with a wide range of altitude, temperature, rainfall, pH and soil moisture, although this species preferred sandy soils. The virulence of nematode strains were assessed using G. mellonella as insect host, recording the larval mortality percentage and the time to insect die, as well as the number of infective juveniles produced to evaluate the reproductive potential and the time tooks to leave the insect cadaver to determinate the infection cycle length. The ecological trends and biological results are discussed in relationship with their future use as biological control.
General Methods for Evolutionary Quantitative Genetic Inference from Generalized Mixed Models.

PubMed

de Villemereuil, Pierre; Schielzeth, Holger; Nakagawa, Shinichi; Morrissey, Michael

2016-11-01

Methods for inference and interpretation of evolutionary quantitative genetic parameters, and for prediction of the response to selection, are best developed for traits with normal distributions. Many traits of evolutionary interest, including many life history and behavioral traits, have inherently nonnormal distributions. The generalized linear mixed model (GLMM) framework has become a widely used tool for estimating quantitative genetic parameters for nonnormal traits. However, whereas GLMMs provide inference on a statistically convenient latent scale, it is often desirable to express quantitative genetic parameters on the scale upon which traits are measured. The parameters of fitted GLMMs, despite being on a latent scale, fully determine all quantities of potential interest on the scale on which traits are expressed. We provide expressions for deriving each of such quantities, including population means, phenotypic (co)variances, variance components including additive genetic (co)variances, and parameters such as heritability. We demonstrate that fixed effects have a strong impact on those parameters and show how to deal with this by averaging or integrating over fixed effects. The expressions require integration of quantities determined by the link function, over distributions of latent values. In general cases, the required integrals must be solved numerically, but efficient methods are available and we provide an implementation in an R package, QGglmm. We show that known formulas for quantities such as heritability of traits with binomial and Poisson distributions are special cases of our expressions. Additionally, we show how fitted GLMM can be incorporated into existing methods for predicting evolutionary trajectories. We demonstrate the accuracy of the resulting method for evolutionary prediction by simulation and apply our approach to data from a wild pedigreed vertebrate population. Copyright © 2016 de Villemereuil et al.
Estimation of Multinomial Probabilities.

DTIC Science & Technology

1978-11-01

1971) and Alam (1978) have shown that the maximum likelihood estimator is admissible with respect to the quadratic loss. Steinhaus (1957) and Trybula...appear). Johnson, B. Mck. (1971). On admissible estimators for certain fixed sample binomial populations. Ann. Math. Statist. 92, 1579-1587. Steinhaus , H
Various methods of determining the natural frequencies and damping of composite cantilever plates. 1. Exact solution for the binomial model of deformation

NASA Astrophysics Data System (ADS)

Skel'chik, V. S.; Ryabov, V. M.

1996-11-01

On the basis of the classical theory of thin anisotropic laminated plates the article analyzes the free vibrations of rectangular cantilever plates made of fibrous composites. The application of Kantorovich's method for the binomial representation of the shape of the elastic surface of a plate yielded for two unknown functions a system of two connected differential equations and the corresponding boundary conditions at the place of constraint and at the free edge. The exact solution for the frequencies and forms of the free vibrations was found with the use of Laplace transformation with respect to the space variable. The magnitudes of several first dimensionless frequencies of the bending and torsional vibrations of the plate were calculated for a wide range of change of two dimensionless complexes, with the dimensions of the plate and the anisotropy of the elastic properties of the material taken into account. The article shows that with torsional vibrations the warping constraint at the fixed end explains the apparent dependence of the shear modulus of the composite on the length of the specimen that had been discovered earlier on in experiments with a torsional pendulum. It examines the interaction and transformation of the second bending mode and of the first torsional mode of the vibrations. It analyzes the asymptotics of the dimensionless frequencies when the length of the plate is increased, and it shows that taking into account the bending-torsion interaction in strongly anisotropic materials type unidirectional carbon reinforced plastic can reduce substantially the frequencies of the bending vibrations but has no effect (within the framework of the binomial model) on the frequencies of the torsional vibrations.
How conservative is Fisher's exact test? A quantitative evaluation of the two-sample comparative binomial trial.

PubMed

Crans, Gerald G; Shuster, Jonathan J

2008-08-15

The debate as to which statistical methodology is most appropriate for the analysis of the two-sample comparative binomial trial has persisted for decades. Practitioners who favor the conditional methods of Fisher, Fisher's exact test (FET), claim that only experimental outcomes containing the same amount of information should be considered when performing analyses. Hence, the total number of successes should be fixed at its observed level in hypothetical repetitions of the experiment. Using conditional methods in clinical settings can pose interpretation difficulties, since results are derived using conditional sample spaces rather than the set of all possible outcomes. Perhaps more importantly from a clinical trial design perspective, this test can be too conservative, resulting in greater resource requirements and more subjects exposed to an experimental treatment. The actual significance level attained by FET (the size of the test) has not been reported in the statistical literature. Berger (J. R. Statist. Soc. D (The Statistician) 2001; 50:79-85) proposed assessing the conservativeness of conditional methods using p-value confidence intervals. In this paper we develop a numerical algorithm that calculates the size of FET for sample sizes, n, up to 125 per group at the two-sided significance level, alpha = 0.05. Additionally, this numerical method is used to define new significance levels alpha(*) = alpha+epsilon, where epsilon is a small positive number, for each n, such that the size of the test is as close as possible to the pre-specified alpha (0.05 for the current work) without exceeding it. Lastly, a sample size and power calculation example are presented, which demonstrates the statistical advantages of implementing the adjustment to FET (using alpha(*) instead of alpha) in the two-sample comparative binomial trial. 2008 John Wiley & Sons, Ltd
Zero-state Markov switching count-data models: an empirical assessment.

PubMed

Malyshkina, Nataliya V; Mannering, Fred L

2010-01-01

In this study, a two-state Markov switching count-data model is proposed as an alternative to zero-inflated models to account for the preponderance of zeros sometimes observed in transportation count data, such as the number of accidents occurring on a roadway segment over some period of time. For this accident-frequency case, zero-inflated models assume the existence of two states: one of the states is a zero-accident count state, which has accident probabilities that are so low that they cannot be statistically distinguished from zero, and the other state is a normal-count state, in which counts can be non-negative integers that are generated by some counting process, for example, a Poisson or negative binomial. While zero-inflated models have come under some criticism with regard to accident-frequency applications - one fact is undeniable - in many applications they provide a statistically superior fit to the data. The Markov switching approach we propose seeks to overcome some of the criticism associated with the zero-accident state of the zero-inflated model by allowing individual roadway segments to switch between zero and normal-count states over time. An important advantage of this Markov switching approach is that it allows for the direct statistical estimation of the specific roadway-segment state (i.e., zero-accident or normal-count state) whereas traditional zero-inflated models do not. To demonstrate the applicability of this approach, a two-state Markov switching negative binomial model (estimated with Bayesian inference) and standard zero-inflated negative binomial models are estimated using five-year accident frequencies on Indiana interstate highway segments. It is shown that the Markov switching model is a viable alternative and results in a superior statistical fit relative to the zero-inflated models.
Neighborhood educational disparities in active commuting among women: the effect of distance between the place of residence and the place of work/study (an ACTI-Cités study).

PubMed

Perchoux, Camille; Nazare, Julie-Anne; Benmarhnia, Tarik; Salze, Paul; Feuillet, Thierry; Hercberg, Serge; Hess, Franck; Menai, Mehdi; Weber, Christiane; Charreire, Hélène; Enaux, Christophe; Oppert, Jean-Michel; Simon, Chantal

2017-06-12

Active transportation has been associated with favorable health outcomes. Previous research highlighted the influence of neighborhood educational level on active transportation. However, little is known regarding the effect of commuting distance on social disparities in active commuting. In this regard, women have been poorly studied. The objective of this paper was to evaluate the relationship between neighborhood educational level and active commuting, and to assess whether the commuting distance modifies this relationship in adult women. This cross-sectional study is based on a subsample of women from the Nutrinet-Santé web-cohort (N = 1169). Binomial, log-binomial and negative binomial regressions were used to assess the associations between neighborhood education level and (i) the likelihood of reporting any active commuting time, and (ii) the share of commuting time made by active transportation modes. Potential effect measure modification of distance to work on the previous associations was assessed both on the additive and the multiplicative scales. Neighborhood education level was positively associated with the probability of reporting any active commuting time (relative risk = 1.774; p < 0.05) and the share of commuting time spent active (relative risk = 1.423; p < 0.05). The impact of neighborhood education was greater at long distances to work for both outcomes. Our results suggest that neighborhood educational disparities in active commuting tend to increase with commuting distance among women. Further research is needed to provide geographically driven guidance for health promotion intervention aiming at reducing disparities in active transportation among socioeconomic groups.
Comparative assessment of analytical approaches to quantify the risk for introduction of rare animal diseases: the example of avian influenza in Spain.

PubMed

Sánchez-Vizcaíno, Fernando; Perez, Andrés; Martínez-López, Beatriz; Sánchez-Vizcaíno, José Manuel

2012-08-01

Trade of animals and animal products imposes an uncertain and variable risk for exotic animal diseases introduction into importing countries. Risk analysis provides importing countries with an objective, transparent, and internationally accepted method for assessing that risk. Over the last decades, European Union countries have conducted probabilistic risk assessments quite frequently to quantify the risk for rare animal diseases introduction into their territories. Most probabilistic animal health risk assessments have been typically classified into one-level and multilevel binomial models. One-level models are more simple than multilevel models because they assume that animals or products originate from one single population. However, it is unknown whether such simplification may result in substantially different results compared to those obtained through the use of multilevel models. Here, data used on a probabilistic multilevel binomial model formulated to assess the risk for highly pathogenic avian influenza introduction into Spain were reanalyzed using a one-level binomial model and their outcomes were compared. An alternative ordinal model is also proposed here, which makes use of simpler assumptions and less information compared to those required by traditional one-level and multilevel approaches. Results suggest that, at least under certain circumstances, results of the one-level and ordinal approaches are similar to those obtained using multilevel models. Consequently, we argue that, when data are insufficient to run traditional probabilistic models, the ordinal approach presented here may be a suitable alternative to rank exporting countries in terms of the risk that they impose for the spread of rare animal diseases into disease-free countries. © 2012 Society for Risk Analysis.
Regular exercise and related factors in patients with Parkinson's disease: Applying zero-inflated negative binomial modeling of exercise count data.

PubMed

Lee, JuHee; Park, Chang Gi; Choi, Moonki

2016-05-01

This study was conducted to identify risk factors that influence regular exercise among patients with Parkinson's disease in Korea. Parkinson's disease is prevalent in the elderly, and may lead to a sedentary lifestyle. Exercise can enhance physical and psychological health. However, patients with Parkinson's disease are less likely to exercise than are other populations due to physical disability. A secondary data analysis and cross-sectional descriptive study were conducted. A convenience sample of 106 patients with Parkinson's disease was recruited at an outpatient neurology clinic of a tertiary hospital in Korea. Demographic characteristics, disease-related characteristics (including disease duration and motor symptoms), self-efficacy for exercise, balance, and exercise level were investigated. Negative binomial regression and zero-inflated negative binomial regression for exercise count data were utilized to determine factors involved in exercise. The mean age of participants was 65.85 ± 8.77 years, and the mean duration of Parkinson's disease was 7.23 ± 6.02 years. Most participants indicated that they engaged in regular exercise (80.19%). Approximately half of participants exercised at least 5 days per week for 30 min, as recommended (51.9%). Motor symptoms were a significant predictor of exercise in the count model, and self-efficacy for exercise was a significant predictor of exercise in the zero model. Severity of motor symptoms was related to frequency of exercise. Self-efficacy contributed to the probability of exercise. Symptom management and improvement of self-efficacy for exercise are important to encourage regular exercise in patients with Parkinson's disease. Copyright © 2015 Elsevier Inc. All rights reserved.
An algorithm for computing moments-based flood quantile estimates when historical flood information is available

USGS Publications Warehouse

Cohn, T.A.; Lane, W.L.; Baier, W.G.

1997-01-01

This paper presents the expected moments algorithm (EMA), a simple and efficient method for incorporating historical and paleoflood information into flood frequency studies. EMA can utilize three types of at-site flood information: systematic stream gage record; information about the magnitude of historical floods; and knowledge of the number of years in the historical period when no large flood occurred. EMA employs an iterative procedure to compute method-of-moments parameter estimates. Initial parameter estimates are calculated from systematic stream gage data. These moments are then updated by including the measured historical peaks and the expected moments, given the previously estimated parameters, of the below-threshold floods from the historical period. The updated moments result in new parameter estimates, and the last two steps are repeated until the algorithm converges. Monte Carlo simulations compare EMA, Bulletin 17B's [United States Water Resources Council, 1982] historically weighted moments adjustment, and maximum likelihood estimators when fitting the three parameters of the log-Pearson type III distribution. These simulations demonstrate that EMA is more efficient than the Bulletin 17B method, and that it is nearly as efficient as maximum likelihood estimation (MLE). The experiments also suggest that EMA has two advantages over MLE when dealing with the log-Pearson type III distribution: It appears that EMA estimates always exist and that they are unique, although neither result has been proven. EMA can be used with binomial or interval-censored data and with any distributional family amenable to method-of-moments estimation.
Waterborne outbreak of gastroenteritis: effects on sick leaves and cost of lost workdays.

PubMed

Halonen, Jaana I; Kivimäki, Mika; Oksanen, Tuula; Virtanen, Pekka; Virtanen, Mikko J; Pentti, Jaana; Vahtera, Jussi

2012-01-01

In 2007, part of a drinking water distribution system was accidentally contaminated with waste water effluent causing a gastroenteritis outbreak in a Finnish town. We examined the acute and cumulative effects of this incidence on sick leaves among public sector employees residing in the clean and contaminated areas, and the additional costs of lost workdays due to the incidence. Daily information on sick leaves of 1789 Finnish Public Sector Study participants was obtained from employers' registers. Global Positioning System-coordinates were used for linking participants to the clean and contaminated areas. Prevalence ratios (PR) for weekly sickness absences were calculated using binomial regression analysis. Calculations for the costs were based on prior studies. Among those living in the contaminated areas, the prevalence of participants on sick leave was 3.54 (95% confidence interval (CI) 2.97-4.22) times higher on the week following the incidence compared to the reference period. Those living and working in the clean area were basically not affected, the corresponding PR for sick leaves was 1.12, 95% CI 0.73-1.73. No cumulative effects on sick leaves were observed among the exposed. The estimated additional costs of lost workdays due to the incidence were 1.8-2.1 million euros. The prevalence of sickness absences among public sector employees residing in affected areas increased shortly after drinking water distribution system was contaminated, but no long-term effects were observed. The estimated costs of lost workdays were remarkable, thus, the cost-benefits of better monitoring systems for the water distribution systems should be evaluated.
Inference of R 0 and Transmission Heterogeneity from the Size Distribution of Stuttering Chains

PubMed Central

Blumberg, Seth; Lloyd-Smith, James O.

2013-01-01

For many infectious disease processes such as emerging zoonoses and vaccine-preventable diseases, and infections occur as self-limited stuttering transmission chains. A mechanistic understanding of transmission is essential for characterizing the risk of emerging diseases and monitoring spatio-temporal dynamics. Thus methods for inferring and the degree of heterogeneity in transmission from stuttering chain data have important applications in disease surveillance and management. Previous researchers have used chain size distributions to infer , but estimation of the degree of individual-level variation in infectiousness (as quantified by the dispersion parameter, ) has typically required contact tracing data. Utilizing branching process theory along with a negative binomial offspring distribution, we demonstrate how maximum likelihood estimation can be applied to chain size data to infer both and the dispersion parameter that characterizes heterogeneity. While the maximum likelihood value for is a simple function of the average chain size, the associated confidence intervals are dependent on the inferred degree of transmission heterogeneity. As demonstrated for monkeypox data from the Democratic Republic of Congo, this impacts when a statistically significant change in is detectable. In addition, by allowing for superspreading events, inference of shifts the threshold above which a transmission chain should be considered anomalously large for a given value of (thus reducing the probability of false alarms about pathogen adaptation). Our analysis of monkeypox also clarifies the various ways that imperfect observation can impact inference of transmission parameters, and highlights the need to quantitatively evaluate whether observation is likely to significantly bias results. PMID:23658504
Ecology of nonnative Siberian prawn (Palaemon modestus) in the lower Snake River, Washington, USA

USGS Publications Warehouse

Erhardt, John M.; Tiffan, Kenneth F.

2016-01-01

We assessed the abundance, distribution, and ecology of the nonnative Siberian prawn Palaemon modestus in the lower Snake River, Washington, USA. Analysis of prawn passage abundance at three Snake River dams showed that populations are growing at exponential rates, especially at Little Goose Dam where over 464,000 prawns were collected in 2015. Monthly beam trawling during 2011–2013 provided information on prawn abundance and distribution in Lower Granite and Little Goose Reservoirs. Zero-inflated regression predicted that the probability of prawn presence increased with decreasing water velocity and increasing depth. Negative binomial models predicted higher catch rates of prawns in deeper water and in closer proximity to dams. Temporally, prawn densities decreased slightly in the summer, likely due to the mortality of older individuals, and then increased in autumn and winter with the emergence and recruitment of young of the year. Seasonal length frequencies showed that distinct juvenile and adult size classes exist throughout the year, suggesting prawns live from 1 to 2 years and may be able to reproduce multiple times during their life. Most juvenile prawns become reproductive adults in 1 year, and peak reproduction occurs from late July through October. Mean fecundity (189 eggs) and reproductive output (11.9 %) are similar to that in their native range. The current use of deep habitats by prawns likely makes them unavailable to most predators in the reservoirs. The distribution and role of Siberian prawns in the lower Snake River food web will probably continue to change as the population grows and warrants continued monitoring and investigation.
An algorithm for computing moments-based flood quantile estimates when historical flood information is available

NASA Astrophysics Data System (ADS)

Cohn, T. A.; Lane, W. L.; Baier, W. G.

This paper presents the expected moments algorithm (EMA), a simple and efficient method for incorporating historical and paleoflood information into flood frequency studies. EMA can utilize three types of at-site flood information: systematic stream gage record; information about the magnitude of historical floods; and knowledge of the number of years in the historical period when no large flood occurred. EMA employs an iterative procedure to compute method-of-moments parameter estimates. Initial parameter estimates are calculated from systematic stream gage data. These moments are then updated by including the measured historical peaks and the expected moments, given the previously estimated parameters, of the below-threshold floods from the historical period. The updated moments result in new parameter estimates, and the last two steps are repeated until the algorithm converges. Monte Carlo simulations compare EMA, Bulletin 17B's [United States Water Resources Council, 1982] historically weighted moments adjustment, and maximum likelihood estimators when fitting the three parameters of the log-Pearson type III distribution. These simulations demonstrate that EMA is more efficient than the Bulletin 17B method, and that it is nearly as efficient as maximum likelihood estimation (MLE). The experiments also suggest that EMA has two advantages over MLE when dealing with the log-Pearson type III distribution: It appears that EMA estimates always exist and that they are unique, although neither result has been proven. EMA can be used with binomial or interval-censored data and with any distributional family amenable to method-of-moments estimation.

Introducing Perception and Modelling of Spatial Randomness in Classroom

ERIC Educational Resources Information Center

De Nóbrega, José Renato

2017-01-01

A strategy to facilitate understanding of spatial randomness is described, using student activities developed in sequence: looking at spatial patterns, simulating approximate spatial randomness using a grid of equally-likely squares, using binomial probabilities for approximations and predictions and then comparing with given Poisson…
Extending the Binomial Checkpointing Technique for Resilience

DOE Office of Scientific and Technical Information (OSTI.GOV)

Walther, Andrea; Narayanan, Sri Hari Krishna

In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, re- quired, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algo- rithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massivemore » parallel simulations and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We de- scribe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding imple- mentation and discuss numerical results.« less
Multifractal Cross Wavelet Analysis

NASA Astrophysics Data System (ADS)

Jiang, Zhi-Qiang; Gao, Xing-Lu; Zhou, Wei-Xing; Stanley, H. Eugene

Complex systems are composed of mutually interacting components and the output values of these components usually exhibit long-range cross-correlations. Using wavelet analysis, we propose a method of characterizing the joint multifractal nature of these long-range cross correlations, a method we call multifractal cross wavelet analysis (MFXWT). We assess the performance of the MFXWT method by performing extensive numerical experiments on the dual binomial measures with multifractal cross correlations and the bivariate fractional Brownian motions (bFBMs) with monofractal cross correlations. For binomial multifractal measures, we find the empirical joint multifractality of MFXWT to be in approximate agreement with the theoretical formula. For bFBMs, MFXWT may provide spurious multifractality because of the wide spanning range of the multifractal spectrum. We also apply the MFXWT method to stock market indices, and in pairs of index returns and volatilities we find an intriguing joint multifractal behavior. The tests on surrogate series also reveal that the cross correlation behavior, particularly the cross correlation with zero lag, is the main origin of cross multifractality.
Ultrasound: a subexploited tool for sample preparation in metabolomics.

PubMed

Luque de Castro, M D; Delgado-Povedano, M M

2014-01-02

Metabolomics, one of the most recently emerged "omics", has taken advantage of ultrasound (US) to improve sample preparation (SP) steps. The metabolomics-US assisted SP step binomial has experienced a dissimilar development that has depended on the area (vegetal or animal) and the SP step. Thus, vegetal metabolomics and US assisted leaching has received the greater attention (encompassing subdisciplines such as metallomics, xenometabolomics and, mainly, lipidomics), but also liquid-liquid extraction and (bio)chemical reactions in metabolomics have taken advantage of US energy. Also clinical and animal samples have benefited from US assisted SP in metabolomics studies but in a lesser extension. The main effects of US have been shortening of the time required for the given step, and/or increase of its efficiency or availability for automation; nevertheless, attention paid to potential degradation caused by US has been scant or nil. Achievements and weak points of the metabolomics-US assisted SP step binomial are discussed and possible solutions to the present shortcomings are exposed. Copyright © 2013 Elsevier B.V. All rights reserved.
Type I error probability spending for post-market drug and vaccine safety surveillance with binomial data.

PubMed

Silva, Ivair R

2018-01-15

Type I error probability spending functions are commonly used for designing sequential analysis of binomial data in clinical trials, but it is also quickly emerging for near-continuous sequential analysis of post-market drug and vaccine safety surveillance. It is well known that, for clinical trials, when the null hypothesis is not rejected, it is still important to minimize the sample size. Unlike in post-market drug and vaccine safety surveillance, that is not important. In post-market safety surveillance, specially when the surveillance involves identification of potential signals, the meaningful statistical performance measure to be minimized is the expected sample size when the null hypothesis is rejected. The present paper shows that, instead of the convex Type I error spending shape conventionally used in clinical trials, a concave shape is more indicated for post-market drug and vaccine safety surveillance. This is shown for both, continuous and group sequential analysis. Copyright © 2017 John Wiley & Sons, Ltd.
Environmental factors prevail over dispersal constraints in determining the distribution and assembly of Trichoptera species in mountain lakes.

PubMed

de Mendoza, Guillermo; Ventura, Marc; Catalan, Jordi

2015-07-01

Aiming to elucidate whether large-scale dispersal factors or environmental species sorting prevail in determining patterns of Trichoptera species composition in mountain lakes, we analyzed the distribution and assembly of the most common Trichoptera (Plectrocnemia laetabilis, Polycentropus flavomaculatus, Drusus rectus, Annitella pyrenaea, and Mystacides azurea) in the mountain lakes of the Pyrenees (Spain, France, Andorra) based on a survey of 82 lakes covering the geographical and environmental extremes of the lake district. Spatial autocorrelation in species composition was determined using Moran's eigenvector maps (MEM). Redundancy analysis (RDA) was applied to explore the influence of MEM variables and in-lake, and catchment environmental variables on Trichoptera assemblages. Variance partitioning analysis (partial RDA) revealed the fraction of species composition variation that could be attributed uniquely to either environmental variability or MEM variables. Finally, the distribution of individual species was analyzed in relation to specific environmental factors using binomial generalized linear models (GLM). Trichoptera assemblages showed spatial structure. However, the most relevant environmental variables in the RDA (i.e., temperature and woody vegetation in-lake catchments) were also related with spatial variables (i.e., altitude and longitude). Partial RDA revealed that the fraction of variation in species composition that was uniquely explained by environmental variability was larger than that uniquely explained by MEM variables. GLM results showed that the distribution of species with longitudinal bias is related to specific environmental factors with geographical trend. The environmental dependence found agrees with the particular traits of each species. We conclude that Trichoptera species distribution and composition in the lakes of the Pyrenees are governed predominantly by local environmental factors, rather than by dispersal constraints. For boreal lakes, with similar environmental conditions, a strong role of dispersal capacity has been suggested. Further investigation should address the role of spatial scaling, namely absolute geographical distances constraining dispersal and steepness of environmental gradients at short distances.
Environmental factors prevail over dispersal constraints in determining the distribution and assembly of Trichoptera species in mountain lakes

PubMed Central

de Mendoza, Guillermo; Ventura, Marc; Catalan, Jordi

2015-01-01

Aiming to elucidate whether large-scale dispersal factors or environmental species sorting prevail in determining patterns of Trichoptera species composition in mountain lakes, we analyzed the distribution and assembly of the most common Trichoptera (Plectrocnemia laetabilis, Polycentropus flavomaculatus, Drusus rectus, Annitella pyrenaea, and Mystacides azurea) in the mountain lakes of the Pyrenees (Spain, France, Andorra) based on a survey of 82 lakes covering the geographical and environmental extremes of the lake district. Spatial autocorrelation in species composition was determined using Moran’s eigenvector maps (MEM). Redundancy analysis (RDA) was applied to explore the influence of MEM variables and in-lake, and catchment environmental variables on Trichoptera assemblages. Variance partitioning analysis (partial RDA) revealed the fraction of species composition variation that could be attributed uniquely to either environmental variability or MEM variables. Finally, the distribution of individual species was analyzed in relation to specific environmental factors using binomial generalized linear models (GLM). Trichoptera assemblages showed spatial structure. However, the most relevant environmental variables in the RDA (i.e., temperature and woody vegetation in-lake catchments) were also related with spatial variables (i.e., altitude and longitude). Partial RDA revealed that the fraction of variation in species composition that was uniquely explained by environmental variability was larger than that uniquely explained by MEM variables. GLM results showed that the distribution of species with longitudinal bias is related to specific environmental factors with geographical trend. The environmental dependence found agrees with the particular traits of each species. We conclude that Trichoptera species distribution and composition in the lakes of the Pyrenees are governed predominantly by local environmental factors, rather than by dispersal constraints. For boreal lakes, with similar environmental conditions, a strong role of dispersal capacity has been suggested. Further investigation should address the role of spatial scaling, namely absolute geographical distances constraining dispersal and steepness of environmental gradients at short distances. PMID:26257867
Visibility graph analysis on quarterly macroeconomic series of China based on complex network theory

NASA Astrophysics Data System (ADS)

Wang, Na; Li, Dong; Wang, Qiwen

2012-12-01

The visibility graph approach and complex network theory provide a new insight into time series analysis. The inheritance of the visibility graph from the original time series was further explored in the paper. We found that degree distributions of visibility graphs extracted from Pseudo Brownian Motion series obtained by the Frequency Domain algorithm exhibit exponential behaviors, in which the exponential exponent is a binomial function of the Hurst index inherited in the time series. Our simulations presented that the quantitative relations between the Hurst indexes and the exponents of degree distribution function are different for different series and the visibility graph inherits some important features of the original time series. Further, we convert some quarterly macroeconomic series including the growth rates of value-added of three industry series and the growth rates of Gross Domestic Product series of China to graphs by the visibility algorithm and explore the topological properties of graphs associated from the four macroeconomic series, namely, the degree distribution and correlations, the clustering coefficient, the average path length, and community structure. Based on complex network analysis we find degree distributions of associated networks from the growth rates of value-added of three industry series are almost exponential and the degree distributions of associated networks from the growth rates of GDP series are scale free. We also discussed the assortativity and disassortativity of the four associated networks as they are related to the evolutionary process of the original macroeconomic series. All the constructed networks have “small-world” features. The community structures of associated networks suggest dynamic changes of the original macroeconomic series. We also detected the relationship among government policy changes, community structures of associated networks and macroeconomic dynamics. We find great influences of government policies in China on the changes of dynamics of GDP and the three industries adjustment. The work in our paper provides a new way to understand the dynamics of economic development.
Multilevel discretized random field models with 'spin' correlations for the simulation of environmental spatial data

NASA Astrophysics Data System (ADS)

Žukovič, Milan; Hristopulos, Dionissios T.

2009-02-01

A current problem of practical significance is how to analyze large, spatially distributed, environmental data sets. The problem is more challenging for variables that follow non-Gaussian distributions. We show by means of numerical simulations that the spatial correlations between variables can be captured by interactions between 'spins'. The spins represent multilevel discretizations of environmental variables with respect to a number of pre-defined thresholds. The spatial dependence between the 'spins' is imposed by means of short-range interactions. We present two approaches, inspired by the Ising and Potts models, that generate conditional simulations of spatially distributed variables from samples with missing data. Currently, the sampling and simulation points are assumed to be at the nodes of a regular grid. The conditional simulations of the 'spin system' are forced to respect locally the sample values and the system statistics globally. The second constraint is enforced by minimizing a cost function representing the deviation between normalized correlation energies of the simulated and the sample distributions. In the approach based on the Nc-state Potts model, each point is assigned to one of Nc classes. The interactions involve all the points simultaneously. In the Ising model approach, a sequential simulation scheme is used: the discretization at each simulation level is binomial (i.e., ± 1). Information propagates from lower to higher levels as the simulation proceeds. We compare the two approaches in terms of their ability to reproduce the target statistics (e.g., the histogram and the variogram of the sample distribution), to predict data at unsampled locations, as well as in terms of their computational complexity. The comparison is based on a non-Gaussian data set (derived from a digital elevation model of the Walker Lake area, Nevada, USA). We discuss the impact of relevant simulation parameters, such as the domain size, the number of discretization levels, and the initial conditions.
Distribution patterns of wintering sea ducks in relation to the North Atlantic Oscillation and local environmental characteristics

USGS Publications Warehouse

Zipkin, Elise F.; Gardner, Beth; Gilbert, Andrew T.; O'Connell, Allan F.; Royle, J. Andrew; Silverman, Emily D.

2010-01-01

Twelve species of North American sea ducks (Tribe Mergini) winter off the eastern coast of the United States and Canada. Yet, despite their seasonal proximity to urbanized areas in this region, there is limited information on patterns of wintering sea duck habitat use. It is difficult to gather information on sea ducks because of the relative inaccessibility of their offshore locations, their high degree of mobility, and their aggregated distributions. To characterize environmental conditions that affect wintering distributions, as well as their geographic ranges, we analyzed count data on five species of sea ducks (black scoters Melanitta nigra americana, surf scoters M. perspicillata, white-winged scoters M. fusca, common eiders Somateria mollissima, and long-tailed ducks Clangula hyemalis) that were collected during the Atlantic Flyway Sea Duck Survey for ten years starting in the early 1990s. We modeled count data for each species within ten-nautical-mile linear survey segments using a zero-inflated negative binomial model that included four local-scale habitat covariates (sea surface temperature, mean bottom depth, maximum bottom slope, and a variable to indicate if the segment was in a bay or not), one broad-scale covariate (the North Atlantic Oscillation), and a temporal correlation component. Our results indicate that species distributions have strong latitudinal gradients and consistency in local habitat use. The North Atlantic Oscillation was the only environmental covariate that had a significant (but variable) effect on the expected count for all five species, suggesting that broad-scale climatic conditions may be directly or indirectly important to the distributions of wintering sea ducks. Our results provide critical information on species-habitat associations, elucidate the complicated relationship between the North Atlantic Oscillation, sea surface temperature, and local sea duck abundances, and should be useful in assessing the impacts of climate change on seabirds.
Distribution patterns of wintering sea ducks in relation to the North Atlantic Oscillation and local environmental characteristics.

PubMed

Zipkin, Elise F; Gardner, Beth; Gilbert, Andrew T; O'Connell, Allan F; Royle, J Andrew; Silverman, Emily D

2010-08-01

Twelve species of North American sea ducks (Tribe Mergini) winter off the eastern coast of the United States and Canada. Yet, despite their seasonal proximity to urbanized areas in this region, there is limited information on patterns of wintering sea duck habitat use. It is difficult to gather information on sea ducks because of the relative inaccessibility of their offshore locations, their high degree of mobility, and their aggregated distributions. To characterize environmental conditions that affect wintering distributions, as well as their geographic ranges, we analyzed count data on five species of sea ducks (black scoters Melanitta nigra americana, surf scoters M. perspicillata, white-winged scoters M. fusca, common eiders Somateria mollissima, and long-tailed ducks Clangula hyemalis) that were collected during the Atlantic Flyway Sea Duck Survey for ten years starting in the early 1990s. We modeled count data for each species within ten-nautical-mile linear survey segments using a zero-inflated negative binomial model that included four local-scale habitat covariates (sea surface temperature, mean bottom depth, maximum bottom slope, and a variable to indicate if the segment was in a bay or not), one broad-scale covariate (the North Atlantic Oscillation), and a temporal correlation component. Our results indicate that species distributions have strong latitudinal gradients and consistency in local habitat use. The North Atlantic Oscillation was the only environmental covariate that had a significant (but variable) effect on the expected count for all five species, suggesting that broad-scale climatic conditions may be directly or indirectly important to the distributions of wintering sea ducks. Our results provide critical information on species-habitat associations, elucidate the complicated relationship between the North Atlantic Oscillation, sea surface temperature, and local sea duck abundances, and should be useful in assessing the impacts of climate change on seabirds.
Acceptance sampling for attributes via hypothesis testing and the hypergeometric distribution

NASA Astrophysics Data System (ADS)

Samohyl, Robert Wayne

2017-10-01

This paper questions some aspects of attribute acceptance sampling in light of the original concepts of hypothesis testing from Neyman and Pearson (NP). Attribute acceptance sampling in industry, as developed by Dodge and Romig (DR), generally follows the international standards of ISO 2859, and similarly the Brazilian standards NBR 5425 to NBR 5427 and the United States Standards ANSI/ASQC Z1.4. The paper evaluates and extends the area of acceptance sampling in two directions. First, by suggesting the use of the hypergeometric distribution to calculate the parameters of sampling plans avoiding the unnecessary use of approximations such as the binomial or Poisson distributions. We show that, under usual conditions, discrepancies can be large. The conclusion is that the hypergeometric distribution, ubiquitously available in commonly used software, is more appropriate than other distributions for acceptance sampling. Second, and more importantly, we elaborate the theory of acceptance sampling in terms of hypothesis testing rigorously following the original concepts of NP. By offering a common theoretical structure, hypothesis testing from NP can produce a better understanding of applications even beyond the usual areas of industry and commerce such as public health and political polling. With the new procedures, both sample size and sample error can be reduced. What is unclear in traditional acceptance sampling is the necessity of linking the acceptable quality limit (AQL) exclusively to the producer and the lot quality percent defective (LTPD) exclusively to the consumer. In reality, the consumer should also be preoccupied with a value of AQL, as should the producer with LTPD. Furthermore, we can also question why type I error is always uniquely associated with the producer as producer risk, and likewise, the same question arises with consumer risk which is necessarily associated with type II error. The resolution of these questions is new to the literature. The article presents R code throughout.
Sample size re-assessment leading to a raised sample size does not inflate type I error rate under mild conditions.

PubMed

Broberg, Per

2013-07-19

One major concern with adaptive designs, such as the sample size adjustable designs, has been the fear of inflating the type I error rate. In (Stat Med 23:1023-1038, 2004) it is however proven that when observations follow a normal distribution and the interim result show promise, meaning that the conditional power exceeds 50%, type I error rate is protected. This bound and the distributional assumptions may seem to impose undesirable restrictions on the use of these designs. In (Stat Med 30:3267-3284, 2011) the possibility of going below 50% is explored and a region that permits an increased sample size without inflation is defined in terms of the conditional power at the interim. A criterion which is implicit in (Stat Med 30:3267-3284, 2011) is derived by elementary methods and expressed in terms of the test statistic at the interim to simplify practical use. Mathematical and computational details concerning this criterion are exhibited. Under very general conditions the type I error rate is preserved under sample size adjustable schemes that permit a raise. The main result states that for normally distributed observations raising the sample size when the result looks promising, where the definition of promising depends on the amount of knowledge gathered so far, guarantees the protection of the type I error rate. Also, in the many situations where the test statistic approximately follows a normal law, the deviation from the main result remains negligible. This article provides details regarding the Weibull and binomial distributions and indicates how one may approach these distributions within the current setting. There is thus reason to consider such designs more often, since they offer a means of adjusting an important design feature at little or no cost in terms of error rate.
Time-to-event continual reassessment method incorporating treatment cycle information with application to an oncology phase I trial.

PubMed

Huang, Bo; Kuan, Pei Fen

2014-11-01

Delayed dose limiting toxicities (i.e. beyond first cycle of treatment) is a challenge for phase I trials. The time-to-event continual reassessment method (TITE-CRM) is a Bayesian dose-finding design to address the issue of long observation time and early patient drop-out. It uses a weighted binomial likelihood with weights assigned to observations by the unknown time-to-toxicity distribution, and is open to accrual continually. To avoid dosing at overly toxic levels while retaining accuracy and efficiency for DLT evaluation that involves multiple cycles, we propose an adaptive weight function by incorporating cyclical data of the experimental treatment with parameters updated continually. This provides a reasonable estimate for the time-to-toxicity distribution by accounting for inter-cycle variability and maintains the statistical properties of consistency and coherence. A case study of a First-in-Human trial in cancer for an experimental biologic is presented using the proposed design. Design calibrations for the clinical and statistical parameters are conducted to ensure good operating characteristics. Simulation results show that the proposed TITE-CRM design with adaptive weight function yields significantly shorter trial duration, does not expose patients to additional risk, is competitive against the existing weighting methods, and possesses some desirable properties. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Estimating the effectiveness of further sampling in species inventories

USGS Publications Warehouse

Keating, K.A.; Quinn, J.F.; Ivie, M.A.; Ivie, L.L.

1998-01-01

Estimators of the number of additional species expected in the next ??n samples offer a potentially important tool for improving cost-effectiveness of species inventories but are largely untested. We used Monte Carlo methods to compare 11 such estimators, across a range of community structures and sampling regimes, and validated our results, where possible, using empirical data from vascular plant and beetle inventories from Glacier National Park, Montana, USA. We found that B. Efron and R. Thisted's 1976 negative binomial estimator was most robust to differences in community structure and that it was among the most accurate estimators when sampling was from model communities with structures resembling the large, heterogeneous communities that are the likely targets of major inventory efforts. Other estimators may be preferred under specific conditions, however. For example, when sampling was from model communities with highly even species-abundance distributions, estimates based on the Michaelis-Menten model were most accurate; when sampling was from moderately even model communities with S=10 species or communities with highly uneven species-abundance distributions, estimates based on Gleason's (1922) species-area model were most accurate. We suggest that use of such methods in species inventories can help improve cost-effectiveness by providing an objective basis for redirecting sampling to more-productive sites, methods, or time periods as the expectation of detecting additional species becomes unacceptably low.
Access to Recreational Physical Activities by Car and Bus: An Assessment of Socio-Spatial Inequalities in Mainland Scotland

PubMed Central

Ferguson, Neil S.; Lamb, Karen E.; Wang, Yang; Ogilvie, David; Ellaway, Anne

2013-01-01

Obesity and other chronic conditions linked with low levels of physical activity (PA) are associated with deprivation. One reason for this could be that it is more difficult for low-income groups to access recreational PA facilities such as swimming pools and sports centres than high-income groups. In this paper, we explore the distribution of access to PA facilities by car and bus across mainland Scotland by income deprivation at datazone level. GIS car and bus networks were created to determine the number of PA facilities accessible within travel times of 10, 20 and 30 minutes. Multilevel negative binomial regression models were then used to investigate the distribution of the number of accessible facilities, adjusting for datazone population size and local authority. Access to PA facilities by car was significantly (p<0.01) higher for the most affluent quintile of area-based income deprivation than for most other quintiles in small towns and all other quintiles in rural areas. Accessibility by bus was significantly lower for the most affluent quintile than for other quintiles in urban areas and small towns, but not in rural areas. Overall, we found that the most disadvantaged groups were those without access to a car and living in the most affluent areas or in rural areas. PMID:23409012
Distribution, occupancy, and habitat correlates of American martens (Martes americana) in Rocky Mountain National Park, Colorado

USGS Publications Warehouse

Baldwin, R.A.; Bender, L.C.

2008-01-01

A clear understanding of habitat associations of martens (Martes americana) is necessary to effectively manage and monitor populations. However, this information was lacking for martens in most of their southern range, particularly during the summer season. We studied the distribution and habitat correlates of martens from 2004 to 2006 in Rocky Mountain National Park (RMNP) across 3 spatial scales: site-specific, home-range, and landscape. We used remote-sensored cameras from early August through late October to inventory occurrence of martens and modeled occurrence as a function of habitat and landscape variables using binary response (BR) and binomial count (BC) logistic regression, and occupancy modeling (OM). We also assessed which was the most appropriate modeling technique for martens in RMNP. Of the 3 modeling techniques, OM appeared to be most appropriate given the explanatory power of derived models and its incorporation of detection probabilities, although the results from BR and BC provided corroborating evidence of important habitat correlates. Location of sites in the western portion of the park, riparian mixed-conifer stands, and mixed-conifer with aspen patches were most frequently positively correlated with occurrence of martens, whereas more xeric and open sites were avoided. Additionally, OM yielded unbiased occupancy values ranging from 91% to 100% and 20% to 30% for the western and eastern portions of RMNP, respectively. ?? 2008 American Society of Mammalogists.
Evolution of a Modified Binomial Random Graph by Agglomeration

NASA Astrophysics Data System (ADS)

Kang, Mihyun; Pachon, Angelica; Rodríguez, Pablo M.

2018-02-01

In the classical Erdős-Rényi random graph G( n, p) there are n vertices and each of the possible edges is independently present with probability p. The random graph G( n, p) is homogeneous in the sense that all vertices have the same characteristics. On the other hand, numerous real-world networks are inhomogeneous in this respect. Such an inhomogeneity of vertices may influence the connection probability between pairs of vertices. The purpose of this paper is to propose a new inhomogeneous random graph model which is obtained in a constructive way from the Erdős-Rényi random graph G( n, p). Given a configuration of n vertices arranged in N subsets of vertices (we call each subset a super-vertex), we define a random graph with N super-vertices by letting two super-vertices be connected if and only if there is at least one edge between them in G( n, p). Our main result concerns the threshold for connectedness. We also analyze the phase transition for the emergence of the giant component and the degree distribution. Even though our model begins with G( n, p), it assumes the existence of some community structure encoded in the configuration. Furthermore, under certain conditions it exhibits a power law degree distribution. Both properties are important for real-world applications.
The epipelagic fish community of Beaufort Sea coastal waters, Alaska

USGS Publications Warehouse

Jarvela, L.E.; Thorsteinson, L.K.

1999-01-01

A three-year study of epipelagic fishes inhabiting Beaufort Sea coastal waters in Alaska documented spatial and temporal patterns in fish distribution and abundance and examined their relationships to thermohaline features during summer. Significant interannual, seasonal, and geographical differences in surface water temperatures and salinities were observed. In 1990, sea ice was absent and marine conditions prevailed, whereas in 1988 and 1991, heavy pack ice was present and the dissolution of the brackish water mass along the coast proceeded more slowly. Arctic cod, capelin, and liparids were the most abundant marine fishes in the catches, while arctic cisco was the only abundant diadromous freshwater species. Age-0 arctic cod were exceptionally abundant and large in 1990, while age-0 capelin dominated in the other years. The alternating numerical dominances of arctic cod and age-0 capelin may represent differing species' responses to wind-driven oceanographic processes affecting growth and survival. The only captures of age-0 arctic cisco occurred during 1990. Catch patterns indicate they use a broad coastal migratory corridor and tolerate high salinities. As in the oceanographic data, geographical anti temporal patterns were apparent in the fish catch data, but in most cases these patterns were not statistically testable because of excessive zero catches. The negative binomial distribution appeared to be a suitable statistical descriptor of the aggregated catch patterns for the more common species.
Sampling Outdoor, Resting Anopheles gambiae and Other Mosquitoes (Diptera: Culicidae) in Western Kenya with Clay Pots

PubMed Central

Odiere, M.; Bayoh, M. N.; Gimnig, J.; Vulule, J.; Irungu, L.; Walker, E.

2014-01-01

Clay pots were analyzed as devices for sampling the outdoor resting fraction of Anopheles gambiae Giles (Diptera: Culicidae) and other mosquito species in a rural, western Kenya. Clay pots (Anopheles gambiae resting pots, herein AgREPOTs), outdoor pit shelters, indoor pyrethrum spray collections (PSC), and Colombian curtain exit traps were compared in collections done biweekly for nine intervals from April to June 2005 in 20 housing compounds. Of 10,517 mosquitoes sampled, 4,668 An. gambiae s.l. were sampled in total of which 63% were An. gambiae s.s. (46% female) and 37% were An. arabiensis (66% female). The clay pots were useful and practical for sampling both sexes of An. gambiae s.l. Additionally, 617 An. funestus (58% female) and 5,232 Culex spp. (males and females together) were collected. Temporal changes in abundance of An. gambiae s.l. were similarly revealed by all four sampling methods, indicating that the clay pots could be used as devices to quantify variation in mosquito population density. Dispersion patterns of the different species and sexes fit well the negative binomial distribution, indicating that the mosquitoes were aggregated in distribution. Aside from providing a useful sampling tool, the AgREPOT also may be useful as a delivery vehicle for insecticides or pathogens to males and females that enter and rest in them. PMID:17294916

A Binomial Test of Group Differences with Correlated Outcome Measures

ERIC Educational Resources Information Center

Onwuegbuzie, Anthony J.; Levin, Joel R.; Ferron, John M.

2011-01-01

Building on previous arguments for why educational researchers should not provide effect-size estimates in the face of statistically nonsignificant outcomes (Robinson & Levin, 1997), Onwuegbuzie and Levin (2005) proposed a 3-step statistical approach for assessing group differences when multiple outcome measures are individually analyzed…
Comparing Environmental Influences on Coral Bleaching Across and within Species using Clustered Binomial Regression

EPA Science Inventory

Differential susceptibility among reef-building coral species can lead to community shifts and loss of diversity as a result of temperature-induced mass bleaching events. However, the influence of the local environment on species-specific bleaching susceptibilities has not been ...
Sequence Factorial and Its Applications

ERIC Educational Resources Information Center

Asiru, Muniru A.

2012-01-01

In this note, we introduce sequence factorial and use this to study generalized M-bonomial coefficients. For the sequence of natural numbers, the twin concepts of sequence factorial and generalized M-bonomial coefficients, respectively, extend the corresponding concepts of factorial of an integer and binomial coefficients. Some latent properties…
A time series model: First-order integer-valued autoregressive (INAR(1))

NASA Astrophysics Data System (ADS)

Simarmata, D. M.; Novkaniza, F.; Widyaningsih, Y.

2017-07-01

Nonnegative integer-valued time series arises in many applications. A time series model: first-order Integer-valued AutoRegressive (INAR(1)) is constructed by binomial thinning operator to model nonnegative integer-valued time series. INAR (1) depends on one period from the process before. The parameter of the model can be estimated by Conditional Least Squares (CLS). Specification of INAR(1) is following the specification of (AR(1)). Forecasting in INAR(1) uses median or Bayesian forecasting methodology. Median forecasting methodology obtains integer s, which is cumulative density function (CDF) until s, is more than or equal to 0.5. Bayesian forecasting methodology forecasts h-step-ahead of generating the parameter of the model and parameter of innovation term using Adaptive Rejection Metropolis Sampling within Gibbs sampling (ARMS), then finding the least integer s, where CDF until s is more than or equal to u . u is a value taken from the Uniform(0,1) distribution. INAR(1) is applied on pneumonia case in Penjaringan, Jakarta Utara, January 2008 until April 2016 monthly.
Meckel syndrome in different populations.

PubMed

Lurie, I W; Prytkov, A N; Meldere, L V

1984-08-01

We report on 18 infants from 13 families where the infant was affected with the Meckel syndrome. The parents belong to various national groups--Russians, Byelorussians, Poles, Ukranians, Letts, and Tatars. One child was from an incestuous union (half-sister and half-brother), in 4 families the parents were natives of the same or neighboring villages; other parents apparently were not related. Excluding 3 couples from Central Russia, the Ukraine, and Tatary, the other 10 families were the inhabitants of the Moscow region, Byelorussia, and Latvia. In 3 of these families at least one grandparent was of Tatar descent. At the same time the frequency of Tatars in these regions is less than 1%. Using the Newton binomial distribution it was shown that the hypothesis about equal frequency of the Meckel syndrome gene among Tatars and other national groups under study may be excluded completely, and therefore the alternative hypothesis about an unusually high frequency of this gene among Tatars must be accepted. Such analysis may be useful for comparative evaluation of gene frequencies in populations which cannot be studied directly.
A Joint Model for the Kinetics of CTC Count and PSA Concentration During Treatment in Metastatic Castration-Resistant Prostate Cancer*

PubMed Central

Wilbaux, M; Tod, M; De Bono, J; Lorente, D; Mateo, J; Freyer, G; You, B; Hénin, E

2015-01-01

Assessment of treatment efficacy in metastatic castration-resistant prostate cancer (mCRPC) is limited by frequent nonmeasurable bone metastases. The count of circulating tumor cells (CTCs) is a promising surrogate marker that may replace the widely used prostate-specific antigen (PSA). The purpose of this study was to quantify the dynamic relationships between the longitudinal kinetics of these markers during treatment in patients with mCRPC. Data from 223 patients with mCRPC treated by chemotherapy and/or hormonotherapy were analyzed for up to 6 months of treatment. A semimechanistic model was built, combining the following several pharmacometric advanced features: (1) Kinetic-Pharmacodynamic (K-PD) compartments for treatments (chemotherapy and hormonotherapy); (2) a latent variable linking both marker kinetics; (3) modeling of CTC kinetics with a cell lifespan model; and (4) a negative binomial distribution for the CTC random sampling. Linked with survival, this model would potentially be useful for predicting treatment efficacy during drug development or for therapeutic adjustment in treated patients. PMID:26225253
Dynamic transcriptional symmetry-breaking in pre-implantation mammalian embryo development revealed by single-cell RNA-seq.

PubMed

Shi, Junchao; Chen, Qi; Li, Xin; Zheng, Xiudeng; Zhang, Ying; Qiao, Jie; Tang, Fuchou; Tao, Yi; Zhou, Qi; Duan, Enkui

2015-10-15

During mammalian pre-implantation embryo development, when the first asymmetry emerges and how it develops to direct distinct cell fates remain longstanding questions. Here, by analyzing single-blastomere transcriptome data from mouse and human pre-implantation embryos, we revealed that the initial blastomere-to-blastomere biases emerge as early as the first embryonic cleavage division, following a binomial distribution pattern. The subsequent zygotic transcriptional activation further elevated overall blastomere-to-blastomere biases during the two- to 16-cell embryo stages. The trends of transcriptional asymmetry fell into two distinct patterns: for some genes, the extent of asymmetry was minimized between blastomeres (monostable pattern), whereas other genes, including those known to be lineage specifiers, showed ever-increasing asymmetry between blastomeres (bistable pattern), supposedly controlled by negative or positive feedbacks. Moreover, our analysis supports a scenario in which opposing lineage specifiers within an early blastomere constantly compete with each other based on their relative ratio, forming an inclined 'lineage strength' that pushes the blastomere onto a predisposed, yet flexible, lineage track before morphological distinction. © 2015. Published by The Company of Biologists Ltd.
Small area estimation for estimating the number of infant mortality in West Java, Indonesia

NASA Astrophysics Data System (ADS)

Anggreyani, Arie; Indahwati, Kurnia, Anang

2016-02-01

Demographic and Health Survey Indonesia (DHSI) is a national designed survey to provide information regarding birth rate, mortality rate, family planning and health. DHSI was conducted by BPS in cooperation with National Population and Family Planning Institution (BKKBN), Indonesia Ministry of Health (KEMENKES) and USAID. Based on the publication of DHSI 2012, the infant mortality rate for a period of five years before survey conducted is 32 for 1000 birth lives. In this paper, Small Area Estimation (SAE) is used to estimate the number of infant mortality in districts of West Java. SAE is a special model of Generalized Linear Mixed Models (GLMM). In this case, the incidence of infant mortality is a Poisson distribution which has equdispersion assumption. The methods to handle overdispersion are binomial negative and quasi-likelihood model. Based on the results of analysis, quasi-likelihood model is the best model to overcome overdispersion problem. The basic model of the small area estimation used basic area level model. Mean square error (MSE) which based on resampling method is used to measure the accuracy of small area estimates.
Determinants of pika population density vs. occupancy in the Southern Rocky Mountains.

PubMed

Erb, Liesl P; Ray, Chris; Guralnick, Robert

2014-04-01

Species distributions are responding rapidly to global change. While correlative studies of local extinction have been vital to understanding the ecological impacts of global change, more mechanistic lines of inquiry are needed for enhanced forecasting. The current study assesses whether the predictors of local extinction also explain population density for a species apparently impacted by climate change. We tested a suite of climatic and habitat metrics as predictors of American pika (Ochotona princeps) relative population density in the Southern Rocky Mountains, USA. Population density was indexed as the density of pika latrine sites. Negative binomial regression and AICc showed that the best predictors of pika latrine density were patch area followed by two measures of vegetation quality: the diversity and relative cover of forbs. In contrast with previous studies of habitat occupancy in the Southern Rockies, climatic factors were not among the top predictors of latrine density. Populations may be buffered from decline and ultimately from extirpation at sites with high-quality vegetation. Conversely, populations at highest risk for declining density and extirpation are likely to be those in sites with poor-quality vegetation.
Comparison of algorithms to generate event times conditional on time-dependent covariates.

PubMed

Sylvestre, Marie-Pierre; Abrahamowicz, Michal

2008-06-30

The Cox proportional hazards model with time-dependent covariates (TDC) is now a part of the standard statistical analysis toolbox in medical research. As new methods involving more complex modeling of time-dependent variables are developed, simulations could often be used to systematically assess the performance of these models. Yet, generating event times conditional on TDC requires well-designed and efficient algorithms. We compare two classes of such algorithms: permutational algorithms (PAs) and algorithms based on a binomial model. We also propose a modification of the PA to incorporate a rejection sampler. We performed a simulation study to assess the accuracy, stability, and speed of these algorithms in several scenarios. Both classes of algorithms generated data sets that, once analyzed, provided virtually unbiased estimates with comparable variances. In terms of computational efficiency, the PA with the rejection sampler reduced the time necessary to generate data by more than 50 per cent relative to alternative methods. The PAs also allowed more flexibility in the specification of the marginal distributions of event times and required less calibration.
Constituent quarks and systematic errors in mid-rapidity charged multiplicity dN ch/dη distributions

DOE PAGES

Tannenbaum, M. J.

2018-01-10

Centrality definition in A + A collisions at colliders such as RHIC and LHC suffers from a correlated systematic uncertainty caused by the efficiency of detecting a p + p collision (50 ± 5% for PHENIX at RHIC). In A + A collisions where centrality is measured by the number of nucleon collisions, N coll, or the number of nucleon participants, N part, or the number of constituent quark participants, N qp, the error in the efficiency of the primary interaction trigger (Beam–Beam Counters) for a p + p collision leads to a correlated systematic uncertainty in N part, Nmore » coll or N qp which reduces binomially as the A + A collisions become more central. If this is not correctly accounted for in projections of A + A to p + p collisions, then mistaken conclusions can result. Finally, a recent example is presented in whether the mid-rapidity charged multiplicity per constituent quark participant d(N ch/dη)/N qp in Au + Au at RHIC was the same as the value in p + p collisions.« less
The 'sparing phenomenon' of purpuric rash over tattooed skin.

PubMed

Pinal-Fernandez, Iago; Solans-Laqué, Roser

2014-01-01

Cutaneous complications associated with decorative tattooing are well known. However, the inhibition of a purpuric reaction by a tattoo is a fact that, as far as the authors know, has not been described before, fitting the definition of a 'sparing phenomenon', the absence of manifesting a particular skin disease in an area previously affected by another condition. From the clinical observation of purpuric lesions apparently inhibited by a tattoo in a 26-year-old patient, we performed an exact binomial test on the observed and expected proportion of purpuric lesions inside (0%, 95% confidence interval, CI, 0-2.6%) and outside (100%, 95% CI 97.4-100%) the tattooed skin, demonstrating a nonrandom distribution respecting the tattooed area (p < 0.001) and identifying the composition of the ink used in the tattoo (color pigment, glycerine, Hamamelis virginiana extract, water and alcohol). Moreover, we reviewed the cases of sparing phenomenon described in the literature. In conclusion this is the first report of a sparing phenomenon of purpuric lesions over tattooed skin. © 2013 S. Karger AG, Basel.
Constituent quarks and systematic errors in mid-rapidity charged multiplicity dN ch/dη distributions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tannenbaum, M. J.

Centrality definition in A + A collisions at colliders such as RHIC and LHC suffers from a correlated systematic uncertainty caused by the efficiency of detecting a p + p collision (50 ± 5% for PHENIX at RHIC). In A + A collisions where centrality is measured by the number of nucleon collisions, N coll, or the number of nucleon participants, N part, or the number of constituent quark participants, N qp, the error in the efficiency of the primary interaction trigger (Beam–Beam Counters) for a p + p collision leads to a correlated systematic uncertainty in N part, Nmore » coll or N qp which reduces binomially as the A + A collisions become more central. If this is not correctly accounted for in projections of A + A to p + p collisions, then mistaken conclusions can result. Finally, a recent example is presented in whether the mid-rapidity charged multiplicity per constituent quark participant d(N ch/dη)/N qp in Au + Au at RHIC was the same as the value in p + p collisions.« less
Critical Values for Lawshe's Content Validity Ratio: Revisiting the Original Methods of Calculation

ERIC Educational Resources Information Center

Ayre, Colin; Scally, Andrew John

2014-01-01

The content validity ratio originally proposed by Lawshe is widely used to quantify content validity and yet methods used to calculate the original critical values were never reported. Methods for original calculation of critical values are suggested along with tables of exact binomial probabilities.
School Violence: The Role of Parental and Community Involvement

ERIC Educational Resources Information Center

Lesneskie, Eric; Block, Steven

2017-01-01

This study utilizes the School Survey on Crime and Safety to identify variables that predict lower levels of violence from four domains: school security, school climate, parental involvement, and community involvement. Negative binomial regression was performed and the findings indicate that statistically significant results come from all four…
Predicting Children's Asthma Hospitalizations: Rural and Urban Differences in Texas

ERIC Educational Resources Information Center

Grineski, Sara E.

2009-01-01

Asthma is the number one chronic health condition facing children today; however, little is known about rural-urban inequalities in asthma. This "area effects on health" study examines rural-urban differences in childhood asthma hospitalizations within the state of Texas using negative binomial regression models. Effects associated with…
Comments Regarding the Binary Power Law for Heterogeneity of Disease Incidence

USDA-ARS?s Scientific Manuscript database

The binary power law (BPL) has been successfully used to characterize heterogeneity (over dispersion or small-scale aggregation) of disease incidence for many plant pathosystems. With the BPL, the log of the observed variance is a linear function of the log of the theoretical variance for a binomial...
The Grammar of Fantasy.

ERIC Educational Resources Information Center

Rodari, Gianni

1998-01-01

Depicts how any word chosen by chance can function as a magical word to exhume fields of memory and excite imagination. Details several word games of invention for children (such as the "fantastic binomial," using creative errors, and "Little Red Riding Hood in a Helicopter") that juxtapose normally unrelated words and that can…
An Alternate Approach to Alternating Sums: A Method to DIE for

ERIC Educational Resources Information Center

Benjamin, Arthur T.; Quinn, Jennifer J.

2008-01-01

Positive sums count. Alternating sums match. Alternating sums of binomial coefficients, Fibonacci numbers, and other combinatorial quantities are analyzed using sign-reversing involutions. In particular, we describe the quantity being considered, match positive and negative terms through an Involution, and count the Exceptions to the matching rule…
Transportation safety data and analysis : Volume 2, Calibration of the highway safety manual and development of new safety performance functions.

DOT National Transportation Integrated Search

2011-03-01

This report documents the calibration of the Highway Safety Manual (HSM) safety performance function (SPF) : for rural two-lane two-way roadway segments in Utah and the development of new models using negative : binomial and hierarchical Bayesian mod...

Some links on this page may take you to non-federal websites. Their policies may differ from this site.