random probability sample: Topics by Science.gov

Sample records for random probability sample

Methodology Series Module 5: Sampling Strategies.

PubMed

Setia, Maninder Singh

2016-01-01

Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ' Sampling Method'. There are essentially two types of sampling methods: 1) probability sampling - based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling - based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ' random sample' when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ' generalizability' of these results. In such a scenario, the researcher may want to use ' purposive sampling' for the study.
Methodology Series Module 5: Sampling Strategies

PubMed Central

Setia, Maninder Singh

2016-01-01

Once the research question and the research design have been finalised, it is important to select the appropriate sample for the study. The method by which the researcher selects the sample is the ‘ Sampling Method’. There are essentially two types of sampling methods: 1) probability sampling – based on chance events (such as random numbers, flipping a coin etc.); and 2) non-probability sampling – based on researcher's choice, population that accessible & available. Some of the non-probability sampling methods are: purposive sampling, convenience sampling, or quota sampling. Random sampling method (such as simple random sample or stratified random sample) is a form of probability sampling. It is important to understand the different sampling methods used in clinical studies and mention this method clearly in the manuscript. The researcher should not misrepresent the sampling method in the manuscript (such as using the term ‘ random sample’ when the researcher has used convenience sample). The sampling method will depend on the research question. For instance, the researcher may want to understand an issue in greater detail for one particular population rather than worry about the ‘ generalizability’ of these results. In such a scenario, the researcher may want to use ‘ purposive sampling’ for the study. PMID:27688438
Sampling Methods in Cardiovascular Nursing Research: An Overview.

PubMed

Kandola, Damanpreet; Banner, Davina; O'Keefe-McCarthy, Sheila; Jassal, Debbie

2014-01-01

Cardiovascular nursing research covers a wide array of topics from health services to psychosocial patient experiences. The selection of specific participant samples is an important part of the research design and process. The sampling strategy employed is of utmost importance to ensure that a representative sample of participants is chosen. There are two main categories of sampling methods: probability and non-probability. Probability sampling is the random selection of elements from the population, where each element of the population has an equal and independent chance of being included in the sample. There are five main types of probability sampling including simple random sampling, systematic sampling, stratified sampling, cluster sampling, and multi-stage sampling. Non-probability sampling methods are those in which elements are chosen through non-random methods for inclusion into the research study and include convenience sampling, purposive sampling, and snowball sampling. Each approach offers distinct advantages and disadvantages and must be considered critically. In this research column, we provide an introduction to these key sampling techniques and draw on examples from the cardiovascular research. Understanding the differences in sampling techniques may aid nurses in effective appraisal of research literature and provide a reference pointfor nurses who engage in cardiovascular research.
HABITAT ASSESSMENT USING A RANDOM PROBABILITY BASED SAMPLING DESIGN: ESCAMBIA RIVER DELTA, FLORIDA

EPA Science Inventory

Smith, Lisa M., Darrin D. Dantin and Steve Jordan. In press. Habitat Assessment Using a Random Probability Based Sampling Design: Escambia River Delta, Florida (Abstract). To be presented at the SWS/GERS Fall Joint Society Meeting: Communication and Collaboration: Coastal Systems...
Sampling in epidemiological research: issues, hazards and pitfalls.

PubMed

Tyrer, Stephen; Heyman, Bob

2016-04-01

Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research.
Sampling in epidemiological research: issues, hazards and pitfalls

PubMed Central

Tyrer, Stephen; Heyman, Bob

2016-01-01

Surveys of people's opinions are fraught with difficulties. It is easier to obtain information from those who respond to text messages or to emails than to attempt to obtain a representative sample. Samples of the population that are selected non-randomly in this way are termed convenience samples as they are easy to recruit. This introduces a sampling bias. Such non-probability samples have merit in many situations, but an epidemiological enquiry is of little value unless a random sample is obtained. If a sufficient number of those selected actually complete a survey, the results are likely to be representative of the population. This editorial describes probability and non-probability sampling methods and illustrates the difficulties and suggested solutions in performing accurate epidemiological research. PMID:27087985
Probability of coincidental similarity among the orbits of small bodies - I. Pairing

NASA Astrophysics Data System (ADS)

Jopek, Tadeusz Jan; Bronikowska, Małgorzata

2017-09-01

Probability of coincidental clustering among orbits of comets, asteroids and meteoroids depends on many factors like: the size of the orbital sample searched for clusters or the size of the identified group, it is different for groups of 2,3,4,… members. Probability of coincidental clustering is assessed by the numerical simulation, therefore, it depends also on the method used for the synthetic orbits generation. We have tested the impact of some of these factors. For a given size of the orbital sample we have assessed probability of random pairing among several orbital populations of different sizes. We have found how these probabilities vary with the size of the orbital samples. Finally, keeping fixed size of the orbital sample we have shown that the probability of random pairing can be significantly different for the orbital samples obtained by different observation techniques. Also for the user convenience we have obtained several formulae which, for given size of the orbital sample can be used to calculate the similarity threshold corresponding to the small value of the probability of coincidental similarity among two orbits.
What Are Probability Surveys used by the National Aquatic Resource Surveys?

EPA Pesticide Factsheets

The National Aquatic Resource Surveys (NARS) use probability-survey designs to assess the condition of the nation’s waters. In probability surveys (also known as sample-surveys or statistical surveys), sampling sites are selected randomly.
Sampling considerations for disease surveillance in wildlife populations

USGS Publications Warehouse

Nusser, S.M.; Clark, W.R.; Otis, D.L.; Huang, L.

2008-01-01

Disease surveillance in wildlife populations involves detecting the presence of a disease, characterizing its prevalence and spread, and subsequent monitoring. A probability sample of animals selected from the population and corresponding estimators of disease prevalence and detection provide estimates with quantifiable statistical properties, but this approach is rarely used. Although wildlife scientists often assume probability sampling and random disease distributions to calculate sample sizes, convenience samples (i.e., samples of readily available animals) are typically used, and disease distributions are rarely random. We demonstrate how landscape-based simulation can be used to explore properties of estimators from convenience samples in relation to probability samples. We used simulation methods to model what is known about the habitat preferences of the wildlife population, the disease distribution, and the potential biases of the convenience-sample approach. Using chronic wasting disease in free-ranging deer (Odocoileus virginianus) as a simple illustration, we show that using probability sample designs with appropriate estimators provides unbiased surveillance parameter estimates but that the selection bias and coverage errors associated with convenience samples can lead to biased and misleading results. We also suggest practical alternatives to convenience samples that mix probability and convenience sampling. For example, a sample of land areas can be selected using a probability design that oversamples areas with larger animal populations, followed by harvesting of individual animals within sampled areas using a convenience sampling method.
On the use of secondary capture-recapture samples to estimate temporary emigration and breeding proportions

USGS Publications Warehouse

Kendall, W.L.; Nichols, J.D.; North, P.M.; Nichols, J.D.

1995-01-01

The use of the Cormack- Jolly-Seber model under a standard sampling scheme of one sample per time period, when the Jolly-Seber assumption that all emigration is permanent does not hold, leads to the confounding of temporary emigration probabilities with capture probabilities. This biases the estimates of capture probability when temporary emigration is a completely random process, and both capture and survival probabilities when there is a temporary trap response in temporary emigration, or it is Markovian. The use of secondary capture samples over a shorter interval within each period, during which the population is assumed to be closed (Pollock's robust design), provides a second source of information on capture probabilities. This solves the confounding problem, and thus temporary emigration probabilities can be estimated. This process can be accomplished in an ad hoc fashion for completely random temporary emigration and to some extent in the temporary trap response case, but modelling the complete sampling process provides more flexibility and permits direct estimation of variances. For the case of Markovian temporary emigration, a full likelihood is required.
Deterministic multidimensional nonuniform gap sampling.

PubMed

Worley, Bradley; Powers, Robert

2015-12-01

Born from empirical observations in nonuniformly sampled multidimensional NMR data relating to gaps between sampled points, the Poisson-gap sampling method has enjoyed widespread use in biomolecular NMR. While the majority of nonuniform sampling schemes are fully randomly drawn from probability densities that vary over a Nyquist grid, the Poisson-gap scheme employs constrained random deviates to minimize the gaps between sampled grid points. We describe a deterministic gap sampling method, based on the average behavior of Poisson-gap sampling, which performs comparably to its random counterpart with the additional benefit of completely deterministic behavior. We also introduce a general algorithm for multidimensional nonuniform sampling based on a gap equation, and apply it to yield a deterministic sampling scheme that combines burst-mode sampling features with those of Poisson-gap schemes. Finally, we derive a relationship between stochastic gap equations and the expectation value of their sampling probability densities. Copyright © 2015 Elsevier Inc. All rights reserved.
Sample Selection in Randomized Experiments: A New Method Using Propensity Score Stratified Sampling

ERIC Educational Resources Information Center

Tipton, Elizabeth; Hedges, Larry; Vaden-Kiernan, Michael; Borman, Geoffrey; Sullivan, Kate; Caverly, Sarah

2014-01-01

Randomized experiments are often seen as the "gold standard" for causal research. Despite the fact that experiments use random assignment to treatment conditions, units are seldom selected into the experiment using probability sampling. Very little research on experimental design has focused on how to make generalizations to well-defined…
Polynomial chaos representation of databases on manifolds

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu

2017-04-15

Characterizing the polynomial chaos expansion (PCE) of a vector-valued random variable with probability distribution concentrated on a manifold is a relevant problem in data-driven settings. The probability distribution of such random vectors is multimodal in general, leading to potentially very slow convergence of the PCE. In this paper, we build on a recent development for estimating and sampling from probabilities concentrated on a diffusion manifold. The proposed methodology constructs a PCE of the random vector together with an associated generator that samples from the target probability distribution which is estimated from data concentrated in the neighborhood of the manifold. Themore » method is robust and remains efficient for high dimension and large datasets. The resulting polynomial chaos construction on manifolds permits the adaptation of many uncertainty quantification and statistical tools to emerging questions motivated by data-driven queries.« less
Randomized branch sampling

Treesearch

Harry T. Valentine

2002-01-01

Randomized branch sampling (RBS) is a special application of multistage probability sampling (see Sampling, environmental), which was developed originally by Jessen [3] to estimate fruit counts on individual orchard trees. In general, the method can be used to obtain estimates of many different attributes of trees or other branched plants. The usual objective of RBS is...
High throughput nonparametric probability density estimation.

PubMed

Farmer, Jenny; Jacobs, Donald

2018-01-01

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference.
High throughput nonparametric probability density estimation

PubMed Central

Farmer, Jenny

2018-01-01

In high throughput applications, such as those found in bioinformatics and finance, it is important to determine accurate probability distribution functions despite only minimal information about data characteristics, and without using human subjectivity. Such an automated process for univariate data is implemented to achieve this goal by merging the maximum entropy method with single order statistics and maximum likelihood. The only required properties of the random variables are that they are continuous and that they are, or can be approximated as, independent and identically distributed. A quasi-log-likelihood function based on single order statistics for sampled uniform random data is used to empirically construct a sample size invariant universal scoring function. Then a probability density estimate is determined by iteratively improving trial cumulative distribution functions, where better estimates are quantified by the scoring function that identifies atypical fluctuations. This criterion resists under and over fitting data as an alternative to employing the Bayesian or Akaike information criterion. Multiple estimates for the probability density reflect uncertainties due to statistical fluctuations in random samples. Scaled quantile residual plots are also introduced as an effective diagnostic to visualize the quality of the estimated probability densities. Benchmark tests show that estimates for the probability density function (PDF) converge to the true PDF as sample size increases on particularly difficult test probability densities that include cases with discontinuities, multi-resolution scales, heavy tails, and singularities. These results indicate the method has general applicability for high throughput statistical inference. PMID:29750803
Optimal random search for a single hidden target.

PubMed

Snider, Joseph

2011-01-01

A single target is hidden at a location chosen from a predetermined probability distribution. Then, a searcher must find a second probability distribution from which random search points are sampled such that the target is found in the minimum number of trials. Here it will be shown that if the searcher must get very close to the target to find it, then the best search distribution is proportional to the square root of the target distribution regardless of dimension. For a Gaussian target distribution, the optimum search distribution is approximately a Gaussian with a standard deviation that varies inversely with how close the searcher must be to the target to find it. For a network where the searcher randomly samples nodes and looks for the fixed target along edges, the optimum is either to sample a node with probability proportional to the square root of the out-degree plus 1 or not to do so at all.
Point-Sampling and Line-Sampling Probability Theory, Geometric Implications, Synthesis

Treesearch

L.R. Grosenbaugh

1958-01-01

Foresters concerned with measuring tree populations on definite areas have long employed two well-known methods of representative sampling. In list or enumerative sampling the entire tree population is tallied with a known proportion being randomly selected and measured for volume or other variables. In area sampling all trees on randomly located plots or strips...
Quantitative comparison of randomization designs in sequential clinical trials based on treatment balance and allocation randomness.

PubMed

Zhao, Wenle; Weng, Yanqiu; Wu, Qi; Palesch, Yuko

2012-01-01

To evaluate the performance of randomization designs under various parameter settings and trial sample sizes, and identify optimal designs with respect to both treatment imbalance and allocation randomness, we evaluate 260 design scenarios from 14 randomization designs under 15 sample sizes range from 10 to 300, using three measures for imbalance and three measures for randomness. The maximum absolute imbalance and the correct guess (CG) probability are selected to assess the trade-off performance of each randomization design. As measured by the maximum absolute imbalance and the CG probability, we found that performances of the 14 randomization designs are located in a closed region with the upper boundary (worst case) given by Efron's biased coin design (BCD) and the lower boundary (best case) from the Soares and Wu's big stick design (BSD). Designs close to the lower boundary provide a smaller imbalance and a higher randomness than designs close to the upper boundary. Our research suggested that optimization of randomization design is possible based on quantified evaluation of imbalance and randomness. Based on the maximum imbalance and CG probability, the BSD, Chen's biased coin design with imbalance tolerance method, and Chen's Ehrenfest urn design perform better than popularly used permuted block design, EBCD, and Wei's urn design. Copyright © 2011 John Wiley & Sons, Ltd.
45 CFR 1356.71 - Federal review of the eligibility of children in foster care and the eligibility of foster care...

Code of Federal Regulations, 2010 CFR

2010-10-01

... by ACF statistical staff from the Adoption and Foster Care Analysis and Reporting System (AFCARS... primary review utilizing probability sampling methodologies. Usually, the chosen methodology will be simple random sampling, but other probability samples may be utilized, when necessary and appropriate. (3...

Performance of Random Effects Model Estimators under Complex Sampling Designs

ERIC Educational Resources Information Center

Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan

2011-01-01

In this article, we consider estimation of parameters of random effects models from samples collected via complex multistage designs. Incorporation of sampling weights is one way to reduce estimation bias due to unequal probabilities of selection. Several weighting methods have been proposed in the literature for estimating the parameters of…
Using GIS to generate spatially balanced random survey designs for natural resource applications.

PubMed

Theobald, David M; Stevens, Don L; White, Denis; Urquhart, N Scott; Olsen, Anthony R; Norman, John B

2007-07-01

Sampling of a population is frequently required to understand trends and patterns in natural resource management because financial and time constraints preclude a complete census. A rigorous probability-based survey design specifies where to sample so that inferences from the sample apply to the entire population. Probability survey designs should be used in natural resource and environmental management situations because they provide the mathematical foundation for statistical inference. Development of long-term monitoring designs demand survey designs that achieve statistical rigor and are efficient but remain flexible to inevitable logistical or practical constraints during field data collection. Here we describe an approach to probability-based survey design, called the Reversed Randomized Quadrant-Recursive Raster, based on the concept of spatially balanced sampling and implemented in a geographic information system. This provides environmental managers a practical tool to generate flexible and efficient survey designs for natural resource applications. Factors commonly used to modify sampling intensity, such as categories, gradients, or accessibility, can be readily incorporated into the spatially balanced sample design.
Betting on Illusory Patterns: Probability Matching in Habitual Gamblers.

PubMed

Gaissmaier, Wolfgang; Wilke, Andreas; Scheibehenne, Benjamin; McCanney, Paige; Barrett, H Clark

2016-03-01

Why do people gamble? A large body of research suggests that cognitive distortions play an important role in pathological gambling. Many of these distortions are specific cases of a more general misperception of randomness, specifically of an illusory perception of patterns in random sequences. In this article, we provide further evidence for the assumption that gamblers are particularly prone to perceiving illusory patterns. In particular, we compared habitual gamblers to a matched sample of community members with regard to how much they exhibit the choice anomaly 'probability matching'. Probability matching describes the tendency to match response proportions to outcome probabilities when predicting binary outcomes. It leads to a lower expected accuracy than the maximizing strategy of predicting the most likely event on each trial. Previous research has shown that an illusory perception of patterns in random sequences fuels probability matching. So does impulsivity, which is also reported to be higher in gamblers. We therefore hypothesized that gamblers will exhibit more probability matching than non-gamblers, which was confirmed in a controlled laboratory experiment. Additionally, gamblers scored much lower than community members on the cognitive reflection task, which indicates higher impulsivity. This difference could account for the difference in probability matching between the samples. These results suggest that gamblers are more willing to bet impulsively on perceived illusory patterns.
On the importance of incorporating sampling weights in ...

EPA Pesticide Factsheets

Occupancy models are used extensively to assess wildlife-habitat associations and to predict species distributions across large geographic regions. Occupancy models were developed as a tool to properly account for imperfect detection of a species. Current guidelines on survey design requirements for occupancy models focus on the number of sample units and the pattern of revisits to a sample unit within a season. We focus on the sampling design or how the sample units are selected in geographic space (e.g., stratified, simple random, unequal probability, etc). In a probability design, each sample unit has a sample weight which quantifies the number of sample units it represents in the finite (oftentimes areal) sampling frame. We demonstrate the importance of including sampling weights in occupancy model estimation when the design is not a simple random sample or equal probability design. We assume a finite areal sampling frame as proposed for a national bat monitoring program. We compare several unequal and equal probability designs and varying sampling intensity within a simulation study. We found the traditional single season occupancy model produced biased estimates of occupancy and lower confidence interval coverage rates compared to occupancy models that accounted for the sampling design. We also discuss how our findings inform the analyses proposed for the nascent North American Bat Monitoring Program and other collaborative synthesis efforts that propose h
Determination of the influence of dispersion pattern of pesticide-resistant individuals on the reliability of resistance estimates using different sampling plans.

PubMed

Shah, R; Worner, S P; Chapman, R B

2012-10-01

Pesticide resistance monitoring includes resistance detection and subsequent documentation/ measurement. Resistance detection would require at least one (≥1) resistant individual(s) to be present in a sample to initiate management strategies. Resistance documentation, on the other hand, would attempt to get an estimate of the entire population (≥90%) of the resistant individuals. A computer simulation model was used to compare the efficiency of simple random and systematic sampling plans to detect resistant individuals and to document their frequencies when the resistant individuals were randomly or patchily distributed. A patchy dispersion pattern of resistant individuals influenced the sampling efficiency of systematic sampling plans while the efficiency of random sampling was independent of such patchiness. When resistant individuals were randomly distributed, sample sizes required to detect at least one resistant individual (resistance detection) with a probability of 0.95 were 300 (1%) and 50 (10% and 20%); whereas, when resistant individuals were patchily distributed, using systematic sampling, sample sizes required for such detection were 6000 (1%), 600 (10%) and 300 (20%). Sample sizes of 900 and 400 would be required to detect ≥90% of resistant individuals (resistance documentation) with a probability of 0.95 when resistant individuals were randomly dispersed and present at a frequency of 10% and 20%, respectively; whereas, when resistant individuals were patchily distributed, using systematic sampling, a sample size of 3000 and 1500, respectively, was necessary. Small sample sizes either underestimated or overestimated the resistance frequency. A simple random sampling plan is, therefore, recommended for insecticide resistance detection and subsequent documentation.
Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials.

PubMed

Wijeysundera, Duminda N; Austin, Peter C; Hux, Janet E; Beattie, W Scott; Laupacis, Andreas

2009-01-01

Randomized trials generally use "frequentist" statistics based on P-values and 95% confidence intervals. Frequentist methods have limitations that might be overcome, in part, by Bayesian inference. To illustrate these advantages, we re-analyzed randomized trials published in four general medical journals during 2004. We used Medline to identify randomized superiority trials with two parallel arms, individual-level randomization and dichotomous or time-to-event primary outcomes. Studies with P<0.05 in favor of the intervention were deemed "positive"; otherwise, they were "negative." We used several prior distributions and exact conjugate analyses to calculate Bayesian posterior probabilities for clinically relevant effects. Of 88 included studies, 39 were positive using a frequentist analysis. Although the Bayesian posterior probabilities of any benefit (relative risk or hazard ratio<1) were high in positive studies, these probabilities were lower and variable for larger benefits. The positive studies had only moderate probabilities for exceeding the effects that were assumed for calculating the sample size. By comparison, there were moderate probabilities of any benefit in negative studies. Bayesian and frequentist analyses complement each other when interpreting the results of randomized trials. Future reports of randomized trials should include both.
Correcting Classifiers for Sample Selection Bias in Two-Phase Case-Control Studies

PubMed Central

Theis, Fabian J.

2017-01-01

Epidemiological studies often utilize stratified data in which rare outcomes or exposures are artificially enriched. This design can increase precision in association tests but distorts predictions when applying classifiers on nonstratified data. Several methods correct for this so-called sample selection bias, but their performance remains unclear especially for machine learning classifiers. With an emphasis on two-phase case-control studies, we aim to assess which corrections to perform in which setting and to obtain methods suitable for machine learning techniques, especially the random forest. We propose two new resampling-based methods to resemble the original data and covariance structure: stochastic inverse-probability oversampling and parametric inverse-probability bagging. We compare all techniques for the random forest and other classifiers, both theoretically and on simulated and real data. Empirical results show that the random forest profits from only the parametric inverse-probability bagging proposed by us. For other classifiers, correction is mostly advantageous, and methods perform uniformly. We discuss consequences of inappropriate distribution assumptions and reason for different behaviors between the random forest and other classifiers. In conclusion, we provide guidance for choosing correction methods when training classifiers on biased samples. For random forests, our method outperforms state-of-the-art procedures if distribution assumptions are roughly fulfilled. We provide our implementation in the R package sambia. PMID:29312464
The coalescent of a sample from a binary branching process.

PubMed

Lambert, Amaury

2018-04-25

At time 0, start a time-continuous binary branching process, where particles give birth to a single particle independently (at a possibly time-dependent rate) and die independently (at a possibly time-dependent and age-dependent rate). A particular case is the classical birth-death process. Stop this process at time T>0. It is known that the tree spanned by the N tips alive at time T of the tree thus obtained (called a reduced tree or coalescent tree) is a coalescent point process (CPP), which basically means that the depths of interior nodes are independent and identically distributed (iid). Now select each of the N tips independently with probability y (Bernoulli sample). It is known that the tree generated by the selected tips, which we will call the Bernoulli sampled CPP, is again a CPP. Now instead, select exactly k tips uniformly at random among the N tips (a k-sample). We show that the tree generated by the selected tips is a mixture of Bernoulli sampled CPPs with the same parent CPP, over some explicit distribution of the sampling probability y. An immediate consequence is that the genealogy of a k-sample can be obtained by the realization of k random variables, first the random sampling probability Y and then the k-1 node depths which are iid conditional on Y=y. Copyright © 2018. Published by Elsevier Inc.
Digital simulation of two-dimensional random fields with arbitrary power spectra and non-Gaussian probability distribution functions.

PubMed

Yura, Harold T; Hanson, Steen G

2012-04-01

Methods for simulation of two-dimensional signals with arbitrary power spectral densities and signal amplitude probability density functions are disclosed. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In most cases the method provides satisfactory results and can thus be considered an engineering approach. Several illustrative examples with relevance for optics are given.
Improved high-dimensional prediction with Random Forests by the use of co-data.

PubMed

Te Beest, Dennis E; Mes, Steven W; Wilting, Saskia M; Brakenhoff, Ruud H; van de Wiel, Mark A

2017-12-28

Prediction in high dimensional settings is difficult due to the large number of variables relative to the sample size. We demonstrate how auxiliary 'co-data' can be used to improve the performance of a Random Forest in such a setting. Co-data are incorporated in the Random Forest by replacing the uniform sampling probabilities that are used to draw candidate variables by co-data moderated sampling probabilities. Co-data here are defined as any type information that is available on the variables of the primary data, but does not use its response labels. These moderated sampling probabilities are, inspired by empirical Bayes, learned from the data at hand. We demonstrate the co-data moderated Random Forest (CoRF) with two examples. In the first example we aim to predict the presence of a lymph node metastasis with gene expression data. We demonstrate how a set of external p-values, a gene signature, and the correlation between gene expression and DNA copy number can improve the predictive performance. In the second example we demonstrate how the prediction of cervical (pre-)cancer with methylation data can be improved by including the location of the probe relative to the known CpG islands, the number of CpG sites targeted by a probe, and a set of p-values from a related study. The proposed method is able to utilize auxiliary co-data to improve the performance of a Random Forest.
Rare event simulation in radiation transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kollman, Craig

1993-10-01

This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved,more » even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiple by the likelihood ratio between the true and simulated probabilities so as to keep the estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive ``learning`` algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give with probability one, a sequence of estimates converging exponentially fast to the true solution.« less
Views of United States physicians and members of the American Medical Association House of Delegates on physician-assisted suicide.

PubMed

Whitney, S N; Brown, B W; Brody, H; Alcser, K H; Bachman, J G; Greely, H T

2001-05-01

To ascertain the views of physicians and physician leaders toward the legalization of physician-assisted suicide. Confidential mail questionnaire. A nationwide random sample of physicians of all ages and specialties, and all members of the American Medical Association (AMA) House of Delegates as of April 1996. Demographic and practice characteristics and attitude toward legalization of physician-assisted suicide. Usable questionnaires were returned by 658 of 930 eligible physicians in the nationwide random sample (71%) and 315 of 390 eligible physicians in the House of Delegates (81%). In the nationwide random sample, 44.5% favored legalization (16.4% definitely and 28.1% probably), 33.9% opposed legalization (20.4% definitely and 13.5% probably), and 22% were unsure. Opposition to legalization was strongly associated with self-defined politically conservative beliefs, religious affiliation, and the importance of religion to the respondent (P <.001). Among members of the AMA House of Delegates, 23.5% favored legalization (7.3% definitely and 16.2% probably), 61.6% opposed legalization (43.5% definitely and 18.1% probably), and 15% were unsure; their views differed significantly from those of the nationwide random sample (P <.001). Given the choice, a majority of both groups would prefer no law at all, with physician-assisted suicide being neither legal nor illegal. Members of the AMA House of Delegates strongly oppose physician-assisted suicide, but rank-and-file physicians show no consensus either for or against its legalization. Although the debate is sometimes adversarial, most physicians in the United States are uncertain or endorse moderate views on assisted suicide.
Revisiting sample size: are big trials the answer?

PubMed

Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J

2012-07-18

The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.
Fitting distributions to microbial contamination data collected with an unequal probability sampling design.

PubMed

Williams, M S; Ebel, E D; Cao, Y

2013-01-01

The fitting of statistical distributions to microbial sampling data is a common application in quantitative microbiology and risk assessment applications. An underlying assumption of most fitting techniques is that data are collected with simple random sampling, which is often times not the case. This study develops a weighted maximum likelihood estimation framework that is appropriate for microbiological samples that are collected with unequal probabilities of selection. A weighted maximum likelihood estimation framework is proposed for microbiological samples that are collected with unequal probabilities of selection. Two examples, based on the collection of food samples during processing, are provided to demonstrate the method and highlight the magnitude of biases in the maximum likelihood estimator when data are inappropriately treated as a simple random sample. Failure to properly weight samples to account for how data are collected can introduce substantial biases into inferences drawn from the data. The proposed methodology will reduce or eliminate an important source of bias in inferences drawn from the analysis of microbial data. This will also make comparisons between studies and the combination of results from different studies more reliable, which is important for risk assessment applications. © 2012 No claim to US Government works.
Influence of item distribution pattern and abundance on efficiency of benthic core sampling

USGS Publications Warehouse

Behney, Adam C.; O'Shaughnessy, Ryan; Eichholz, Michael W.; Stafford, Joshua D.

2014-01-01

ore sampling is a commonly used method to estimate benthic item density, but little information exists about factors influencing the accuracy and time-efficiency of this method. We simulated core sampling in a Geographic Information System framework by generating points (benthic items) and polygons (core samplers) to assess how sample size (number of core samples), core sampler size (cm2), distribution of benthic items, and item density affected the bias and precision of estimates of density, the detection probability of items, and the time-costs. When items were distributed randomly versus clumped, bias decreased and precision increased with increasing sample size and increased slightly with increasing core sampler size. Bias and precision were only affected by benthic item density at very low values (500–1,000 items/m2). Detection probability (the probability of capturing ≥ 1 item in a core sample if it is available for sampling) was substantially greater when items were distributed randomly as opposed to clumped. Taking more small diameter core samples was always more time-efficient than taking fewer large diameter samples. We are unable to present a single, optimal sample size, but provide information for researchers and managers to derive optimal sample sizes dependent on their research goals and environmental conditions.
Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling

PubMed Central

Barranca, Victor J.; Kovačič, Gregor; Zhou, Douglas; Cai, David

2016-01-01

Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging. PMID:27555464
Latin Hypercube Sampling (LHS) UNIX Library/Standalone

DOE Office of Scientific and Technical Information (OSTI.GOV)

2004-05-13

The LHS UNIX Library/Standalone software provides the capability to draw random samples from over 30 distribution types. It performs the sampling by a stratified sampling method called Latin Hypercube Sampling (LHS). Multiple distributions can be sampled simultaneously, with user-specified correlations amongst the input distributions, LHS UNIX Library/ Standalone provides a way to generate multi-variate samples. The LHS samples can be generated either as a callable library (e.g., from within the DAKOTA software framework) or as a standalone capability. LHS UNIX Library/Standalone uses the Latin Hypercube Sampling method (LHS) to generate samples. LHS is a constrained Monte Carlo sampling scheme. Inmore » LHS, the range of each variable is divided into non-overlapping intervals on the basis of equal probability. A sample is selected at random with respect to the probability density in each interval, If multiple variables are sampled simultaneously, then values obtained for each are paired in a random manner with the n values of the other variables. In some cases, the pairing is restricted to obtain specified correlations amongst the input variables. Many simulation codes have input parameters that are uncertain and can be specified by a distribution, To perform uncertainty analysis and sensitivity analysis, random values are drawn from the input parameter distributions, and the simulation is run with these values to obtain output values. If this is done repeatedly, with many input samples drawn, one can build up a distribution of the output as well as examine correlations between input and output variables.« less
Multipartite nonlocality and random measurements

NASA Astrophysics Data System (ADS)

de Rosier, Anna; Gruca, Jacek; Parisio, Fernando; Vértesi, Tamás; Laskowski, Wiesław

2017-07-01

We present an exhaustive numerical analysis of violations of local realism by families of multipartite quantum states. As an indicator of nonclassicality we employ the probability of violation for randomly sampled observables. Surprisingly, it rapidly increases with the number of parties or settings and even for relatively small values local realism is violated for almost all observables. We have observed this effect to be typical in the sense that it emerged for all investigated states including some with randomly drawn coefficients. We also present the probability of violation as a witness of genuine multipartite entanglement.
Views of United States Physicians and Members of the American Medical Association House of Delegates on Physician-assisted Suicide

PubMed Central

Whitney, Simon N; Brown, Byron W; Brody, Howard; Alcser, Kirsten H; Bachman, Jerald G; Greely, Henry T

2001-01-01

OBJECTIVE To ascertain the views of physicians and physician leaders toward the legalization of physician-assisted suicide. DESIGN Confidential mail questionnaire. PARTICIPANTS A nationwide random sample of physicians of all ages and specialties, and all members of the American Medical Association (AMA) House of Delegates as of April 1996. MEASUREMENTS Demographic and practice characteristics and attitude toward legalization of physician-assisted suicide. MAIN RESULTS Usable questionnaires were returned by 658 of 930 eligible physicians in the nationwide random sample (71%) and 315 of 390 eligible physicians in the House of Delegates (81%). In the nationwide random sample, 44.5% favored legalization (16.4% definitely and 28.1% probably), 33.9% opposed legalization (20.4% definitely and 13.5% probably), and 22% were unsure. Opposition to legalization was strongly associated with self-defined politically conservative beliefs, religious affiliation, and the importance of religion to the respondent (P < .001). Among members of the AMA House of Delegates, 23.5% favored legalization (7.3% definitely and 16.2% probably), 61.6% opposed legalization (43.5% definitely and 18.1% probably), and 15% were unsure; their views differed significantly from those of the nationwide random sample (P < .001). Given the choice, a majority of both groups would prefer no law at all, with physician-assisted suicide being neither legal nor illegal. CONCLUSIONS Members of the AMA House of Delegates strongly oppose physician-assisted suicide, but rank-and-file physicians show no consensus either for or against its legalization. Although the debate is sometimes adversarial, most physicians in the United States are uncertain or endorse moderate views on assisted suicide. PMID:11359546
Accounting for randomness in measurement and sampling in studying cancer cell population dynamics.

PubMed

Ghavami, Siavash; Wolkenhauer, Olaf; Lahouti, Farshad; Ullah, Mukhtar; Linnebacher, Michael

2014-10-01

Knowing the expected temporal evolution of the proportion of different cell types in sample tissues gives an indication about the progression of the disease and its possible response to drugs. Such systems have been modelled using Markov processes. We here consider an experimentally realistic scenario in which transition probabilities are estimated from noisy cell population size measurements. Using aggregated data of FACS measurements, we develop MMSE and ML estimators and formulate two problems to find the minimum number of required samples and measurements to guarantee the accuracy of predicted population sizes. Our numerical results show that the convergence mechanism of transition probabilities and steady states differ widely from the real values if one uses the standard deterministic approach for noisy measurements. This provides support for our argument that for the analysis of FACS data one should consider the observed state as a random variable. The second problem we address is about the consequences of estimating the probability of a cell being in a particular state from measurements of small population of cells. We show how the uncertainty arising from small sample sizes can be captured by a distribution for the state probability.

The Evaluation of Bias of the Weighted Random Effects Model Estimators. Research Report. ETS RR-11-13

ERIC Educational Resources Information Center

Jia, Yue; Stokes, Lynne; Harris, Ian; Wang, Yan

2011-01-01

Estimation of parameters of random effects models from samples collected via complex multistage designs is considered. One way to reduce estimation bias due to unequal probabilities of selection is to incorporate sampling weights. Many researchers have been proposed various weighting methods (Korn, & Graubard, 2003; Pfeffermann, Skinner,…
Optimized probability sampling of study sites to improve generalizability in a multisite intervention trial.

PubMed

Kraschnewski, Jennifer L; Keyserling, Thomas C; Bangdiwala, Shrikant I; Gizlice, Ziya; Garcia, Beverly A; Johnston, Larry F; Gustafson, Alison; Petrovic, Lindsay; Glasgow, Russell E; Samuel-Hodge, Carmen D

2010-01-01

Studies of type 2 translation, the adaption of evidence-based interventions to real-world settings, should include representative study sites and staff to improve external validity. Sites for such studies are, however, often selected by convenience sampling, which limits generalizability. We used an optimized probability sampling protocol to select an unbiased, representative sample of study sites to prepare for a randomized trial of a weight loss intervention. We invited North Carolina health departments within 200 miles of the research center to participate (N = 81). Of the 43 health departments that were eligible, 30 were interested in participating. To select a representative and feasible sample of 6 health departments that met inclusion criteria, we generated all combinations of 6 from the 30 health departments that were eligible and interested. From the subset of combinations that met inclusion criteria, we selected 1 at random. Of 593,775 possible combinations of 6 counties, 15,177 (3%) met inclusion criteria. Sites in the selected subset were similar to all eligible sites in terms of health department characteristics and county demographics. Optimized probability sampling improved generalizability by ensuring an unbiased and representative sample of study sites.
Random function representation of stationary stochastic vector processes for probability density evolution analysis of wind-induced structures

NASA Astrophysics Data System (ADS)

Liu, Zhangjun; Liu, Zenghui

2018-06-01

This paper develops a hybrid approach of spectral representation and random function for simulating stationary stochastic vector processes. In the proposed approach, the high-dimensional random variables, included in the original spectral representation (OSR) formula, could be effectively reduced to only two elementary random variables by introducing the random functions that serve as random constraints. Based on this, a satisfactory simulation accuracy can be guaranteed by selecting a small representative point set of the elementary random variables. The probability information of the stochastic excitations can be fully emerged through just several hundred of sample functions generated by the proposed approach. Therefore, combined with the probability density evolution method (PDEM), it could be able to implement dynamic response analysis and reliability assessment of engineering structures. For illustrative purposes, a stochastic turbulence wind velocity field acting on a frame-shear-wall structure is simulated by constructing three types of random functions to demonstrate the accuracy and efficiency of the proposed approach. Careful and in-depth studies concerning the probability density evolution analysis of the wind-induced structure have been conducted so as to better illustrate the application prospects of the proposed approach. Numerical examples also show that the proposed approach possesses a good robustness.
Assessing Performance Tradeoffs in Undersea Distributed Sensor Networks

DTIC Science & Technology

2006-09-01

time. We refer to this process as track - before - detect (see [5] for a description), since the final determination of a target presence is not made until...expressions for probability of successful search and probability of false search for modeling the track - before - detect process. We then describe a numerical...random manner (randomly sampled from a uniform distribution). II. SENSOR NETWORK PERFORMANCE MODELS We model the process of track - before - detect by
Misrepresenting random sampling? A systematic review of research papers in the Journal of Advanced Nursing.

PubMed

Williamson, Graham R

2003-11-01

This paper discusses the theoretical limitations of the use of random sampling and probability theory in the production of a significance level (or P-value) in nursing research. Potential alternatives, in the form of randomization tests, are proposed. Research papers in nursing, medicine and psychology frequently misrepresent their statistical findings, as the P-values reported assume random sampling. In this systematic review of studies published between January 1995 and June 2002 in the Journal of Advanced Nursing, 89 (68%) studies broke this assumption because they used convenience samples or entire populations. As a result, some of the findings may be questionable. The key ideas of random sampling and probability theory for statistical testing (for generating a P-value) are outlined. The result of a systematic review of research papers published in the Journal of Advanced Nursing is then presented, showing how frequently random sampling appears to have been misrepresented. Useful alternative techniques that might overcome these limitations are then discussed. REVIEW LIMITATIONS: This review is limited in scope because it is applied to one journal, and so the findings cannot be generalized to other nursing journals or to nursing research in general. However, it is possible that other nursing journals are also publishing research articles based on the misrepresentation of random sampling. The review is also limited because in several of the articles the sampling method was not completely clearly stated, and in this circumstance a judgment has been made as to the sampling method employed, based on the indications given by author(s). Quantitative researchers in nursing should be very careful that the statistical techniques they use are appropriate for the design and sampling methods of their studies. If the techniques they employ are not appropriate, they run the risk of misinterpreting findings by using inappropriate, unrepresentative and biased samples.
Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap.

PubMed

Zhou, Hanzhi; Elliott, Michael R; Raghunathan, Trivellore E

2016-06-01

Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in "Delta-V," a key crash severity measure.
Multiple Imputation in Two-Stage Cluster Samples Using The Weighted Finite Population Bayesian Bootstrap

PubMed Central

Zhou, Hanzhi; Elliott, Michael R.; Raghunathan, Trivellore E.

2017-01-01

Multistage sampling is often employed in survey samples for cost and convenience. However, accounting for clustering features when generating datasets for multiple imputation is a nontrivial task, particularly when, as is often the case, cluster sampling is accompanied by unequal probabilities of selection, necessitating case weights. Thus, multiple imputation often ignores complex sample designs and assumes simple random sampling when generating imputations, even though failing to account for complex sample design features is known to yield biased estimates and confidence intervals that have incorrect nominal coverage. In this article, we extend a recently developed, weighted, finite-population Bayesian bootstrap procedure to generate synthetic populations conditional on complex sample design data that can be treated as simple random samples at the imputation stage, obviating the need to directly model design features for imputation. We develop two forms of this method: one where the probabilities of selection are known at the first and second stages of the design, and the other, more common in public use files, where only the final weight based on the product of the two probabilities is known. We show that this method has advantages in terms of bias, mean square error, and coverage properties over methods where sample designs are ignored, with little loss in efficiency, even when compared with correct fully parametric models. An application is made using the National Automotive Sampling System Crashworthiness Data System, a multistage, unequal probability sample of U.S. passenger vehicle crashes, which suffers from a substantial amount of missing data in “Delta-V,” a key crash severity measure. PMID:29226161
Detection of mastitis in dairy cattle by use of mixture models for repeated somatic cell scores: a Bayesian approach via Gibbs sampling.

PubMed

Odegård, J; Jensen, J; Madsen, P; Gianola, D; Klemetsdal, G; Heringstad, B

2003-11-01

The distribution of somatic cell scores could be regarded as a mixture of at least two components depending on a cow's udder health status. A heteroscedastic two-component Bayesian normal mixture model with random effects was developed and implemented via Gibbs sampling. The model was evaluated using datasets consisting of simulated somatic cell score records. Somatic cell score was simulated as a mixture representing two alternative udder health statuses ("healthy" or "diseased"). Animals were assigned randomly to the two components according to the probability of group membership (Pm). Random effects (additive genetic and permanent environment), when included, had identical distributions across mixture components. Posterior probabilities of putative mastitis were estimated for all observations, and model adequacy was evaluated using measures of sensitivity, specificity, and posterior probability of misclassification. Fitting different residual variances in the two mixture components caused some bias in estimation of parameters. When the components were difficult to disentangle, so were their residual variances, causing bias in estimation of Pm and of location parameters of the two underlying distributions. When all variance components were identical across mixture components, the mixture model analyses returned parameter estimates essentially without bias and with a high degree of precision. Including random effects in the model increased the probability of correct classification substantially. No sizable differences in probability of correct classification were found between models in which a single cow effect (ignoring relationships) was fitted and models where this effect was split into genetic and permanent environmental components, utilizing relationship information. When genetic and permanent environmental effects were fitted, the between-replicate variance of estimates of posterior means was smaller because the model accounted for random genetic drift.
Probability machines: consistent probability estimation using nonparametric learning machines.

PubMed

Malley, J D; Kruppa, J; Dasgupta, A; Malley, K G; Ziegler, A

2012-01-01

Most machine learning approaches only provide a classification for binary responses. However, probabilities are required for risk estimation using individual patient characteristics. It has been shown recently that every statistical learning machine known to be consistent for a nonparametric regression problem is a probability machine that is provably consistent for this estimation problem. The aim of this paper is to show how random forests and nearest neighbors can be used for consistent estimation of individual probabilities. Two random forest algorithms and two nearest neighbor algorithms are described in detail for estimation of individual probabilities. We discuss the consistency of random forests, nearest neighbors and other learning machines in detail. We conduct a simulation study to illustrate the validity of the methods. We exemplify the algorithms by analyzing two well-known data sets on the diagnosis of appendicitis and the diagnosis of diabetes in Pima Indians. Simulations demonstrate the validity of the method. With the real data application, we show the accuracy and practicality of this approach. We provide sample code from R packages in which the probability estimation is already available. This means that all calculations can be performed using existing software. Random forest algorithms as well as nearest neighbor approaches are valid machine learning methods for estimating individual probabilities for binary responses. Freely available implementations are available in R and may be used for applications.
Optimal estimation for discrete time jump processes

NASA Technical Reports Server (NTRS)

Vaca, M. V.; Tretter, S. A.

1977-01-01

Optimum estimates of nonobservable random variables or random processes which influence the rate functions of a discrete time jump process (DTJP) are obtained. The approach is based on the a posteriori probability of a nonobservable event expressed in terms of the a priori probability of that event and of the sample function probability of the DTJP. A general representation for optimum estimates and recursive equations for minimum mean squared error (MMSE) estimates are obtained. MMSE estimates are nonlinear functions of the observations. The problem of estimating the rate of a DTJP when the rate is a random variable with a probability density function of the form cx super K (l-x) super m and show that the MMSE estimates are linear in this case. This class of density functions explains why there are insignificant differences between optimum unconstrained and linear MMSE estimates in a variety of problems.
Optimal estimation for discrete time jump processes

NASA Technical Reports Server (NTRS)

Vaca, M. V.; Tretter, S. A.

1978-01-01

Optimum estimates of nonobservable random variables or random processes which influence the rate functions of a discrete time jump process (DTJP) are derived. The approach used is based on the a posteriori probability of a nonobservable event expressed in terms of the a priori probability of that event and of the sample function probability of the DTJP. Thus a general representation is obtained for optimum estimates, and recursive equations are derived for minimum mean-squared error (MMSE) estimates. In general, MMSE estimates are nonlinear functions of the observations. The problem is considered of estimating the rate of a DTJP when the rate is a random variable with a beta probability density function and the jump amplitudes are binomially distributed. It is shown that the MMSE estimates are linear. The class of beta density functions is rather rich and explains why there are insignificant differences between optimum unconstrained and linear MMSE estimates in a variety of problems.
Pigeons' Choices between Fixed-Interval and Random-Interval Schedules: Utility of Variability?

ERIC Educational Resources Information Center

Andrzejewski, Matthew E.; Cardinal, Claudia D.; Field, Douglas P.; Flannery, Barbara A.; Johnson, Michael; Bailey, Kathleen; Hineline, Philip N.

2005-01-01

Pigeons' choosing between fixed-interval and random-interval schedules of reinforcement was investigated in three experiments using a discrete-trial procedure. In all three experiments, the random-interval schedule was generated by sampling a probability distribution at an interval (and in multiples of the interval) equal to that of the…
Early morning urine collection to improve urinary lateral flow LAM assay sensitivity in hospitalised patients with HIV-TB co-infection.

PubMed

Gina, Phindile; Randall, Philippa J; Muchinga, Tapuwa E; Pooran, Anil; Meldau, Richard; Peter, Jonny G; Dheda, Keertan

2017-05-12

Urine LAM testing has been approved by the WHO for use in hospitalised patients with advanced immunosuppression. However, sensitivity remains suboptimal. We therefore examined the incremental diagnostic sensitivity of early morning urine (EMU) versus random urine sampling using the Determine® lateral flow lipoarabinomannan assay (LF-LAM) in HIV-TB co-infected patients. Consenting HIV-infected inpatients, screened as part of a larger prospective randomized controlled trial, that were treated for TB, and could donate matched random and EMU samples were included. Thus paired sample were collected from the same patient, LF-LAM was graded using the pre-January 2014, with grade 1 and 2 manufacturer-designated cut-points (the latter designated grade 1 after January 2014). Single sputum Xpert-MTB/RIF and/or TB culture positivity served as the reference standard (definite TB). Those treated for TB but not meeting this standard were designated probable TB. 123 HIV-infected patients commenced anti-TB treatment and provided matched random and EMU samples. 33% (41/123) and 67% (82/123) had definite and probable TB, respectively. Amongst those with definite TB LF-LAM sensitivity (95%CI), using the grade 2 cut-point, increased from 12% (5-24; 5/43) to 39% (26-54; 16/41) with random versus EMU, respectively (p = 0.005). Similarly, amongst probable TB, LF-LAM sensitivity increased from 10% (5-17; 8/83) to 24% (16-34; 20/82) (p = 0.001). LF-LAM specificity was not determined. This proof of concept study indicates that EMU could improve the sensitivity of LF-LAM in hospitalised TB-HIV co-infected patients. These data have implications for clinical practice.
A nonparametric method to generate synthetic populations to adjust for complex sampling design features.

PubMed

Dong, Qi; Elliott, Michael R; Raghunathan, Trivellore E

2014-06-01

Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs.
A nonparametric method to generate synthetic populations to adjust for complex sampling design features

PubMed Central

Dong, Qi; Elliott, Michael R.; Raghunathan, Trivellore E.

2017-01-01

Outside of the survey sampling literature, samples are often assumed to be generated by a simple random sampling process that produces independent and identically distributed (IID) samples. Many statistical methods are developed largely in this IID world. Application of these methods to data from complex sample surveys without making allowance for the survey design features can lead to erroneous inferences. Hence, much time and effort have been devoted to develop the statistical methods to analyze complex survey data and account for the sample design. This issue is particularly important when generating synthetic populations using finite population Bayesian inference, as is often done in missing data or disclosure risk settings, or when combining data from multiple surveys. By extending previous work in finite population Bayesian bootstrap literature, we propose a method to generate synthetic populations from a posterior predictive distribution in a fashion inverts the complex sampling design features and generates simple random samples from a superpopulation point of view, making adjustment on the complex data so that they can be analyzed as simple random samples. We consider a simulation study with a stratified, clustered unequal-probability of selection sample design, and use the proposed nonparametric method to generate synthetic populations for the 2006 National Health Interview Survey (NHIS), and the Medical Expenditure Panel Survey (MEPS), which are stratified, clustered unequal-probability of selection sample designs. PMID:29200608
APPLICATION OF A MULTIPURPOSE UNEQUAL-PROBABILITY STREAM SURVEY IN THE MID-ATLANTIC COASTAL PLAIN

EPA Science Inventory

A stratified random sample with unequal-probability selection was used to design a multipurpose survey of headwater streams in the Mid-Atlantic Coastal Plain. Objectives for data from the survey include unbiased estimates of regional stream conditions, and adequate coverage of un...
Large Deviations: Advanced Probability for Undergrads

ERIC Educational Resources Information Center

Rolls, David A.

2007-01-01

In the branch of probability called "large deviations," rates of convergence (e.g. of the sample mean) are considered. The theory makes use of the moment generating function. So, particularly for sums of independent and identically distributed random variables, the theory can be made accessible to senior undergraduates after a first course in…
(I Can't Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research.

PubMed

van Rijnsoever, Frank J

2017-01-01

I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: "random chance," which is based on probability sampling, "minimal information," which yields at least one new code per sampling step, and "maximum information," which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario.
Evaluation of a Class of Simple and Effective Uncertainty Methods for Sparse Samples of Random Variables and Functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romero, Vicente; Bonney, Matthew; Schroeder, Benjamin

When very few samples of a random quantity are available from a source distribution of unknown shape, it is usually not possible to accurately infer the exact distribution from which the data samples come. Under-estimation of important quantities such as response variance and failure probabilities can result. For many engineering purposes, including design and risk analysis, we attempt to avoid under-estimation with a strategy to conservatively estimate (bound) these types of quantities -- without being overly conservative -- when only a few samples of a random quantity are available from model predictions or replicate experiments. This report examines a classmore » of related sparse-data uncertainty representation and inference approaches that are relatively simple, inexpensive, and effective. Tradeoffs between the methods' conservatism, reliability, and risk versus number of data samples (cost) are quantified with multi-attribute metrics use d to assess method performance for conservative estimation of two representative quantities: central 95% of response; and 10 -4 probability of exceeding a response threshold in a tail of the distribution. Each method's performance is characterized with 10,000 random trials on a large number of diverse and challenging distributions. The best method and number of samples to use in a given circumstance depends on the uncertainty quantity to be estimated, the PDF character, and the desired reliability of bounding the true value. On the basis of this large data base and study, a strategy is proposed for selecting the method and number of samples for attaining reasonable credibility levels in bounding these types of quantities when sparse samples of random variables or functions are available from experiments or simulations.« less
THREE-PEE SAMPLING THEORY and program 'THRP' for computer generation of selection criteria

Treesearch

L. R. Grosenbaugh

1965-01-01

Theory necessary for sampling with probability proportional to prediction ('three-pee,' or '3P,' sampling) is first developed and then exemplified by numerical comparisons of several estimators. Program 'T RP' for computer generation of appropriate 3P-sample-selection criteria is described, and convenient random integer dispensers are...

The Influence of Mark-Recapture Sampling Effort on Estimates of Rock Lobster Survival

PubMed Central

Kordjazi, Ziya; Frusher, Stewart; Buxton, Colin; Gardner, Caleb; Bird, Tomas

2016-01-01

Five annual capture-mark-recapture surveys on Jasus edwardsii were used to evaluate the effect of sample size and fishing effort on the precision of estimated survival probability. Datasets of different numbers of individual lobsters (ranging from 200 to 1,000 lobsters) were created by random subsampling from each annual survey. This process of random subsampling was also used to create 12 datasets of different levels of effort based on three levels of the number of traps (15, 30 and 50 traps per day) and four levels of the number of sampling-days (2, 4, 6 and 7 days). The most parsimonious Cormack-Jolly-Seber (CJS) model for estimating survival probability shifted from a constant model towards sex-dependent models with increasing sample size and effort. A sample of 500 lobsters or 50 traps used on four consecutive sampling-days was required for obtaining precise survival estimations for males and females, separately. Reduced sampling effort of 30 traps over four sampling days was sufficient if a survival estimate for both sexes combined was sufficient for management of the fishery. PMID:26990561
Sampling guidelines for oral fluid-based surveys of group-housed animals.

PubMed

Rotolo, Marisa L; Sun, Yaxuan; Wang, Chong; Giménez-Lirola, Luis; Baum, David H; Gauger, Phillip C; Harmon, Karen M; Hoogland, Marlin; Main, Rodger; Zimmerman, Jeffrey J

2017-09-01

Formulas and software for calculating sample size for surveys based on individual animal samples are readily available. However, sample size formulas are not available for oral fluids and other aggregate samples that are increasingly used in production settings. Therefore, the objective of this study was to develop sampling guidelines for oral fluid-based porcine reproductive and respiratory syndrome virus (PRRSV) surveys in commercial swine farms. Oral fluid samples were collected in 9 weekly samplings from all pens in 3 barns on one production site beginning shortly after placement of weaned pigs. Samples (n=972) were tested by real-time reverse-transcription PCR (RT-rtPCR) and the binary results analyzed using a piecewise exponential survival model for interval-censored, time-to-event data with misclassification. Thereafter, simulation studies were used to study the barn-level probability of PRRSV detection as a function of sample size, sample allocation (simple random sampling vs fixed spatial sampling), assay diagnostic sensitivity and specificity, and pen-level prevalence. These studies provided estimates of the probability of detection by sample size and within-barn prevalence. Detection using fixed spatial sampling was as good as, or better than, simple random sampling. Sampling multiple barns on a site increased the probability of detection with the number of barns sampled. These results are relevant to PRRSV control or elimination projects at the herd, regional, or national levels, but the results are also broadly applicable to contagious pathogens of swine for which oral fluid tests of equivalent performance are available. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.
[Implication of inverse-probability weighting method in the evaluation of diagnostic test with verification bias].

PubMed

Kang, Leni; Zhang, Shaokai; Zhao, Fanghui; Qiao, Youlin

2014-03-01

To evaluate and adjust the verification bias existed in the screening or diagnostic tests. Inverse-probability weighting method was used to adjust the sensitivity and specificity of the diagnostic tests, with an example of cervical cancer screening used to introduce the Compare Tests package in R software which could be implemented. Sensitivity and specificity calculated from the traditional method and maximum likelihood estimation method were compared to the results from Inverse-probability weighting method in the random-sampled example. The true sensitivity and specificity of the HPV self-sampling test were 83.53% (95%CI:74.23-89.93)and 85.86% (95%CI: 84.23-87.36). In the analysis of data with randomly missing verification by gold standard, the sensitivity and specificity calculated by traditional method were 90.48% (95%CI:80.74-95.56)and 71.96% (95%CI:68.71-75.00), respectively. The adjusted sensitivity and specificity under the use of Inverse-probability weighting method were 82.25% (95% CI:63.11-92.62) and 85.80% (95% CI: 85.09-86.47), respectively, whereas they were 80.13% (95%CI:66.81-93.46)and 85.80% (95%CI: 84.20-87.41) under the maximum likelihood estimation method. The inverse-probability weighting method could effectively adjust the sensitivity and specificity of a diagnostic test when verification bias existed, especially when complex sampling appeared.
Verbal versus Numerical Probabilities: Does Format Presentation of Probabilistic Information regarding Breast Cancer Screening Affect Women's Comprehension?

ERIC Educational Resources Information Center

Vahabi, Mandana

2010-01-01

Objective: To test whether the format in which women receive probabilistic information about breast cancer and mammography affects their comprehension. Methods: A convenience sample of 180 women received pre-assembled randomized packages containing a breast health information brochure, with probabilities presented in either verbal or numeric…
Rare Event Simulation in Radiation Transport

NASA Astrophysics Data System (ADS)

Kollman, Craig

This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved, even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiplied by the likelihood ratio between the true and simulated probabilities so as to keep our estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive "learning" algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give, with probability one, a sequence of estimates converging exponentially fast to the true solution. In the final chapter, an attempt to generalize this algorithm to a continuous state space is made. This involves partitioning the space into a finite number of cells. There is a tradeoff between additional computation per iteration and variance reduction per iteration that arises in determining the optimal grid size. All versions of this algorithm can be thought of as a compromise between deterministic and Monte Carlo methods, capturing advantages of both techniques.
Estimating rare events in biochemical systems using conditional sampling.

PubMed

Sundar, V S

2017-01-28

The paper focuses on development of variance reduction strategies to estimate rare events in biochemical systems. Obtaining this probability using brute force Monte Carlo simulations in conjunction with the stochastic simulation algorithm (Gillespie's method) is computationally prohibitive. To circumvent this, important sampling tools such as the weighted stochastic simulation algorithm and the doubly weighted stochastic simulation algorithm have been proposed. However, these strategies require an additional step of determining the important region to sample from, which is not straightforward for most of the problems. In this paper, we apply the subset simulation method, developed as a variance reduction tool in the context of structural engineering, to the problem of rare event estimation in biochemical systems. The main idea is that the rare event probability is expressed as a product of more frequent conditional probabilities. These conditional probabilities are estimated with high accuracy using Monte Carlo simulations, specifically the Markov chain Monte Carlo method with the modified Metropolis-Hastings algorithm. Generating sample realizations of the state vector using the stochastic simulation algorithm is viewed as mapping the discrete-state continuous-time random process to the standard normal random variable vector. This viewpoint opens up the possibility of applying more sophisticated and efficient sampling schemes developed elsewhere to problems in stochastic chemical kinetics. The results obtained using the subset simulation method are compared with existing variance reduction strategies for a few benchmark problems, and a satisfactory improvement in computational time is demonstrated.
Piecewise SALT sampling for estimating suspended sediment yields

Treesearch

Robert B. Thomas

1989-01-01

A probability sampling method called SALT (Selection At List Time) has been developed for collecting and summarizing data on delivery of suspended sediment in rivers. It is based on sampling and estimating yield using a suspended-sediment rating curve for high discharges and simple random sampling for low flows. The method gives unbiased estimates of total yield and...
Two Universality Properties Associated with the Monkey Model of Zipf's Law

NASA Astrophysics Data System (ADS)

Perline, Richard; Perline, Ron

2016-03-01

The distribution of word probabilities in the monkey model of Zipf's law is associated with two universality properties: (1) the power law exponent converges strongly to $-1$ as the alphabet size increases and the letter probabilities are specified as the spacings from a random division of the unit interval for any distribution with a bounded density function on $[0,1]$; and (2), on a logarithmic scale the version of the model with a finite word length cutoff and unequal letter probabilities is approximately normally distributed in the part of the distribution away from the tails. The first property is proved using a remarkably general limit theorem for the logarithm of sample spacings from Shao and Hahn, and the second property follows from Anscombe's central limit theorem for a random number of i.i.d. random variables. The finite word length model leads to a hybrid Zipf-lognormal mixture distribution closely related to work in other areas.
Adaptive detection of noise signal according to Neumann-Pearson criterion

NASA Astrophysics Data System (ADS)

Padiryakov, Y. A.

1985-03-01

Optimum detection according to the Neumann-Pearson criterion is considered in the case of a random Gaussian noise signal, stationary during measurement, and a stationary random Gaussian background interference. Detection is based on two samples, their statistics characterized by estimates of their spectral densities, it being a priori known that sample A from the signal channel is either the sum of signal and interference or interference alone and sample B from the reference interference channel is an interference with the same spectral density as that of the interference in sample A for both hypotheses. The probability of correct detection is maximized on the average, first in the 2N-dimensional space of signal spectral density and interference spectral density readings, by fixing the probability of false alarm at each point so as to stabilize it at a constant level against variation of the interference spectral density. Deterministic decision rules are established. The algorithm is then reduced to equivalent detection in the N-dimensional space of the ratio of sample A readings to sample B readings.
(I Can’t Get No) Saturation: A simulation and guidelines for sample sizes in qualitative research

PubMed Central

2017-01-01

I explore the sample size in qualitative research that is required to reach theoretical saturation. I conceptualize a population as consisting of sub-populations that contain different types of information sources that hold a number of codes. Theoretical saturation is reached after all the codes in the population have been observed once in the sample. I delineate three different scenarios to sample information sources: “random chance,” which is based on probability sampling, “minimal information,” which yields at least one new code per sampling step, and “maximum information,” which yields the largest number of new codes per sampling step. Next, I use simulations to assess the minimum sample size for each scenario for systematically varying hypothetical populations. I show that theoretical saturation is more dependent on the mean probability of observing codes than on the number of codes in a population. Moreover, the minimal and maximal information scenarios are significantly more efficient than random chance, but yield fewer repetitions per code to validate the findings. I formulate guidelines for purposive sampling and recommend that researchers follow a minimum information scenario. PMID:28746358
What Does a Random Line Look Like: An Experimental Study

ERIC Educational Resources Information Center

Turner, Nigel E.; Liu, Eleanor; Toneatto, Tony

2011-01-01

The study examined the perception of random lines by people with gambling problems compared to people without gambling problems. The sample consisted of 67 probable pathological gamblers and 46 people without gambling problems. Participants completed a number of questionnaires about their gambling and were then presented with a series of random…
The Probability of Obtaining Two Statistically Different Test Scores as a Test Index

ERIC Educational Resources Information Center

Muller, Jorg M.

2006-01-01

A new test index is defined as the probability of obtaining two randomly selected test scores (PDTS) as statistically different. After giving a concept definition of the test index, two simulation studies are presented. The first analyzes the influence of the distribution of test scores, test reliability, and sample size on PDTS within classical…
Multinomial mixture model with heterogeneous classification probabilities

USGS Publications Warehouse

Holland, M.D.; Gray, B.R.

2011-01-01

Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.
Quantum probabilistic logic programming

NASA Astrophysics Data System (ADS)

Balu, Radhakrishnan

2015-05-01

We describe a quantum mechanics based logic programming language that supports Horn clauses, random variables, and covariance matrices to express and solve problems in probabilistic logic. The Horn clauses of the language wrap random variables, including infinite valued, to express probability distributions and statistical correlations, a powerful feature to capture relationship between distributions that are not independent. The expressive power of the language is based on a mechanism to implement statistical ensembles and to solve the underlying SAT instances using quantum mechanical machinery. We exploit the fact that classical random variables have quantum decompositions to build the Horn clauses. We establish the semantics of the language in a rigorous fashion by considering an existing probabilistic logic language called PRISM with classical probability measures defined on the Herbrand base and extending it to the quantum context. In the classical case H-interpretations form the sample space and probability measures defined on them lead to consistent definition of probabilities for well formed formulae. In the quantum counterpart, we define probability amplitudes on Hinterpretations facilitating the model generations and verifications via quantum mechanical superpositions and entanglements. We cast the well formed formulae of the language as quantum mechanical observables thus providing an elegant interpretation for their probabilities. We discuss several examples to combine statistical ensembles and predicates of first order logic to reason with situations involving uncertainty.
Experimental Study of the Effect of the Initial Spectrum Width on the Statistics of Random Wave Groups

NASA Astrophysics Data System (ADS)

Shemer, L.; Sergeeva, A.

2009-12-01

The statistics of random water wave field determines the probability of appearance of extremely high (freak) waves. This probability is strongly related to the spectral wave field characteristics. Laboratory investigation of the spatial variation of the random wave-field statistics for various initial conditions is thus of substantial practical importance. Unidirectional nonlinear random wave groups are investigated experimentally in the 300 m long Large Wave Channel (GWK) in Hannover, Germany, which is the biggest facility of its kind in Europe. Numerous realizations of a wave field with the prescribed frequency power spectrum, yet randomly-distributed initial phases of each harmonic, were generated by a computer-controlled piston-type wavemaker. Several initial spectral shapes with identical dominant wave length but different width were considered. For each spectral shape, the total duration of sampling in all realizations was long enough to yield sufficient sample size for reliable statistics. Through all experiments, an effort had been made to retain the characteristic wave height value and thus the degree of nonlinearity of the wave field. Spatial evolution of numerous statistical wave field parameters (skewness, kurtosis and probability distributions) is studied using about 25 wave gauges distributed along the tank. It is found that, depending on the initial spectral shape, the frequency spectrum of the wave field may undergo significant modification in the course of its evolution along the tank; the values of all statistical wave parameters are strongly related to the local spectral width. A sample of the measured wave height probability functions (scaled by the variance of surface elevation) is plotted in Fig. 1 for the initially narrow rectangular spectrum. The results in Fig. 1 resemble findings obtained in [1] for the initial Gaussian spectral shape. The probability of large waves notably surpasses that predicted by the Rayleigh distribution and is the highest at the distance of about 100 m. Acknowledgement This study is carried out in the framework of the EC supported project "Transnational access to large-scale tests in the Large Wave Channel (GWK) of Forschungszentrum Küste (Contract HYDRALAB III - No. 022441). [1] L. Shemer and A. Sergeeva, J. Geophys. Res. Oceans 114, C01015 (2009). Figure 1. Variation along the tank of the measured wave height distribution for rectangular initial spectral shape, the carrier wave period T0=1.5 s.
USING GIS TO GENERATE SPATIALLY-BALANCED RANDOM SURVEY DESIGNS FOR NATURAL RESOURCE APPLICATIONS

EPA Science Inventory

Sampling of a population is frequently required to understand trends and patterns in natural resource management because financial and time constraints preclude a complete census. A rigorous probability-based survey design specifies where to sample so that inferences from the sam...
Sampling large random knots in a confined space

NASA Astrophysics Data System (ADS)

Arsuaga, J.; Blackstone, T.; Diao, Y.; Hinson, K.; Karadayi, E.; Saito, M.

2007-09-01

DNA knots formed under extreme conditions of condensation, as in bacteriophage P4, are difficult to analyze experimentally and theoretically. In this paper, we propose to use the uniform random polygon model as a supplementary method to the existing methods for generating random knots in confinement. The uniform random polygon model allows us to sample knots with large crossing numbers and also to generate large diagrammatically prime knot diagrams. We show numerically that uniform random polygons sample knots with large minimum crossing numbers and certain complicated knot invariants (as those observed experimentally). We do this in terms of the knot determinants or colorings. Our numerical results suggest that the average determinant of a uniform random polygon of n vertices grows faster than O(e^{n^2}) . We also investigate the complexity of prime knot diagrams. We show rigorously that the probability that a randomly selected 2D uniform random polygon of n vertices is almost diagrammatically prime goes to 1 as n goes to infinity. Furthermore, the average number of crossings in such a diagram is at the order of O(n2). Therefore, the two-dimensional uniform random polygons offer an effective way in sampling large (prime) knots, which can be useful in various applications.
Designing efficient surveys: spatial arrangement of sample points for detection of invasive species

Treesearch

Ludek Berec; John M. Kean; Rebecca Epanchin-Niell; Andrew M. Liebhold; Robert G. Haight

2015-01-01

Effective surveillance is critical to managing biological invasions via early detection and eradication. The efficiency of surveillance systems may be affected by the spatial arrangement of sample locations. We investigate how the spatial arrangement of sample points, ranging from random to fixed grid arrangements, affects the probability of detecting a target...
THE RELATIONSHIP BETWEEN TEMPERATURE, PHYSICAL HABITAT AND FISH ASSEMBLAGE DATA IN A STATE WIDE PROBABILITY SURVEY OF OREGON STREAMS

EPA Science Inventory

To assess the ecological condition of streams and rivers in Oregon, we sampled 146 sites
in summer, 1997 as part of the U.S. EPA's Environmental Monitoring and Assessment Program.
Sample reaches were selected using a systematic, randomized sample design from the blue-line n...
Using known map category marginal frequencies to improve estimates of thematic map accuracy

NASA Technical Reports Server (NTRS)

Card, D. H.

1982-01-01

By means of two simple sampling plans suggested in the accuracy-assessment literature, it is shown how one can use knowledge of map-category relative sizes to improve estimates of various probabilities. The fact that maximum likelihood estimates of cell probabilities for the simple random sampling and map category-stratified sampling were identical has permitted a unified treatment of the contingency-table analysis. A rigorous analysis of the effect of sampling independently within map categories is made possible by results for the stratified case. It is noted that such matters as optimal sample size selection for the achievement of a desired level of precision in various estimators are irrelevant, since the estimators derived are valid irrespective of how sample sizes are chosen.

Use and interpretation of logistic regression in habitat-selection studies

USGS Publications Warehouse

Keating, Kim A.; Cherry, Steve

2004-01-01

Logistic regression is an important tool for wildlife habitat-selection studies, but the method frequently has been misapplied due to an inadequate understanding of the logistic model, its interpretation, and the influence of sampling design. To promote better use of this method, we review its application and interpretation under 3 sampling designs: random, case-control, and use-availability. Logistic regression is appropriate for habitat use-nonuse studies employing random sampling and can be used to directly model the conditional probability of use in such cases. Logistic regression also is appropriate for studies employing case-control sampling designs, but careful attention is required to interpret results correctly. Unless bias can be estimated or probability of use is small for all habitats, results of case-control studies should be interpreted as odds ratios, rather than probability of use or relative probability of use. When data are gathered under a use-availability design, logistic regression can be used to estimate approximate odds ratios if probability of use is small, at least on average. More generally, however, logistic regression is inappropriate for modeling habitat selection in use-availability studies. In particular, using logistic regression to fit the exponential model of Manly et al. (2002:100) does not guarantee maximum-likelihood estimates, valid probabilities, or valid likelihoods. We show that the resource selection function (RSF) commonly used for the exponential model is proportional to a logistic discriminant function. Thus, it may be used to rank habitats with respect to probability of use and to identify important habitat characteristics or their surrogates, but it is not guaranteed to be proportional to probability of use. Other problems associated with the exponential model also are discussed. We describe an alternative model based on Lancaster and Imbens (1996) that offers a method for estimating conditional probability of use in use-availability studies. Although promising, this model fails to converge to a unique solution in some important situations. Further work is needed to obtain a robust method that is broadly applicable to use-availability studies.
Economic Intervention and Parenting: A Randomized Experiment of Statewide Child Development Accounts

ERIC Educational Resources Information Center

Nam, Yunju; Wikoff, Nora; Sherraden, Michael

2016-01-01

Objective: We examine the effects of Child Development Accounts (CDAs) on parenting stress and practices. Methods: We use data from the SEED for Oklahoma Kids (SEED OK) experiment. SEED OK selected caregivers of infants from Oklahoma birth certificates using a probability sampling method, randomly assigned caregivers to the treatment (n = 1,132)…
Estimation of distribution overlap of urn models.

PubMed

Hampton, Jerrad; Lladser, Manuel E

2012-01-01

A classical problem in statistics is estimating the expected coverage of a sample, which has had applications in gene expression, microbial ecology, optimization, and even numismatics. Here we consider a related extension of this problem to random samples of two discrete distributions. Specifically, we estimate what we call the dissimilarity probability of a sample, i.e., the probability of a draw from one distribution not being observed in [Formula: see text] draws from another distribution. We show our estimator of dissimilarity to be a [Formula: see text]-statistic and a uniformly minimum variance unbiased estimator of dissimilarity over the largest appropriate range of [Formula: see text]. Furthermore, despite the non-Markovian nature of our estimator when applied sequentially over [Formula: see text], we show it converges uniformly in probability to the dissimilarity parameter, and we present criteria when it is approximately normally distributed and admits a consistent jackknife estimator of its variance. As proof of concept, we analyze V35 16S rRNA data to discern between various microbial environments. Other potential applications concern any situation where dissimilarity of two discrete distributions may be of interest. For instance, in SELEX experiments, each urn could represent a random RNA pool and each draw a possible solution to a particular binding site problem over that pool. The dissimilarity of these pools is then related to the probability of finding binding site solutions in one pool that are absent in the other.
How Statistics "Excel" Online.

ERIC Educational Resources Information Center

Chao, Faith; Davis, James

2000-01-01

Discusses the use of Microsoft Excel software and provides examples of its use in an online statistics course at Golden Gate University in the areas of randomness and probability, sampling distributions, confidence intervals, and regression analysis. (LRW)
Fatigue crack growth model RANDOM2 user manual, appendix 1

NASA Technical Reports Server (NTRS)

Boyce, Lola; Lovelace, Thomas B.

1989-01-01

The FORTRAN program RANDOM2 is documented. RANDOM2 is based on fracture mechanics using a probabilistic fatigue crack growth model. It predicts the random lifetime of an engine component to reach a given crack size. Included in this user manual are details regarding the theoretical background of RANDOM2, input data, instructions and a sample problem illustrating the use of RANDOM2. Appendix A gives information on the physical quantities, their symbols, FORTRAN names, and both SI and U.S. Customary units. Appendix B includes photocopies of the actual computer printout corresponding to the sample problem. Appendices C and D detail the IMSL, Ver. 10(1), subroutines and functions called by RANDOM2 and a SAS/GRAPH(2) program that can be used to plot both the probability density function (p.d.f.) and the cumulative distribution function (c.d.f.).
Probability Distributions for Random Quantum Operations

NASA Astrophysics Data System (ADS)

Schultz, Kevin

Motivated by uncertainty quantification and inference of quantum information systems, in this work we draw connections between the notions of random quantum states and operations in quantum information with probability distributions commonly encountered in the field of orientation statistics. This approach identifies natural sample spaces and probability distributions upon these spaces that can be used in the analysis, simulation, and inference of quantum information systems. The theory of exponential families on Stiefel manifolds provides the appropriate generalization to the classical case. Furthermore, this viewpoint motivates a number of additional questions into the convex geometry of quantum operations relative to both the differential geometry of Stiefel manifolds as well as the information geometry of exponential families defined upon them. In particular, we draw on results from convex geometry to characterize which quantum operations can be represented as the average of a random quantum operation. This project was supported by the Intelligence Advanced Research Projects Activity via Department of Interior National Business Center Contract Number 2012-12050800010.
Digital simulation of an arbitrary stationary stochastic process by spectral representation.

PubMed

Yura, Harold T; Hanson, Steen G

2011-04-01

In this paper we present a straightforward, efficient, and computationally fast method for creating a large number of discrete samples with an arbitrary given probability density function and a specified spectral content. The method relies on initially transforming a white noise sample set of random Gaussian distributed numbers into a corresponding set with the desired spectral distribution, after which this colored Gaussian probability distribution is transformed via an inverse transform into the desired probability distribution. In contrast to previous work, where the analyses were limited to auto regressive and or iterative techniques to obtain satisfactory results, we find that a single application of the inverse transform method yields satisfactory results for a wide class of arbitrary probability distributions. Although a single application of the inverse transform technique does not conserve the power spectra exactly, it yields highly accurate numerical results for a wide range of probability distributions and target power spectra that are sufficient for system simulation purposes and can thus be regarded as an accurate engineering approximation, which can be used for wide range of practical applications. A sufficiency condition is presented regarding the range of parameter values where a single application of the inverse transform method yields satisfactory agreement between the simulated and target power spectra, and a series of examples relevant for the optics community are presented and discussed. Outside this parameter range the agreement gracefully degrades but does not distort in shape. Although we demonstrate the method here focusing on stationary random processes, we see no reason why the method could not be extended to simulate non-stationary random processes. © 2011 Optical Society of America
Quantifying the benefit of wellbore leakage potential estimates for prioritizing long-term MVA well sampling at a CO2 storage site.

PubMed

Azzolina, Nicholas A; Small, Mitchell J; Nakles, David V; Glazewski, Kyle A; Peck, Wesley D; Gorecki, Charles D; Bromhal, Grant S; Dilmore, Robert M

2015-01-20

This work uses probabilistic methods to simulate a hypothetical geologic CO2 storage site in a depleted oil and gas field, where the large number of legacy wells would make it cost-prohibitive to sample all wells for all measurements as part of the postinjection site care. Deep well leakage potential scores were assigned to the wells using a random subsample of 100 wells from a detailed study of 826 legacy wells that penetrate the basal Cambrian formation on the U.S. side of the U.S./Canadian border. Analytical solutions and Monte Carlo simulations were used to quantify the statistical power of selecting a leaking well. Power curves were developed as a function of (1) the number of leaking wells within the Area of Review; (2) the sampling design (random or judgmental, choosing first the wells with the highest deep leakage potential scores); (3) the number of wells included in the monitoring sampling plan; and (4) the relationship between a well’s leakage potential score and its relative probability of leakage. Cases where the deep well leakage potential scores are fully or partially informative of the relative leakage probability are compared to a noninformative base case in which leakage is equiprobable across all wells in the Area of Review. The results show that accurate prior knowledge about the probability of well leakage adds measurable value to the ability to detect a leaking well during the monitoring program, and that the loss in detection ability due to imperfect knowledge of the leakage probability can be quantified. This work underscores the importance of a data-driven, risk-based monitoring program that incorporates uncertainty quantification into long-term monitoring sampling plans at geologic CO2 storage sites.
Adaptive Peer Sampling with Newscast

NASA Astrophysics Data System (ADS)

Tölgyesi, Norbert; Jelasity, Márk

The peer sampling service is a middleware service that provides random samples from a large decentralized network to support gossip-based applications such as multicast, data aggregation and overlay topology management. Lightweight gossip-based implementations of the peer sampling service have been shown to provide good quality random sampling while also being extremely robust to many failure scenarios, including node churn and catastrophic failure. We identify two problems with these approaches. The first problem is related to message drop failures: if a node experiences a higher-than-average message drop rate then the probability of sampling this node in the network will decrease. The second problem is that the application layer at different nodes might request random samples at very different rates which can result in very poor random sampling especially at nodes with high request rates. We propose solutions for both problems. We focus on Newscast, a robust implementation of the peer sampling service. Our solution is based on simple extensions of the protocol and an adaptive self-control mechanism for its parameters, namely—without involving failure detectors—nodes passively monitor local protocol events using them as feedback for a local control loop for self-tuning the protocol parameters. The proposed solution is evaluated by simulation experiments.
Statistical description of non-Gaussian samples in the F2 layer of the ionosphere during heliogeophysical disturbances

NASA Astrophysics Data System (ADS)

Sergeenko, N. P.

2017-11-01

An adequate statistical method should be developed in order to predict probabilistically the range of ionospheric parameters. This problem is solved in this paper. The time series of the critical frequency of the layer F2- foF2( t) were subjected to statistical processing. For the obtained samples {δ foF2}, statistical distributions and invariants up to the fourth order are calculated. The analysis shows that the distributions differ from the Gaussian law during the disturbances. At levels of sufficiently small probability distributions, there are arbitrarily large deviations from the model of the normal process. Therefore, it is attempted to describe statistical samples {δ foF2} based on the Poisson model. For the studied samples, the exponential characteristic function is selected under the assumption that time series are a superposition of some deterministic and random processes. Using the Fourier transform, the characteristic function is transformed into a nonholomorphic excessive-asymmetric probability-density function. The statistical distributions of the samples {δ foF2} calculated for the disturbed periods are compared with the obtained model distribution function. According to the Kolmogorov's criterion, the probabilities of the coincidence of a posteriori distributions with the theoretical ones are P 0.7-0.9. The conducted analysis makes it possible to draw a conclusion about the applicability of a model based on the Poisson random process for the statistical description and probabilistic variation estimates during heliogeophysical disturbances of the variations {δ foF2}.
Provable classically intractable sampling with measurement-based computation in constant time

NASA Astrophysics Data System (ADS)

Sanders, Stephen; Miller, Jacob; Miyake, Akimasa

We present a constant-time measurement-based quantum computation (MQC) protocol to perform a classically intractable sampling problem. We sample from the output probability distribution of a subclass of the instantaneous quantum polynomial time circuits introduced by Bremner, Montanaro and Shepherd. In contrast with the usual circuit model, our MQC implementation includes additional randomness due to byproduct operators associated with the computation. Despite this additional randomness we show that our sampling task cannot be efficiently simulated by a classical computer. We extend previous results to verify the quantum supremacy of our sampling protocol efficiently using only single-qubit Pauli measurements. Center for Quantum Information and Control, Department of Physics and Astronomy, University of New Mexico, Albuquerque, NM 87131, USA.
The decline and fall of Type II error rates

Treesearch

Steve Verrill; Mark Durst

2005-01-01

For general linear models with normally distributed random errors, the probability of a Type II error decreases exponentially as a function of sample size. This potentially rapid decline reemphasizes the importance of performing power calculations.
Random Numbers and Monte Carlo Methods

NASA Astrophysics Data System (ADS)

Scherer, Philipp O. J.

Many-body problems often involve the calculation of integrals of very high dimension which cannot be treated by standard methods. For the calculation of thermodynamic averages Monte Carlo methods are very useful which sample the integration volume at randomly chosen points. After summarizing some basic statistics, we discuss algorithms for the generation of pseudo-random numbers with given probability distribution which are essential for all Monte Carlo methods. We show how the efficiency of Monte Carlo integration can be improved by sampling preferentially the important configurations. Finally the famous Metropolis algorithm is applied to classical many-particle systems. Computer experiments visualize the central limit theorem and apply the Metropolis method to the traveling salesman problem.
Multicenter, randomized trial of quantitative pretest probability to reduce unnecessary medical radiation exposure in emergency department patients with chest pain and dyspnea.

PubMed

Kline, Jeffrey A; Jones, Alan E; Shapiro, Nathan I; Hernandez, Jackeline; Hogg, Melanie M; Troyer, Jennifer; Nelson, R Darrel

2014-01-01

Use of pretest probability can reduce unnecessary testing. We hypothesize that quantitative pretest probability, linked to evidence-based management strategies, can reduce unnecessary radiation exposure and cost in low-risk patients with symptoms suggestive of acute coronary syndrome and pulmonary embolism. This was a prospective, 4-center, randomized controlled trial of decision support effectiveness. Subjects were adults with chest pain and dyspnea, nondiagnostic ECGs, and no obvious diagnosis. The clinician provided data needed to compute pretest probabilities from a Web-based system. Clinicians randomized to the intervention group received the pretest probability estimates for both acute coronary syndrome and pulmonary embolism and suggested clinical actions designed to lower radiation exposure and cost. The control group received nothing. Patients were followed for 90 days. The primary outcome and sample size of 550 was predicated on a significant reduction in the proportion of healthy patients exposed to >5 mSv chest radiation. A total of 550 patients were randomized, and 541 had complete data. The proportion with >5 mSv to the chest and no significant cardiopulmonary diagnosis within 90 days was reduced from 33% to 25% (P=0.038). The intervention group had significantly lower median chest radiation exposure (0.06 versus 0.34 mSv; P=0.037, Mann-Whitney U test) and lower median costs ($934 versus $1275; P=0.018) for medical care. Adverse events occurred in 16% of controls and 11% in the intervention group (P=0.06). Provision of pretest probability and prescriptive advice reduced radiation exposure and cost of care in low-risk ambulatory patients with symptoms of acute coronary syndrome and pulmonary embolism. URL: http://www.clinicaltrials.gov. Unique identifier: NCT01059500.
Modelling the spatial distribution of Fasciola hepatica in dairy cattle in Europe.

PubMed

Ducheyne, Els; Charlier, Johannes; Vercruysse, Jozef; Rinaldi, Laura; Biggeri, Annibale; Demeler, Janina; Brandt, Christina; De Waal, Theo; Selemetas, Nikolaos; Höglund, Johan; Kaba, Jaroslaw; Kowalczyk, Slawomir J; Hendrickx, Guy

2015-03-26

A harmonized sampling approach in combination with spatial modelling is required to update current knowledge of fasciolosis in dairy cattle in Europe. Within the scope of the EU project GLOWORM, samples from 3,359 randomly selected farms in 849 municipalities in Belgium, Germany, Ireland, Poland and Sweden were collected and their infection status assessed using an indirect bulk tank milk (BTM) enzyme-linked immunosorbent assay (ELISA). Dairy farms were considered exposed when the optical density ratio (ODR) exceeded the 0.3 cut-off. Two ensemble-modelling techniques, Random Forests (RF) and Boosted Regression Trees (BRT), were used to obtain the spatial distribution of the probability of exposure to Fasciola hepatica using remotely sensed environmental variables (1-km spatial resolution) and interpolated values from meteorological stations as predictors. The median ODRs amounted to 0.31, 0.12, 0.54, 0.25 and 0.44 for Belgium, Germany, Ireland, Poland and southern Sweden, respectively. Using the 0.3 threshold, 571 municipalities were categorized as positive and 429 as negative. RF was seen as capable of predicting the spatial distribution of exposure with an area under the receiver operation characteristic (ROC) curve (AUC) of 0.83 (0.96 for BRT). Both models identified rainfall and temperature as the most important factors for probability of exposure. Areas of high and low exposure were identified by both models, with BRT better at discriminating between low-probability and high-probability exposure; this model may therefore be more useful in practise. Given a harmonized sampling strategy, it should be possible to generate robust spatial models for fasciolosis in dairy cattle in Europe to be used as input for temporal models and for the detection of deviations in baseline probability. Further research is required for model output in areas outside the eco-climatic range investigated.
Technical Reports Prepared Under Contract N00014-76-C-0475.

DTIC Science & Technology

1987-05-29

264 Approximations to Densities in Geometric H. Solomon 10/27/78 Probability M.A. Stephens 3. Technical Relort No. Title Author Date 265 Sequential ...Certain Multivariate S. Iyengar 8/12/82 Normal Probabilities 323 EDF Statistics for Testing for the Gamma M.A. Stephens 8/13/82 Distribution with...20-85 Nets 360 Random Sequential Coding By Hamming Distance Yoshiaki Itoh 07-11-85 Herbert Solomon 361 Transforming Censored Samples And Testing Fit
Effect of randomness on multi-frequency aeroelastic responses resolved by Unsteady Adaptive Stochastic Finite Elements

NASA Astrophysics Data System (ADS)

Witteveen, Jeroen A. S.; Bijl, Hester

2009-10-01

The Unsteady Adaptive Stochastic Finite Elements (UASFE) method resolves the effect of randomness in numerical simulations of single-mode aeroelastic responses with a constant accuracy in time for a constant number of samples. In this paper, the UASFE framework is extended to multi-frequency responses and continuous structures by employing a wavelet decomposition pre-processing step to decompose the sampled multi-frequency signals into single-frequency components. The effect of the randomness on the multi-frequency response is then obtained by summing the results of the UASFE interpolation at constant phase for the different frequency components. Results for multi-frequency responses and continuous structures show a three orders of magnitude reduction of computational costs compared to crude Monte Carlo simulations in a harmonically forced oscillator, a flutter panel problem, and the three-dimensional transonic AGARD 445.6 wing aeroelastic benchmark subject to random fields and random parameters with various probability distributions.
Does prevalence matter to physicians in estimating post-test probability of disease? A randomized trial.

PubMed

Agoritsas, Thomas; Courvoisier, Delphine S; Combescure, Christophe; Deom, Marie; Perneger, Thomas V

2011-04-01

The probability of a disease following a diagnostic test depends on the sensitivity and specificity of the test, but also on the prevalence of the disease in the population of interest (or pre-test probability). How physicians use this information is not well known. To assess whether physicians correctly estimate post-test probability according to various levels of prevalence and explore this skill across respondent groups. Randomized trial. Population-based sample of 1,361 physicians of all clinical specialties. We described a scenario of a highly accurate screening test (sensitivity 99% and specificity 99%) in which we randomly manipulated the prevalence of the disease (1%, 2%, 10%, 25%, 95%, or no information). We asked physicians to estimate the probability of disease following a positive test (categorized as <60%, 60-79%, 80-94%, 95-99.9%, and >99.9%). Each answer was correct for a different version of the scenario, and no answer was possible in the "no information" scenario. We estimated the proportion of physicians proficient in assessing post-test probability as the proportion of correct answers beyond the distribution of answers attributable to guessing. Most respondents in each of the six groups (67%-82%) selected a post-test probability of 95-99.9%, regardless of the prevalence of disease and even when no information on prevalence was provided. This answer was correct only for a prevalence of 25%. We estimated that 9.1% (95% CI 6.0-14.0) of respondents knew how to assess correctly the post-test probability. This proportion did not vary with clinical experience or practice setting. Most physicians do not take into account the prevalence of disease when interpreting a positive test result. This may cause unnecessary testing and diagnostic errors.
Using known populations of pronghorn to evaluate sampling plans and estimators

USGS Publications Warehouse

Kraft, K.M.; Johnson, D.H.; Samuelson, J.M.; Allen, S.H.

1995-01-01

Although sampling plans and estimators of abundance have good theoretical properties, their performance in real situations is rarely assessed because true population sizes are unknown. We evaluated widely used sampling plans and estimators of population size on 3 known clustered distributions of pronghorn (Antilocapra americana). Our criteria were accuracy of the estimate, coverage of 95% confidence intervals, and cost. Sampling plans were combinations of sampling intensities (16, 33, and 50%), sample selection (simple random sampling without replacement, systematic sampling, and probability proportional to size sampling with replacement), and stratification. We paired sampling plans with suitable estimators (simple, ratio, and probability proportional to size). We used area of the sampling unit as the auxiliary variable for the ratio and probability proportional to size estimators. All estimators were nearly unbiased, but precision was generally low (overall mean coefficient of variation [CV] = 29). Coverage of 95% confidence intervals was only 89% because of the highly skewed distribution of the pronghorn counts and small sample sizes, especially with stratification. Stratification combined with accurate estimates of optimal stratum sample sizes increased precision, reducing the mean CV from 33 without stratification to 25 with stratification; costs increased 23%. Precise results (mean CV = 13) but poor confidence interval coverage (83%) were obtained with simple and ratio estimators when the allocation scheme included all sampling units in the stratum containing most pronghorn. Although areas of the sampling units varied, ratio estimators and probability proportional to size sampling did not increase precision, possibly because of the clumped distribution of pronghorn. Managers should be cautious in using sampling plans and estimators to estimate abundance of aggregated populations.
Incidence Rates of Sexual Harassment in Mass Communications Internship Programs: An Initial Study Comparing Intern, Student, and Professional Rates.

ERIC Educational Resources Information Center

Bowen, Michelle; Laurion, Suzanne

A study documented, using a telephone survey, the incidence rates of sexual harassment of mass communication interns, and compared those rates to student and professional rates. A probability sample of 44 male and 52 female mass communications professionals was generated using several random sampling techniques from among professionals who work in…

Probability of assertive behaviour, interpersonal anxiety and self-efficacy of South African registered dietitians.

PubMed

Paterson, Marie; Green, J M; Basson, C J; Ross, F

2002-02-01

There is little information on the probability of assertive behaviour, interpersonal anxiety and self-efficacy in the literature regarding dietitians. The objective of this study was to establish baseline information of these attributes and the factors affecting them. Questionnaires collecting biographical information and self-assessment psychometric scales measuring levels of probability of assertiveness, interpersonal anxiety and self-efficacy were mailed to 350 subjects, who comprised a random sample of dietitians registered with the Health Professions Council of South Africa. Forty-one per cent (n=145) of the sample responded. Self-assessment inventory results were compared to test levels of probability of assertive behaviour, interpersonal anxiety and self-efficacy. The inventory results were compared with the biographical findings to establish statistical relationships between the variables. The hypotheses were formulated before data collection. Dietitians had acceptable levels of probability of assertive behaviour and interpersonal anxiety. The probability of assertive behaviour was significantly lower than the level noted in the literature and was negatively related to interpersonal anxiety and positively related to self-efficacy.
On Probability Domains IV

NASA Astrophysics Data System (ADS)

Frič, Roman; Papčo, Martin

2017-12-01

Stressing a categorical approach, we continue our study of fuzzified domains of probability, in which classical random events are replaced by measurable fuzzy random events. In operational probability theory (S. Bugajski) classical random variables are replaced by statistical maps (generalized distribution maps induced by random variables) and in fuzzy probability theory (S. Gudder) the central role is played by observables (maps between probability domains). We show that to each of the two generalized probability theories there corresponds a suitable category and the two resulting categories are dually equivalent. Statistical maps and observables become morphisms. A statistical map can send a degenerated (pure) state to a non-degenerated one —a quantum phenomenon and, dually, an observable can map a crisp random event to a genuine fuzzy random event —a fuzzy phenomenon. The dual equivalence means that the operational probability theory and the fuzzy probability theory coincide and the resulting generalized probability theory has two dual aspects: quantum and fuzzy. We close with some notes on products and coproducts in the dual categories.
Radiation Transport in Random Media With Large Fluctuations

NASA Astrophysics Data System (ADS)

Olson, Aaron; Prinja, Anil; Franke, Brian

2017-09-01

Neutral particle transport in media exhibiting large and complex material property spatial variation is modeled by representing cross sections as lognormal random functions of space and generated through a nonlinear memory-less transformation of a Gaussian process with covariance uniquely determined by the covariance of the cross section. A Karhunen-Loève decomposition of the Gaussian process is implemented to effciently generate realizations of the random cross sections and Woodcock Monte Carlo used to transport particles on each realization and generate benchmark solutions for the mean and variance of the particle flux as well as probability densities of the particle reflectance and transmittance. A computationally effcient stochastic collocation method is implemented to directly compute the statistical moments such as the mean and variance, while a polynomial chaos expansion in conjunction with stochastic collocation provides a convenient surrogate model that also produces probability densities of output quantities of interest. Extensive numerical testing demonstrates that use of stochastic reduced-order modeling provides an accurate and cost-effective alternative to random sampling for particle transport in random media.
Responsiveness-informed multiple imputation and inverse probability-weighting in cohort studies with missing data that are non-monotone or not missing at random.

PubMed

Doidge, James C

2018-02-01

Population-based cohort studies are invaluable to health research because of the breadth of data collection over time, and the representativeness of their samples. However, they are especially prone to missing data, which can compromise the validity of analyses when data are not missing at random. Having many waves of data collection presents opportunity for participants' responsiveness to be observed over time, which may be informative about missing data mechanisms and thus useful as an auxiliary variable. Modern approaches to handling missing data such as multiple imputation and maximum likelihood can be difficult to implement with the large numbers of auxiliary variables and large amounts of non-monotone missing data that occur in cohort studies. Inverse probability-weighting can be easier to implement but conventional wisdom has stated that it cannot be applied to non-monotone missing data. This paper describes two methods of applying inverse probability-weighting to non-monotone missing data, and explores the potential value of including measures of responsiveness in either inverse probability-weighting or multiple imputation. Simulation studies are used to compare methods and demonstrate that responsiveness in longitudinal studies can be used to mitigate bias induced by missing data, even when data are not missing at random.
[Exploration of the concept of genetic drift in genetics teaching of undergraduates].

PubMed

Wang, Chun-ming

2016-01-01

Genetic drift is one of the difficulties in teaching genetics due to its randomness and probability which could easily cause conceptual misunderstanding. The “sampling error" in its definition is often misunderstood because of the research method of “sampling", which disturbs the results and causes the random changes in allele frequency. I analyzed and compared the definitions of genetic drift in domestic and international genetic textbooks, and found that the definitions containing “sampling error" are widely adopted but are interpreted correctly in only a few textbooks. Here, the history of research on genetic drift, i.e., the contributions of Wright, Fisher and Kimura, is introduced. Moreover, I particularly describe two representative articles recently published about genetic drift teaching of undergraduates, which point out that misconceptions are inevitable for undergraduates during the studying process and also provide a preliminary solution. Combined with my own teaching practice, I suggest that the definition of genetic drift containing “sampling error" can be adopted with further interpretation, i.e., “sampling error" is random sampling among gametes when generating the next generation of alleles which is equivalent to a random sampling of all gametes participating in mating in gamete pool and has no relationship with artificial sampling in general genetics studies. This article may provide some help in genetics teaching.
Probabilistic generation of random networks taking into account information on motifs occurrence.

PubMed

Bois, Frederic Y; Gayraud, Ghislaine

2015-01-01

Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli.
Probabilistic Generation of Random Networks Taking into Account Information on Motifs Occurrence

PubMed Central

Bois, Frederic Y.

2015-01-01

Abstract Because of the huge number of graphs possible even with a small number of nodes, inference on network structure is known to be a challenging problem. Generating large random directed graphs with prescribed probabilities of occurrences of some meaningful patterns (motifs) is also difficult. We show how to generate such random graphs according to a formal probabilistic representation, using fast Markov chain Monte Carlo methods to sample them. As an illustration, we generate realistic graphs with several hundred nodes mimicking a gene transcription interaction network in Escherichia coli. PMID:25493547
fixedTimeEvents: An R package for the distribution of distances between discrete events in fixed time

NASA Astrophysics Data System (ADS)

Liland, Kristian Hovde; Snipen, Lars

When a series of Bernoulli trials occur within a fixed time frame or limited space, it is often interesting to assess if the successful outcomes have occurred completely at random, or if they tend to group together. One example, in genetics, is detecting grouping of genes within a genome. Approximations of the distribution of successes are possible, but they become inaccurate for small sample sizes. In this article, we describe the exact distribution of time between random, non-overlapping successes in discrete time of fixed length. A complete description of the probability mass function, the cumulative distribution function, mean, variance and recurrence relation is included. We propose an associated test for the over-representation of short distances and illustrate the methodology through relevant examples. The theory is implemented in an R package including probability mass, cumulative distribution, quantile function, random number generator, simulation functions, and functions for testing.
Semantic Importance Sampling for Statistical Model Checking

DTIC Science & Technology

2015-01-16

SMT calls while maintaining correctness. Finally, we implement SIS in a tool called osmosis and use it to verify a number of stochastic systems with...2 surveys related work. Section 3 presents background definitions and concepts. Section 4 presents SIS, and Section 5 presents our tool osmosis . In...which I∗M|=Φ(x) = 1. We do this by first randomly selecting a cube c from C∗ with uniform probability since each cube has equal probability 9 5. OSMOSIS
Data-driven probability concentration and sampling on manifold

DOE Office of Scientific and Technical Information (OSTI.GOV)

Soize, C., E-mail: christian.soize@univ-paris-est.fr; Ghanem, R., E-mail: ghanem@usc.edu

2016-09-15

A new methodology is proposed for generating realizations of a random vector with values in a finite-dimensional Euclidean space that are statistically consistent with a dataset of observations of this vector. The probability distribution of this random vector, while a priori not known, is presumed to be concentrated on an unknown subset of the Euclidean space. A random matrix is introduced whose columns are independent copies of the random vector and for which the number of columns is the number of data points in the dataset. The approach is based on the use of (i) the multidimensional kernel-density estimation methodmore » for estimating the probability distribution of the random matrix, (ii) a MCMC method for generating realizations for the random matrix, (iii) the diffusion-maps approach for discovering and characterizing the geometry and the structure of the dataset, and (iv) a reduced-order representation of the random matrix, which is constructed using the diffusion-maps vectors associated with the first eigenvalues of the transition matrix relative to the given dataset. The convergence aspects of the proposed methodology are analyzed and a numerical validation is explored through three applications of increasing complexity. The proposed method is found to be robust to noise levels and data complexity as well as to the intrinsic dimension of data and the size of experimental datasets. Both the methodology and the underlying mathematical framework presented in this paper contribute new capabilities and perspectives at the interface of uncertainty quantification, statistical data analysis, stochastic modeling and associated statistical inverse problems.« less
Visualizing Time-Varying Distribution Data in EOS Application

NASA Technical Reports Server (NTRS)

Shen, Han-Wei

2004-01-01

In this research, we have developed several novel visualization methods for spatial probability density function data. Our focus has been on 2D spatial datasets, where each pixel is a random variable, and has multiple samples which are the results of experiments on that random variable. We developed novel clustering algorithms as a means to reduce the information contained in these datasets; and investigated different ways of interpreting and clustering the data.
Assessing representativeness of sampling methods for reaching men who have sex with men: a direct comparison of results obtained from convenience and probability samples.

PubMed

Schwarcz, Sandra; Spindler, Hilary; Scheer, Susan; Valleroy, Linda; Lansky, Amy

2007-07-01

Convenience samples are used to determine HIV-related behaviors among men who have sex with men (MSM) without measuring the extent to which the results are representative of the broader MSM population. We compared results from a cross-sectional survey of MSM recruited from gay bars between June and October 2001 to a random digit dial telephone survey conducted between June 2002 and January 2003. The men in the probability sample were older, better educated, and had higher incomes than men in the convenience sample, the convenience sample enrolled more employed men and men of color. Substance use around the time of sex was higher in the convenience sample but other sexual behaviors were similar. HIV testing was common among men in both samples. Periodic validation, through comparison of data collected by different sampling methods, may be useful when relying on survey data for program and policy development.
Probability techniques for reliability analysis of composite materials

NASA Technical Reports Server (NTRS)

Wetherhold, Robert C.; Ucci, Anthony M.

1994-01-01

Traditional design approaches for composite materials have employed deterministic criteria for failure analysis. New approaches are required to predict the reliability of composite structures since strengths and stresses may be random variables. This report will examine and compare methods used to evaluate the reliability of composite laminae. The two types of methods that will be evaluated are fast probability integration (FPI) methods and Monte Carlo methods. In these methods, reliability is formulated as the probability that an explicit function of random variables is less than a given constant. Using failure criteria developed for composite materials, a function of design variables can be generated which defines a 'failure surface' in probability space. A number of methods are available to evaluate the integration over the probability space bounded by this surface; this integration delivers the required reliability. The methods which will be evaluated are: the first order, second moment FPI methods; second order, second moment FPI methods; the simple Monte Carlo; and an advanced Monte Carlo technique which utilizes importance sampling. The methods are compared for accuracy, efficiency, and for the conservativism of the reliability estimation. The methodology involved in determining the sensitivity of the reliability estimate to the design variables (strength distributions) and importance factors is also presented.
Computational methods for efficient structural reliability and reliability sensitivity analysis

NASA Technical Reports Server (NTRS)

Wu, Y.-T.

1993-01-01

This paper presents recent developments in efficient structural reliability analysis methods. The paper proposes an efficient, adaptive importance sampling (AIS) method that can be used to compute reliability and reliability sensitivities. The AIS approach uses a sampling density that is proportional to the joint PDF of the random variables. Starting from an initial approximate failure domain, sampling proceeds adaptively and incrementally with the goal of reaching a sampling domain that is slightly greater than the failure domain to minimize over-sampling in the safe region. Several reliability sensitivity coefficients are proposed that can be computed directly and easily from the above AIS-based failure points. These probability sensitivities can be used for identifying key random variables and for adjusting design to achieve reliability-based objectives. The proposed AIS methodology is demonstrated using a turbine blade reliability analysis problem.
Bayesian Probability Theory

NASA Astrophysics Data System (ADS)

von der Linden, Wolfgang; Dose, Volker; von Toussaint, Udo

2014-06-01

Preface; Part I. Introduction: 1. The meaning of probability; 2. Basic definitions; 3. Bayesian inference; 4. Combinatrics; 5. Random walks; 6. Limit theorems; 7. Continuous distributions; 8. The central limit theorem; 9. Poisson processes and waiting times; Part II. Assigning Probabilities: 10. Transformation invariance; 11. Maximum entropy; 12. Qualified maximum entropy; 13. Global smoothness; Part III. Parameter Estimation: 14. Bayesian parameter estimation; 15. Frequentist parameter estimation; 16. The Cramer-Rao inequality; Part IV. Testing Hypotheses: 17. The Bayesian way; 18. The frequentist way; 19. Sampling distributions; 20. Bayesian vs frequentist hypothesis tests; Part V. Real World Applications: 21. Regression; 22. Inconsistent data; 23. Unrecognized signal contributions; 24. Change point problems; 25. Function estimation; 26. Integral equations; 27. Model selection; 28. Bayesian experimental design; Part VI. Probabilistic Numerical Techniques: 29. Numerical integration; 30. Monte Carlo methods; 31. Nested sampling; Appendixes; References; Index.
Urn models for response-adaptive randomized designs: a simulation study based on a non-adaptive randomized trial.

PubMed

Ghiglietti, Andrea; Scarale, Maria Giovanna; Miceli, Rosalba; Ieva, Francesca; Mariani, Luigi; Gavazzi, Cecilia; Paganoni, Anna Maria; Edefonti, Valeria

2018-03-22

Recently, response-adaptive designs have been proposed in randomized clinical trials to achieve ethical and/or cost advantages by using sequential accrual information collected during the trial to dynamically update the probabilities of treatment assignments. In this context, urn models-where the probability to assign patients to treatments is interpreted as the proportion of balls of different colors available in a virtual urn-have been used as response-adaptive randomization rules. We propose the use of Randomly Reinforced Urn (RRU) models in a simulation study based on a published randomized clinical trial on the efficacy of home enteral nutrition in cancer patients after major gastrointestinal surgery. We compare results with the RRU design with those previously published with the non-adaptive approach. We also provide a code written with the R software to implement the RRU design in practice. In detail, we simulate 10,000 trials based on the RRU model in three set-ups of different total sample sizes. We report information on the number of patients allocated to the inferior treatment and on the empirical power of the t-test for the treatment coefficient in the ANOVA model. We carry out a sensitivity analysis to assess the effect of different urn compositions. For each sample size, in approximately 75% of the simulation runs, the number of patients allocated to the inferior treatment by the RRU design is lower, as compared to the non-adaptive design. The empirical power of the t-test for the treatment effect is similar in the two designs.
Likelihood analysis of species occurrence probability from presence-only data for modelling species distributions

USGS Publications Warehouse

Royle, J. Andrew; Chandler, Richard B.; Yackulic, Charles; Nichols, James D.

2012-01-01

1. Understanding the factors affecting species occurrence is a pre-eminent focus of applied ecological research. However, direct information about species occurrence is lacking for many species. Instead, researchers sometimes have to rely on so-called presence-only data (i.e. when no direct information about absences is available), which often results from opportunistic, unstructured sampling. MAXENT is a widely used software program designed to model and map species distribution using presence-only data. 2. We provide a critical review of MAXENT as applied to species distribution modelling and discuss how it can lead to inferential errors. A chief concern is that MAXENT produces a number of poorly defined indices that are not directly related to the actual parameter of interest – the probability of occurrence (ψ). This focus on an index was motivated by the belief that it is not possible to estimate ψ from presence-only data; however, we demonstrate that ψ is identifiable using conventional likelihood methods under the assumptions of random sampling and constant probability of species detection. 3. The model is implemented in a convenient r package which we use to apply the model to simulated data and data from the North American Breeding Bird Survey. We demonstrate that MAXENT produces extreme under-predictions when compared to estimates produced by logistic regression which uses the full (presence/absence) data set. We note that MAXENT predictions are extremely sensitive to specification of the background prevalence, which is not objectively estimated using the MAXENT method. 4. As with MAXENT, formal model-based inference requires a random sample of presence locations. Many presence-only data sets, such as those based on museum records and herbarium collections, may not satisfy this assumption. However, when sampling is random, we believe that inference should be based on formal methods that facilitate inference about interpretable ecological quantities instead of vaguely defined indices.
Secondary outcome analysis for data from an outcome-dependent sampling design.

PubMed

Pan, Yinghao; Cai, Jianwen; Longnecker, Matthew P; Zhou, Haibo

2018-04-22

Outcome-dependent sampling (ODS) scheme is a cost-effective way to conduct a study. For a study with continuous primary outcome, an ODS scheme can be implemented where the expensive exposure is only measured on a simple random sample and supplemental samples selected from 2 tails of the primary outcome variable. With the tremendous cost invested in collecting the primary exposure information, investigators often would like to use the available data to study the relationship between a secondary outcome and the obtained exposure variable. This is referred as secondary analysis. Secondary analysis in ODS designs can be tricky, as the ODS sample is not a random sample from the general population. In this article, we use the inverse probability weighted and augmented inverse probability weighted estimating equations to analyze the secondary outcome for data obtained from the ODS design. We do not make any parametric assumptions on the primary and secondary outcome and only specify the form of the regression mean models, thus allow an arbitrary error distribution. Our approach is robust to second- and higher-order moment misspecification. It also leads to more precise estimates of the parameters by effectively using all the available participants. Through simulation studies, we show that the proposed estimator is consistent and asymptotically normal. Data from the Collaborative Perinatal Project are analyzed to illustrate our method. Copyright © 2018 John Wiley & Sons, Ltd.
Fatigue strength reduction model: RANDOM3 and RANDOM4 user manual, appendix 2

NASA Technical Reports Server (NTRS)

Boyce, Lola; Lovelace, Thomas B.

1989-01-01

The FORTRAN programs RANDOM3 and RANDOM4 are documented. They are based on fatigue strength reduction, using a probabilistic constitutive model. They predict the random lifetime of an engine component to reach a given fatigue strength. Included in this user manual are details regarding the theoretical backgrounds of RANDOM3 and RANDOM4. Appendix A gives information on the physical quantities, their symbols, FORTRAN names, and both SI and U.S. Customary units. Appendix B and C include photocopies of the actual computer printout corresponding to the sample problems. Appendices D and E detail the IMSL, Version 10(1), subroutines and functions called by RANDOM3 and RANDOM4 and SAS/GRAPH(2) programs that can be used to plot both the probability density functions (p.d.f.) and the cumulative distribution functions (c.d.f.).
Space shuttle solid rocket booster recovery system definition, volume 1

NASA Technical Reports Server (NTRS)

1973-01-01

The performance requirements, preliminary designs, and development program plans for an airborne recovery system for the space shuttle solid rocket booster are discussed. The analyses performed during the study phase of the program are presented. The basic considerations which established the system configuration are defined. A Monte Carlo statistical technique using random sampling of the probability distribution for the critical water impact parameters was used to determine the failure probability of each solid rocket booster component as functions of impact velocity and component strength capability.

SAS procedures for designing and analyzing sample surveys

USGS Publications Warehouse

Stafford, Joshua D.; Reinecke, Kenneth J.; Kaminski, Richard M.

2003-01-01

Complex surveys often are necessary to estimate occurrence (or distribution), density, and abundance of plants and animals for purposes of re-search and conservation. Most scientists are familiar with simple random sampling, where sample units are selected from a population of interest (sampling frame) with equal probability. However, the goal of ecological surveys often is to make inferences about populations over large or complex spatial areas where organisms are not homogeneously distributed or sampling frames are in-convenient or impossible to construct. Candidate sampling strategies for such complex surveys include stratified,multistage, and adaptive sampling (Thompson 1992, Buckland 1994).
POF-Darts: Geometric adaptive sampling for probability of failure

DOE PAGES

Ebeida, Mohamed S.; Mitchell, Scott A.; Swiler, Laura P.; ...

2016-06-18

We introduce a novel technique, POF-Darts, to estimate the Probability Of Failure based on random disk-packing in the uncertain parameter space. POF-Darts uses hyperplane sampling to explore the unexplored part of the uncertain space. We use the function evaluation at a sample point to determine whether it belongs to failure or non-failure regions, and surround it with a protection sphere region to avoid clustering. We decompose the domain into Voronoi cells around the function evaluations as seeds and choose the radius of the protection sphere depending on the local Lipschitz continuity. As sampling proceeds, regions uncovered with spheres will shrink,more » improving the estimation accuracy. After exhausting the function evaluation budget, we build a surrogate model using the function evaluations associated with the sample points and estimate the probability of failure by exhaustive sampling of that surrogate. In comparison to other similar methods, our algorithm has the advantages of decoupling the sampling step from the surrogate construction one, the ability to reach target POF values with fewer samples, and the capability of estimating the number and locations of disconnected failure regions, not just the POF value. Furthermore, we present various examples to demonstrate the efficiency of our novel approach.« less
University Students’ Conceptual Knowledge of Randomness and Probability in the Contexts of Evolution and Mathematics

PubMed Central

Fiedler, Daniela; Tröbst, Steffen; Harms, Ute

2017-01-01

Students of all ages face severe conceptual difficulties regarding key aspects of evolution—the central, unifying, and overarching theme in biology. Aspects strongly related to abstract “threshold” concepts like randomness and probability appear to pose particular difficulties. A further problem is the lack of an appropriate instrument for assessing students’ conceptual knowledge of randomness and probability in the context of evolution. To address this problem, we have developed two instruments, Randomness and Probability Test in the Context of Evolution (RaProEvo) and Randomness and Probability Test in the Context of Mathematics (RaProMath), that include both multiple-choice and free-response items. The instruments were administered to 140 university students in Germany, then the Rasch partial-credit model was applied to assess them. The results indicate that the instruments generate reliable and valid inferences about students’ conceptual knowledge of randomness and probability in the two contexts (which are separable competencies). Furthermore, RaProEvo detected significant differences in knowledge of randomness and probability, as well as evolutionary theory, between biology majors and preservice biology teachers. PMID:28572180
A test of geographic assignment using isotope tracers in feathers of known origin

USGS Publications Warehouse

Wunder, Michael B.; Kester, C.L.; Knopf, F.L.; Rye, R.O.

2005-01-01

We used feathers of known origin collected from across the breeding range of a migratory shorebird to test the use of isotope tracers for assigning breeding origins. We analyzed δD, δ13C, and δ15N in feathers from 75 mountain plover (Charadrius montanus) chicks sampled in 2001 and from 119 chicks sampled in 2002. We estimated parameters for continuous-response inverse regression models and for discrete-response Bayesian probability models from data for each year independently. We evaluated model predictions with both the training data and by using the alternate year as an independent test dataset. Our results provide weak support for modeling latitude and isotope values as monotonic functions of one another, especially when data are pooled over known sources of variation such as sample year or location. We were unable to make even qualitative statements, such as north versus south, about the likely origin of birds using both δD and δ13C in inverse regression models; results were no better than random assignment. Probability models provided better results and a more natural framework for the problem. Correct assignment rates were highest when considering all three isotopes in the probability framework, but the use of even a single isotope was better than random assignment. The method appears relatively robust to temporal effects and is most sensitive to the isotope discrimination gradients over which samples are taken. We offer that the problem of using isotope tracers to infer geographic origin is best framed as one of assignment, rather than prediction.
Bipartite discrimination of independently prepared quantum states as a counterexample to a parallel repetition conjecture

NASA Astrophysics Data System (ADS)

Akibue, Seiseki; Kato, Go

2018-04-01

For distinguishing quantum states sampled from a fixed ensemble, the gap in bipartite and single-party distinguishability can be interpreted as a nonlocality of the ensemble. In this paper, we consider bipartite state discrimination in a composite system consisting of N subsystems, where each subsystem is shared between two parties and the state of each subsystem is randomly sampled from a particular ensemble comprising the Bell states. We show that the success probability of perfectly identifying the state converges to 1 as N →∞ if the entropy of the probability distribution associated with the ensemble is less than 1, even if the success probability is less than 1 for any finite N . In other words, the nonlocality of the N -fold ensemble asymptotically disappears if the probability distribution associated with each ensemble is concentrated. Furthermore, we show that the disappearance of the nonlocality can be regarded as a remarkable counterexample of a fundamental open question in theoretical computer science, called a parallel repetition conjecture of interactive games with two classically communicating players. Measurements for the discrimination task include a projective measurement of one party represented by stabilizer states, which enable the other party to perfectly distinguish states that are sampled with high probability.
Point of data saturation was assessed using resampling methods in a survey with open-ended questions.

PubMed

Tran, Viet-Thi; Porcher, Raphael; Falissard, Bruno; Ravaud, Philippe

2016-12-01

To describe methods to determine sample sizes in surveys using open-ended questions and to assess how resampling methods can be used to determine data saturation in these surveys. We searched the literature for surveys with open-ended questions and assessed the methods used to determine sample size in 100 studies selected at random. Then, we used Monte Carlo simulations on data from a previous study on the burden of treatment to assess the probability of identifying new themes as a function of the number of patients recruited. In the literature, 85% of researchers used a convenience sample, with a median size of 167 participants (interquartile range [IQR] = 69-406). In our simulation study, the probability of identifying at least one new theme for the next included subject was 32%, 24%, and 12% after the inclusion of 30, 50, and 100 subjects, respectively. The inclusion of 150 participants at random resulted in the identification of 92% themes (IQR = 91-93%) identified in the original study. In our study, data saturation was most certainly reached for samples >150 participants. Our method may be used to determine when to continue the study to find new themes or stop because of futility. Copyright © 2016 Elsevier Inc. All rights reserved.
The Self-Adapting Focused Review System. Probability sampling of medical records to monitor utilization and quality of care.

PubMed

Ash, A; Schwartz, M; Payne, S M; Restuccia, J D

1990-11-01

Medical record review is increasing in importance as the need to identify and monitor utilization and quality of care problems grow. To conserve resources, reviews are usually performed on a subset of cases. If judgment is used to identify subgroups for review, this raises the following questions: How should subgroups be determined, particularly since the locus of problems can change over time? What standard of comparison should be used in interpreting rates of problems found in subgroups? How can population problem rates be estimated from observed subgroup rates? How can the bias be avoided that arises because reviewers know that selected cases are suspected of having problems? How can changes in problem rates over time be interpreted when evaluating intervention programs? Simple random sampling, an alternative to subgroup review, overcomes the problems implied by these questions but is inefficient. The Self-Adapting Focused Review System (SAFRS), introduced and described here, provides an adaptive approach to record selection that is based upon model-weighted probability sampling. It retains the desirable inferential properties of random sampling while allowing reviews to be concentrated on cases currently thought most likely to be problematic. Model development and evaluation are illustrated using hospital data to predict inappropriate admissions.
Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR)

PubMed Central

Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

2007-01-01

Background Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. Methods We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. Application We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. Conclusion This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy. PMID:17543100
Sampling in health geography: reconciling geographical objectives and probabilistic methods. An example of a health survey in Vientiane (Lao PDR).

PubMed

Vallée, Julie; Souris, Marc; Fournet, Florence; Bochaton, Audrey; Mobillion, Virginie; Peyronnie, Karine; Salem, Gérard

2007-06-01

Geographical objectives and probabilistic methods are difficult to reconcile in a unique health survey. Probabilistic methods focus on individuals to provide estimates of a variable's prevalence with a certain precision, while geographical approaches emphasise the selection of specific areas to study interactions between spatial characteristics and health outcomes. A sample selected from a small number of specific areas creates statistical challenges: the observations are not independent at the local level, and this results in poor statistical validity at the global level. Therefore, it is difficult to construct a sample that is appropriate for both geographical and probability methods. We used a two-stage selection procedure with a first non-random stage of selection of clusters. Instead of randomly selecting clusters, we deliberately chose a group of clusters, which as a whole would contain all the variation in health measures in the population. As there was no health information available before the survey, we selected a priori determinants that can influence the spatial homogeneity of the health characteristics. This method yields a distribution of variables in the sample that closely resembles that in the overall population, something that cannot be guaranteed with randomly-selected clusters, especially if the number of selected clusters is small. In this way, we were able to survey specific areas while minimising design effects and maximising statistical precision. We applied this strategy in a health survey carried out in Vientiane, Lao People's Democratic Republic. We selected well-known health determinants with unequal spatial distribution within the city: nationality and literacy. We deliberately selected a combination of clusters whose distribution of nationality and literacy is similar to the distribution in the general population. This paper describes the conceptual reasoning behind the construction of the survey sample and shows that it can be advantageous to choose clusters using reasoned hypotheses, based on both probability and geographical approaches, in contrast to a conventional, random cluster selection strategy.
Posttraumatic stress disorder and associated risk factors in Canadian peacekeeping veterans with health-related disabilities.

PubMed

Richardson, J Don; Naifeh, James A; Elhai, Jon D

2007-08-01

This study investigates posttraumatic stress disorder (PTSD) and its associated risk factors in a random, national, Canadian sample of United Nations peacekeeping veterans with service-related disabilities. Participants included 1016 male veterans (age < 65 years) who served in the Canadian Forces from 1990 to 1999 and were selected from a larger random sample of 1968 veterans who voluntarily and anonymously completed a general health survey conducted by Veterans Affairs Canada in 1999. Survey instruments included the PTSD Checklist-Military Version (PCL-M), Center for Epidemiological Studies-Depression Scale (CES-D), and questionnaires regarding life events during the past year, current stressors, sociodemographic characteristics, and military history. We found that rates of probable PTSD (PCL-M score > 50) among veterans were 10.92% for veterans deployed once and 14.84% for those deployed more than once. The rates of probable clinical depression (CES-D score > 16) were 30.35% for veterans deployed once and 32.62% for those deployed more than once. We found that, in multivariate analyses, probable PTSD rates and PTSD severity were associated with younger age, single marital status, and deployment frequency. PTSD is an important health concern in the veteran population. Understanding such risk factors as younger age and unmarried status can help predict morbidity among trauma-exposed veterans.
ENVIRONMENTAL MONITORING AND ASSESSMENT PROGRAM (EMAP): WESTERN STREAMS AND RIVERS STATISTICAL SUMMARY

EPA Science Inventory

This statistical summary reports data from the Environmental Monitoring and Assessment Program (EMAP) Western Pilot (EMAP-W). EMAP-W was a sample survey (or probability survey, often simply called 'random') of streams and rivers in 12 states of the western U.S. (Arizona, Californ...
Calibrating random forests for probability estimation.

PubMed

Dankowski, Theresa; Ziegler, Andreas

2016-09-30

Probabilities can be consistently estimated using random forests. It is, however, unclear how random forests should be updated to make predictions for other centers or at different time points. In this work, we present two approaches for updating random forests for probability estimation. The first method has been proposed by Elkan and may be used for updating any machine learning approach yielding consistent probabilities, so-called probability machines. The second approach is a new strategy specifically developed for random forests. Using the terminal nodes, which represent conditional probabilities, the random forest is first translated to logistic regression models. These are, in turn, used for re-calibration. The two updating strategies were compared in a simulation study and are illustrated with data from the German Stroke Study Collaboration. In most simulation scenarios, both methods led to similar improvements. In the simulation scenario in which the stricter assumptions of Elkan's method were not met, the logistic regression-based re-calibration approach for random forests outperformed Elkan's method. It also performed better on the stroke data than Elkan's method. The strength of Elkan's method is its general applicability to any probability machine. However, if the strict assumptions underlying this approach are not met, the logistic regression-based approach is preferable for updating random forests for probability estimation. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd. © 2016 The Authors. Statistics in Medicine Published by John Wiley & Sons Ltd.
Modeling Systematic Change in Stopover Duration Does Not Improve Bias in Trends Estimated from Migration Counts.

PubMed

Crewe, Tara L; Taylor, Philip D; Lepage, Denis

2015-01-01

The use of counts of unmarked migrating animals to monitor long term population trends assumes independence of daily counts and a constant rate of detection. However, migratory stopovers often last days or weeks, violating the assumption of count independence. Further, a systematic change in stopover duration will result in a change in the probability of detecting individuals once, but also in the probability of detecting individuals on more than one sampling occasion. We tested how variation in stopover duration influenced accuracy and precision of population trends by simulating migration count data with known constant rate of population change and by allowing daily probability of survival (an index of stopover duration) to remain constant, or to vary randomly, cyclically, or increase linearly over time by various levels. Using simulated datasets with a systematic increase in stopover duration, we also tested whether any resulting bias in population trend could be reduced by modeling the underlying source of variation in detection, or by subsampling data to every three or five days to reduce the incidence of recounting. Mean bias in population trend did not differ significantly from zero when stopover duration remained constant or varied randomly over time, but bias and the detection of false trends increased significantly with a systematic increase in stopover duration. Importantly, an increase in stopover duration over time resulted in a compounding effect on counts due to the increased probability of detection and of recounting on subsequent sampling occasions. Under this scenario, bias in population trend could not be modeled using a covariate for stopover duration alone. Rather, to improve inference drawn about long term population change using counts of unmarked migrants, analyses must include a covariate for stopover duration, as well as incorporate sampling modifications (e.g., subsampling) to reduce the probability that individuals will be detected on more than one occasion.
Mode switching in volcanic seismicity: El Hierro 2011-2013

NASA Astrophysics Data System (ADS)

Roberts, Nick S.; Bell, Andrew F.; Main, Ian G.

2016-05-01

The Gutenberg-Richter b value is commonly used in volcanic eruption forecasting to infer material or mechanical properties from earthquake distributions. Such studies typically analyze discrete time windows or phases, but the choice of such windows is subjective and can introduce significant bias. Here we minimize this sample bias by iteratively sampling catalogs with randomly chosen windows and then stack the resulting probability density functions for the estimated b>˜ value to determine a net probability density function. We examine data from the El Hierro seismic catalog during a period of unrest in 2011-2013 and demonstrate clear multimodal behavior. Individual modes are relatively stable in time, but the most probable b>˜ value intermittently switches between modes, one of which is similar to that of tectonic seismicity. Multimodality is primarily associated with intermittent activation and cessation of activity in different parts of the volcanic system rather than with respect to any systematic inferred underlying process.
Assessing relative abundance and reproductive success of shrubsteppe raptors

USGS Publications Warehouse

Lehman, Robert N.; Carpenter, L.B.; Steenhof, Karen; Kochert, Michael N.

1998-01-01

From 1991-1994, we quantified relative abundance and reproductive success of the Ferruginous Hawk (Buteo regalis), Northern Harrier (Circus cyaneus), Burrowing Owl (Speotytoc unicularia), and Short-eared Owl (Asio flammeus) on the shrubsteppe plateaus (benchlands) in and near the Snake River Birds of Prey National Conservation Area in southwestern Idaho. To assess relative abundance, we searched randomly selected plots using four sampling methods: point counts, line transects, and quadrats of two sizes. On a persampling-effort basis, transects were slightly more effective than point counts and quadrats for locating raptor nests (3.4 pairs detected/100 h of effort vs. 2.2-3.1 pairs). Random sampling using quadrats failed to detect a Short-eared Owl population increase from 1993 to 1994. To evaluate nesting success, we tried to determine reproductive outcome for all nesting attempts located during random, historical, and incidental nest searches. We compared nesting success estimates based on all nesting attempts, on attempts found during incubation, and the Mayfield model. Most pairs used to evaluate success were pairs found incidentally. Visits to historical nesting areas yielded the highest number of pairs per sampling effort (14.6/100 h), but reoccupancy rates for most species decreased through time. Estimates based on all attempts had the highest sample sizes but probably overestimated success for all species except the Ferruginous Hawk. Estimates of success based on nesting attempts found during incubation had the lowest sample sizes. All three methods yielded biased nesting snccess estimates for the Northern Harrier and Short-eared Owl. The estimate based on pairs found during incubation probably provided the least biased estimate for the Burrowing Owl. Assessments of nesting success were hindered by difficulties in confirming egg laying and nesting success for all species except the Ferruginous hawk.
Estimating temporary emigration and breeding proportions using capture-recapture data with Pollock's robust design

USGS Publications Warehouse

Kendall, W.L.; Nichols, J.D.; Hines, J.E.

1997-01-01

Statistical inference for capture-recapture studies of open animal populations typically relies on the assumption that all emigration from the studied population is permanent. However, there are many instances in which this assumption is unlikely to be met. We define two general models for the process of temporary emigration, completely random and Markovian. We then consider effects of these two types of temporary emigration on Jolly-Seber (Seber 1982) estimators and on estimators arising from the full-likelihood approach of Kendall et al. (1995) to robust design data. Capture-recapture data arising from Pollock's (1982) robust design provide the basis for obtaining unbiased estimates of demographic parameters in the presence of temporary emigration and for estimating the probability of temporary emigration. We present a likelihood-based approach to dealing with temporary emigration that permits estimation under different models of temporary emigration and yields tests for completely random and Markovian emigration. In addition, we use the relationship between capture probability estimates based on closed and open models under completely random temporary emigration to derive three ad hoc estimators for the probability of temporary emigration, two of which should be especially useful in situations where capture probabilities are heterogeneous among individual animals. Ad hoc and full-likelihood estimators are illustrated for small mammal capture-recapture data sets. We believe that these models and estimators will be useful for testing hypotheses about the process of temporary emigration, for estimating demographic parameters in the presence of temporary emigration, and for estimating probabilities of temporary emigration. These latter estimates are frequently of ecological interest as indicators of animal movement and, in some sampling situations, as direct estimates of breeding probabilities and proportions.
Ignition probability of polymer-bonded explosives accounting for multiple sources of material stochasticity

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, S.; Barua, A.; Zhou, M., E-mail: min.zhou@me.gatech.edu

2014-05-07

Accounting for the combined effect of multiple sources of stochasticity in material attributes, we develop an approach that computationally predicts the probability of ignition of polymer-bonded explosives (PBXs) under impact loading. The probabilistic nature of the specific ignition processes is assumed to arise from two sources of stochasticity. The first source involves random variations in material microstructural morphology; the second source involves random fluctuations in grain-binder interfacial bonding strength. The effect of the first source of stochasticity is analyzed with multiple sets of statistically similar microstructures and constant interfacial bonding strength. Subsequently, each of the microstructures in the multiple setsmore » is assigned multiple instantiations of randomly varying grain-binder interfacial strengths to analyze the effect of the second source of stochasticity. Critical hotspot size-temperature states reaching the threshold for ignition are calculated through finite element simulations that explicitly account for microstructure and bulk and interfacial dissipation to quantify the time to criticality (t{sub c}) of individual samples, allowing the probability distribution of the time to criticality that results from each source of stochastic variation for a material to be analyzed. Two probability superposition models are considered to combine the effects of the multiple sources of stochasticity. The first is a parallel and series combination model, and the second is a nested probability function model. Results show that the nested Weibull distribution provides an accurate description of the combined ignition probability. The approach developed here represents a general framework for analyzing the stochasticity in the material behavior that arises out of multiple types of uncertainty associated with the structure, design, synthesis and processing of materials.« less
Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model

PubMed Central

Guan, Li; Hao, Bibo; Cheng, Qijin; Yip, Paul SF

2015-01-01

Background Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. Objective The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. Methods There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric “Screening Efficiency” that were adopted to evaluate model effectiveness. Results Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Conclusions Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media. PMID:26543921
Identifying Chinese Microblog Users With High Suicide Probability Using Internet-Based Profile and Linguistic Features: Classification Model.

PubMed

Guan, Li; Hao, Bibo; Cheng, Qijin; Yip, Paul Sf; Zhu, Tingshao

2015-01-01

Traditional offline assessment of suicide probability is time consuming and difficult in convincing at-risk individuals to participate. Identifying individuals with high suicide probability through online social media has an advantage in its efficiency and potential to reach out to hidden individuals, yet little research has been focused on this specific field. The objective of this study was to apply two classification models, Simple Logistic Regression (SLR) and Random Forest (RF), to examine the feasibility and effectiveness of identifying high suicide possibility microblog users in China through profile and linguistic features extracted from Internet-based data. There were nine hundred and nine Chinese microblog users that completed an Internet survey, and those scoring one SD above the mean of the total Suicide Probability Scale (SPS) score, as well as one SD above the mean in each of the four subscale scores in the participant sample were labeled as high-risk individuals, respectively. Profile and linguistic features were fed into two machine learning algorithms (SLR and RF) to train the model that aims to identify high-risk individuals in general suicide probability and in its four dimensions. Models were trained and then tested by 5-fold cross validation; in which both training set and test set were generated under the stratified random sampling rule from the whole sample. There were three classic performance metrics (Precision, Recall, F1 measure) and a specifically defined metric "Screening Efficiency" that were adopted to evaluate model effectiveness. Classification performance was generally matched between SLR and RF. Given the best performance of the classification models, we were able to retrieve over 70% of the labeled high-risk individuals in overall suicide probability as well as in the four dimensions. Screening Efficiency of most models varied from 1/4 to 1/2. Precision of the models was generally below 30%. Individuals in China with high suicide probability are recognizable by profile and text-based information from microblogs. Although there is still much space to improve the performance of classification models in the future, this study may shed light on preliminary screening of risky individuals via machine learning algorithms, which can work side-by-side with expert scrutiny to increase efficiency in large-scale-surveillance of suicide probability from online social media.
Sampling designs matching species biology produce accurate and affordable abundance indices

PubMed Central

Farley, Sean; Russell, Gareth J.; Butler, Matthew J.; Selinger, Jeff

2013-01-01

Wildlife biologists often use grid-based designs to sample animals and generate abundance estimates. Although sampling in grids is theoretically sound, in application, the method can be logistically difficult and expensive when sampling elusive species inhabiting extensive areas. These factors make it challenging to sample animals and meet the statistical assumption of all individuals having an equal probability of capture. Violating this assumption biases results. Does an alternative exist? Perhaps by sampling only where resources attract animals (i.e., targeted sampling), it would provide accurate abundance estimates more efficiently and affordably. However, biases from this approach would also arise if individuals have an unequal probability of capture, especially if some failed to visit the sampling area. Since most biological programs are resource limited, and acquiring abundance data drives many conservation and management applications, it becomes imperative to identify economical and informative sampling designs. Therefore, we evaluated abundance estimates generated from grid and targeted sampling designs using simulations based on geographic positioning system (GPS) data from 42 Alaskan brown bears (Ursus arctos). Migratory salmon drew brown bears from the wider landscape, concentrating them at anadromous streams. This provided a scenario for testing the targeted approach. Grid and targeted sampling varied by trap amount, location (traps placed randomly, systematically or by expert opinion), and traps stationary or moved between capture sessions. We began by identifying when to sample, and if bears had equal probability of capture. We compared abundance estimates against seven criteria: bias, precision, accuracy, effort, plus encounter rates, and probabilities of capture and recapture. One grid (49 km2 cells) and one targeted configuration provided the most accurate results. Both placed traps by expert opinion and moved traps between capture sessions, which raised capture probabilities. The grid design was least biased (−10.5%), but imprecise (CV 21.2%), and used most effort (16,100 trap-nights). The targeted configuration was more biased (−17.3%), but most precise (CV 12.3%), with least effort (7,000 trap-nights). Targeted sampling generated encounter rates four times higher, and capture and recapture probabilities 11% and 60% higher than grid sampling, in a sampling frame 88% smaller. Bears had unequal probability of capture with both sampling designs, partly because some bears never had traps available to sample them. Hence, grid and targeted sampling generated abundance indices, not estimates. Overall, targeted sampling provided the most accurate and affordable design to index abundance. Targeted sampling may offer an alternative method to index the abundance of other species inhabiting expansive and inaccessible landscapes elsewhere, provided their attraction to resource concentrations. PMID:24392290

Random sampling of elementary flux modes in large-scale metabolic networks.

PubMed

Machado, Daniel; Soons, Zita; Patil, Kiran Raosaheb; Ferreira, Eugénio C; Rocha, Isabel

2012-09-15

The description of a metabolic network in terms of elementary (flux) modes (EMs) provides an important framework for metabolic pathway analysis. However, their application to large networks has been hampered by the combinatorial explosion in the number of modes. In this work, we develop a method for generating random samples of EMs without computing the whole set. Our algorithm is an adaptation of the canonical basis approach, where we add an additional filtering step which, at each iteration, selects a random subset of the new combinations of modes. In order to obtain an unbiased sample, all candidates are assigned the same probability of getting selected. This approach avoids the exponential growth of the number of modes during computation, thus generating a random sample of the complete set of EMs within reasonable time. We generated samples of different sizes for a metabolic network of Escherichia coli, and observed that they preserve several properties of the full EM set. It is also shown that EM sampling can be used for rational strain design. A well distributed sample, that is representative of the complete set of EMs, should be suitable to most EM-based methods for analysis and optimization of metabolic networks. Source code for a cross-platform implementation in Python is freely available at http://code.google.com/p/emsampler. dmachado@deb.uminho.pt Supplementary data are available at Bioinformatics online.
Influence of the random walk finite step on the first-passage probability

NASA Astrophysics Data System (ADS)

Klimenkova, Olga; Menshutin, Anton; Shchur, Lev

2018-01-01

A well known connection between first-passage probability of random walk and distribution of electrical potential described by Laplace equation is studied. We simulate random walk in the plane numerically as a discrete time process with fixed step length. We measure first-passage probability to touch the absorbing sphere of radius R in 2D. We found a regular deviation of the first-passage probability from the exact function, which we attribute to the finiteness of the random walk step.
Sampled-Data Consensus of Linear Multi-agent Systems With Packet Losses.

PubMed

Zhang, Wenbing; Tang, Yang; Huang, Tingwen; Kurths, Jurgen

In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.
Fixation probability in a two-locus intersexual selection model.

PubMed

Durand, Guillermo; Lessard, Sabin

2016-06-01

We study a two-locus model of intersexual selection in a finite haploid population reproducing according to a discrete-time Moran model with a trait locus expressed in males and a preference locus expressed in females. We show that the probability of ultimate fixation of a single mutant allele for a male ornament introduced at random at the trait locus given any initial frequency state at the preference locus is increased by weak intersexual selection and recombination, weak or strong. Moreover, this probability exceeds the initial frequency of the mutant allele even in the case of a costly male ornament if intersexual selection is not too weak. On the other hand, the probability of ultimate fixation of a single mutant allele for a female preference towards a male ornament introduced at random at the preference locus is increased by weak intersexual selection and weak recombination if the female preference is not costly, and is strong enough in the case of a costly male ornament. The analysis relies on an extension of the ancestral recombination-selection graph for samples of haplotypes to take into account events of intersexual selection, while the symbolic calculation of the fixation probabilities is made possible in a reasonable time by an optimizing algorithm. Copyright © 2016 Elsevier Inc. All rights reserved.
Effect of alignment of easy axes on dynamic magnetization of immobilized magnetic nanoparticles

NASA Astrophysics Data System (ADS)

Yoshida, Takashi; Matsugi, Yuki; Tsujimura, Naotaka; Sasayama, Teruyoshi; Enpuku, Keiji; Viereck, Thilo; Schilling, Meinhard; Ludwig, Frank

2017-04-01

In some biomedical applications of magnetic nanoparticles (MNPs), the particles are physically immobilized. In this study, we explore the effect of the alignment of the magnetic easy axes on the dynamic magnetization of immobilized MNPs under an AC excitation field. We prepared three immobilized MNP samples: (1) a sample in which easy axes are randomly oriented, (2) a parallel-aligned sample in which easy axes are parallel to the AC field, and (3) an orthogonally aligned sample in which easy axes are perpendicular to the AC field. First, we show that the parallel-aligned sample has the largest hysteresis in the magnetization curve and the largest harmonic magnetization spectra, followed by the randomly oriented and orthogonally aligned samples. For example, 1.6-fold increase was observed in the area of the hysteresis loop of the parallel-aligned sample compared to that of the randomly oriented sample. To quantitatively discuss the experimental results, we perform a numerical simulation based on a Fokker-Planck equation, in which probability distributions for the directions of the easy axes are taken into account in simulating the prepared MNP samples. We obtained quantitative agreement between experiment and simulation. These results indicate that the dynamic magnetization of immobilized MNPs is significantly affected by the alignment of the easy axes.
Quantum-inspired algorithm for estimating the permanent of positive semidefinite matrices

NASA Astrophysics Data System (ADS)

Chakhmakhchyan, L.; Cerf, N. J.; Garcia-Patron, R.

2017-08-01

We construct a quantum-inspired classical algorithm for computing the permanent of Hermitian positive semidefinite matrices by exploiting a connection between these mathematical structures and the boson sampling model. Specifically, the permanent of a Hermitian positive semidefinite matrix can be expressed in terms of the expected value of a random variable, which stands for a specific photon-counting probability when measuring a linear-optically evolved random multimode coherent state. Our algorithm then approximates the matrix permanent from the corresponding sample mean and is shown to run in polynomial time for various sets of Hermitian positive semidefinite matrices, achieving a precision that improves over known techniques. This work illustrates how quantum optics may benefit algorithm development.
Accounting for treatment by center interaction in sample size determinations and the use of surrogate outcomes in the pessary for the prevention of preterm birth trial: a simulation study.

PubMed

Willan, Andrew R

2016-07-05

The Pessary for the Prevention of Preterm Birth Study (PS3) is an international, multicenter, randomized clinical trial designed to examine the effectiveness of the Arabin pessary in preventing preterm birth in pregnant women with a short cervix. During the design of the study two methodological issues regarding power and sample size were raised. Since treatment in the Standard Arm will vary between centers, it is anticipated that so too will the probability of preterm birth in that arm. This will likely result in a treatment by center interaction, and the issue of how this will affect the sample size requirements was raised. The sample size requirements to examine the effect of the pessary on the baby's clinical outcome was prohibitively high, so the second issue is how best to examine the effect on clinical outcome. The approaches taken to address these issues are presented. Simulation and sensitivity analysis were used to address the sample size issue. The probability of preterm birth in the Standard Arm was assumed to vary between centers following a Beta distribution with a mean of 0.3 and a coefficient of variation of 0.3. To address the second issue a Bayesian decision model is proposed that combines the information regarding the between-treatment difference in the probability of preterm birth from PS3 with the data from the Multiple Courses of Antenatal Corticosteroids for Preterm Birth Study that relate preterm birth and perinatal mortality/morbidity. The approach provides a between-treatment comparison with respect to the probability of a bad clinical outcome. The performance of the approach was assessed using simulation and sensitivity analysis. Accounting for a possible treatment by center interaction increased the sample size from 540 to 700 patients per arm for the base case. The sample size requirements increase with the coefficient of variation and decrease with the number of centers. Under the same assumptions used for determining the sample size requirements, the simulated mean probability that pessary reduces the risk of perinatal mortality/morbidity is 0.98. The simulated mean decreased with coefficient of variation and increased with the number of clinical sites. Employing simulation and sensitivity analysis is a useful approach for determining sample size requirements while accounting for the additional uncertainty due to a treatment by center interaction. Using a surrogate outcome in conjunction with a Bayesian decision model is an efficient way to compare important clinical outcomes in a randomized clinical trial in situations where the direct approach requires a prohibitively high sample size.
Translational Genomics Research Institute: Identification of Pathways Enriched with Condition-Specific Statistical Dependencies Across Four Subtypes of Glioblastoma Multiforme | Office of Cancer Genomics

Cancer.gov

Evaluation of Differential DependencY (EDDY) is a statistical test for the differential dependency relationship of a set of genes between two given conditions. For each condition, possible dependency network structures are enumerated and their likelihoods are computed to represent a probability distribution of dependency networks. The difference between the probability distributions of dependency networks is computed between conditions, and its statistical significance is evaluated with random permutations of condition labels on the samples.
Translational Genomics Research Institute (TGen): Identification of Pathways Enriched with Condition-Specific Statistical Dependencies Across Four Subtypes of Glioblastoma Multiforme | Office of Cancer Genomics

Cancer.gov

Evaluation of Differential DependencY (EDDY) is a statistical test for the differential dependency relationship of a set of genes between two given conditions. For each condition, possible dependency network structures are enumerated and their likelihoods are computed to represent a probability distribution of dependency networks. The difference between the probability distributions of dependency networks is computed between conditions, and its statistical significance is evaluated with random permutations of condition labels on the samples.
SELWAY-BITTERROOT WILDERNESS, IDAHO AND MONTANA.

USGS Publications Warehouse

Toth, Margo I.; Zilka, Nicholas T.

1984-01-01

Mineral-resource studies of the Selway-Bitterroot Wilderness in Idaho County, Idaho, and Missoula and Ravalli Counties, Montana, were carried out. Four areas with probable and one small area of substantiated mineral-resource potential were recognized. The areas of the Running Creek, Painted Rocks, and Whistling Pig plutons of Tertiary age have probable resource potential for molybdenum, although detailed geochemical sampling and surface investigations failed to recognize mineralized systems at the surface. Randomly distributed breccia zones along a fault in the vicinity of the Cliff mine have a substantiated potential for small silver-copper-lead resources.
Using a Calendar and Explanatory Instructions to Aid Within-Household Selection in Mail Surveys

ERIC Educational Resources Information Center

Stange, Mathew; Smyth, Jolene D.; Olson, Kristen

2016-01-01

Although researchers can easily select probability samples of addresses using the U.S. Postal Service's Delivery Sequence File, randomly selecting respondents within households for surveys remains challenging. Researchers often place within-household selection instructions, such as the next or last birthday methods, in survey cover letters to…
Think They're Drunk? Alcohol Servers and the Identification of Intoxication.

ERIC Educational Resources Information Center

Burns, Edward D.; Nusbaumer, Michael R.; Reiling, Denise M.

2003-01-01

Examines practices used by servers to assess intoxication. The analysis was based upon questionnaires mailed to a random probability sample of licensed servers from one state (N = 822). Indicators found to be most important were examined in relation to a variety of occupational characteristics. Implications for training curricula, policy…
Relationships among Taiwanese Children's Computer Game Use, Academic Achievement and Parental Governing Approach

ERIC Educational Resources Information Center

Yeh, Duen-Yian; Cheng, Ching-Hsue

2016-01-01

This study examined the relationships among children's computer game use, academic achievement and parental governing approach to propose probable answers for the doubts of Taiwanese parents. 355 children (ages 11-14) were randomly sampled from 20 elementary schools in a typically urbanised county in Taiwan. Questionnaire survey (five questions)…
Nonparametric probability density estimation by optimization theoretic techniques

NASA Technical Reports Server (NTRS)

Scott, D. W.

1976-01-01

Two nonparametric probability density estimators are considered. The first is the kernel estimator. The problem of choosing the kernel scaling factor based solely on a random sample is addressed. An interactive mode is discussed and an algorithm proposed to choose the scaling factor automatically. The second nonparametric probability estimate uses penalty function techniques with the maximum likelihood criterion. A discrete maximum penalized likelihood estimator is proposed and is shown to be consistent in the mean square error. A numerical implementation technique for the discrete solution is discussed and examples displayed. An extensive simulation study compares the integrated mean square error of the discrete and kernel estimators. The robustness of the discrete estimator is demonstrated graphically.
On Convergent Probability of a Random Walk

ERIC Educational Resources Information Center

Lee, Y.-F.; Ching, W.-K.

2006-01-01

This note introduces an interesting random walk on a straight path with cards of random numbers. The method of recurrent relations is used to obtain the convergent probability of the random walk with different initial positions.
Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy

NASA Astrophysics Data System (ADS)

Wahl, N.; Hennig, P.; Wieser, H. P.; Bangert, M.

2017-07-01

The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU ≤slant {5} min). The resulting standard deviation (expectation value) of dose show average global γ{3% / {3}~mm} pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
Efficiency of analytical and sampling-based uncertainty propagation in intensity-modulated proton therapy.

PubMed

Wahl, N; Hennig, P; Wieser, H P; Bangert, M

2017-06-26

The sensitivity of intensity-modulated proton therapy (IMPT) treatment plans to uncertainties can be quantified and mitigated with robust/min-max and stochastic/probabilistic treatment analysis and optimization techniques. Those methods usually rely on sparse random, importance, or worst-case sampling. Inevitably, this imposes a trade-off between computational speed and accuracy of the uncertainty propagation. Here, we investigate analytical probabilistic modeling (APM) as an alternative for uncertainty propagation and minimization in IMPT that does not rely on scenario sampling. APM propagates probability distributions over range and setup uncertainties via a Gaussian pencil-beam approximation into moments of the probability distributions over the resulting dose in closed form. It supports arbitrary correlation models and allows for efficient incorporation of fractionation effects regarding random and systematic errors. We evaluate the trade-off between run-time and accuracy of APM uncertainty computations on three patient datasets. Results are compared against reference computations facilitating importance and random sampling. Two approximation techniques to accelerate uncertainty propagation and minimization based on probabilistic treatment plan optimization are presented. Runtimes are measured on CPU and GPU platforms, dosimetric accuracy is quantified in comparison to a sampling-based benchmark (5000 random samples). APM accurately propagates range and setup uncertainties into dose uncertainties at competitive run-times (GPU [Formula: see text] min). The resulting standard deviation (expectation value) of dose show average global [Formula: see text] pass rates between 94.2% and 99.9% (98.4% and 100.0%). All investigated importance sampling strategies provided less accuracy at higher run-times considering only a single fraction. Considering fractionation, APM uncertainty propagation and treatment plan optimization was proven to be possible at constant time complexity, while run-times of sampling-based computations are linear in the number of fractions. Using sum sampling within APM, uncertainty propagation can only be accelerated at the cost of reduced accuracy in variance calculations. For probabilistic plan optimization, we were able to approximate the necessary pre-computations within seconds, yielding treatment plans of similar quality as gained from exact uncertainty propagation. APM is suited to enhance the trade-off between speed and accuracy in uncertainty propagation and probabilistic treatment plan optimization, especially in the context of fractionation. This brings fully-fledged APM computations within reach of clinical application.
Underestimating extreme events in power-law behavior due to machine-dependent cutoffs

NASA Astrophysics Data System (ADS)

Radicchi, Filippo

2014-11-01

Power-law distributions are typical macroscopic features occurring in almost all complex systems observable in nature. As a result, researchers in quantitative analyses must often generate random synthetic variates obeying power-law distributions. The task is usually performed through standard methods that map uniform random variates into the desired probability space. Whereas all these algorithms are theoretically solid, in this paper we show that they are subject to severe machine-dependent limitations. As a result, two dramatic consequences arise: (i) the sampling in the tail of the distribution is not random but deterministic; (ii) the moments of the sample distribution, which are theoretically expected to diverge as functions of the sample sizes, converge instead to finite values. We provide quantitative indications for the range of distribution parameters that can be safely handled by standard libraries used in computational analyses. Whereas our findings indicate possible reinterpretations of numerical results obtained through flawed sampling methodologies, they also pave the way for the search for a concrete solution to this central issue shared by all quantitative sciences dealing with complexity.
Comonotonic bounds on the survival probabilities in the Lee-Carter model for mortality projection

NASA Astrophysics Data System (ADS)

Denuit, Michel; Dhaene, Jan

2007-06-01

In the Lee-Carter framework, future survival probabilities are random variables with an intricate distribution function. In large homogeneous portfolios of life annuities, value-at-risk or conditional tail expectation of the total yearly payout of the company are approximately equal to the corresponding quantities involving random survival probabilities. This paper aims to derive some bounds in the increasing convex (or stop-loss) sense on these random survival probabilities. These bounds are obtained with the help of comonotonic upper and lower bounds on sums of correlated random variables.
Boosting association rule mining in large datasets via Gibbs sampling.

PubMed

Qian, Guoqi; Rao, Calyampudi Radhakrishna; Sun, Xiaoying; Wu, Yuehua

2016-05-03

Current algorithms for association rule mining from transaction data are mostly deterministic and enumerative. They can be computationally intractable even for mining a dataset containing just a few hundred transaction items, if no action is taken to constrain the search space. In this paper, we develop a Gibbs-sampling-induced stochastic search procedure to randomly sample association rules from the itemset space, and perform rule mining from the reduced transaction dataset generated by the sample. Also a general rule importance measure is proposed to direct the stochastic search so that, as a result of the randomly generated association rules constituting an ergodic Markov chain, the overall most important rules in the itemset space can be uncovered from the reduced dataset with probability 1 in the limit. In the simulation study and a real genomic data example, we show how to boost association rule mining by an integrated use of the stochastic search and the Apriori algorithm.

THE RHETORICAL USE OF RANDOM SAMPLING: CRAFTING AND COMMUNICATING THE PUBLIC IMAGE OF POLLS AS A SCIENCE (1935-1948).

PubMed

Lusinchi, Dominic

2017-03-01

The scientific pollsters (Archibald Crossley, George H. Gallup, and Elmo Roper) emerged onto the American news media scene in 1935. Much of what they did in the following years (1935-1948) was to promote both the political and scientific legitimacy of their enterprise. They sought to be recognized as the sole legitimate producers of public opinion. In this essay I examine the, mostly overlooked, rhetorical work deployed by the pollsters to publicize the scientific credentials of their polling activities, and the central role the concept of sampling has had in that pursuit. First, they distanced themselves from the failed straw poll by claiming that their sampling methodology based on quotas was informed by science. Second, although in practice they did not use random sampling, they relied on it rhetorically to derive the symbolic benefits of being associated with the "laws of probability." © 2017 Wiley Periodicals, Inc.
Estimating juvenile Chinook salmon (Oncorhynchus tshawytscha) abundance from beach seine data collected in the Sacramento–San Joaquin Delta and San Francisco Bay, California

USGS Publications Warehouse

Perry, Russell W.; Kirsch, Joseph E.; Hendrix, A. Noble

2016-06-17

Resource managers rely on abundance or density metrics derived from beach seine surveys to make vital decisions that affect fish population dynamics and assemblage structure. However, abundance and density metrics may be biased by imperfect capture and lack of geographic closure during sampling. Currently, there is considerable uncertainty about the capture efficiency of juvenile Chinook salmon (Oncorhynchus tshawytscha) by beach seines. Heterogeneity in capture can occur through unrealistic assumptions of closure and from variation in the probability of capture caused by environmental conditions. We evaluated the assumptions of closure and the influence of environmental conditions on capture efficiency and abundance estimates of Chinook salmon from beach seining within the Sacramento–San Joaquin Delta and the San Francisco Bay. Beach seine capture efficiency was measured using a stratified random sampling design combined with open and closed replicate depletion sampling. A total of 56 samples were collected during the spring of 2014. To assess variability in capture probability and the absolute abundance of juvenile Chinook salmon, beach seine capture efficiency data were fitted to the paired depletion design using modified N-mixture models. These models allowed us to explicitly test the closure assumption and estimate environmental effects on the probability of capture. We determined that our updated method allowing for lack of closure between depletion samples drastically outperformed traditional data analysis that assumes closure among replicate samples. The best-fit model (lowest-valued Akaike Information Criterion model) included the probability of fish being available for capture (relaxed closure assumption), capture probability modeled as a function of water velocity and percent coverage of fine sediment, and abundance modeled as a function of sample area, temperature, and water velocity. Given that beach seining is a ubiquitous sampling technique for many species, our improved sampling design and analysis could provide significant improvements in density and abundance estimation.
Honest Importance Sampling with Multiple Markov Chains

PubMed Central

Tan, Aixin; Doss, Hani; Hobert, James P.

2017-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π1, is used to estimate an expectation with respect to another, π. The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π1 is replaced by a Harris ergodic Markov chain with invariant density π1, then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π1, …, πk, are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection. PMID:28701855
Honest Importance Sampling with Multiple Markov Chains.

PubMed

Tan, Aixin; Doss, Hani; Hobert, James P

2015-01-01

Importance sampling is a classical Monte Carlo technique in which a random sample from one probability density, π 1 , is used to estimate an expectation with respect to another, π . The importance sampling estimator is strongly consistent and, as long as two simple moment conditions are satisfied, it obeys a central limit theorem (CLT). Moreover, there is a simple consistent estimator for the asymptotic variance in the CLT, which makes for routine computation of standard errors. Importance sampling can also be used in the Markov chain Monte Carlo (MCMC) context. Indeed, if the random sample from π 1 is replaced by a Harris ergodic Markov chain with invariant density π 1 , then the resulting estimator remains strongly consistent. There is a price to be paid however, as the computation of standard errors becomes more complicated. First, the two simple moment conditions that guarantee a CLT in the iid case are not enough in the MCMC context. Second, even when a CLT does hold, the asymptotic variance has a complex form and is difficult to estimate consistently. In this paper, we explain how to use regenerative simulation to overcome these problems. Actually, we consider a more general set up, where we assume that Markov chain samples from several probability densities, π 1 , …, π k , are available. We construct multiple-chain importance sampling estimators for which we obtain a CLT based on regeneration. We show that if the Markov chains converge to their respective target distributions at a geometric rate, then under moment conditions similar to those required in the iid case, the MCMC-based importance sampling estimator obeys a CLT. Furthermore, because the CLT is based on a regenerative process, there is a simple consistent estimator of the asymptotic variance. We illustrate the method with two applications in Bayesian sensitivity analysis. The first concerns one-way random effects models under different priors. The second involves Bayesian variable selection in linear regression, and for this application, importance sampling based on multiple chains enables an empirical Bayes approach to variable selection.
Binomial leap methods for simulating stochastic chemical kinetics.

PubMed

Tian, Tianhai; Burrage, Kevin

2004-12-01

This paper discusses efficient simulation methods for stochastic chemical kinetics. Based on the tau-leap and midpoint tau-leap methods of Gillespie [D. T. Gillespie, J. Chem. Phys. 115, 1716 (2001)], binomial random variables are used in these leap methods rather than Poisson random variables. The motivation for this approach is to improve the efficiency of the Poisson leap methods by using larger stepsizes. Unlike Poisson random variables whose range of sample values is from zero to infinity, binomial random variables have a finite range of sample values. This probabilistic property has been used to restrict possible reaction numbers and to avoid negative molecular numbers in stochastic simulations when larger stepsize is used. In this approach a binomial random variable is defined for a single reaction channel in order to keep the reaction number of this channel below the numbers of molecules that undergo this reaction channel. A sampling technique is also designed for the total reaction number of a reactant species that undergoes two or more reaction channels. Samples for the total reaction number are not greater than the molecular number of this species. In addition, probability properties of the binomial random variables provide stepsize conditions for restricting reaction numbers in a chosen time interval. These stepsize conditions are important properties of robust leap control strategies. Numerical results indicate that the proposed binomial leap methods can be applied to a wide range of chemical reaction systems with very good accuracy and significant improvement on efficiency over existing approaches. (c) 2004 American Institute of Physics.
Co-occurrence of Pacific sleeper sharks Somniosus pacificus and harbor seals Phoca vitulina in Glacier Bay

USGS Publications Warehouse

Taggart, S. James; Andrews, A.G.; Mondragon, Jennifer; Mathews, E.A.

2005-01-01

We present evidence that Pacific sleeper sharks Somniosus pacificus co-occur with harbor seals Phoca vitulina in Glacier Bay, Alaska, and that these sharks scavenge or prey on marine mammals. In 2002, 415 stations were fished throughout Glacier Bay on a systematic sampling grid. Pacific sleeper sharks were caught at 3 of the 415 stations, and at one station a Pacific halibut Hippoglossus stenolepis was caught with a fresh bite, identified as the bite of a sleeper shark. All 3 sharks and the shark-bitten halibut were caught at stations near the mouth of Johns Hopkins Inlet, a glacial fjord with the highest concentration of seals in Glacier Bay. Using a bootstrap technique, we estimated the probability of sampling the sharks (and the shark-bitten halibut) in the vicinity of Johns Hopkins Inlet. If sharks were randomly distributed in Glacier Bay, the probability of sampling all 4 pots at the mouth of Johns Hopkins Inlet was very low (P = 0.00002). The highly non-random distribution of the sleeper sharks located near the largest harbor seal pupping and breeding colony in Glacier Bay suggests that these 2 species co-occur and may interact ecologically in or near Johns Hopkins Inlet.
A random walk rule for phase I clinical trials.

PubMed

Durham, S D; Flournoy, N; Rosenberger, W F

1997-06-01

We describe a family of random walk rules for the sequential allocation of dose levels to patients in a dose-response study, or phase I clinical trial. Patients are sequentially assigned the next higher, same, or next lower dose level according to some probability distribution, which may be determined by ethical considerations as well as the patient's response. It is shown that one can choose these probabilities in order to center dose level assignments unimodally around any target quantile of interest. Estimation of the quantile is discussed; the maximum likelihood estimator and its variance are derived under a two-parameter logistic distribution, and the maximum likelihood estimator is compared with other nonparametric estimators. Random walk rules have clear advantages: they are simple to implement, and finite and asymptotic distribution theory is completely worked out. For a specific random walk rule, we compute finite and asymptotic properties and give examples of its use in planning studies. Having the finite distribution theory available and tractable obviates the need for elaborate simulation studies to analyze the properties of the design. The small sample properties of our rule, as determined by exact theory, compare favorably to those of the continual reassessment method, determined by simulation.
Inventory of forest resources (including water) by multi-level sampling. [nine northern Virginia coastal plain counties

NASA Technical Reports Server (NTRS)

Aldrich, R. C.; Dana, R. W.; Roberts, E. H. (Principal Investigator)

1977-01-01

The author has identified the following significant results. A stratified random sample using LANDSAT band 5 and 7 panchromatic prints resulted in estimates of water in counties with sampling errors less than + or - 9% (67% probability level). A forest inventory using a four band LANDSAT color composite resulted in estimates of forest area by counties that were within + or - 6.7% and + or - 3.7% respectively (67% probability level). Estimates of forest area for counties by computer assisted techniques were within + or - 21% of operational forest survey figures and for all counties the difference was only one percent. Correlations of airborne terrain reflectance measurements with LANDSAT radiance verified a linear atmospheric model with an additive (path radiance) term and multiplicative (transmittance) term. Coefficients of determination for 28 of the 32 modeling attempts, not adverseley affected by rain shower occurring between the times of LANDSAT passage and aircraft overflights, exceeded 0.83.
Optimized nested Markov chain Monte Carlo sampling: theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coe, Joshua D; Shaw, M Sam; Sewell, Thomas D

2009-01-01

Metropolis Monte Carlo sampling of a reference potential is used to build a Markov chain in the isothermal-isobaric ensemble. At the endpoints of the chain, the energy is reevaluated at a different level of approximation (the 'full' energy) and a composite move encompassing all of the intervening steps is accepted on the basis of a modified Metropolis criterion. By manipulating the thermodynamic variables characterizing the reference system we maximize the average acceptance probability of composite moves, lengthening significantly the random walk made between consecutive evaluations of the full energy at a fixed acceptance probability. This provides maximally decorrelated samples ofmore » the full potential, thereby lowering the total number required to build ensemble averages of a given variance. The efficiency of the method is illustrated using model potentials appropriate to molecular fluids at high pressure. Implications for ab initio or density functional theory (DFT) treatment are discussed.« less
Can we estimate molluscan abundance and biomass on the continental shelf?

NASA Astrophysics Data System (ADS)

Powell, Eric N.; Mann, Roger; Ashton-Alcox, Kathryn A.; Kuykendall, Kelsey M.; Chase Long, M.

2017-11-01

Few empirical studies have focused on the effect of sample density on the estimate of abundance of the dominant carbonate-producing fauna of the continental shelf. Here, we present such a study and consider the implications of suboptimal sampling design on estimates of abundance and size-frequency distribution. We focus on a principal carbonate producer of the U.S. Atlantic continental shelf, the Atlantic surfclam, Spisula solidissima. To evaluate the degree to which the results are typical, we analyze a dataset for the principal carbonate producer of Mid-Atlantic estuaries, the Eastern oyster Crassostrea virginica, obtained from Delaware Bay. These two species occupy different habitats and display different lifestyles, yet demonstrate similar challenges to survey design and similar trends with sampling density. The median of a series of simulated survey mean abundances, the central tendency obtained over a large number of surveys of the same area, always underestimated true abundance at low sample densities. More dramatic were the trends in the probability of a biased outcome. As sample density declined, the probability of a survey availability event, defined as a survey yielding indices >125% or <75% of the true population abundance, increased and that increase was disproportionately biased towards underestimates. For these cases where a single sample accessed about 0.001-0.004% of the domain, 8-15 random samples were required to reduce the probability of a survey availability event below 40%. The problem of differential bias, in which the probabilities of a biased-high and a biased-low survey index were distinctly unequal, was resolved with fewer samples than the problem of overall bias. These trends suggest that the influence of sampling density on survey design comes with a series of incremental challenges. At woefully inadequate sampling density, the probability of a biased-low survey index will substantially exceed the probability of a biased-high index. The survey time series on the average will return an estimate of the stock that underestimates true stock abundance. If sampling intensity is increased, the frequency of biased indices balances between high and low values. Incrementing sample number from this point steadily reduces the likelihood of a biased survey; however, the number of samples necessary to drive the probability of survey availability events to a preferred level of infrequency may be daunting. Moreover, certain size classes will be disproportionately susceptible to such events and the impact on size frequency will be species specific, depending on the relative dispersion of the size classes.
Baseline adjustments for binary data in repeated cross-sectional cluster randomized trials.

PubMed

Nixon, R M; Thompson, S G

2003-09-15

Analysis of covariance models, which adjust for a baseline covariate, are often used to compare treatment groups in a controlled trial in which individuals are randomized. Such analysis adjusts for any baseline imbalance and usually increases the precision of the treatment effect estimate. We assess the value of such adjustments in the context of a cluster randomized trial with repeated cross-sectional design and a binary outcome. In such a design, a new sample of individuals is taken from the clusters at each measurement occasion, so that baseline adjustment has to be at the cluster level. Logistic regression models are used to analyse the data, with cluster level random effects to allow for different outcome probabilities in each cluster. We compare the estimated treatment effect and its precision in models that incorporate a covariate measuring the cluster level probabilities at baseline and those that do not. In two data sets, taken from a cluster randomized trial in the treatment of menorrhagia, the value of baseline adjustment is only evident when the number of subjects per cluster is large. We assess the generalizability of these findings by undertaking a simulation study, and find that increased precision of the treatment effect requires both large cluster sizes and substantial heterogeneity between clusters at baseline, but baseline imbalance arising by chance in a randomized study can always be effectively adjusted for. Copyright 2003 John Wiley & Sons, Ltd.
Texas School Survey of Substance Abuse: Grades 7-12. 1992.

ERIC Educational Resources Information Center

Liu, Liang Y.; Fredlund, Eric V.

The 1992 Texas School Survey results for secondary students are based on data collected from a sample of 73,073 students in grades 7 through 12. Students were randomly selected from school districts throughout the state using a multi-stage probability design. The procedure ensured that students living in metropolitan and rural areas of Texas are…
Bertrand's Paradox: A Physical Way out along the Lines of Buffon's Needle Throwing Experiment

ERIC Educational Resources Information Center

Di Porto, P.; Crosignani, B.; Ciattoni, A.; Liu, H. C.

2011-01-01

Bertrand's paradox (Bertrand 1889 "Calcul des Probabilites" (Paris: Gauthier-Villars)) can be considered as a cautionary memento, to practitioners and students of probability calculus alike, of the possible ambiguous meaning of the term "at random" when the sample space of events is continuous. It deals with the existence of different possible…
Randomness, Sample Size, Imagination and Metacognition: Making Judgments about Differences in Data Sets

ERIC Educational Resources Information Center

Stack, Sue; Watson, Jane

2013-01-01

There is considerable research on the difficulties students have in conceptualising individual concepts of probability and statistics (see for example, Bryant & Nunes, 2012; Jones, 2005). The unit of work developed for the action research project described in this article is specifically designed to address some of these in order to help…
Adapted random sampling patterns for accelerated MRI.

PubMed

Knoll, Florian; Clason, Christian; Diwoky, Clemens; Stollberger, Rudolf

2011-02-01

Variable density random sampling patterns have recently become increasingly popular for accelerated imaging strategies, as they lead to incoherent aliasing artifacts. However, the design of these sampling patterns is still an open problem. Current strategies use model assumptions like polynomials of different order to generate a probability density function that is then used to generate the sampling pattern. This approach relies on the optimization of design parameters which is very time consuming and therefore impractical for daily clinical use. This work presents a new approach that generates sampling patterns by making use of power spectra of existing reference data sets and hence requires neither parameter tuning nor an a priori mathematical model of the density of sampling points. The approach is validated with downsampling experiments, as well as with accelerated in vivo measurements. The proposed approach is compared with established sampling patterns, and the generalization potential is tested by using a range of reference images. Quantitative evaluation is performed for the downsampling experiments using RMS differences to the original, fully sampled data set. Our results demonstrate that the image quality of the method presented in this paper is comparable to that of an established model-based strategy when optimization of the model parameter is carried out and yields superior results to non-optimized model parameters. However, no random sampling pattern showed superior performance when compared to conventional Cartesian subsampling for the considered reconstruction strategy.
Models of multidimensional discrete distribution of probabilities of random variables in information systems

NASA Astrophysics Data System (ADS)

Gromov, Yu Yu; Minin, Yu V.; Ivanova, O. G.; Morozova, O. N.

2018-03-01

Multidimensional discrete distributions of probabilities of independent random values were received. Their one-dimensional distribution is widely used in probability theory. Producing functions of those multidimensional distributions were also received.
Computer Simulation Results for the Two-Point Probability Function of Composite Media

NASA Astrophysics Data System (ADS)

Smith, P.; Torquato, S.

1988-05-01

Computer simulation results are reported for the two-point matrix probability function S2 of two-phase random media composed of disks distributed with an arbitrary degree of impenetrability λ. The novel technique employed to sample S2( r) (which gives the probability of finding the endpoints of a line segment of length r in the matrix) is very accurate and has a fast execution time. Results for the limiting cases λ = 0 (fully penetrable disks) and λ = 1 (hard disks), respectively, compare very favorably with theoretical predictions made by Torquato and Beasley and by Torquato and Lado. Results are also reported for several values of λ. that lie between these two extremes: cases which heretofore have not been examined.
Reducing seed dependent variability of non-uniformly sampled multidimensional NMR data

NASA Astrophysics Data System (ADS)

Mobli, Mehdi

2015-07-01

The application of NMR spectroscopy to study the structure, dynamics and function of macromolecules requires the acquisition of several multidimensional spectra. The one-dimensional NMR time-response from the spectrometer is extended to additional dimensions by introducing incremented delays in the experiment that cause oscillation of the signal along "indirect" dimensions. For a given dimension the delay is incremented at twice the rate of the maximum frequency (Nyquist rate). To achieve high-resolution requires acquisition of long data records sampled at the Nyquist rate. This is typically a prohibitive step due to time constraints, resulting in sub-optimal data records to the detriment of subsequent analyses. The multidimensional NMR spectrum itself is typically sparse, and it has been shown that in such cases it is possible to use non-Fourier methods to reconstruct a high-resolution multidimensional spectrum from a random subset of non-uniformly sampled (NUS) data. For a given acquisition time, NUS has the potential to improve the sensitivity and resolution of a multidimensional spectrum, compared to traditional uniform sampling. The improvements in sensitivity and/or resolution achieved by NUS are heavily dependent on the distribution of points in the random subset acquired. Typically, random points are selected from a probability density function (PDF) weighted according to the NMR signal envelope. In extreme cases as little as 1% of the data is subsampled. The heavy under-sampling can result in poor reproducibility, i.e. when two experiments are carried out where the same number of random samples is selected from the same PDF but using different random seeds. Here, a jittered sampling approach is introduced that is shown to improve random seed dependent reproducibility of multidimensional spectra generated from NUS data, compared to commonly applied NUS methods. It is shown that this is achieved due to the low variability of the inherent sensitivity of the random subset chosen from a given PDF. Finally, it is demonstrated that metrics used to find optimal NUS distributions are heavily dependent on the inherent sensitivity of the random subset, and such optimisation is therefore less critical when using the proposed sampling scheme.
Exploring the Connection Between Sampling Problems in Bayesian Inference and Statistical Mechanics

NASA Technical Reports Server (NTRS)

Pohorille, Andrew

2006-01-01

The Bayesian and statistical mechanical communities often share the same objective in their work - estimating and integrating probability distribution functions (pdfs) describing stochastic systems, models or processes. Frequently, these pdfs are complex functions of random variables exhibiting multiple, well separated local minima. Conventional strategies for sampling such pdfs are inefficient, sometimes leading to an apparent non-ergodic behavior. Several recently developed techniques for handling this problem have been successfully applied in statistical mechanics. In the multicanonical and Wang-Landau Monte Carlo (MC) methods, the correct pdfs are recovered from uniform sampling of the parameter space by iteratively establishing proper weighting factors connecting these distributions. Trivial generalizations allow for sampling from any chosen pdf. The closely related transition matrix method relies on estimating transition probabilities between different states. All these methods proved to generate estimates of pdfs with high statistical accuracy. In another MC technique, parallel tempering, several random walks, each corresponding to a different value of a parameter (e.g. "temperature"), are generated and occasionally exchanged using the Metropolis criterion. This method can be considered as a statistically correct version of simulated annealing. An alternative approach is to represent the set of independent variables as a Hamiltonian system. Considerab!e progress has been made in understanding how to ensure that the system obeys the equipartition theorem or, equivalently, that coupling between the variables is correctly described. Then a host of techniques developed for dynamical systems can be used. Among them, probably the most powerful is the Adaptive Biasing Force method, in which thermodynamic integration and biased sampling are combined to yield very efficient estimates of pdfs. The third class of methods deals with transitions between states described by rate constants. These problems are isomorphic with chemical kinetics problems. Recently, several efficient techniques for this purpose have been developed based on the approach originally proposed by Gillespie. Although the utility of the techniques mentioned above for Bayesian problems has not been determined, further research along these lines is warranted
Validation of Statistical Sampling Algorithms in Visual Sample Plan (VSP): Summary Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nuffer, Lisa L; Sego, Landon H.; Wilson, John E.

2009-02-18

The U.S. Department of Homeland Security, Office of Technology Development (OTD) contracted with a set of U.S. Department of Energy national laboratories, including the Pacific Northwest National Laboratory (PNNL), to write a Remediation Guidance for Major Airports After a Chemical Attack. The report identifies key activities and issues that should be considered by a typical major airport following an incident involving release of a toxic chemical agent. Four experimental tasks were identified that would require further research in order to supplement the Remediation Guidance. One of the tasks, Task 4, OTD Chemical Remediation Statistical Sampling Design Validation, dealt with statisticalmore » sampling algorithm validation. This report documents the results of the sampling design validation conducted for Task 4. In 2005, the Government Accountability Office (GAO) performed a review of the past U.S. responses to Anthrax terrorist cases. Part of the motivation for this PNNL report was a major GAO finding that there was a lack of validated sampling strategies in the U.S. response to Anthrax cases. The report (GAO 2005) recommended that probability-based methods be used for sampling design in order to address confidence in the results, particularly when all sample results showed no remaining contamination. The GAO also expressed a desire that the methods be validated, which is the main purpose of this PNNL report. The objective of this study was to validate probability-based statistical sampling designs and the algorithms pertinent to within-building sampling that allow the user to prescribe or evaluate confidence levels of conclusions based on data collected as guided by the statistical sampling designs. Specifically, the designs found in the Visual Sample Plan (VSP) software were evaluated. VSP was used to calculate the number of samples and the sample location for a variety of sampling plans applied to an actual release site. Most of the sampling designs validated are probability based, meaning samples are located randomly (or on a randomly placed grid) so no bias enters into the placement of samples, and the number of samples is calculated such that IF the amount and spatial extent of contamination exceeds levels of concern, at least one of the samples would be taken from a contaminated area, at least X% of the time. Hence, "validation" of the statistical sampling algorithms is defined herein to mean ensuring that the "X%" (confidence) is actually met.« less

Continuous time random walk model with asymptotical probability density of waiting times via inverse Mittag-Leffler function

NASA Astrophysics Data System (ADS)

Liang, Yingjie; Chen, Wen

2018-04-01

The mean squared displacement (MSD) of the traditional ultraslow diffusion is a logarithmic function of time. Recently, the continuous time random walk model is employed to characterize this ultraslow diffusion dynamics by connecting the heavy-tailed logarithmic function and its variation as the asymptotical waiting time density. In this study we investigate the limiting waiting time density of a general ultraslow diffusion model via the inverse Mittag-Leffler function, whose special case includes the traditional logarithmic ultraslow diffusion model. The MSD of the general ultraslow diffusion model is analytically derived as an inverse Mittag-Leffler function, and is observed to increase even more slowly than that of the logarithmic function model. The occurrence of very long waiting time in the case of the inverse Mittag-Leffler function has the largest probability compared with the power law model and the logarithmic function model. The Monte Carlo simulations of one dimensional sample path of a single particle are also performed. The results show that the inverse Mittag-Leffler waiting time density is effective in depicting the general ultraslow random motion.
Inference for binomial probability based on dependent Bernoulli random variables with applications to meta‐analysis and group level studies

PubMed Central

Bakbergenuly, Ilyas; Morgenthaler, Stephan

2016-01-01

We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group‐level studies or in meta‐analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log‐odds and arcsine transformations of the estimated probability p^, both for single‐group studies and in combining results from several groups or studies in meta‐analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta‐analysis and result in abysmal coverage of the combined effect for large K. We also propose bias‐correction for the arcsine transformation. Our simulations demonstrate that this bias‐correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta‐analyses of prevalence. PMID:27192062
Generation of Stationary Non-Gaussian Time Histories with a Specified Cross-spectral Density

DOE PAGES

Smallwood, David O.

1997-01-01

The paper reviews several methods for the generation of stationary realizations of sampled time histories with non-Gaussian distributions and introduces a new method which can be used to control the cross-spectral density matrix and the probability density functions (pdfs) of the multiple input problem. Discussed first are two methods for the specialized case of matching the auto (power) spectrum, the skewness, and kurtosis using generalized shot noise and using polynomial functions. It is then shown that the skewness and kurtosis can also be controlled by the phase of a complex frequency domain description of the random process. The general casemore » of matching a target probability density function using a zero memory nonlinear (ZMNL) function is then covered. Next methods for generating vectors of random variables with a specified covariance matrix for a class of spherically invariant random vectors (SIRV) are discussed. Finally the general case of matching the cross-spectral density matrix of a vector of inputs with non-Gaussian marginal distributions is presented.« less
Predicting longitudinal trajectories of health probabilities with random-effects multinomial logit regression.

PubMed

Liu, Xian; Engel, Charles C

2012-12-20

Researchers often encounter longitudinal health data characterized with three or more ordinal or nominal categories. Random-effects multinomial logit models are generally applied to account for potential lack of independence inherent in such clustered data. When parameter estimates are used to describe longitudinal processes, however, random effects, both between and within individuals, need to be retransformed for correctly predicting outcome probabilities. This study attempts to go beyond existing work by developing a retransformation method that derives longitudinal growth trajectories of unbiased health probabilities. We estimated variances of the predicted probabilities by using the delta method. Additionally, we transformed the covariates' regression coefficients on the multinomial logit function, not substantively meaningful, to the conditional effects on the predicted probabilities. The empirical illustration uses the longitudinal data from the Asset and Health Dynamics among the Oldest Old. Our analysis compared three sets of the predicted probabilities of three health states at six time points, obtained from, respectively, the retransformation method, the best linear unbiased prediction, and the fixed-effects approach. The results demonstrate that neglect of retransforming random errors in the random-effects multinomial logit model results in severely biased longitudinal trajectories of health probabilities as well as overestimated effects of covariates on the probabilities. Copyright © 2012 John Wiley & Sons, Ltd.
Under-sampling trajectory design for compressed sensing based DCE-MRI.

PubMed

Liu, Duan-duan; Liang, Dong; Zhang, Na; Liu, Xin; Zhang, Yuan-ting

2013-01-01

Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) needs high temporal and spatial resolution to accurately estimate quantitative parameters and characterize tumor vasculature. Compressed Sensing (CS) has the potential to accomplish this mutual importance. However, the randomness in CS under-sampling trajectory designed using the traditional variable density (VD) scheme may translate to uncertainty in kinetic parameter estimation when high reduction factors are used. Therefore, accurate parameter estimation using VD scheme usually needs multiple adjustments on parameters of Probability Density Function (PDF), and multiple reconstructions even with fixed PDF, which is inapplicable for DCE-MRI. In this paper, an under-sampling trajectory design which is robust to the change on PDF parameters and randomness with fixed PDF is studied. The strategy is to adaptively segment k-space into low-and high frequency domain, and only apply VD scheme in high-frequency domain. Simulation results demonstrate high accuracy and robustness comparing to VD design.
Dynamic probability of reinforcement for cooperation: Random game termination in the centipede game.

PubMed

Krockow, Eva M; Colman, Andrew M; Pulford, Briony D

2018-03-01

Experimental games have previously been used to study principles of human interaction. Many such games are characterized by iterated or repeated designs that model dynamic relationships, including reciprocal cooperation. To enable the study of infinite game repetitions and to avoid endgame effects of lower cooperation toward the final game round, investigators have introduced random termination rules. This study extends previous research that has focused narrowly on repeated Prisoner's Dilemma games by conducting a controlled experiment of two-player, random termination Centipede games involving probabilistic reinforcement and characterized by the longest decision sequences reported in the empirical literature to date (24 decision nodes). Specifically, we assessed mean exit points and cooperation rates, and compared the effects of four different termination rules: no random game termination, random game termination with constant termination probability, random game termination with increasing termination probability, and random game termination with decreasing termination probability. We found that although mean exit points were lower for games with shorter expected game lengths, the subjects' cooperativeness was significantly reduced only in the most extreme condition with decreasing computer termination probability and an expected game length of two decision nodes. © 2018 Society for the Experimental Analysis of Behavior.
Effects of an Employee Wellness Program on Physiological Risk Factors, Job Satisfaction, and Monetary Savings in a South Texas University

ERIC Educational Resources Information Center

Hamilton, Jacqueline

2009-01-01

An experimental study was conducted to investigate the effects of an Employee Wellness Program on physiological risk factors, job satisfaction, and monetary savings in a South Texas University. The non-probability sample consisted of 31 employees from lower income level positions. The employees were randomly assigned to the treatment group which…
Sexual Intercourse and Pregnancy among African-American Adolescent Girls in High-Poverty Neighborhoods: The Role of Family and Perceived Community Environment. JCPR Working Paper.

ERIC Educational Resources Information Center

Moore, Mignon R.; Chase-Lansdale, P. Lindsay

This study used data from a random sample of African American families living in poor urban communities to examine: how well socialization, supervision, and marital transition hypotheses explained the relationship between family structure and the probability of sexual debut and pregnancy for African American adolescents in disadvantaged…
Ensembles of Spiking Neurons with Noise Support Optimal Probabilistic Inference in a Dynamically Changing Environment

PubMed Central

Legenstein, Robert; Maass, Wolfgang

2014-01-01

It has recently been shown that networks of spiking neurons with noise can emulate simple forms of probabilistic inference through “neural sampling”, i.e., by treating spikes as samples from a probability distribution of network states that is encoded in the network. Deficiencies of the existing model are its reliance on single neurons for sampling from each random variable, and the resulting limitation in representing quickly varying probabilistic information. We show that both deficiencies can be overcome by moving to a biologically more realistic encoding of each salient random variable through the stochastic firing activity of an ensemble of neurons. The resulting model demonstrates that networks of spiking neurons with noise can easily track and carry out basic computational operations on rapidly varying probability distributions, such as the odds of getting rewarded for a specific behavior. We demonstrate the viability of this new approach towards neural coding and computation, which makes use of the inherent parallelism of generic neural circuits, by showing that this model can explain experimentally observed firing activity of cortical neurons for a variety of tasks that require rapid temporal integration of sensory information. PMID:25340749
Nonlinearity and seasonal bias in an index of brushtail possum abundance

USGS Publications Warehouse

Forsyth, D.M.; Link, W.A.; Webster, R.; Nugent, G.; Warburton, B.

2005-01-01

Introduced brushtail possums (Trichosurus vulpecula) are a widespread pest of conservation and agriculture in New Zealand, and considerable effort has been expended controlling populations to low densities. A national protocol for monitoring the abundance of possums, termed trap catch index (TCI), was adopted in 1996. The TCI requires that lines of leghold traps set at 20-m spacing are randomly located in a management area. The traps are set for 3 fine nights and checked daily, and possums are killed and traps reset. The TCI is the mean percentage of trap nights that possums were caught, corrected for sprung traps and nontarget captures, with trap line as the sampling unit. We studied I forest and I farmland area in the North Island, New Zealand, to address concerns that TCI estimates may not be readily comparable because of seasonal changes in the capture probability of possums. We located blocks of 6 trap lines at each area and randomly trapped I line in each block in 3 seasons (summer, winter, and spring) in 2000 and 2001. We developed a model to allow for variation in local population size and nightly capture probability, and fitted the model using the Bayesian analysis software BUGS. Capture probability declined with increasing abundance of possums, generating a nonlinear TCI. Capture probability in farmland was lower during spring relative to winter and summer, and to forest during summer. In the absence of a proven and cost-effective alternative, our results support the continued use of the TCI for monitoring the abundance of possums in New Zealand. Seasonal biases in the TCI should be minimized by conducting repeat sampling in the same season.
Finding SDSS Galaxy Clusters in 4-dimensional Color Space Using the False Discovery Rate

NASA Astrophysics Data System (ADS)

Nichol, R. C.; Miller, C. J.; Reichart, D.; Wasserman, L.; Genovese, C.; SDSS Collaboration

2000-12-01

We describe a recently developed statistical technique that provides a meaningful cut-off in probability-based decision making. We are concerned with multiple testing, where each test produces a well-defined probability (or p-value). By well-known, we mean that the null hypothesis used to determine the p-value is fully understood and appropriate. The method is entitled False Discovery Rate (FDR) and its largest advantage over other measures is that it allows one to specify a maximal amount of acceptable error. As an example of this tool, we apply FDR to a four-dimensional clustering algorithm using SDSS data. For each galaxy (or test galaxy), we count the number of neighbors that fit within one standard deviation of a four dimensional Gaussian centered on that test galaxy. The mean and standard deviation of that Gaussian are determined from the colors and errors of the test galaxy. We then take that same Gaussian and place it on a random selection of n galaxies and make a similar count. In the limit of large n, we expect the median count around these random galaxies to represent a typical field galaxy. For every test galaxy we determine the probability (or p-value) that it is a field galaxy based on these counts. A low p-value implies that the test galaxy is in a cluster environment. Once we have a p-value for every galaxy, we use FDR to determine at what level we should make our probability cut-off. Once this cut-off is made, we have a final sample of galaxies that are cluster-like galaxies. Using FDR, we also know the maximum amount of field contamination in our cluster galaxy sample. We present our preliminary galaxy clustering results using these methods.
Algorithmic universality in F-theory compactifications

NASA Astrophysics Data System (ADS)

Halverson, James; Long, Cody; Sung, Benjamin

2017-12-01

We study universality of geometric gauge sectors in the string landscape in the context of F-theory compactifications. A finite time construction algorithm is presented for 4/3 ×2.96 ×10755 F-theory geometries that are connected by a network of topological transitions in a connected moduli space. High probability geometric assumptions uncover universal structures in the ensemble without explicitly constructing it. For example, non-Higgsable clusters of seven-branes with intricate gauge sectors occur with a probability above 1 - 1.01 ×10-755 , and the geometric gauge group rank is above 160 with probability 0.999995. In the latter case there are at least 10 E8 factors, the structure of which fixes the gauge groups on certain nearby seven-branes. Visible sectors may arise from E6 or S U (3 ) seven-branes, which occur in certain random samples with probability ≃1 /200 .
Microbial Performance of Food Safety Control and Assurance Activities in a Fresh Produce Processing Sector Measured Using a Microbial Assessment Scheme and Statistical Modeling.

PubMed

Njage, Patrick Murigu Kamau; Sawe, Chemutai Tonui; Onyango, Cecilia Moraa; Habib, I; Njagi, Edmund Njeru; Aerts, Marc; Molenberghs, Geert

2017-01-01

Current approaches such as inspections, audits, and end product testing cannot detect the distribution and dynamics of microbial contamination. Despite the implementation of current food safety management systems, foodborne outbreaks linked to fresh produce continue to be reported. A microbial assessment scheme and statistical modeling were used to systematically assess the microbial performance of core control and assurance activities in five Kenyan fresh produce processing and export companies. Generalized linear mixed models and correlated random-effects joint models for multivariate clustered data followed by empirical Bayes estimates enabled the analysis of the probability of contamination across critical sampling locations (CSLs) and factories as a random effect. Salmonella spp. and Listeria monocytogenes were not detected in the final products. However, none of the processors attained the maximum safety level for environmental samples. Escherichia coli was detected in five of the six CSLs, including the final product. Among the processing-environment samples, the hand or glove swabs of personnel revealed a higher level of predicted contamination with E. coli , and 80% of the factories were E. coli positive at this CSL. End products showed higher predicted probabilities of having the lowest level of food safety compared with raw materials. The final products were E. coli positive despite the raw materials being E. coli negative for 60% of the processors. There was a higher probability of contamination with coliforms in water at the inlet than in the final rinse water. Four (80%) of the five assessed processors had poor to unacceptable counts of Enterobacteriaceae on processing surfaces. Personnel-, equipment-, and product-related hygiene measures to improve the performance of preventive and intervention measures are recommended.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Ebeida, Mohamed S.; Mitchell, Scott A.; Swiler, Laura P.

We introduce a novel technique, POF-Darts, to estimate the Probability Of Failure based on random disk-packing in the uncertain parameter space. POF-Darts uses hyperplane sampling to explore the unexplored part of the uncertain space. We use the function evaluation at a sample point to determine whether it belongs to failure or non-failure regions, and surround it with a protection sphere region to avoid clustering. We decompose the domain into Voronoi cells around the function evaluations as seeds and choose the radius of the protection sphere depending on the local Lipschitz continuity. As sampling proceeds, regions uncovered with spheres will shrink,more » improving the estimation accuracy. After exhausting the function evaluation budget, we build a surrogate model using the function evaluations associated with the sample points and estimate the probability of failure by exhaustive sampling of that surrogate. In comparison to other similar methods, our algorithm has the advantages of decoupling the sampling step from the surrogate construction one, the ability to reach target POF values with fewer samples, and the capability of estimating the number and locations of disconnected failure regions, not just the POF value. Furthermore, we present various examples to demonstrate the efficiency of our novel approach.« less
Testing for variation in taxonomic extinction probabilities: a suggested methodology and some results

USGS Publications Warehouse

Conroy, M.J.; Nichols, J.D.

1984-01-01

Several important questions in evolutionary biology and paleobiology involve sources of variation in extinction rates. In all cases of which we are aware, extinction rates have been estimated from data in which the probability that an observation (e.g., a fossil taxon) will occur is related both to extinction rates and to what we term encounter probabilities. Any statistical method for analyzing fossil data should at a minimum permit separate inferences on these two components. We develop a method for estimating taxonomic extinction rates from stratigraphic range data and for testing hypotheses about variability in these rates. We use this method to estimate extinction rates and to test the hypothesis of constant extinction rates for several sets of stratigraphic range data. The results of our tests support the hypothesis that extinction rates varied over the geologic time periods examined. We also present a test that can be used to identify periods of high or low extinction probabilities and provide an example using Phanerozoic invertebrate data. Extinction rates should be analyzed using stochastic models, in which it is recognized that stratigraphic samples are random varlates and that sampling is imperfect
Improvements in sub-grid, microphysics averages using quadrature based approaches

NASA Astrophysics Data System (ADS)

Chowdhary, K.; Debusschere, B.; Larson, V. E.

2013-12-01

Sub-grid variability in microphysical processes plays a critical role in atmospheric climate models. In order to account for this sub-grid variability, Larson and Schanen (2013) propose placing a probability density function on the sub-grid cloud microphysics quantities, e.g. autoconversion rate, essentially interpreting the cloud microphysics quantities as a random variable in each grid box. Random sampling techniques, e.g. Monte Carlo and Latin Hypercube, can be used to calculate statistics, e.g. averages, on the microphysics quantities, which then feed back into the model dynamics on the coarse scale. We propose an alternate approach using numerical quadrature methods based on deterministic sampling points to compute the statistical moments of microphysics quantities in each grid box. We have performed a preliminary test on the Kessler autoconversion formula, and, upon comparison with Latin Hypercube sampling, our approach shows an increased level of accuracy with a reduction in sample size by almost two orders of magnitude. Application to other microphysics processes is the subject of ongoing research.
Modeling Training Site Vegetation Coverage Probability with a Random Optimizing Procedure: An Artificial Neural Network Approach.

DTIC Science & Technology

1998-05-01

Coverage Probability with a Random Optimization Procedure: An Artificial Neural Network Approach by Biing T. Guan, George Z. Gertner, and Alan B...Modeling Training Site Vegetation Coverage Probability with a Random Optimizing Procedure: An Artificial Neural Network Approach 6. AUTHOR(S) Biing...coverage based on past coverage. Approach A literature survey was conducted to identify artificial neural network analysis techniques applicable for
Designing a monitoring program to estimate estuarine survival of anadromous salmon smolts: simulating the effect of sample design on inference

USGS Publications Warehouse

Romer, Jeremy D.; Gitelman, Alix I.; Clements, Shaun; Schreck, Carl B.

2015-01-01

A number of researchers have attempted to estimate salmonid smolt survival during outmigration through an estuary. However, it is currently unclear how the design of such studies influences the accuracy and precision of survival estimates. In this simulation study we consider four patterns of smolt survival probability in the estuary, and test the performance of several different sampling strategies for estimating estuarine survival assuming perfect detection. The four survival probability patterns each incorporate a systematic component (constant, linearly increasing, increasing and then decreasing, and two pulses) and a random component to reflect daily fluctuations in survival probability. Generally, spreading sampling effort (tagging) across the season resulted in more accurate estimates of survival. All sampling designs in this simulation tended to under-estimate the variation in the survival estimates because seasonal and daily variation in survival probability are not incorporated in the estimation procedure. This under-estimation results in poorer performance of estimates from larger samples. Thus, tagging more fish may not result in better estimates of survival if important components of variation are not accounted for. The results of our simulation incorporate survival probabilities and run distribution data from previous studies to help illustrate the tradeoffs among sampling strategies in terms of the number of tags needed and distribution of tagging effort. This information will assist researchers in developing improved monitoring programs and encourage discussion regarding issues that should be addressed prior to implementation of any telemetry-based monitoring plan. We believe implementation of an effective estuary survival monitoring program will strengthen the robustness of life cycle models used in recovery plans by providing missing data on where and how much mortality occurs in the riverine and estuarine portions of smolt migration. These data could result in better informed management decisions and assist in guidance for more effective estuarine restoration projects.
A critical analysis of high-redshift, massive, galaxy clusters. Part I

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoyle, Ben; Jimenez, Raul; Verde, Licia

2012-02-01

We critically investigate current statistical tests applied to high redshift clusters of galaxies in order to test the standard cosmological model and describe their range of validity. We carefully compare a sample of high-redshift, massive, galaxy clusters with realistic Poisson sample simulations of the theoretical mass function, which include the effect of Eddington bias. We compare the observations and simulations using the following statistical tests: the distributions of ensemble and individual existence probabilities (in the > M, > z sense), the redshift distributions, and the 2d Kolmogorov-Smirnov test. Using seemingly rare clusters from Hoyle et al. (2011), and Jee etmore » al. (2011) and assuming the same survey geometry as in Jee et al. (2011, which is less conservative than Hoyle et al. 2011), we find that the ( > M, > z) existence probabilities of all clusters are fully consistent with ΛCDM. However assuming the same survey geometry, we use the 2d K-S test probability to show that the observed clusters are not consistent with being the least probable clusters from simulations at > 95% confidence, and are also not consistent with being a random selection of clusters, which may be caused by the non-trivial selection function and survey geometry. Tension can be removed if we examine only a X-ray selected sub sample, with simulations performed assuming a modified survey geometry.« less
Application of a multipurpose unequal probability stream survey in the Mid-Atlantic Coastal Plain

USGS Publications Warehouse

Ator, S.W.; Olsen, A.R.; Pitchford, A.M.; Denver, J.M.

2003-01-01

A stratified, spatially balanced sample with unequal probability selection was used to design a multipurpose survey of headwater streams in the Mid-Atlantic Coastal Plain. Objectives for the survey include unbiased estimates of regional stream conditions, and adequate coverage of unusual but significant environmental settings to support empirical modeling of the factors affecting those conditions. The design and field application of the survey are discussed in light of these multiple objectives. A probability (random) sample of 175 first-order nontidal streams was selected for synoptic sampling of water chemistry and benthic and riparian ecology during late winter and spring 2000. Twenty-five streams were selected within each of seven hydrogeologic subregions (strata) that were delineated on the basis of physiography and surficial geology. In each subregion, unequal inclusion probabilities were used to provide an approximately even distribution of streams along a gradient of forested to developed (agricultural or urban) land in the contributing watershed. Alternate streams were also selected. Alternates were included in groups of five in each subregion when field reconnaissance demonstrated that primary streams were inaccessible or otherwise unusable. Despite the rejection and replacement of a considerable number of primary streams during reconnaissance (up to 40 percent in one subregion), the desired land use distribution was maintained within each hydrogeologic subregion without sacrificing the probabilistic design.

Probabilistic modelling of overflow, surcharge and flooding in urban drainage using the first-order reliability method and parameterization of local rain series.

PubMed

Thorndahl, S; Willems, P

2008-01-01

Failure of urban drainage systems may occur due to surcharge or flooding at specific manholes in the system, or due to overflows from combined sewer systems to receiving waters. To quantify the probability or return period of failure, standard approaches make use of the simulation of design storms or long historical rainfall series in a hydrodynamic model of the urban drainage system. In this paper, an alternative probabilistic method is investigated: the first-order reliability method (FORM). To apply this method, a long rainfall time series was divided in rainstorms (rain events), and each rainstorm conceptualized to a synthetic rainfall hyetograph by a Gaussian shape with the parameters rainstorm depth, duration and peak intensity. Probability distributions were calibrated for these three parameters and used on the basis of the failure probability estimation, together with a hydrodynamic simulation model to determine the failure conditions for each set of parameters. The method takes into account the uncertainties involved in the rainstorm parameterization. Comparison is made between the failure probability results of the FORM method, the standard method using long-term simulations and alternative methods based on random sampling (Monte Carlo direct sampling and importance sampling). It is concluded that without crucial influence on the modelling accuracy, the FORM is very applicable as an alternative to traditional long-term simulations of urban drainage systems.
Nurse Family Partnership: Comparing Costs per Family in Randomized Trials Versus Scale-Up.

PubMed

Miller, Ted R; Hendrie, Delia

2015-12-01

The literature that addresses cost differences between randomized trials and full-scale replications is quite sparse. This paper examines how costs differed among three randomized trials and six statewide scale-ups of nurse family partnership (NFP) intensive home visitation to low income first-time mothers. A literature review provided data on pertinent trials. At our request, six well-established programs reported their total expenditures. We adjusted the costs to national prices based on mean hourly wages for registered nurses and then inflated them to 2010 dollars. A centralized data system provided utilization. Replications had fewer home visits per family than trials (25 vs. 31, p = .05), lower costs per client ($8860 vs. $12,398, p = .01), and lower costs per visit ($354 vs. $400, p = .30). Sample size limited the significance of these differences. In this type of labor intensive program, costs probably were lower in scale-up than in randomized trials. Key cost drivers were attrition and the stable caseload size possible in an ongoing program. Our estimates reveal a wide variation in cost per visit across six state programs, which suggests that those planning replications should not expect a simple rule to guide cost estimations for scale-ups. Nevertheless, NFP replications probably achieved some economies of scale.
TemperSAT: A new efficient fair-sampling random k-SAT solver

NASA Astrophysics Data System (ADS)

Fang, Chao; Zhu, Zheng; Katzgraber, Helmut G.

The set membership problem is of great importance to many applications and, in particular, database searches for target groups. Recently, an approach to speed up set membership searches based on the NP-hard constraint-satisfaction problem (random k-SAT) has been developed. However, the bottleneck of the approach lies in finding the solution to a large SAT formula efficiently and, in particular, a large number of independent solutions is needed to reduce the probability of false positives. Unfortunately, traditional random k-SAT solvers such as WalkSAT are biased when seeking solutions to the Boolean formulas. By porting parallel tempering Monte Carlo to the sampling of binary optimization problems, we introduce a new algorithm (TemperSAT) whose performance is comparable to current state-of-the-art SAT solvers for large k with the added benefit that theoretically it can find many independent solutions quickly. We illustrate our results by comparing to the currently fastest implementation of WalkSAT, WalkSATlm.
Statistical Characterization of the Mechanical Parameters of Intact Rock Under Triaxial Compression: An Experimental Proof of the Jinping Marble

NASA Astrophysics Data System (ADS)

Jiang, Quan; Zhong, Shan; Cui, Jie; Feng, Xia-Ting; Song, Leibo

2016-12-01

We investigated the statistical characteristics and probability distribution of the mechanical parameters of natural rock using triaxial compression tests. Twenty cores of Jinping marble were tested under each different levels of confining stress (i.e., 5, 10, 20, 30, and 40 MPa). From these full stress-strain data, we summarized the numerical characteristics and determined the probability distribution form of several important mechanical parameters, including deformational parameters, characteristic strength, characteristic strains, and failure angle. The statistical proofs relating to the mechanical parameters of rock presented new information about the marble's probabilistic distribution characteristics. The normal and log-normal distributions were appropriate for describing random strengths of rock; the coefficients of variation of the peak strengths had no relationship to the confining stress; the only acceptable random distribution for both Young's elastic modulus and Poisson's ratio was the log-normal function; and the cohesive strength had a different probability distribution pattern than the frictional angle. The triaxial tests and statistical analysis also provided experimental evidence for deciding the minimum reliable number of experimental sample and for picking appropriate parameter distributions to use in reliability calculations for rock engineering.
Comparing population size estimators for plethodontid salamanders

USGS Publications Warehouse

Bailey, L.L.; Simons, T.R.; Pollock, K.H.

2004-01-01

Despite concern over amphibian declines, few studies estimate absolute abundances because of logistic and economic constraints and previously poor estimator performance. Two estimation approaches recommended for amphibian studies are mark-recapture and depletion (or removal) sampling. We compared abundance estimation via various mark-recapture and depletion methods, using data from a three-year study of terrestrial salamanders in Great Smoky Mountains National Park. Our results indicate that short-term closed-population, robust design, and depletion methods estimate surface population of salamanders (i.e., those near the surface and available for capture during a given sampling occasion). In longer duration studies, temporary emigration violates assumptions of both open- and closed-population mark-recapture estimation models. However, if the temporary emigration is completely random, these models should yield unbiased estimates of the total population (superpopulation) of salamanders in the sampled area. We recommend using Pollock's robust design in mark-recapture studies because of its flexibility to incorporate variation in capture probabilities and to estimate temporary emigration probabilities.
Probability Theory Plus Noise: Descriptive Estimation and Inferential Judgment.

PubMed

Costello, Fintan; Watts, Paul

2018-01-01

We describe a computational model of two central aspects of people's probabilistic reasoning: descriptive probability estimation and inferential probability judgment. This model assumes that people's reasoning follows standard frequentist probability theory, but it is subject to random noise. This random noise has a regressive effect in descriptive probability estimation, moving probability estimates away from normative probabilities and toward the center of the probability scale. This random noise has an anti-regressive effect in inferential judgement, however. These regressive and anti-regressive effects explain various reliable and systematic biases seen in people's descriptive probability estimation and inferential probability judgment. This model predicts that these contrary effects will tend to cancel out in tasks that involve both descriptive estimation and inferential judgement, leading to unbiased responses in those tasks. We test this model by applying it to one such task, described by Gallistel et al. ). Participants' median responses in this task were unbiased, agreeing with normative probability theory over the full range of responses. Our model captures the pattern of unbiased responses in this task, while simultaneously explaining systematic biases away from normatively correct probabilities seen in other tasks. Copyright © 2018 Cognitive Science Society, Inc.
Impulsive synchronization of Markovian jumping randomly coupled neural networks with partly unknown transition probabilities via multiple integral approach.

PubMed

Chandrasekar, A; Rakkiyappan, R; Cao, Jinde

2015-10-01

This paper studies the impulsive synchronization of Markovian jumping randomly coupled neural networks with partly unknown transition probabilities via multiple integral approach. The array of neural networks are coupled in a random fashion which is governed by Bernoulli random variable. The aim of this paper is to obtain the synchronization criteria, which is suitable for both exactly known and partly unknown transition probabilities such that the coupled neural network is synchronized with mixed time-delay. The considered impulsive effects can be synchronized at partly unknown transition probabilities. Besides, a multiple integral approach is also proposed to strengthen the Markovian jumping randomly coupled neural networks with partly unknown transition probabilities. By making use of Kronecker product and some useful integral inequalities, a novel Lyapunov-Krasovskii functional was designed for handling the coupled neural network with mixed delay and then impulsive synchronization criteria are solvable in a set of linear matrix inequalities. Finally, numerical examples are presented to illustrate the effectiveness and advantages of the theoretical results. Copyright © 2015 Elsevier Ltd. All rights reserved.
Estimating the Burden of Leptospirosis among Febrile Subjects Aged below 20 Years in Kampong Cham Communities, Cambodia, 2007-2009

PubMed Central

Hem, Sopheak; Ly, Sowath; Votsi, Irene; Vogt, Florian; Asgari, Nima; Buchy, Philippe; Heng, Seiha; Picardeau, Mathieu; Sok, Touch; Ly, Sovann; Huy, Rekol; Guillard, Bertrand; Cauchemez, Simon; Tarantola, Arnaud

2016-01-01

Background Leptospirosis is an emerging but neglected public health challenge in the Asia/Pacific Region with an annual incidence estimated at 10–100 per 100,000 population. No accurate data, however, are available for at-risk rural Cambodian communities. Method We conducted anonymous, unlinked testing for IgM antibodies to Leptospira spp. on paired sera of Cambodian patients <20 years of age between 2007–2009 collected through active, community-based surveillance for febrile illnesses in a convenience sample of 27 rural and semi-rural villages in four districts of Kampong Cham province, Cambodia. Leptospirosis testing was done on paired serological samples negative for Dengue, Japanese encephalitis and Chikungunya viruses after random selection. Convalescent samples found positive while initial samples were negative were considered as proof of acute infection. We then applied a mathematical model to estimate the risk of fever caused by leptospirosis, dengue or other causes in rural Cambodia. Results A total of 630 samples are coming from a randomly selected subset of 2358 samples. IgM positive were found on the convalescent serum sample, among which 100 (15.8%) samples were IgM negative on an earlier sample. Seventeen of these 100 seroconversions were confirmed using a Microagglutination Test. We estimated the probability of having a fever due to leptospirosis at 1. 03% (95% Credible Interval CI: 0. 95%–1. 22%) per semester. In comparison, this probability was 2. 61% (95% CI: 2. 55%, 2. 83%) for dengue and 17. 65% (95% CI: 17. 49%, 18. 08%) for other causes. Conclusion Our data from febrile cases aged below 20 years suggest that the burden of leptospirosis is high in rural Cambodian communities. This is especially true during the rainy season, even in the absence of identified epidemics. PMID:27043016
Methodological considerations in using complex survey data: an applied example with the Head Start Family and Child Experiences Survey.

PubMed

Hahs-Vaughn, Debbie L; McWayne, Christine M; Bulotsky-Shearer, Rebecca J; Wen, Xiaoli; Faria, Ann-Marie

2011-06-01

Complex survey data are collected by means other than simple random samples. This creates two analytical issues: nonindependence and unequal selection probability. Failing to address these issues results in underestimated standard errors and biased parameter estimates. Using data from the nationally representative Head Start Family and Child Experiences Survey (FACES; 1997 and 2000 cohorts), three diverse multilevel models are presented that illustrate differences in results depending on addressing or ignoring the complex sampling issues. Limitations of using complex survey data are reported, along with recommendations for reporting complex sample results. © The Author(s) 2011
People's Intuitions about Randomness and Probability: An Empirical Study

ERIC Educational Resources Information Center

Lecoutre, Marie-Paule; Rovira, Katia; Lecoutre, Bruno; Poitevineau, Jacques

2006-01-01

What people mean by randomness should be taken into account when teaching statistical inference. This experiment explored subjective beliefs about randomness and probability through two successive tasks. Subjects were asked to categorize 16 familiar items: 8 real items from everyday life experiences, and 8 stochastic items involving a repeatable…
Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants?

PubMed

Yilmaz, Yildiz E; Bull, Shelley B

2011-11-29

Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.
Random-Forest Classification of High-Resolution Remote Sensing Images and Ndsm Over Urban Areas

NASA Astrophysics Data System (ADS)

Sun, X. F.; Lin, X. G.

2017-09-01

As an intermediate step between raw remote sensing data and digital urban maps, remote sensing data classification has been a challenging and long-standing research problem in the community of remote sensing. In this work, an effective classification method is proposed for classifying high-resolution remote sensing data over urban areas. Starting from high resolution multi-spectral images and 3D geometry data, our method proceeds in three main stages: feature extraction, classification, and classified result refinement. First, we extract color, vegetation index and texture features from the multi-spectral image and compute the height, elevation texture and differential morphological profile (DMP) features from the 3D geometry data. Then in the classification stage, multiple random forest (RF) classifiers are trained separately, then combined to form a RF ensemble to estimate each sample's category probabilities. Finally the probabilities along with the feature importance indicator outputted by RF ensemble are used to construct a fully connected conditional random field (FCCRF) graph model, by which the classification results are refined through mean-field based statistical inference. Experiments on the ISPRS Semantic Labeling Contest dataset show that our proposed 3-stage method achieves 86.9% overall accuracy on the test data.
Inference for binomial probability based on dependent Bernoulli random variables with applications to meta-analysis and group level studies.

PubMed

Bakbergenuly, Ilyas; Kulinskaya, Elena; Morgenthaler, Stephan

2016-07-01

We study bias arising as a result of nonlinear transformations of random variables in random or mixed effects models and its effect on inference in group-level studies or in meta-analysis. The findings are illustrated on the example of overdispersed binomial distributions, where we demonstrate considerable biases arising from standard log-odds and arcsine transformations of the estimated probability p̂, both for single-group studies and in combining results from several groups or studies in meta-analysis. Our simulations confirm that these biases are linear in ρ, for small values of ρ, the intracluster correlation coefficient. These biases do not depend on the sample sizes or the number of studies K in a meta-analysis and result in abysmal coverage of the combined effect for large K. We also propose bias-correction for the arcsine transformation. Our simulations demonstrate that this bias-correction works well for small values of the intraclass correlation. The methods are applied to two examples of meta-analyses of prevalence. © 2016 The Authors. Biometrical Journal Published by Wiley-VCH Verlag GmbH & Co. KGaA.
On random field Completely Automated Public Turing Test to Tell Computers and Humans Apart generation.

PubMed

Kouritzin, Michael A; Newton, Fraser; Wu, Biao

2013-04-01

Herein, we propose generating CAPTCHAs through random field simulation and give a novel, effective and efficient algorithm to do so. Indeed, we demonstrate that sufficient information about word tests for easy human recognition is contained in the site marginal probabilities and the site-to-nearby-site covariances and that these quantities can be embedded directly into certain conditional probabilities, designed for effective simulation. The CAPTCHAs are then partial random realizations of the random CAPTCHA word. We start with an initial random field (e.g., randomly scattered letter pieces) and use Gibbs resampling to re-simulate portions of the field repeatedly using these conditional probabilities until the word becomes human-readable. The residual randomness from the initial random field together with the random implementation of the CAPTCHA word provide significant resistance to attack. This results in a CAPTCHA, which is unrecognizable to modern optical character recognition but is recognized about 95% of the time in a human readability study.
Nonstationary envelope process and first excursion probability.

NASA Technical Reports Server (NTRS)

Yang, J.-N.

1972-01-01

The definition of stationary random envelope proposed by Cramer and Leadbetter, is extended to the envelope of nonstationary random process possessing evolutionary power spectral densities. The density function, the joint density function, the moment function, and the crossing rate of a level of the nonstationary envelope process are derived. Based on the envelope statistics, approximate solutions to the first excursion probability of nonstationary random processes are obtained. In particular, applications of the first excursion probability to the earthquake engineering problems are demonstrated in detail.
Quantifying errors without random sampling.

PubMed

Phillips, Carl V; LaPole, Luwanna M

2003-06-12

All quantifications of mortality, morbidity, and other health measures involve numerous sources of error. The routine quantification of random sampling error makes it easy to forget that other sources of error can and should be quantified. When a quantification does not involve sampling, error is almost never quantified and results are often reported in ways that dramatically overstate their precision. We argue that the precision implicit in typical reporting is problematic and sketch methods for quantifying the various sources of error, building up from simple examples that can be solved analytically to more complex cases. There are straightforward ways to partially quantify the uncertainty surrounding a parameter that is not characterized by random sampling, such as limiting reported significant figures. We present simple methods for doing such quantifications, and for incorporating them into calculations. More complicated methods become necessary when multiple sources of uncertainty must be combined. We demonstrate that Monte Carlo simulation, using available software, can estimate the uncertainty resulting from complicated calculations with many sources of uncertainty. We apply the method to the current estimate of the annual incidence of foodborne illness in the United States. Quantifying uncertainty from systematic errors is practical. Reporting this uncertainty would more honestly represent study results, help show the probability that estimated values fall within some critical range, and facilitate better targeting of further research.
Accounting for selection bias in association studies with complex survey data.

PubMed

Wirth, Kathleen E; Tchetgen Tchetgen, Eric J

2014-05-01

Obtaining representative information from hidden and hard-to-reach populations is fundamental to describe the epidemiology of many sexually transmitted diseases, including HIV. Unfortunately, simple random sampling is impractical in these settings, as no registry of names exists from which to sample the population at random. However, complex sampling designs can be used, as members of these populations tend to congregate at known locations, which can be enumerated and sampled at random. For example, female sex workers may be found at brothels and street corners, whereas injection drug users often come together at shooting galleries. Despite the logistical appeal, complex sampling schemes lead to unequal probabilities of selection, and failure to account for this differential selection can result in biased estimates of population averages and relative risks. However, standard techniques to account for selection can lead to substantial losses in efficiency. Consequently, researchers implement a variety of strategies in an effort to balance validity and efficiency. Some researchers fully or partially account for the survey design, whereas others do nothing and treat the sample as a realization of the population of interest. We use directed acyclic graphs to show how certain survey sampling designs, combined with subject-matter considerations unique to individual exposure-outcome associations, can induce selection bias. Finally, we present a novel yet simple maximum likelihood approach for analyzing complex survey data; this approach optimizes statistical efficiency at no cost to validity. We use simulated data to illustrate this method and compare it with other analytic techniques.
Actual distribution of Cronobacter spp. in industrial batches of powdered infant formula and consequences for performance of sampling strategies.

PubMed

Jongenburger, I; Reij, M W; Boer, E P J; Gorris, L G M; Zwietering, M H

2011-11-15

The actual spatial distribution of microorganisms within a batch of food influences the results of sampling for microbiological testing when this distribution is non-homogeneous. In the case of pathogens being non-homogeneously distributed, it markedly influences public health risk. This study investigated the spatial distribution of Cronobacter spp. in powdered infant formula (PIF) on industrial batch-scale for both a recalled batch as well a reference batch. Additionally, local spatial occurrence of clusters of Cronobacter cells was assessed, as well as the performance of typical sampling strategies to determine the presence of the microorganisms. The concentration of Cronobacter spp. was assessed in the course of the filling time of each batch, by taking samples of 333 g using the most probable number (MPN) enrichment technique. The occurrence of clusters of Cronobacter spp. cells was investigated by plate counting. From the recalled batch, 415 MPN samples were drawn. The expected heterogeneous distribution of Cronobacter spp. could be quantified from these samples, which showed no detectable level (detection limit of -2.52 log CFU/g) in 58% of samples, whilst in the remainder concentrations were found to be between -2.52 and 2.75 log CFU/g. The estimated average concentration in the recalled batch was -2.78 log CFU/g and a standard deviation of 1.10 log CFU/g. The estimated average concentration in the reference batch was -4.41 log CFU/g, with 99% of the 93 samples being below the detection limit. In the recalled batch, clusters of cells occurred sporadically in 8 out of 2290 samples of 1g taken. The two largest clusters contained 123 (2.09 log CFU/g) and 560 (2.75 log CFU/g) cells. Various sampling strategies were evaluated for the recalled batch. Taking more and smaller samples and keeping the total sampling weight constant, considerably improved the performance of the sampling plans to detect such a type of contaminated batch. Compared to random sampling, stratified random sampling improved the probability to detect the heterogeneous contamination. Copyright © 2011 Elsevier B.V. All rights reserved.
Assessment and Implication of Prognostic Imbalance in Randomized Controlled Trials with a Binary Outcome – A Simulation Study

PubMed Central

Chu, Rong; Walter, Stephen D.; Guyatt, Gordon; Devereaux, P. J.; Walsh, Michael; Thorlund, Kristian; Thabane, Lehana

2012-01-01

Background Chance imbalance in baseline prognosis of a randomized controlled trial can lead to over or underestimation of treatment effects, particularly in trials with small sample sizes. Our study aimed to (1) evaluate the probability of imbalance in a binary prognostic factor (PF) between two treatment arms, (2) investigate the impact of prognostic imbalance on the estimation of a treatment effect, and (3) examine the effect of sample size (n) in relation to the first two objectives. Methods We simulated data from parallel-group trials evaluating a binary outcome by varying the risk of the outcome, effect of the treatment, power and prevalence of the PF, and n. Logistic regression models with and without adjustment for the PF were compared in terms of bias, standard error, coverage of confidence interval and statistical power. Results For a PF with a prevalence of 0.5, the probability of a difference in the frequency of the PF≥5% reaches 0.42 with 125/arm. Ignoring a strong PF (relative risk = 5) leads to underestimating the strength of a moderate treatment effect, and the underestimate is independent of n when n is >50/arm. Adjusting for such PF increases statistical power. If the PF is weak (RR = 2), adjustment makes little difference in statistical inference. Conditional on a 5% imbalance of a powerful PF, adjustment reduces the likelihood of large bias. If an absolute measure of imbalance ≥5% is deemed important, including 1000 patients/arm provides sufficient protection against such an imbalance. Two thousand patients/arm may provide an adequate control against large random deviations in treatment effect estimation in the presence of a powerful PF. Conclusions The probability of prognostic imbalance in small trials can be substantial. Covariate adjustment improves estimation accuracy and statistical power, and hence should be performed when strong PFs are observed. PMID:22629322
Hybrid computer technique yields random signal probability distributions

NASA Technical Reports Server (NTRS)

Cameron, W. D.

1965-01-01

Hybrid computer determines the probability distributions of instantaneous and peak amplitudes of random signals. This combined digital and analog computer system reduces the errors and delays of manual data analysis.

Improving diagnostic accuracy of prostate carcinoma by systematic random map-biopsy.

PubMed

Szabó, J; Hegedûs, G; Bartók, K; Kerényi, T; Végh, A; Romics, I; Szende, B

2000-01-01

Systematic random rectal ultrasound directed map-biopsy of the prostate was performed in 77 RDE (rectal digital examination) positive and 25 RDE negative cases, if applicable. Hypoechoic areas were found in 30% of RDE positive and in 16% of RDE negative cases. The score for carcinoma in the hypoechoic areas was 6.5% in RDE positive and 0% in RDE negative cases, whereas systematic map biopsy detected 62% carcinomas in RDE positive, and 16% carcinomas in RDE negative patients. The probability of positive diagnosis of prostate carcinoma increased in parallel with the number of biopsy samples/case. The importance of systematic map biopsy is emphasized.
The Relationship between Credit Card Use Behavior and Household Well-Being during the Great Recession: Implications for the Ethics of Credit Use

ERIC Educational Resources Information Center

Hunter, Jennifer L.; Heath, Claudia J.

2017-01-01

This article uses a random digit dial probability sample (N = 328) to examine the relationship between credit card use behaviors and household well-being during a period of severe economic recession: The Great Recession. The ability to measure the role of credit card use during a period of recession provides unique insights to the study of credit…
Cannabis use in children with individualized risk profiles: Predicting the effect of universal prevention intervention.

PubMed

Miovský, Michal; Vonkova, Hana; Čablová, Lenka; Gabrhelík, Roman

2015-11-01

To study the effect of a universal prevention intervention targeting cannabis use in individual children with different risk profiles. A school-based randomized controlled prevention trial was conducted over a period of 33 months (n=1874 sixth-graders, baseline mean age 11.82). We used a two-level random intercept logistic model for panel data to predict the probabilities of cannabis use for each child. Specifically, we used eight risk/protective factors to characterize each child and then predicted two probabilities of cannabis use for each child if the child had the intervention or not. Using the two probabilities, we calculated the absolute and relative effect of the intervention for each child. According to the two probabilities, we also divided the sample into a low-risk group (the quarter of the children with the lowest probabilities), a moderate-risk group, and a high-risk group (the quarter of the children with the highest probabilities) and showed the average effect of the intervention on these groups. The differences between the intervention group and the control group were statistically significant in each risk group. The average predicted probabilities of cannabis use for a child from the low-risk group were 4.3% if the child had the intervention and 6.53% if no intervention was provided. The corresponding probabilities for a child from the moderate-risk group were 10.91% and 15.34% and for a child from the high-risk group 25.51% and 32.61%. School grades, thoughts of hurting oneself, and breaking the rules were the three most important factors distinguishing high-risk and low-risk children. We predicted the effect of the intervention on individual children, characterized by their risk/protective factors. The predicted absolute effect and relative effect of any intervention for any selected risk/protective profile of a given child may be utilized in both prevention practice and research. Copyright © 2015 Elsevier Ltd. All rights reserved.
Continuous-Time Classical and Quantum Random Walk on Direct Product of Cayley Graphs

NASA Astrophysics Data System (ADS)

Salimi, S.; Jafarizadeh, M. A.

2009-06-01

In this paper we define direct product of graphs and give a recipe for obtaining probability of observing particle on vertices in the continuous-time classical and quantum random walk. In the recipe, the probability of observing particle on direct product of graph is obtained by multiplication of probability on the corresponding to sub-graphs, where this method is useful to determining probability of walk on complicated graphs. Using this method, we calculate the probability of continuous-time classical and quantum random walks on many of finite direct product Cayley graphs (complete cycle, complete Kn, charter and n-cube). Also, we inquire that the classical state the stationary uniform distribution is reached as t → ∞ but for quantum state is not always satisfied.
Statistical considerations in evaluating pharmacogenomics-based clinical effect for confirmatory trials.

PubMed

Wang, Sue-Jane; O'Neill, Robert T; Hung, Hm James

2010-10-01

The current practice for seeking genomically favorable patients in randomized controlled clinical trials using genomic convenience samples. To discuss the extent of imbalance, confounding, bias, design efficiency loss, type I error, and type II error that can occur in the evaluation of the convenience samples, particularly when they are small samples. To articulate statistical considerations for a reasonable sample size to minimize the chance of imbalance, and, to highlight the importance of replicating the subgroup finding in independent studies. Four case examples reflecting recent regulatory experiences are used to underscore the problems with convenience samples. Probability of imbalance for a pre-specified subgroup is provided to elucidate sample size needed to minimize the chance of imbalance. We use an example drug development to highlight the level of scientific rigor needed, with evidence replicated for a pre-specified subgroup claim. The convenience samples evaluated ranged from 18% to 38% of the intent-to-treat samples with sample size ranging from 100 to 5000 patients per arm. The baseline imbalance can occur with probability higher than 25%. Mild to moderate multiple confounders yielding the same directional bias in favor of the treated group can make treatment group incomparable at baseline and result in a false positive conclusion that there is a treatment difference. Conversely, if the same directional bias favors the placebo group or there is loss in design efficiency, the type II error can increase substantially. Pre-specification of a genomic subgroup hypothesis is useful only for some degree of type I error control. Complete ascertainment of genomic samples in a randomized controlled trial should be the first step to explore if a favorable genomic patient subgroup suggests a treatment effect when there is no clear prior knowledge and understanding about how the mechanism of a drug target affects the clinical outcome of interest. When stratified randomization based on genomic biomarker status cannot be implemented in designing a pharmacogenomics confirmatory clinical trial, if there is one genomic biomarker prognostic for clinical response, as a general rule of thumb, a sample size of at least 100 patients may be needed to be considered for the lower prevalence genomic subgroup to minimize the chance of an imbalance of 20% or more difference in the prevalence of the genomic marker. The sample size may need to be at least 150, 350, and 1350, respectively, if an imbalance of 15%, 10% and 5% difference is of concern.
Nonrecurrence and Bell-like inequalities

NASA Astrophysics Data System (ADS)

Danforth, Douglas G.

2017-12-01

The general class, Λ, of Bell hidden variables is composed of two subclasses ΛR and ΛN such that ΛR⋃ΛN = Λ and ΛR∩ ΛN = {}. The class ΛN is very large and contains random variables whose domain is the continuum, the reals. There are an uncountable infinite number of reals. Every instance of a real random variable is unique. The probability of two instances being equal is zero, exactly zero. ΛN induces sample independence. All correlations are context dependent but not in the usual sense. There is no "spooky action at a distance". Random variables, belonging to ΛN, are independent from one experiment to the next. The existence of the class ΛN makes it impossible to derive any of the standard Bell inequalities used to define quantum entanglement.
Designing a national soil erosion monitoring network for England and Wales

NASA Astrophysics Data System (ADS)

Lark, Murray; Rawlins, Barry; Anderson, Karen; Evans, Martin; Farrow, Luke; Glendell, Miriam; James, Mike; Rickson, Jane; Quine, Timothy; Quinton, John; Brazier, Richard

2014-05-01

Although soil erosion is recognised as a significant threat to sustainable land use and may be a priority for action in any forthcoming EU Soil Framework Directive, those responsible for setting national policy with respect to erosion are constrained by a lack of robust, representative, data at large spatial scales. This reflects the process-orientated nature of much soil erosion research. Recognising this limitation, The UK Department for Environment, Food and Rural Affairs (Defra) established a project to pilot a cost-effective framework for monitoring of soil erosion in England and Wales (E&W). The pilot will compare different soil erosion monitoring methods at a site scale and provide statistical information for the final design of the full national monitoring network that will: provide unbiased estimates of the spatial mean of soil erosion rate across E&W (tonnes ha-1 yr-1) for each of three land-use classes - arable and horticultural grassland upland and semi-natural habitats quantify the uncertainty of these estimates with confidence intervals. Probability (design-based) sampling provides most efficient unbiased estimates of spatial means. In this study, a 16 hectare area (a square of 400 x 400 m) positioned at the centre of a 1-km grid cell, selected at random from mapped land use across E&W, provided the sampling support for measurement of erosion rates, with at least 94% of the support area corresponding to the target land use classes. Very small or zero erosion rates likely to be encountered at many sites reduce the sampling efficiency and make it difficult to compare different methods of soil erosion monitoring. Therefore, to increase the proportion of samples with larger erosion rates without biasing our estimates, we increased the inclusion probability density in areas where the erosion rate is likely to be large by using stratified random sampling. First, each sampling domain (land use class in E&W) was divided into strata; e.g. two sub-domains within which, respectively, small or no erosion rates, and moderate or larger erosion rates are expected. Each stratum was then sampled independently and at random. The sample density need not be equal in the two strata, but is known and is accounted for in the estimation of the mean and its standard error. To divide the domains into strata we used information on slope angle, previous interpretation of erosion susceptibility of the soil associations that correspond to the soil map of E&W at 1:250 000 (Soil Survey of England and Wales, 1983), and visual interpretation of evidence of erosion from aerial photography. While each domain could be stratified on the basis of the first two criteria, air photo interpretation across the whole country was not feasible. For this reason we used a two-phase random sampling for stratification (TPRS) design (de Gruijter et al., 2006). First, we formed an initial random sample of 1-km grid cells from the target domain. Second, each cell was then allocated to a stratum on the basis of the three criteria. A subset of the selected cells from each stratum were then selected for field survey at random, with a specified sampling density for each stratum so as to increase the proportion of cells where moderate or larger erosion rates were expected. Once measurements of erosion have been made, an estimate of the spatial mean of the erosion rate over the target domain, its standard error and associated uncertainty can be calculated by an expression which accounts for the estimated proportions of the two strata within the initial random sample. de Gruijter, J.J., Brus, D.J., Biekens, M.F.P. & Knotters, M. 2006. Sampling for Natural Resource Monitoring. Springer, Berlin. Soil Survey of England and Wales. 1983 National Soil Map NATMAP Vector 1:250,000. National Soil Research Institute, Cranfield University.
GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China

NASA Astrophysics Data System (ADS)

Xu, Chong; Dai, Fuchu; Xu, Xiwei; Lee, Yuan Hsi

2012-04-01

Support vector machine (SVM) modeling is based on statistical learning theory. It involves a training phase with associated input and target output values. In recent years, the method has become increasingly popular. The main purpose of this study is to evaluate the mapping power of SVM modeling in earthquake triggered landslide-susceptibility mapping for a section of the Jianjiang River watershed using a Geographic Information System (GIS) software. The river was affected by the Wenchuan earthquake of May 12, 2008. Visual interpretation of colored aerial photographs of 1-m resolution and extensive field surveys provided a detailed landslide inventory map containing 3147 landslides related to the 2008 Wenchuan earthquake. Elevation, slope angle, slope aspect, distance from seismogenic faults, distance from drainages, and lithology were used as the controlling parameters. For modeling, three groups of positive and negative training samples were used in concert with four different kernel functions. Positive training samples include the centroids of 500 large landslides, those of all 3147 landslides, and 5000 randomly selected points in landslide polygons. Negative training samples include 500, 3147, and 5000 randomly selected points on slopes that remained stable during the Wenchuan earthquake. The four kernel functions are linear, polynomial, radial basis, and sigmoid. In total, 12 cases of landslide susceptibility were mapped. Comparative analyses of landslide-susceptibility probability and area relation curves show that both the polynomial and radial basis functions suitably classified the input data as either landslide positive or negative though the radial basis function was more successful. The 12 generated landslide-susceptibility maps were compared with known landslide centroid locations and landslide polygons to verify the success rate and predictive accuracy of each model. The 12 results were further validated using area-under-curve analysis. Group 3 with 5000 randomly selected points on the landslide polygons, and 5000 randomly selected points along stable slopes gave the best results with a success rate of 79.20% and predictive accuracy of 79.13% under the radial basis function. Of all the results, the sigmoid kernel function was the least skillful when used in concert with the centroid data of all 3147 landslides as positive training samples, and the negative training samples of 3147 randomly selected points in regions of stable slope (success rate = 54.95%; predictive accuracy = 61.85%). This paper also provides suggestions and reference data for selecting appropriate training samples and kernel function types for earthquake triggered landslide-susceptibility mapping using SVM modeling. Predictive landslide-susceptibility maps could be useful in hazard mitigation by helping planners understand the probability of landslides in different regions.
The fast algorithm of spark in compressive sensing

NASA Astrophysics Data System (ADS)

Xie, Meihua; Yan, Fengxia

2017-01-01

Compressed Sensing (CS) is an advanced theory on signal sampling and reconstruction. In CS theory, the reconstruction condition of signal is an important theory problem, and spark is a good index to study this problem. But the computation of spark is NP hard. In this paper, we study the problem of computing spark. For some special matrixes, for example, the Gaussian random matrix and 0-1 random matrix, we obtain some conclusions. Furthermore, for Gaussian random matrix with fewer rows than columns, we prove that its spark equals to the number of its rows plus one with probability 1. For general matrix, two methods are given to compute its spark. One is the method of directly searching and the other is the method of dual-tree searching. By simulating 24 Gaussian random matrixes and 18 0-1 random matrixes, we tested the computation time of these two methods. Numerical results showed that the dual-tree searching method had higher efficiency than directly searching, especially for those matrixes which has as much as rows and columns.
Linking of uniform random polygons in confined spaces

NASA Astrophysics Data System (ADS)

Arsuaga, J.; Blackstone, T.; Diao, Y.; Karadayi, E.; Saito, M.

2007-03-01

In this paper, we study the topological entanglement of uniform random polygons in a confined space. We derive the formula for the mean squared linking number of such polygons. For a fixed simple closed curve in the confined space, we rigorously show that the linking probability between this curve and a uniform random polygon of n vertices is at least 1-O\\big(\\frac{1}{\\sqrt{n}}\\big) . Our numerical study also indicates that the linking probability between two uniform random polygons (in a confined space), of m and n vertices respectively, is bounded below by 1-O\\big(\\frac{1}{\\sqrt{mn}}\\big) . In particular, the linking probability between two uniform random polygons, both of n vertices, is bounded below by 1-O\\big(\\frac{1}{n}\\big) .
Genealogical Properties of Subsamples in Highly Fecund Populations

NASA Astrophysics Data System (ADS)

Eldon, Bjarki; Freund, Fabian

2018-03-01

We consider some genealogical properties of nested samples. The complete sample is assumed to have been drawn from a natural population characterised by high fecundity and sweepstakes reproduction (abbreviated HFSR). The random gene genealogies of the samples are—due to our assumption of HFSR—modelled by coalescent processes which admit multiple mergers of ancestral lineages looking back in time. Among the genealogical properties we consider are the probability that the most recent common ancestor is shared between the complete sample and the subsample nested within the complete sample; we also compare the lengths of `internal' branches of nested genealogies between different coalescent processes. The results indicate how `informative' a subsample is about the properties of the larger complete sample, how much information is gained by increasing the sample size, and how the `informativeness' of the subsample varies between different coalescent processes.
Normal probability plots with confidence.

PubMed

Chantarangsi, Wanpen; Liu, Wei; Bretz, Frank; Kiatsupaibul, Seksan; Hayter, Anthony J; Wan, Fang

2015-01-01

Normal probability plots are widely used as a statistical tool for assessing whether an observed simple random sample is drawn from a normally distributed population. The users, however, have to judge subjectively, if no objective rule is provided, whether the plotted points fall close to a straight line. In this paper, we focus on how a normal probability plot can be augmented by intervals for all the points so that, if the population distribution is normal, then all the points should fall into the corresponding intervals simultaneously with probability 1-α. These simultaneous 1-α probability intervals provide therefore an objective mean to judge whether the plotted points fall close to the straight line: the plotted points fall close to the straight line if and only if all the points fall into the corresponding intervals. The powers of several normal probability plot based (graphical) tests and the most popular nongraphical Anderson-Darling and Shapiro-Wilk tests are compared by simulation. Based on this comparison, recommendations are given in Section 3 on which graphical tests should be used in what circumstances. An example is provided to illustrate the methods. © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Phonotactic Probability Effects in Children Who Stutter

PubMed Central

Anderson, Julie D.; Byrd, Courtney T.

2008-01-01

Purpose The purpose of this study was to examine the influence of phonotactic probability, the frequency of different sound segments and segment sequences, on the overall fluency with which words are produced by preschool children who stutter (CWS), as well as to determine whether it has an effect on the type of stuttered disfluency produced. Method A 500+ word language sample was obtained from 19 CWS. Each stuttered word was randomly paired with a fluently produced word that closely matched it in grammatical class, word length, familiarity, word and neighborhood frequency, and neighborhood density. Phonotactic probability values were obtained for the stuttered and fluent words from an online database. Results Phonotactic probability did not have a significant influence on the overall susceptibility of words to stuttering, but it did impact the type of stuttered disfluency produced. In specific, single-syllable word repetitions were significantly lower in phonotactic probability than fluently produced words, as well as part-word repetitions and sound prolongations. Conclusions In general, the differential impact of phonotactic probability on the type of stuttering-like disfluency produced by young CWS provides some support for the notion that different disfluency types may originate in the disruption of different levels of processing. PMID:18658056
Probability of stress-corrosion fracture under random loading.

NASA Technical Reports Server (NTRS)

Yang, J.-N.

1972-01-01

A method is developed for predicting the probability of stress-corrosion fracture of structures under random loadings. The formulation is based on the cumulative damage hypothesis and the experimentally determined stress-corrosion characteristics. Under both stationary and nonstationary random loadings, the mean value and the variance of the cumulative damage are obtained. The probability of stress-corrosion fracture is then evaluated using the principle of maximum entropy. It is shown that, under stationary random loadings, the standard deviation of the cumulative damage increases in proportion to the square root of time, while the coefficient of variation (dispersion) decreases in inversed proportion to the square root of time. Numerical examples are worked out to illustrate the general results.
Violence, abuse, alcohol and drug use, and sexual behaviors in street children of Greater Cairo and Alexandria, Egypt.

PubMed

Nada, Khaled H; Suliman, El Daw A

2010-07-01

To measure the prevalence of HIV/AIDS risk behaviors and related factors in a large, probability-based sample of boys and girls aged 12-17 years living on the streets of Egypt's largest urban centers of Greater Cairo and Alexandria. Time-location sampling (TLS) was used to recruit a cross-sectional sample of street children. Procedures entailed using key informants and field observation to create a sampling frame of locations at predetermined time intervals of the day, where street children congregate in the two cities, selecting a random sample of time-locations from the complete list, and intercepting children in the selected time-locations to assess eligibility and conduct interviews. Interviews gathered basic demographic information, life events on the street (including violence, abuse, forced sex), sexual and drug use behaviors, and HIV/AIDS knowledge. A total of 857 street children were enrolled in the two cities, with an age, sex, and time-location composition matching the sampling frame. The majority of these children had faced harassment or abuse (93%) typically by police and other street children, had used drugs (62%), and, among the older adolescents, were sexually active (67%). Among the sexually active 15-17-year-olds, most reported multiple partners (54%) and never using condoms (52%). Most girls (53% in Greater Cairo and 90% in Alexandria) had experienced sexual abuse. The majority of street children experienced more than one of these risks. Overlaps with populations at highest risk for HIV were substantial, namely men who have sex with men, commercial sex workers, and injection drug users. Our study using a randomized TLS approach produced a rigorous, diverse, probability-based sample of street children and documented very high levels of multiple concurrent risks. Our findings strongly advocate for multiple services including those addressing HIV and STI prevention and care, substance use, shelters, and sensitization of authorities to the plight of street children in Egypt.
The random coding bound is tight for the average code.

NASA Technical Reports Server (NTRS)

Gallager, R. G.

1973-01-01

The random coding bound of information theory provides a well-known upper bound to the probability of decoding error for the best code of a given rate and block length. The bound is constructed by upperbounding the average error probability over an ensemble of codes. The bound is known to give the correct exponential dependence of error probability on block length for transmission rates above the critical rate, but it gives an incorrect exponential dependence at rates below a second lower critical rate. Here we derive an asymptotic expression for the average error probability over the ensemble of codes used in the random coding bound. The result shows that the weakness of the random coding bound at rates below the second critical rate is due not to upperbounding the ensemble average, but rather to the fact that the best codes are much better than the average at low rates.
Prevalence of anxiety, depression and post-traumatic stress disorder in the Kashmir Valley

PubMed Central

Lenglet, Annick; Ariti, Cono; Shah, Showkat; Shah, Helal; Ara, Shabnum; Viney, Kerri; Janes, Simon; Pintaldi, Giovanni

2017-01-01

Background Following the partition of India in 1947, the Kashmir Valley has been subject to continual political insecurity and ongoing conflict, the region remains highly militarised. We conducted a representative cross-sectional population-based survey of adults to estimate the prevalence and predictors of anxiety, depression and post-traumatic stress disorder (PTSD) in the 10 districts of the Kashmir Valley. Methods Between October and December 2015, we interviewed 5519 out of 5600 invited participants, ≥18 years of age, randomly sampled using a probability proportional to size cluster sampling design. We estimated the prevalence of a probable psychological disorder using the Hopkins Symptom Checklist (HSCL-25) and the Harvard Trauma Questionnaire (HTQ-16). Both screening instruments had been culturally adapted and translated. Data were weighted to account for the sampling design and multivariate logistic regression analysis was conducted to identify risk factors for developing symptoms of psychological distress. Findings The estimated prevalence of mental distress in adults in the Kashmir Valley was 45% (95% CI 42.6 to 47.0). We identified 41% (95% CI 39.2 to 43.4) of adults with probable depression, 26% (95% CI 23.8 to 27.5) with probable anxiety and 19% (95% CI 17.5 to 21.2) with probable PTSD. The three disorders were associated with the following characteristics: being female, over 55 years of age, having had no formal education, living in a rural area and being widowed/divorced or separated. A dose–response association was found between the number of traumatic events experienced or witnessed and all three mental disorders. Interpretation The implementation of mental health awareness programmes, interventions aimed at high risk groups and addressing trauma-related symptoms from all causes are needed in the Kashmir Valley. PMID:29082026
Generating intrinsically disordered protein conformational ensembles from a Markov chain

NASA Astrophysics Data System (ADS)

Cukier, Robert I.

2018-03-01

Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.
System reliability of randomly vibrating structures: Computational modeling and laboratory testing

NASA Astrophysics Data System (ADS)

Sundar, V. S.; Ammanagi, S.; Manohar, C. S.

2015-09-01

The problem of determination of system reliability of randomly vibrating structures arises in many application areas of engineering. We discuss in this paper approaches based on Monte Carlo simulations and laboratory testing to tackle problems of time variant system reliability estimation. The strategy we adopt is based on the application of Girsanov's transformation to the governing stochastic differential equations which enables estimation of probability of failure with significantly reduced number of samples than what is needed in a direct simulation study. Notably, we show that the ideas from Girsanov's transformation based Monte Carlo simulations can be extended to conduct laboratory testing to assess system reliability of engineering structures with reduced number of samples and hence with reduced testing times. Illustrative examples include computational studies on a 10-degree of freedom nonlinear system model and laboratory/computational investigations on road load response of an automotive system tested on a four-post test rig.
A large-scale study of the random variability of a coding sequence: a study on the CFTR gene.

PubMed

Modiano, Guido; Bombieri, Cristina; Ciminelli, Bianca Maria; Belpinati, Francesca; Giorgi, Silvia; Georges, Marie des; Scotet, Virginie; Pompei, Fiorenza; Ciccacci, Cinzia; Guittard, Caroline; Audrézet, Marie Pierre; Begnini, Angela; Toepfer, Michael; Macek, Milan; Ferec, Claude; Claustres, Mireille; Pignatti, Pier Franco

2005-02-01

Coding single nucleotide substitutions (cSNSs) have been studied on hundreds of genes using small samples (n(g) approximately 100-150 genes). In the present investigation, a large random European population sample (average n(g) approximately 1500) was studied for a single gene, the CFTR (Cystic Fibrosis Transmembrane conductance Regulator). The nonsynonymous (NS) substitutions exhibited, in accordance with previous reports, a mean probability of being polymorphic (q > 0.005), much lower than that of the synonymous (S) substitutions, but they showed a similar rate of subpolymorphic (q < 0.005) variability. This indicates that, in autosomal genes that may have harmful recessive alleles (nonduplicated genes with important functions), genetic drift overwhelms selection in the subpolymorphic range of variability, making disadvantageous alleles behave as neutral. These results imply that the majority of the subpolymorphic nonsynonymous alleles of these genes are selectively negative or even pathogenic.

Auxiliary Parameter MCMC for Exponential Random Graph Models

NASA Astrophysics Data System (ADS)

Byshkin, Maksym; Stivala, Alex; Mira, Antonietta; Krause, Rolf; Robins, Garry; Lomi, Alessandro

2016-11-01

Exponential random graph models (ERGMs) are a well-established family of statistical models for analyzing social networks. Computational complexity has so far limited the appeal of ERGMs for the analysis of large social networks. Efficient computational methods are highly desirable in order to extend the empirical scope of ERGMs. In this paper we report results of a research project on the development of snowball sampling methods for ERGMs. We propose an auxiliary parameter Markov chain Monte Carlo (MCMC) algorithm for sampling from the relevant probability distributions. The method is designed to decrease the number of allowed network states without worsening the mixing of the Markov chains, and suggests a new approach for the developments of MCMC samplers for ERGMs. We demonstrate the method on both simulated and actual (empirical) network data and show that it reduces CPU time for parameter estimation by an order of magnitude compared to current MCMC methods.
Combinatorial Statistics on Trees and Networks

DTIC Science & Technology

2010-09-29

interaction graph is drawn from the Erdos- Renyi , G(n,p), where each edge is present independently with probability p. For this model we establish a double...special interest is the behavior of Gibbs sampling on the Erdos- Renyi random graph G{n, d/n), where each edge is chosen independently with...which have no counterparts in the coloring setting. Our proof presented here exploits in novel ways the local treelike structure of Erdos- Renyi
Interlaboratory Reproducibility and Proficiency Testing within the Human Papillomavirus Cervical Cancer Screening Program in Catalonia, Spain

PubMed Central

Ibáñez, R.; Félez-Sánchez, M.; Godínez, J. M.; Guardià, C.; Caballero, E.; Juve, R.; Combalia, N.; Bellosillo, B.; Cuevas, D.; Moreno-Crespi, J.; Pons, L.; Autonell, J.; Gutierrez, C.; Ordi, J.; de Sanjosé, S.

2014-01-01

In Catalonia, a screening protocol for cervical cancer, including human papillomavirus (HPV) DNA testing using the Digene Hybrid Capture 2 (HC2) assay, was implemented in 2006. In order to monitor interlaboratory reproducibility, a proficiency testing (PT) survey of the HPV samples was launched in 2008. The aim of this study was to explore the repeatability of the HC2 assay's performance. Participating laboratories provided 20 samples annually, 5 randomly chosen samples from each of the following relative light unit (RLU) intervals: <0.5, 0.5 to 0.99, 1 to 9.99, and ≥10. Kappa statistics were used to determine the agreement levels between the original and the PT readings. The nature and origin of the discrepant results were calculated by bootstrapping. A total of 946 specimens were retested. The kappa values were 0.91 for positive/negative categorical classification and 0.79 for the four RLU intervals studied. Sample retesting yielded systematically lower RLU values than the original test (P < 0.005), independently of the time elapsed between the two determinations (median, 53 days), possibly due to freeze-thaw cycles. The probability for a sample to show clinically discrepant results upon retesting was a function of the RLU value; samples with RLU values in the 0.5 to 5 interval showed 10.80% probability to yield discrepant results (95% confidence interval [CI], 7.86 to 14.33) compared to 0.85% probability for samples outside this interval (95% CI, 0.17 to 1.69). Globally, the HC2 assay shows high interlaboratory concordance. We have identified differential confidence thresholds and suggested the guidelines for interlaboratory PT in the future, as analytical quality assessment of HPV DNA detection remains a central component of the screening program for cervical cancer prevention. PMID:24574284
Perceptions of randomized security schedules.

PubMed

Scurich, Nicholas; John, Richard S

2014-04-01

Security of infrastructure is a major concern. Traditional security schedules are unable to provide omnipresent coverage; consequently, adversaries can exploit predictable vulnerabilities to their advantage. Randomized security schedules, which randomly deploy security measures, overcome these limitations, but public perceptions of such schedules have not been examined. In this experiment, participants were asked to make a choice between attending a venue that employed a traditional (i.e., search everyone) or a random (i.e., a probability of being searched) security schedule. The absolute probability of detecting contraband was manipulated (i.e., 1/10, 1/4, 1/2) but equivalent between the two schedule types. In general, participants were indifferent to either security schedule, regardless of the probability of detection. The randomized schedule was deemed more convenient, but the traditional schedule was considered fairer and safer. There were no differences between traditional and random schedule in terms of perceived effectiveness or deterrence. Policy implications for the implementation and utilization of randomized schedules are discussed. © 2013 Society for Risk Analysis.
Method and apparatus for detecting a desired behavior in digital image data

DOEpatents

Kegelmeyer, Jr., W. Philip

1997-01-01

A method for detecting stellate lesions in digitized mammographic image data includes the steps of prestoring a plurality of reference images, calculating a plurality of features for each of the pixels of the reference images, and creating a binary decision tree from features of randomly sampled pixels from each of the reference images. Once the binary decision tree has been created, a plurality of features, preferably including an ALOE feature (analysis of local oriented edges), are calculated for each of the pixels of the digitized mammographic data. Each of these plurality of features of each pixel are input into the binary decision tree and a probability is determined, for each of the pixels, corresponding to the likelihood of the presence of a stellate lesion, to create a probability image. Finally, the probability image is spatially filtered to enforce local consensus among neighboring pixels and the spatially filtered image is output.
Method and apparatus for detecting a desired behavior in digital image data

DOEpatents

Kegelmeyer, Jr., W. Philip

1997-01-01

A method for detecting stellate lesions in digitized mammographic image data includes the steps of prestoring a plurality of reference images, calculating a plurality of features for each of the pixels of the reference images, and creating a binary decision tree from features of randomly sampled pixels from each of the reference images. Once the binary decision tree has been created, a plurality of features, preferably including an ALOE feature (analysis of local oriented edges), are calculated for each of the pixels of the digitized mammographic data. Each of these plurality of features of each pixel are input into the binary decision tree and a probability is determined, for each of the pixels, corresponding to the likelihood of the presence of a stellate lesion, to create a probability image. Finally, the probability image is spacially filtered to enforce local consensus among neighboring pixels and the spacially filtered image is output.
Strong profiling is not mathematically optimal for discovering rare malfeasors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Press, William H

2008-01-01

In a large population of individuals labeled j = 1,2,...,N, governments attempt to find the rare malfeasor j = j, (terrorist, for example) by making use of priors p{sub j} that estimate the probability of individual j being a malfeasor. Societal resources for secondary random screening such as airport search or police investigation are concentrated against individuals with the largest priors. They may call this 'strong profiling' if the concentration is at least proportional to p{sub j} for the largest values. Strong profiling often results in higher probability, but otherwise innocent, individuals being repeatedly subjected to screening. They show heremore » that, entirely apart from considerations of social policy, strong profiling is not mathematically optimal at finding malfeasors. Even if prior probabilities were accurate, their optimal use would be only as roughly the geometric mean between a strong profiling and a completely uniform sampling of the population.« less
Exact and Approximate Probabilistic Symbolic Execution

NASA Technical Reports Server (NTRS)

Luckow, Kasper; Pasareanu, Corina S.; Dwyer, Matthew B.; Filieri, Antonio; Visser, Willem

2014-01-01

Probabilistic software analysis seeks to quantify the likelihood of reaching a target event under uncertain environments. Recent approaches compute probabilities of execution paths using symbolic execution, but do not support nondeterminism. Nondeterminism arises naturally when no suitable probabilistic model can capture a program behavior, e.g., for multithreading or distributed systems. In this work, we propose a technique, based on symbolic execution, to synthesize schedulers that resolve nondeterminism to maximize the probability of reaching a target event. To scale to large systems, we also introduce approximate algorithms to search for good schedulers, speeding up established random sampling and reinforcement learning results through the quantification of path probabilities based on symbolic execution. We implemented the techniques in Symbolic PathFinder and evaluated them on nondeterministic Java programs. We show that our algorithms significantly improve upon a state-of- the-art statistical model checking algorithm, originally developed for Markov Decision Processes.
Sampling--how big a sample?

PubMed

Aitken, C G

1999-07-01

It is thought that, in a consignment of discrete units, a certain proportion of the units contain illegal material. A sample of the consignment is to be inspected. Various methods for the determination of the sample size are compared. The consignment will be considered as a random sample from some super-population of units, a certain proportion of which contain drugs. For large consignments, a probability distribution, known as the beta distribution, for the proportion of the consignment which contains illegal material is obtained. This distribution is based on prior beliefs about the proportion. Under certain specific conditions the beta distribution gives the same numerical results as an approach based on the binomial distribution. The binomial distribution provides a probability for the number of units in a sample which contain illegal material, conditional on knowing the proportion of the consignment which contains illegal material. This is in contrast to the beta distribution which provides probabilities for the proportion of a consignment which contains illegal material, conditional on knowing the number of units in the sample which contain illegal material. The interpretation when the beta distribution is used is much more intuitively satisfactory. It is also much more flexible in its ability to cater for prior beliefs which may vary given the different circumstances of different crimes. For small consignments, a distribution, known as the beta-binomial distribution, for the number of units in the consignment which are found to contain illegal material, is obtained, based on prior beliefs about the number of units in the consignment which are thought to contain illegal material. As with the beta and binomial distributions for large samples, it is shown that, in certain specific conditions, the beta-binomial and hypergeometric distributions give the same numerical results. However, the beta-binomial distribution, as with the beta distribution, has a more intuitively satisfactory interpretation and greater flexibility. The beta and the beta-binomial distributions provide methods for the determination of the minimum sample size to be taken from a consignment in order to satisfy a certain criterion. The criterion requires the specification of a proportion and a probability.
Using multilevel spatial models to understand salamander site occupancy patterns after wildfire

USGS Publications Warehouse

Chelgren, Nathan; Adams, Michael J.; Bailey, Larissa L.; Bury, R. Bruce

2011-01-01

Studies of the distribution of elusive forest wildlife have suffered from the confounding of true presence with the uncertainty of detection. Occupancy modeling, which incorporates probabilities of species detection conditional on presence, is an emerging approach for reducing observation bias. However, the current likelihood modeling framework is restrictive for handling unexplained sources of variation in the response that may occur when there are dependence structures such as smaller sampling units that are nested within larger sampling units. We used multilevel Bayesian occupancy modeling to handle dependence structures and to partition sources of variation in occupancy of sites by terrestrial salamanders (family Plethodontidae) within and surrounding an earlier wildfire in western Oregon, USA. Comparison of model fit favored a spatial N-mixture model that accounted for variation in salamander abundance over models that were based on binary detection/non-detection data. Though catch per unit effort was higher in burned areas than unburned, there was strong support that this pattern was due to a higher probability of capture for individuals in burned plots. Within the burn, the odds of capturing an individual given it was present were 2.06 times the odds outside the burn, reflecting reduced complexity of ground cover in the burn. There was weak support that true occupancy was lower within the burned area. While the odds of occupancy in the burn were 0.49 times the odds outside the burn among the five species, the magnitude of variation attributed to the burn was small in comparison to variation attributed to other landscape variables and to unexplained, spatially autocorrelated random variation. While ordinary occupancy models may separate the biological pattern of interest from variation in detection probability when all sources of variation are known, the addition of random effects structures for unexplained sources of variation in occupancy and detection probability may often more appropriately represent levels of uncertainty. ?? 2011 by the Ecological Society of America.
Robustness-Based Design Optimization Under Data Uncertainty

NASA Technical Reports Server (NTRS)

Zaman, Kais; McDonald, Mark; Mahadevan, Sankaran; Green, Lawrence

2010-01-01

This paper proposes formulations and algorithms for design optimization under both aleatory (i.e., natural or physical variability) and epistemic uncertainty (i.e., imprecise probabilistic information), from the perspective of system robustness. The proposed formulations deal with epistemic uncertainty arising from both sparse and interval data without any assumption about the probability distributions of the random variables. A decoupled approach is proposed in this paper to un-nest the robustness-based design from the analysis of non-design epistemic variables to achieve computational efficiency. The proposed methods are illustrated for the upper stage design problem of a two-stage-to-orbit (TSTO) vehicle, where the information on the random design inputs are only available as sparse point and/or interval data. As collecting more data reduces uncertainty but increases cost, the effect of sample size on the optimality and robustness of the solution is also studied. A method is developed to determine the optimal sample size for sparse point data that leads to the solutions of the design problem that are least sensitive to variations in the input random variables.
A Semi-Analytical Method for the PDFs of A Ship Rolling in Random Oblique Waves

NASA Astrophysics Data System (ADS)

Liu, Li-qin; Liu, Ya-liu; Xu, Wan-hai; Li, Yan; Tang, You-gang

2018-03-01

The PDFs (probability density functions) and probability of a ship rolling under the random parametric and forced excitations were studied by a semi-analytical method. The rolling motion equation of the ship in random oblique waves was established. The righting arm obtained by the numerical simulation was approximately fitted by an analytical function. The irregular waves were decomposed into two Gauss stationary random processes, and the CARMA (2, 1) model was used to fit the spectral density function of parametric and forced excitations. The stochastic energy envelope averaging method was used to solve the PDFs and the probability. The validity of the semi-analytical method was verified by the Monte Carlo method. The C11 ship was taken as an example, and the influences of the system parameters on the PDFs and probability were analyzed. The results show that the probability of ship rolling is affected by the characteristic wave height, wave length, and the heading angle. In order to provide proper advice for the ship's manoeuvring, the parametric excitations should be considered appropriately when the ship navigates in the oblique seas.
Maximum-entropy probability distributions under Lp-norm constraints

NASA Technical Reports Server (NTRS)

Dolinar, S.

1991-01-01

Continuous probability density functions and discrete probability mass functions are tabulated which maximize the differential entropy or absolute entropy, respectively, among all probability distributions with a given L sub p norm (i.e., a given pth absolute moment when p is a finite integer) and unconstrained or constrained value set. Expressions for the maximum entropy are evaluated as functions of the L sub p norm. The most interesting results are obtained and plotted for unconstrained (real valued) continuous random variables and for integer valued discrete random variables. The maximum entropy expressions are obtained in closed form for unconstrained continuous random variables, and in this case there is a simple straight line relationship between the maximum differential entropy and the logarithm of the L sub p norm. Corresponding expressions for arbitrary discrete and constrained continuous random variables are given parametrically; closed form expressions are available only for special cases. However, simpler alternative bounds on the maximum entropy of integer valued discrete random variables are obtained by applying the differential entropy results to continuous random variables which approximate the integer valued random variables in a natural manner. All the results are presented in an integrated framework that includes continuous and discrete random variables, constraints on the permissible value set, and all possible values of p. Understanding such as this is useful in evaluating the performance of data compression schemes.
Linkage of Viral Sequences among HIV-Infected Village Residents in Botswana: Estimation of Linkage Rates in the Presence of Missing Data

PubMed Central

Carnegie, Nicole Bohme; Wang, Rui; Novitsky, Vladimir; De Gruttola, Victor

2014-01-01

Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40–100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data. PMID:24415932
Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data.

PubMed

Carnegie, Nicole Bohme; Wang, Rui; Novitsky, Vladimir; De Gruttola, Victor

2014-01-01

Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40-100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data.
Gene expression pattern recognition algorithm inferences to classify samples exposed to chemical agents

NASA Astrophysics Data System (ADS)

Bushel, Pierre R.; Bennett, Lee; Hamadeh, Hisham; Green, James; Ableson, Alan; Misener, Steve; Paules, Richard; Afshari, Cynthia

2002-06-01

We present an analysis of pattern recognition procedures used to predict the classes of samples exposed to pharmacologic agents by comparing gene expression patterns from samples treated with two classes of compounds. Rat liver mRNA samples following exposure for 24 hours with phenobarbital or peroxisome proliferators were analyzed using a 1700 rat cDNA microarray platform. Sets of genes that were consistently differentially expressed in the rat liver samples following treatment were stored in the MicroArray Project System (MAPS) database. MAPS identified 238 genes in common that possessed a low probability (P < 0.01) of being randomly detected as differentially expressed at the 95% confidence level. Hierarchical cluster analysis on the 238 genes clustered specific gene expression profiles that separated samples based on exposure to a particular class of compound.
On estimating probability of presence from use-availability or presence-background data.

PubMed

Phillips, Steven J; Elith, Jane

2013-06-01

A fundamental ecological modeling task is to estimate the probability that a species is present in (or uses) a site, conditional on environmental variables. For many species, available data consist of "presence" data (locations where the species [or evidence of it] has been observed), together with "background" data, a random sample of available environmental conditions. Recently published papers disagree on whether probability of presence is identifiable from such presence-background data alone. This paper aims to resolve the disagreement, demonstrating that additional information is required. We defined seven simulated species representing various simple shapes of response to environmental variables (constant, linear, convex, unimodal, S-shaped) and ran five logistic model-fitting methods using 1000 presence samples and 10 000 background samples; the simulations were repeated 100 times. The experiment revealed a stark contrast between two groups of methods: those based on a strong assumption that species' true probability of presence exactly matches a given parametric form had highly variable predictions and much larger RMS error than methods that take population prevalence (the fraction of sites in which the species is present) as an additional parameter. For six species, the former group grossly under- or overestimated probability of presence. The cause was not model structure or choice of link function, because all methods were logistic with linear and, where necessary, quadratic terms. Rather, the experiment demonstrates that an estimate of prevalence is not just helpful, but is necessary (except in special cases) for identifying probability of presence. We therefore advise against use of methods that rely on the strong assumption, due to Lele and Keim (recently advocated by Royle et al.) and Lancaster and Imbens. The methods are fragile, and their strong assumption is unlikely to be true in practice. We emphasize, however, that we are not arguing against standard statistical methods such as logistic regression, generalized linear models, and so forth, none of which requires the strong assumption. If probability of presence is required for a given application, there is no panacea for lack of data. Presence-background data must be augmented with an additional datum, e.g., species' prevalence, to reliably estimate absolute (rather than relative) probability of presence.
Estimation of the lower and upper bounds on the probability of failure using subset simulation and random set theory

NASA Astrophysics Data System (ADS)

Alvarez, Diego A.; Uribe, Felipe; Hurtado, Jorge E.

2018-02-01

Random set theory is a general framework which comprises uncertainty in the form of probability boxes, possibility distributions, cumulative distribution functions, Dempster-Shafer structures or intervals; in addition, the dependence between the input variables can be expressed using copulas. In this paper, the lower and upper bounds on the probability of failure are calculated by means of random set theory. In order to accelerate the calculation, a well-known and efficient probability-based reliability method known as subset simulation is employed. This method is especially useful for finding small failure probabilities in both low- and high-dimensional spaces, disjoint failure domains and nonlinear limit state functions. The proposed methodology represents a drastic reduction of the computational labor implied by plain Monte Carlo simulation for problems defined with a mixture of representations for the input variables, while delivering similar results. Numerical examples illustrate the efficiency of the proposed approach.
Probability of stress-corrosion fracture under random loading

NASA Technical Reports Server (NTRS)

Yang, J. N.

1974-01-01

Mathematical formulation is based on cumulative-damage hypothesis and experimentally-determined stress-corrosion characteristics. Under both stationary random loadings, mean value and variance of cumulative damage are obtained. Probability of stress-corrosion fracture is then evaluated, using principle of maximum entropy.
Agricultural water demand, water quality and crop suitability in Souk-Alkhamis Al-Khums, Libya

NASA Astrophysics Data System (ADS)

Abunnour, Mohamed Ali; Hashim, Noorazuan Bin Md.; Jaafar, Mokhtar Bin

2016-06-01

Water scarcity, unequal population distribution and agricultural activities increased in the coastal plains, and the probability of seawater intrusion with ground water. According to this, the quantitative and qualitative deterioration of underground water quality has become a potential for the occurrence, in addition to the decline in agricultural production in the study area. This paper aims to discover the use of ground water for irrigation in agriculture and their suitability and compatibility for agricultural. On the other hand, the quality is determines by the cultivated crops. 16 random samples of regular groundwater are collected and analyzed chemically. Questionnaires are also distributed randomly on regular basis to farmers.

On the number of infinite geodesics and ground states in disordered systems

NASA Astrophysics Data System (ADS)

Wehr, Jan

1997-04-01

We study first-passage percolation models and their higher dimensional analogs—models of surfaces with random weights. We prove that under very general conditions the number of lines or, in the second case, hypersurfaces which locally minimize the sum of the random weights is with probability one equal to 0 or with probability one equal to +∞. As corollaries we show that in any dimension d≥2 the number of ground states of an Ising ferromagnet with random coupling constants equals (with probability one) 2 or +∞. Proofs employ simple large-deviation estimates and ergodic arguments.
Dynamic Response of an Optomechanical System to a Stationary Random Excitation in the Time Domain

DOE PAGES

Palmer, Jeremy A.; Paez, Thomas L.

2011-01-01

Modern electro-optical instruments are typically designed with assemblies of optomechanical members that support optics such that alignment is maintained in service environments that include random vibration loads. This paper presents a nonlinear numerical analysis that calculates statistics for the peak lateral response of optics in an optomechanical sub-assembly subject to random excitation of the housing. The work is unique in that the prior art does not address peak response probability distribution for stationary random vibration in the time domain for a common lens-retainer-housing system with Coulomb damping. Analytical results are validated by using displacement response data from random vibration testingmore » of representative prototype sub-assemblies. A comparison of predictions to experimental results yields reasonable agreement. The Type I Asymptotic form provides the cumulative distribution function for peak response probabilities. Probabilities are calculated for actual lens centration tolerances. The probability that peak response will not exceed the centration tolerance is greater than 80% for prototype configurations where the tolerance is high (on the order of 30 micrometers). Conversely, the probability is low for those where the tolerance is less than 20 micrometers. The analysis suggests a design paradigm based on the influence of lateral stiffness on the magnitude of the response.« less
Hazard Function Estimation with Cause-of-Death Data Missing at Random.

PubMed

Wang, Qihua; Dinse, Gregg E; Liu, Chunling

2012-04-01

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data.
Effects of ignition location models on the burn patterns of simulated wildfires

USGS Publications Warehouse

Bar-Massada, A.; Syphard, A.D.; Hawbaker, T.J.; Stewart, S.I.; Radeloff, V.C.

2011-01-01

Fire simulation studies that use models such as FARSITE often assume that ignition locations are distributed randomly, because spatially explicit information about actual ignition locations are difficult to obtain. However, many studies show that the spatial distribution of ignition locations, whether human-caused or natural, is non-random. Thus, predictions from fire simulations based on random ignitions may be unrealistic. However, the extent to which the assumption of ignition location affects the predictions of fire simulation models has never been systematically explored. Our goal was to assess the difference in fire simulations that are based on random versus non-random ignition location patterns. We conducted four sets of 6000 FARSITE simulations for the Santa Monica Mountains in California to quantify the influence of random and non-random ignition locations and normal and extreme weather conditions on fire size distributions and spatial patterns of burn probability. Under extreme weather conditions, fires were significantly larger for non-random ignitions compared to random ignitions (mean area of 344.5 ha and 230.1 ha, respectively), but burn probability maps were highly correlated (r = 0.83). Under normal weather, random ignitions produced significantly larger fires than non-random ignitions (17.5 ha and 13.3 ha, respectively), and the spatial correlations between burn probability maps were not high (r = 0.54), though the difference in the average burn probability was small. The results of the study suggest that the location of ignitions used in fire simulation models may substantially influence the spatial predictions of fire spread patterns. However, the spatial bias introduced by using a random ignition location model may be minimized if the fire simulations are conducted under extreme weather conditions when fire spread is greatest. ?? 2010 Elsevier Ltd.
Probability distribution for the Gaussian curvature of the zero level surface of a random function

NASA Astrophysics Data System (ADS)

Hannay, J. H.

2018-04-01

A rather natural construction for a smooth random surface in space is the level surface of value zero, or ‘nodal’ surface f(x,y,z) = 0, of a (real) random function f; the interface between positive and negative regions of the function. A physically significant local attribute at a point of a curved surface is its Gaussian curvature (the product of its principal curvatures) because, when integrated over the surface it gives the Euler characteristic. Here the probability distribution for the Gaussian curvature at a random point on the nodal surface f = 0 is calculated for a statistically homogeneous (‘stationary’) and isotropic zero mean Gaussian random function f. Capitalizing on the isotropy, a ‘fixer’ device for axes supplies the probability distribution directly as a multiple integral. Its evaluation yields an explicit algebraic function with a simple average. Indeed, this average Gaussian curvature has long been known. For a non-zero level surface instead of the nodal one, the probability distribution is not fully tractable, but is supplied as an integral expression.
Safety and Efficacy of Nanocurcumin as Add-On Therapy to Riluzole in Patients With Amyotrophic Lateral Sclerosis: A Pilot Randomized Clinical Trial.

PubMed

Ahmadi, Mona; Agah, Elmira; Nafissi, Shahriar; Jaafari, Mahmoud Reza; Harirchian, Mohammad Hossein; Sarraf, Payam; Faghihi-Kashani, Sara; Hosseini, Seyed Jalal; Ghoreishi, Abdolreza; Aghamollaii, Vajiheh; Hosseini, Mostafa; Tafakhori, Abbas

2018-04-01

The objective of present study was to assess the safety and efficacy of nanocurcumin as an anti-inflammatory and antioxidant agent in adults with amyotrophic lateral sclerosis (ALS). We conducted a 12-month, double-blind, randomized, placebo-controlled trial at a neurological referral center in Iran. Eligible patients with a definite or probable ALS diagnosis were randomly assigned to receive either nanocurcumin (80 mg daily) or placebo in a 1:1 ratio. A computerized random number generator was used to prepare the randomization list. All patients and research investigators were blinded to treatment allocation. The primary outcome was survival, and event was defined to be death or mechanical ventilation dependency. Analysis was by intention-to-treat and included all patients who received at least one dose of study drug. A total of 54 patients were randomized to receive either nanocurcumin (n = 27) or placebo (n = 27). After 12 months, events occurred in 1 patient (3.7%) in the nanocurcumin group and in 6 patients (22.2%) in the placebo group. Kaplan-Meier analysis revealed a significant difference between the study groups regarding their survival curves (p = 0.036). No significant between-group differences were observed for any other outcome measures. No serious adverse events or treatment-related deaths were detected. No patients withdrew as a result of drug adverse events. The results suggest that nanocurcumin is safe and might improve the probability of survival as an add-on treatment in patients with ALS, especially in those with existing bulbar symptoms. Future studies with larger sample sizes and of longer duration are needed to confirm these findings.
Computer simulation of random variables and vectors with arbitrary probability distribution laws

NASA Technical Reports Server (NTRS)

Bogdan, V. M.

1981-01-01

Assume that there is given an arbitrary n-dimensional probability distribution F. A recursive construction is found for a sequence of functions x sub 1 = f sub 1 (U sub 1, ..., U sub n), ..., x sub n = f sub n (U sub 1, ..., U sub n) such that if U sub 1, ..., U sub n are independent random variables having uniform distribution over the open interval (0,1), then the joint distribution of the variables x sub 1, ..., x sub n coincides with the distribution F. Since uniform independent random variables can be well simulated by means of a computer, this result allows one to simulate arbitrary n-random variables if their joint probability distribution is known.
Psychopathology among New York city public school children 6 months after September 11.

PubMed

Hoven, Christina W; Duarte, Cristiane S; Lucas, Christopher P; Wu, Ping; Mandell, Donald J; Goodwin, Renee D; Cohen, Michael; Balaban, Victor; Woodruff, Bradley A; Bin, Fan; Musa, George J; Mei, Lori; Cantor, Pamela A; Aber, J Lawrence; Cohen, Patricia; Susser, Ezra

2005-05-01

Children exposed to a traumatic event may be at higher risk for developing mental disorders. The prevalence of child psychopathology, however, has not been assessed in a population-based sample exposed to different levels of mass trauma or across a range of disorders. To determine prevalence and correlates of probable mental disorders among New York City, NY, public school students 6 months following the September 11, 2001, World Trade Center attack. Survey. New York City public schools. A citywide, random, representative sample of 8236 students in grades 4 through 12, including oversampling in closest proximity to the World Trade Center site (ground zero) and other high-risk areas. Children were screened for probable mental disorders with the Diagnostic Interview Schedule for Children Predictive Scales. One or more of 6 probable anxiety/depressive disorders were identified in 28.6% of all children. The most prevalent were probable agoraphobia (14.8%), probable separation anxiety (12.3%), and probable posttraumatic stress disorder (10.6%). Higher levels of exposure correspond to higher prevalence for all probable anxiety/depressive disorders. Girls and children in grades 4 and 5 were the most affected. In logistic regression analyses, child's exposure (adjusted odds ratio, 1.62), exposure of a child's family member (adjusted odds ratio, 1.80), and the child's prior trauma (adjusted odds ratio, 2.01) were related to increased likelihood of probable anxiety/depressive disorders. Results were adjusted for different types of exposure, sociodemographic characteristics, and child mental health service use. A high proportion of New York City public school children had a probable mental disorder 6 months after September 11, 2001. The data suggest that there is a relationship between level of exposure to trauma and likelihood of child anxiety/depressive disorders in the community. The results support the need to apply wide-area epidemiological approaches to mental health assessment after any large-scale disaster.
A risk assessment method for multi-site damage

NASA Astrophysics Data System (ADS)

Millwater, Harry Russell, Jr.

This research focused on developing probabilistic methods suitable for computing small probabilities of failure, e.g., 10sp{-6}, of structures subject to multi-site damage (MSD). MSD is defined as the simultaneous development of fatigue cracks at multiple sites in the same structural element such that the fatigue cracks may coalesce to form one large crack. MSD is modeled as an array of collinear cracks with random initial crack lengths with the centers of the initial cracks spaced uniformly apart. The data used was chosen to be representative of aluminum structures. The structure is considered failed whenever any two adjacent cracks link up. A fatigue computer model is developed that can accurately and efficiently grow a collinear array of arbitrary length cracks from initial size until failure. An algorithm is developed to compute the stress intensity factors of all cracks considering all interaction effects. The probability of failure of two to 100 cracks is studied. Lower bounds on the probability of failure are developed based upon the probability of the largest crack exceeding a critical crack size. The critical crack size is based on the initial crack size that will grow across the ligament when the neighboring crack has zero length. The probability is evaluated using extreme value theory. An upper bound is based on the probability of the maximum sum of initial cracks being greater than a critical crack size. A weakest link sampling approach is developed that can accurately and efficiently compute small probabilities of failure. This methodology is based on predicting the weakest link, i.e., the two cracks to link up first, for a realization of initial crack sizes, and computing the cycles-to-failure using these two cracks. Criteria to determine the weakest link are discussed. Probability results using the weakest link sampling method are compared to Monte Carlo-based benchmark results. The results indicate that very small probabilities can be computed accurately in a few minutes using a Hewlett-Packard workstation.
Network Sampling with Memory: A proposal for more efficient sampling from social networks.

PubMed

Mouw, Ted; Verdery, Ashton M

2012-08-01

Techniques for sampling from networks have grown into an important area of research across several fields. For sociologists, the possibility of sampling from a network is appealing for two reasons: (1) A network sample can yield substantively interesting data about network structures and social interactions, and (2) it is useful in situations where study populations are difficult or impossible to survey with traditional sampling approaches because of the lack of a sampling frame. Despite its appeal, methodological concerns about the precision and accuracy of network-based sampling methods remain. In particular, recent research has shown that sampling from a network using a random walk based approach such as Respondent Driven Sampling (RDS) can result in high design effects (DE)-the ratio of the sampling variance to the sampling variance of simple random sampling (SRS). A high design effect means that more cases must be collected to achieve the same level of precision as SRS. In this paper we propose an alternative strategy, Network Sampling with Memory (NSM), which collects network data from respondents in order to reduce design effects and, correspondingly, the number of interviews needed to achieve a given level of statistical power. NSM combines a "List" mode, where all individuals on the revealed network list are sampled with the same cumulative probability, with a "Search" mode, which gives priority to bridge nodes connecting the current sample to unexplored parts of the network. We test the relative efficiency of NSM compared to RDS and SRS on 162 school and university networks from Add Health and Facebook that range in size from 110 to 16,278 nodes. The results show that the average design effect for NSM on these 162 networks is 1.16, which is very close to the efficiency of a simple random sample (DE=1), and 98.5% lower than the average DE we observed for RDS.
Network Sampling with Memory: A proposal for more efficient sampling from social networks

PubMed Central

Mouw, Ted; Verdery, Ashton M.

2013-01-01

Techniques for sampling from networks have grown into an important area of research across several fields. For sociologists, the possibility of sampling from a network is appealing for two reasons: (1) A network sample can yield substantively interesting data about network structures and social interactions, and (2) it is useful in situations where study populations are difficult or impossible to survey with traditional sampling approaches because of the lack of a sampling frame. Despite its appeal, methodological concerns about the precision and accuracy of network-based sampling methods remain. In particular, recent research has shown that sampling from a network using a random walk based approach such as Respondent Driven Sampling (RDS) can result in high design effects (DE)—the ratio of the sampling variance to the sampling variance of simple random sampling (SRS). A high design effect means that more cases must be collected to achieve the same level of precision as SRS. In this paper we propose an alternative strategy, Network Sampling with Memory (NSM), which collects network data from respondents in order to reduce design effects and, correspondingly, the number of interviews needed to achieve a given level of statistical power. NSM combines a “List” mode, where all individuals on the revealed network list are sampled with the same cumulative probability, with a “Search” mode, which gives priority to bridge nodes connecting the current sample to unexplored parts of the network. We test the relative efficiency of NSM compared to RDS and SRS on 162 school and university networks from Add Health and Facebook that range in size from 110 to 16,278 nodes. The results show that the average design effect for NSM on these 162 networks is 1.16, which is very close to the efficiency of a simple random sample (DE=1), and 98.5% lower than the average DE we observed for RDS. PMID:24159246
[Biometric bases: basic concepts of probability calculation].

PubMed

Dinya, E

1998-04-26

The author gives or outline of the basic concepts of probability theory. The bases of the event algebra, definition of the probability, the classical probability model and the random variable are presented.
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.

PubMed

Gao, Xiang; Lin, Huaiying; Dong, Qunfeng

2017-01-01

Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
Accounting for length-bias and selection effects in estimating the distribution of menstrual cycle length

PubMed Central

Lum, Kirsten J.; Sundaram, Rajeshwari; Louis, Thomas A.

2015-01-01

Prospective pregnancy studies are a valuable source of longitudinal data on menstrual cycle length. However, care is needed when making inferences of such renewal processes. For example, accounting for the sampling plan is necessary for unbiased estimation of the menstrual cycle length distribution for the study population. If couples can enroll when they learn of the study as opposed to waiting for the start of a new menstrual cycle, then due to length-bias, the enrollment cycle will be stochastically larger than the general run of cycles, a typical property of prevalent cohort studies. Furthermore, the probability of enrollment can depend on the length of time since a woman’s last menstrual period (a backward recurrence time), resulting in selection effects. We focus on accounting for length-bias and selection effects in the likelihood for enrollment menstrual cycle length, using a recursive two-stage approach wherein we first estimate the probability of enrollment as a function of the backward recurrence time and then use it in a likelihood with sampling weights that account for length-bias and selection effects. To broaden the applicability of our methods, we augment our model to incorporate a couple-specific random effect and time-independent covariate. A simulation study quantifies performance for two scenarios of enrollment probability when proper account is taken of sampling plan features. In addition, we estimate the probability of enrollment and the distribution of menstrual cycle length for the study population of the Longitudinal Investigation of Fertility and the Environment Study. PMID:25027273
Determining the authenticity of athlete urine in doping control by DNA analysis.

PubMed

Devesse, Laurence; Syndercombe Court, Denise; Cowan, David

2015-10-01

The integrity of urine samples collected from athletes for doping control is essential. The authenticity of samples may be contested, leading to the need for a robust sample identification method. DNA typing using short tandem repeats (STR) can be used for identification purposes, but its application to cellular DNA in urine has so far been limited. Here, a reliable and accurate method is reported for the successful identification of urine samples, using reduced final extraction volumes and the STR multiplex kit, Promega® PowerPlex ESI 17, with capillary electrophoretic characterisation of the alleles. Full DNA profiles were obtained for all samples (n = 20) stored for less than 2 days at 4 °C. The effect of different storage conditions on yield of cellular DNA and probability of obtaining a full profile were also investigated. Storage for 21 days at 4 °C resulted in allelic drop-out in some samples, but the random match probabilities obtained demonstrate the high power of discrimination achieved through targeting a large number of STRs. The best solution for long-term storage was centrifugation and removal of supernatant prior to freezing at -20 °C. The method is robust enough for incorporation into current anti-doping protocols, and was successfully applied to 44 athlete samples for anti-doping testing with 100% concordant typing. Copyright © 2015 John Wiley & Sons, Ltd.
Quantum supremacy in constant-time measurement-based computation: A unified architecture for sampling and verification

NASA Astrophysics Data System (ADS)

Miller, Jacob; Sanders, Stephen; Miyake, Akimasa

2017-12-01

While quantum speed-up in solving certain decision problems by a fault-tolerant universal quantum computer has been promised, a timely research interest includes how far one can reduce the resource requirement to demonstrate a provable advantage in quantum devices without demanding quantum error correction, which is crucial for prolonging the coherence time of qubits. We propose a model device made of locally interacting multiple qubits, designed such that simultaneous single-qubit measurements on it can output probability distributions whose average-case sampling is classically intractable, under similar assumptions as the sampling of noninteracting bosons and instantaneous quantum circuits. Notably, in contrast to these previous unitary-based realizations, our measurement-based implementation has two distinctive features. (i) Our implementation involves no adaptation of measurement bases, leading output probability distributions to be generated in constant time, independent of the system size. Thus, it could be implemented in principle without quantum error correction. (ii) Verifying the classical intractability of our sampling is done by changing the Pauli measurement bases only at certain output qubits. Our usage of random commuting quantum circuits in place of computationally universal circuits allows a unique unification of sampling and verification, so they require the same physical resource requirements in contrast to the more demanding verification protocols seen elsewhere in the literature.
Discrepancy-based error estimates for Quasi-Monte Carlo III. Error distributions and central limits

NASA Astrophysics Data System (ADS)

Hoogland, Jiri; Kleiss, Ronald

1997-04-01

In Quasi-Monte Carlo integration, the integration error is believed to be generally smaller than in classical Monte Carlo with the same number of integration points. Using an appropriate definition of an ensemble of quasi-random point sets, we derive various results on the probability distribution of the integration error, which can be compared to the standard Central Limit Theorem for normal stochastic sampling. In many cases, a Gaussian error distribution is obtained.
Population and performance analyses of four major populations with Illumina's FGx Forensic Genomics System.

PubMed

Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Seah, Lay Hong; Budowle, Bruce

2017-09-01

The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation. Copyright © 2017 Elsevier B.V. All rights reserved.
Effect of ambient temperature storage on potable water coliform population estimations.

PubMed Central

Standridge, J H; Delfino, J J

1983-01-01

The effect of the length of time between sampling potable water and performing coliform analyses has been a long-standing controversial issue in environmental microbiology. The issue is of practical importance since reducing the sample-to-analysis time may substantially increase costs for water analysis programs. Randomly selected samples (from those routinely collected throughout the State of Wisconsin) were analyzed for total coliforms after being held at room temperature (20 +/- 2 degrees C) for 24 and 48 h. Differences in results for the two holding times were compared with differences predicted by probability calculations. The study showed that storage of the potable water for up to 48 h had little effect on the public health significance of most samples containing more than two coliforms per 100 ml. PMID:6651296
Chemical classification of iron meteorites. XI. Multi-element studies of 38 new irons and the high abundance of ungrouped irons from Antarctica

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wasson, J.T.; Ouyang, Xinwei; Wang, Jianmin

1989-03-01

The authors report concentrations of 14 elements in the metal of 38 iron meteorites and a pallasite. The meteorites are classified based on these data and on structural observations. Three samples are paired with previously classified irons; thus, these additional 35 irons raise the number of well-classified, independent iron meteorites to 598. One Yamato iron contains 342 mg/g Ni, the second highest Ni content in an IAB iron after Oktibbeha County. Two small irons from Western Australia appear to be metal nodules from mesosiderites. Several of the new irons are from Antarctica. Of 24 independent irons from Antarctica, 8 aremore » ungrouped. The fraction, 0.333, is much higher than the fraction 0.161 among all 598 classified irons. Statistical tests show that it is highly improbably ({approximately}2.9% probability) that the Antarctic population is a random sample of the larger population. The difference is probably related to the fact that the median mass of Antarctic irons is about two orders of magnitude smaller than that of non-Antarctic irons. It is doubtful that the difference results from fragmentation patterns yielding different size distributions favoring smaller masses among ungrouped irons. More likely is the possibility that smaller meteoroids tend to sample a larger number of asteroidal source regions, perhaps because small meteoroids tend to have higher ejection velocities or because small meteoroids have random-walked a greater increment of orbital semimajor axis away from that of the parent body.« less

Sample size requirements for the design of reliability studies: precision consideration.

PubMed

Shieh, Gwowen

2014-09-01

In multilevel modeling, the intraclass correlation coefficient based on the one-way random-effects model is routinely employed to measure the reliability or degree of resemblance among group members. To facilitate the advocated practice of reporting confidence intervals in future reliability studies, this article presents exact sample size procedures for precise interval estimation of the intraclass correlation coefficient under various allocation and cost structures. Although the suggested approaches do not admit explicit sample size formulas and require special algorithms for carrying out iterative computations, they are more accurate than the closed-form formulas constructed from large-sample approximations with respect to the expected width and assurance probability criteria. This investigation notes the deficiency of existing methods and expands the sample size methodology for the design of reliability studies that have not previously been discussed in the literature.
Prediction of penicillin resistance in Staphylococcus aureus isolates from dairy cows with mastitis, based on prior test results.

PubMed

Grinberg, A; Lopez-Villalobos, N; Lawrence, K; Nulsen, M

2005-10-01

To gauge how well prior laboratory test results predict in vitro penicillin resistance of Staphylococcus aureus isolates from dairy cows with mastitis. Population-based data on the farm of origin (n=79), genotype based on pulsed-field gel electrophoresis (PFGE) results, and the penicillin-resistance status of Staph. aureus isolates (n=115) from milk samples collected from dairy cows with mastitis submitted to two diagnostic laboratories over a 6-month period were used. Data were mined stochastically using the all-possible-pairs method, binomial modelling and bootstrap simulation, to test whether prior test results enhance the accuracy of prediction of penicillin resistance on farms. Of all Staph. aureus isolates tested, 38% were penicillin resistant. A significant aggregation of penicillin-resistance status was evident within farms. The probability of random pairs of isolates from the same farm having the same penicillin-resistance status was 76%, compared with 53% for random pairings of samples across all farms. Thus, the resistance status of randomly selected isolates was 1.43 times more likely to correctly predict the status of other isolates from the same farm than the random population pairwise concordance probability (p=0.011). This effect was likely due to the clonal relationship of isolates within farms, as the predictive fraction attributable to prior test results was close to nil when the effect of within-farm clonal infections was withdrawn from the model. Knowledge of the penicillin-resistance status of a prior Staph. aureus isolate significantly enhanced the predictive capability of other isolates from the same farm. In the time and space frame of this study, clinicians using previous information from a farm would have more accurately predicted the penicillin-resistance status of an isolate than they would by chance alone on farms infected with clonal Staph. aureus isolates, but not on farms infected with highly genetically heterogeneous bacterial strains.
Return probabilities and hitting times of random walks on sparse Erdös-Rényi graphs.

PubMed

Martin, O C; Sulc, P

2010-03-01

We consider random walks on random graphs, focusing on return probabilities and hitting times for sparse Erdös-Rényi graphs. Using the tree approach, which is expected to be exact in the large graph limit, we show how to solve for the distribution of these quantities and we find that these distributions exhibit a form of self-similarity.
Public attitudes toward stuttering in Turkey: probability versus convenience sampling.

PubMed

Ozdemir, R Sertan; St Louis, Kenneth O; Topbaş, Seyhun

2011-12-01

A Turkish translation of the Public Opinion Survey of Human Attributes-Stuttering (POSHA-S) was used to compare probability versus convenience sampling to measure public attitudes toward stuttering. A convenience sample of adults in Eskişehir, Turkey was compared with two replicates of a school-based, probability cluster sampling scheme. The two replicates of the probability sampling scheme yielded similar demographic samples, both of which were different from the convenience sample. Components of subscores on the POSHA-S were significantly different in more than half of the comparisons between convenience and probability samples, indicating important differences in public attitudes. If POSHA-S users intend to generalize to specific geographic areas, results of this study indicate that probability sampling is a better research strategy than convenience sampling. The reader will be able to: (1) discuss the difference between convenience sampling and probability sampling; (2) describe a school-based probability sampling scheme; and (3) describe differences in POSHA-S results from convenience sampling versus probability sampling. Copyright © 2011 Elsevier Inc. All rights reserved.
Forward and inverse uncertainty quantification using multilevel Monte Carlo algorithms for an elliptic non-local equation

DOE PAGES

Jasra, Ajay; Law, Kody J. H.; Zhou, Yan

2016-01-01

Our paper considers uncertainty quantification for an elliptic nonlocal equation. In particular, it is assumed that the parameters which define the kernel in the nonlocal operator are uncertain and a priori distributed according to a probability measure. It is shown that the induced probability measure on some quantities of interest arising from functionals of the solution to the equation with random inputs is well-defined,s as is the posterior distribution on parameters given observations. As the elliptic nonlocal equation cannot be solved approximate posteriors are constructed. The multilevel Monte Carlo (MLMC) and multilevel sequential Monte Carlo (MLSMC) sampling algorithms are usedmore » for a priori and a posteriori estimation, respectively, of quantities of interest. Furthermore, these algorithms reduce the amount of work to estimate posterior expectations, for a given level of error, relative to Monte Carlo and i.i.d. sampling from the posterior at a given level of approximation of the solution of the elliptic nonlocal equation.« less
Forward and inverse uncertainty quantification using multilevel Monte Carlo algorithms for an elliptic non-local equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jasra, Ajay; Law, Kody J. H.; Zhou, Yan

Our paper considers uncertainty quantification for an elliptic nonlocal equation. In particular, it is assumed that the parameters which define the kernel in the nonlocal operator are uncertain and a priori distributed according to a probability measure. It is shown that the induced probability measure on some quantities of interest arising from functionals of the solution to the equation with random inputs is well-defined,s as is the posterior distribution on parameters given observations. As the elliptic nonlocal equation cannot be solved approximate posteriors are constructed. The multilevel Monte Carlo (MLMC) and multilevel sequential Monte Carlo (MLSMC) sampling algorithms are usedmore » for a priori and a posteriori estimation, respectively, of quantities of interest. Furthermore, these algorithms reduce the amount of work to estimate posterior expectations, for a given level of error, relative to Monte Carlo and i.i.d. sampling from the posterior at a given level of approximation of the solution of the elliptic nonlocal equation.« less
Ant-inspired density estimation via random walks.

PubMed

Musco, Cameron; Su, Hsin-Hao; Lynch, Nancy A

2017-10-03

Many ant species use distributed population density estimation in applications ranging from quorum sensing, to task allocation, to appraisal of enemy colony strength. It has been shown that ants estimate local population density by tracking encounter rates: The higher the density, the more often the ants bump into each other. We study distributed density estimation from a theoretical perspective. We prove that a group of anonymous agents randomly walking on a grid are able to estimate their density within a small multiplicative error in few steps by measuring their rates of encounter with other agents. Despite dependencies inherent in the fact that nearby agents may collide repeatedly (and, worse, cannot recognize when this happens), our bound nearly matches what would be required to estimate density by independently sampling grid locations. From a biological perspective, our work helps shed light on how ants and other social insects can obtain relatively accurate density estimates via encounter rates. From a technical perspective, our analysis provides tools for understanding complex dependencies in the collision probabilities of multiple random walks. We bound the strength of these dependencies using local mixing properties of the underlying graph. Our results extend beyond the grid to more general graphs, and we discuss applications to size estimation for social networks, density estimation for robot swarms, and random walk-based sampling for sensor networks.
Validating long-term satellite-derived disturbance products: the case of burned areas

NASA Astrophysics Data System (ADS)

Boschetti, L.; Roy, D. P.

2015-12-01

The potential research, policy and management applications of satellite products place a high priority on providing statements about their accuracy. A number of NASA, ESA and EU funded global and continental burned area products have been developed using coarse spatial resolution satellite data, and have the potential to become part of a long-term fire Climate Data Record. These products have usually been validated by comparison with reference burned area maps derived by visual interpretation of Landsat or similar spatial resolution data selected on an ad hoc basis. More optimally, a design-based validation method should be adopted that is characterized by the selection of reference data via a probability sampling that can subsequently be used to compute accuracy metrics, taking into account the sampling probability. Design based techniques have been used for annual land cover and land cover change product validation, but have not been widely used for burned area products, or for the validation of global products that are highly variable in time and space (e.g. snow, floods or other non-permanent phenomena). This has been due to the challenge of designing an appropriate sampling strategy, and to the cost of collecting independent reference data. We propose a tri-dimensional sampling grid that allows for probability sampling of Landsat data in time and in space. To sample the globe in the spatial domain with non-overlapping sampling units, the Thiessen Scene Area (TSA) tessellation of the Landsat WRS path/rows is used. The TSA grid is then combined with the 16-day Landsat acquisition calendar to provide tri-dimensonal elements (voxels). This allows the implementation of a sampling design where not only the location but also the time interval of the reference data is explicitly drawn by probability sampling. The proposed sampling design is a stratified random sampling, with two-level stratification of the voxels based on biomes and fire activity (Figure 1). The novel validation approach, used for the validation of the MODIS and forthcoming VIIRS global burned area products, is a general one, and could be used for the validation of other global products that are highly variable in space and time and is required to assess the accuracy of climate records. The approach is demonstrated using a 1 year dataset of MODIS fire products.
A Bayesian pick-the-winner design in a randomized phase II clinical trial.

PubMed

Chen, Dung-Tsa; Huang, Po-Yu; Lin, Hui-Yi; Chiappori, Alberto A; Gabrilovich, Dmitry I; Haura, Eric B; Antonia, Scott J; Gray, Jhanelle E

2017-10-24

Many phase II clinical trials evaluate unique experimental drugs/combinations through multi-arm design to expedite the screening process (early termination of ineffective drugs) and to identify the most effective drug (pick the winner) to warrant a phase III trial. Various statistical approaches have been developed for the pick-the-winner design but have been criticized for lack of objective comparison among the drug agents. We developed a Bayesian pick-the-winner design by integrating a Bayesian posterior probability with Simon two-stage design in a randomized two-arm clinical trial. The Bayesian posterior probability, as the rule to pick the winner, is defined as probability of the response rate in one arm higher than in the other arm. The posterior probability aims to determine the winner when both arms pass the second stage of the Simon two-stage design. When both arms are competitive (i.e., both passing the second stage), the Bayesian posterior probability performs better to correctly identify the winner compared with the Fisher exact test in the simulation study. In comparison to a standard two-arm randomized design, the Bayesian pick-the-winner design has a higher power to determine a clear winner. In application to two studies, the approach is able to perform statistical comparison of two treatment arms and provides a winner probability (Bayesian posterior probability) to statistically justify the winning arm. We developed an integrated design that utilizes Bayesian posterior probability, Simon two-stage design, and randomization into a unique setting. It gives objective comparisons between the arms to determine the winner.
Combined statistical analysis of landslide release and propagation

NASA Astrophysics Data System (ADS)

Mergili, Martin; Rohmaneo, Mohammad; Chu, Hone-Jay

2016-04-01

Statistical methods - often coupled with stochastic concepts - are commonly employed to relate areas affected by landslides with environmental layers, and to estimate spatial landslide probabilities by applying these relationships. However, such methods only concern the release of landslides, disregarding their motion. Conceptual models for mass flow routing are used for estimating landslide travel distances and possible impact areas. Automated approaches combining release and impact probabilities are rare. The present work attempts to fill this gap by a fully automated procedure combining statistical and stochastic elements, building on the open source GRASS GIS software: (1) The landslide inventory is subset into release and deposition zones. (2) We employ a traditional statistical approach to estimate the spatial release probability of landslides. (3) We back-calculate the probability distribution of the angle of reach of the observed landslides, employing the software tool r.randomwalk. One set of random walks is routed downslope from each pixel defined as release area. Each random walk stops when leaving the observed impact area of the landslide. (4) The cumulative probability function (cdf) derived in (3) is used as input to route a set of random walks downslope from each pixel in the study area through the DEM, assigning the probability gained from the cdf to each pixel along the path (impact probability). The impact probability of a pixel is defined as the average impact probability of all sets of random walks impacting a pixel. Further, the average release probabilities of the release pixels of all sets of random walks impacting a given pixel are stored along with the area of the possible release zone. (5) We compute the zonal release probability by increasing the release probability according to the size of the release zone - the larger the zone, the larger the probability that a landslide will originate from at least one pixel within this zone. We quantify this relationship by a set of empirical curves. (6) Finally, we multiply the zonal release probability with the impact probability in order to estimate the combined impact probability for each pixel. We demonstrate the model with a 167 km² study area in Taiwan, using an inventory of landslides triggered by the typhoon Morakot. Analyzing the model results leads us to a set of key conclusions: (i) The average composite impact probability over the entire study area corresponds well to the density of observed landside pixels. Therefore we conclude that the method is valid in general, even though the concept of the zonal release probability bears some conceptual issues that have to be kept in mind. (ii) The parameters used as predictors cannot fully explain the observed distribution of landslides. The size of the release zone influences the composite impact probability to a larger degree than the pixel-based release probability. (iii) The prediction rate increases considerably when excluding the largest, deep-seated, landslides from the analysis. We conclude that such landslides are mainly related to geological features hardly reflected in the predictor layers used.
Recurrence of random walks with long-range steps generated by fractional Laplacian matrices on regular networks and simple cubic lattices

NASA Astrophysics Data System (ADS)

Michelitsch, T. M.; Collet, B. A.; Riascos, A. P.; Nowakowski, A. F.; Nicolleau, F. C. G. A.

2017-12-01

We analyze a Markovian random walk strategy on undirected regular networks involving power matrix functions of the type L\\frac{α{2}} where L indicates a ‘simple’ Laplacian matrix. We refer to such walks as ‘fractional random walks’ with admissible interval 0<α ≤slant 2 . We deduce probability-generating functions (network Green’s functions) for the fractional random walk. From these analytical results we establish a generalization of Polya’s recurrence theorem for fractional random walks on d-dimensional infinite lattices: The fractional random walk is transient for dimensions d > α (recurrent for d≤slantα ) of the lattice. As a consequence, for 0<α< 1 the fractional random walk is transient for all lattice dimensions d=1, 2, .. and in the range 1≤slantα < 2 for dimensions d≥slant 2 . Finally, for α=2 , Polya’s classical recurrence theorem is recovered, namely the walk is transient only for lattice dimensions d≥slant 3 . The generalization of Polya’s recurrence theorem remains valid for the class of random walks with Lévy flight asymptotics for long-range steps. We also analyze the mean first passage probabilities, mean residence times, mean first passage times and global mean first passage times (Kemeny constant) for the fractional random walk. For an infinite 1D lattice (infinite ring) we obtain for the transient regime 0<α<1 closed form expressions for the fractional lattice Green’s function matrix containing the escape and ever passage probabilities. The ever passage probabilities (fractional lattice Green’s functions) in the transient regime fulfil Riesz potential power law decay asymptotic behavior for nodes far from the departure node. The non-locality of the fractional random walk is generated by the non-diagonality of the fractional Laplacian matrix with Lévy-type heavy tailed inverse power law decay for the probability of long-range moves. This non-local and asymptotic behavior of the fractional random walk introduces small-world properties with the emergence of Lévy flights on large (infinite) lattices.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Lin, E-mail: godyalin@163.com; Singh, Uttam, E-mail: uttamsingh@hri.res.in; Pati, Arun K., E-mail: akpati@hri.res.in

Compact expressions for the average subentropy and coherence are obtained for random mixed states that are generated via various probability measures. Surprisingly, our results show that the average subentropy of random mixed states approaches the maximum value of the subentropy which is attained for the maximally mixed state as we increase the dimension. In the special case of the random mixed states sampled from the induced measure via partial tracing of random bipartite pure states, we establish the typicality of the relative entropy of coherence for random mixed states invoking the concentration of measure phenomenon. Our results also indicate thatmore » mixed quantum states are less useful compared to pure quantum states in higher dimension when we extract quantum coherence as a resource. This is because of the fact that average coherence of random mixed states is bounded uniformly, however, the average coherence of random pure states increases with the increasing dimension. As an important application, we establish the typicality of relative entropy of entanglement and distillable entanglement for a specific class of random bipartite mixed states. In particular, most of the random states in this specific class have relative entropy of entanglement and distillable entanglement equal to some fixed number (to within an arbitrary small error), thereby hugely reducing the complexity of computation of these entanglement measures for this specific class of mixed states.« less
Investigation of estimators of probability density functions

NASA Technical Reports Server (NTRS)

Speed, F. M.

1972-01-01

Four research projects are summarized which include: (1) the generation of random numbers on the IBM 360/44, (2) statistical tests used to check out random number generators, (3) Specht density estimators, and (4) use of estimators of probability density functions in analyzing large amounts of data.
Extraction of linear features on SAR imagery

NASA Astrophysics Data System (ADS)

Liu, Junyi; Li, Deren; Mei, Xin

2006-10-01

Linear features are usually extracted from SAR imagery by a few edge detectors derived from the contrast ratio edge detector with a constant probability of false alarm. On the other hand, the Hough Transform is an elegant way of extracting global features like curve segments from binary edge images. Randomized Hough Transform can reduce the computation time and memory usage of the HT drastically. While Randomized Hough Transform will bring about a great deal of cells invalid during the randomized sample. In this paper, we propose a new approach to extract linear features on SAR imagery, which is an almost automatic algorithm based on edge detection and Randomized Hough Transform. The presented improved method makes full use of the directional information of each edge candidate points so as to solve invalid cumulate problems. Applied result is in good agreement with the theoretical study, and the main linear features on SAR imagery have been extracted automatically. The method saves storage space and computational time, which shows its effectiveness and applicability.
Random isotropic one-dimensional XY-model

NASA Astrophysics Data System (ADS)

Gonçalves, L. L.; Vieira, A. P.

1998-01-01

The 1D isotropic s = ½XY-model ( N sites), with random exchange interaction in a transverse random field is considered. The random variables satisfy bimodal quenched distributions. The solution is obtained by using the Jordan-Wigner fermionization and a canonical transformation, reducing the problem to diagonalizing an N × N matrix, corresponding to a system of N noninteracting fermions. The calculations are performed numerically for N = 1000, and the field-induced magnetization at T = 0 is obtained by averaging the results for the different samples. For the dilute case, in the uniform field limit, the magnetization exhibits various discontinuities, which are the consequence of the existence of disconnected finite clusters distributed along the chain. Also in this limit, for finite exchange constants J A and J B, as the probability of J A varies from one to zero, the saturation field is seen to vary from Γ A to Γ B, where Γ A(Γ B) is the value of the saturation field for the pure case with exchange constant equal to J A(J B) .
Prevalence of paratuberculosis in the dairy goat and dairy sheep industries in Ontario, Canada.

PubMed

Bauman, Cathy A; Jones-Bitton, Andria; Menzies, Paula; Toft, Nils; Jansen, Jocelyn; Kelton, David

2016-02-01

A cross-sectional study was undertaken (October 2010 to August 2011) to estimate the prevalence of paratuberculosis in the small ruminant dairy industries in Ontario, Canada. Blood and feces were sampled from 580 goats and 397 sheep (lactating and 2 y of age or older) that were randomly selected from 29 randomly selected dairy goat herds and 21 convenience-selected dairy sheep flocks. Fecal samples were analyzed using bacterial culture (BD BACTEC MGIT 960) and polymerase chain reaction (Tetracore); serum samples were tested with the Prionics Parachek enzyme-linked immunosorbent assay (ELISA). Using 3-test latent class Bayesian models, true farm-level prevalence was estimated to be 83.0% [95% probability interval (PI): 62.6% to 98.1%] for dairy goats and 66.8% (95% PI: 41.6% to 91.4%) for dairy sheep. The within-farm true prevalence for dairy goats was 35.2% (95% PI: 23.0% to 49.8%) and for dairy sheep was 48.3% (95% PI: 27.6% to 74.3%). These data indicate that a paratuberculosis control program for small ruminants is needed in Ontario.
Reasons for nonresponse in a web-based survey of alcohol involvement among first-year college students.

PubMed

Cranford, James A; McCabe, Sean Esteban; Boyd, Carol J; Slayden, Janie; Reed, Mark B; Ketchie, Julie M; Lange, James E; Scott, Marcia S

2008-01-01

This study conducted a follow-up telephone survey of a probability sample of college students who did not respond to a Web survey to determine correlates of and reasons for nonresponse. A stratified random sample of 2502 full-time first-year undergraduate students was invited to participate in a Web-based survey. A random sample of 221 students who did not respond to the original Web survey completed an abbreviated version of the original survey by telephone. Nonresponse did not vary by gender, but nonresponse was higher among Blacks and Hispanics compared to Whites, and Blacks compared to Asians. Nonresponders reported lower frequency of past 28 days drinking, lower levels of past-year and past 28-days heavy episodic drinking, and more time spent preparing for classes than responders. The most common reasons for nonresponse were "too busy" (45.7%), "not interested" (18.1%), and "forgot to complete survey" (18.1%). Reasons for nonresponse to Web surveys among college students are similar to reasons for nonresponse to mail and telephone surveys, and some nonresponse reasons vary as a function of alcohol involvement.
Probability of loss of assured safety in temperature dependent systems with multiple weak and strong links.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Jay Dean; Oberkampf, William Louis; Helton, Jon Craig

2004-12-01

Relationships to determine the probability that a weak link (WL)/strong link (SL) safety system will fail to function as intended in a fire environment are investigated. In the systems under study, failure of the WL system before failure of the SL system is intended to render the overall system inoperational and thus prevent the possible occurrence of accidents with potentially serious consequences. Formal developments of the probability that the WL system fails to deactivate the overall system before failure of the SL system (i.e., the probability of loss of assured safety, PLOAS) are presented for several WWSL configurations: (i) onemore » WL, one SL, (ii) multiple WLs, multiple SLs with failure of any SL before any WL constituting failure of the safety system, (iii) multiple WLs, multiple SLs with failure of all SLs before any WL constituting failure of the safety system, and (iv) multiple WLs, multiple SLs and multiple sublinks in each SL with failure of any sublink constituting failure of the associated SL and failure of all SLs before failure of any WL constituting failure of the safety system. The indicated probabilities derive from time-dependent temperatures in the WL/SL system and variability (i.e., aleatory uncertainty) in the temperatures at which the individual components of this system fail and are formally defined as multidimensional integrals. Numerical procedures based on quadrature (i.e., trapezoidal rule, Simpson's rule) and also on Monte Carlo techniques (i.e., simple random sampling, importance sampling) are described and illustrated for the evaluation of these integrals. Example uncertainty and sensitivity analyses for PLOAS involving the representation of uncertainty (i.e., epistemic uncertainty) with probability theory and also with evidence theory are presented.« less
A robust method using propensity score stratification for correcting verification bias for binary tests

PubMed Central

He, Hua; McDermott, Michael P.

2012-01-01

Sensitivity and specificity are common measures of the accuracy of a diagnostic test. The usual estimators of these quantities are unbiased if data on the diagnostic test result and the true disease status are obtained from all subjects in an appropriately selected sample. In some studies, verification of the true disease status is performed only for a subset of subjects, possibly depending on the result of the diagnostic test and other characteristics of the subjects. Estimators of sensitivity and specificity based on this subset of subjects are typically biased; this is known as verification bias. Methods have been proposed to correct verification bias under the assumption that the missing data on disease status are missing at random (MAR), that is, the probability of missingness depends on the true (missing) disease status only through the test result and observed covariate information. When some of the covariates are continuous, or the number of covariates is relatively large, the existing methods require parametric models for the probability of disease or the probability of verification (given the test result and covariates), and hence are subject to model misspecification. We propose a new method for correcting verification bias based on the propensity score, defined as the predicted probability of verification given the test result and observed covariates. This is estimated separately for those with positive and negative test results. The new method classifies the verified sample into several subsamples that have homogeneous propensity scores and allows correction for verification bias. Simulation studies demonstrate that the new estimators are more robust to model misspecification than existing methods, but still perform well when the models for the probability of disease and probability of verification are correctly specified. PMID:21856650
A chi-square goodness-of-fit test for non-identically distributed random variables: with application to empirical Bayes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Conover, W.J.; Cox, D.D.; Martz, H.F.

1997-12-01

When using parametric empirical Bayes estimation methods for estimating the binomial or Poisson parameter, the validity of the assumed beta or gamma conjugate prior distribution is an important diagnostic consideration. Chi-square goodness-of-fit tests of the beta or gamma prior hypothesis are developed for use when the binomial sample sizes or Poisson exposure times vary. Nine examples illustrate the application of the methods, using real data from such diverse applications as the loss of feedwater flow rates in nuclear power plants, the probability of failure to run on demand and the failure rates of the high pressure coolant injection systems atmore » US commercial boiling water reactors, the probability of failure to run on demand of emergency diesel generators in US commercial nuclear power plants, the rate of failure of aircraft air conditioners, baseball batting averages, the probability of testing positive for toxoplasmosis, and the probability of tumors in rats. The tests are easily applied in practice by means of corresponding Mathematica{reg_sign} computer programs which are provided.« less

Probability theory plus noise: Replies to Crupi and Tentori (2016) and to Nilsson, Juslin, and Winman (2016).

PubMed

Costello, Fintan; Watts, Paul

2016-01-01

A standard assumption in much of current psychology is that people do not reason about probability using the rules of probability theory but instead use various heuristics or "rules of thumb," which can produce systematic reasoning biases. In Costello and Watts (2014), we showed that a number of these biases can be explained by a model where people reason according to probability theory but are subject to random noise. More importantly, that model also predicted agreement with probability theory for certain expressions that cancel the effects of random noise: Experimental results strongly confirmed this prediction, showing that probabilistic reasoning is simultaneously systematically biased and "surprisingly rational." In their commentaries on that paper, both Crupi and Tentori (2016) and Nilsson, Juslin, and Winman (2016) point to various experimental results that, they suggest, our model cannot explain. In this reply, we show that our probability theory plus noise model can in fact explain every one of the results identified by these authors. This gives a degree of additional support to the view that people's probability judgments embody the rational rules of probability theory and that biases in those judgments can be explained as simply effects of random noise. (c) 2015 APA, all rights reserved).
A quarter of a century of the DBQ: some supplementary notes on its validity with regard to accidents.

PubMed

de Winter, Joost C F; Dodou, Dimitra; Stanton, Neville A

2015-01-01

This article synthesises the latest information on the relationship between the Driver Behaviour Questionnaire (DBQ) and accidents. We show by means of computer simulation that correlations with accidents are necessarily small because accidents are rare events. An updated meta-analysis on the zero-order correlations between the DBQ and self-reported accidents yielded an overall r of .13 (fixed-effect and random-effects models) for violations (57,480 participants; 67 samples) and .09 (fixed-effect and random-effects models) for errors (66,028 participants; 56 samples). An analysis of a previously published DBQ dataset (975 participants) showed that by aggregating across four measurement occasions, the correlation coefficient with self-reported accidents increased from .14 to .24 for violations and from .11 to .19 for errors. Our meta-analysis also showed that DBQ violations (r = .24; 6353 participants; 20 samples) but not DBQ errors (r = - .08; 1086 participants; 16 samples) correlated with recorded vehicle speed. Practitioner Summary: The DBQ is probably the most widely used self-report questionnaire in driver behaviour research. This study shows that DBQ violations and errors correlate moderately with self-reported traffic accidents.
RANdom SAmple Consensus (RANSAC) algorithm for material-informatics: application to photovoltaic solar cells.

PubMed

Kaspi, Omer; Yosipof, Abraham; Senderowitz, Hanoch

2017-06-06

An important aspect of chemoinformatics and material-informatics is the usage of machine learning algorithms to build Quantitative Structure Activity Relationship (QSAR) models. The RANdom SAmple Consensus (RANSAC) algorithm is a predictive modeling tool widely used in the image processing field for cleaning datasets from noise. RANSAC could be used as a "one stop shop" algorithm for developing and validating QSAR models, performing outlier removal, descriptors selection, model development and predictions for test set samples using applicability domain. For "future" predictions (i.e., for samples not included in the original test set) RANSAC provides a statistical estimate for the probability of obtaining reliable predictions, i.e., predictions within a pre-defined number of standard deviations from the true values. In this work we describe the first application of RNASAC in material informatics, focusing on the analysis of solar cells. We demonstrate that for three datasets representing different metal oxide (MO) based solar cell libraries RANSAC-derived models select descriptors previously shown to correlate with key photovoltaic properties and lead to good predictive statistics for these properties. These models were subsequently used to predict the properties of virtual solar cells libraries highlighting interesting dependencies of PV properties on MO compositions.
The Number of Patients and Events Required to Limit the Risk of Overestimation of Intervention Effects in Meta-Analysis—A Simulation Study

PubMed Central

Thorlund, Kristian; Imberger, Georgina; Walsh, Michael; Chu, Rong; Gluud, Christian; Wetterslev, Jørn; Guyatt, Gordon; Devereaux, Philip J.; Thabane, Lehana

2011-01-01

Background Meta-analyses including a limited number of patients and events are prone to yield overestimated intervention effect estimates. While many assume bias is the cause of overestimation, theoretical considerations suggest that random error may be an equal or more frequent cause. The independent impact of random error on meta-analyzed intervention effects has not previously been explored. It has been suggested that surpassing the optimal information size (i.e., the required meta-analysis sample size) provides sufficient protection against overestimation due to random error, but this claim has not yet been validated. Methods We simulated a comprehensive array of meta-analysis scenarios where no intervention effect existed (i.e., relative risk reduction (RRR) = 0%) or where a small but possibly unimportant effect existed (RRR = 10%). We constructed different scenarios by varying the control group risk, the degree of heterogeneity, and the distribution of trial sample sizes. For each scenario, we calculated the probability of observing overestimates of RRR>20% and RRR>30% for each cumulative 500 patients and 50 events. We calculated the cumulative number of patients and events required to reduce the probability of overestimation of intervention effect to 10%, 5%, and 1%. We calculated the optimal information size for each of the simulated scenarios and explored whether meta-analyses that surpassed their optimal information size had sufficient protection against overestimation of intervention effects due to random error. Results The risk of overestimation of intervention effects was usually high when the number of patients and events was small and this risk decreased exponentially over time as the number of patients and events increased. The number of patients and events required to limit the risk of overestimation depended considerably on the underlying simulation settings. Surpassing the optimal information size generally provided sufficient protection against overestimation. Conclusions Random errors are a frequent cause of overestimation of intervention effects in meta-analyses. Surpassing the optimal information size will provide sufficient protection against overestimation. PMID:22028777
Development of a Random Field Model for Gas Plume Detection in Multiple LWIR Images.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Heasler, Patrick G.

This report develops a random field model that describes gas plumes in LWIR remote sensing images. The random field model serves as a prior distribution that can be combined with LWIR data to produce a posterior that determines the probability that a gas plume exists in the scene and also maps the most probable location of any plume. The random field model is intended to work with a single pixel regression estimator--a regression model that estimates gas concentration on an individual pixel basis.
First-passage problems: A probabilistic dynamic analysis for degraded structures

NASA Technical Reports Server (NTRS)

Shiao, Michael C.; Chamis, Christos C.

1990-01-01

Structures subjected to random excitations with uncertain system parameters degraded by surrounding environments (a random time history) are studied. Methods are developed to determine the statistics of dynamic responses, such as the time-varying mean, the standard deviation, the autocorrelation functions, and the joint probability density function of any response and its derivative. Moreover, the first-passage problems with deterministic and stationary/evolutionary random barriers are evaluated. The time-varying (joint) mean crossing rate and the probability density function of the first-passage time for various random barriers are derived.
Convergence in High Probability of the Quantum Diffusion in a Random Band Matrix Model

NASA Astrophysics Data System (ADS)

Margarint, Vlad

2018-06-01

We consider Hermitian random band matrices H in d ≥slant 1 dimensions. The matrix elements H_{xy}, indexed by x, y \\in Λ \\subset Z^d, are independent, uniformly distributed random variable if |x-y| is less than the band width W, and zero otherwise. We update the previous results of the converge of quantum diffusion in a random band matrix model from convergence of the expectation to convergence in high probability. The result is uniformly in the size |Λ| of the matrix.
Exploration properties of biased evanescent random walkers on a one-dimensional lattice

NASA Astrophysics Data System (ADS)

Esguerra, Jose Perico; Reyes, Jelian

2017-08-01

We investigate the combined effects of bias and evanescence on the characteristics of random walks on a one-dimensional lattice. We calculate the time-dependent return probability, eventual return probability, conditional mean return time, and the time-dependent mean number of visited sites of biased immortal and evanescent discrete-time random walkers on a one-dimensional lattice. We then extend the calculations to the case of a continuous-time step-coupled biased evanescent random walk on a one-dimensional lattice with an exponential waiting time distribution.
Incidence of tuberculosis among school-going adolescents in South India.

PubMed

Uppada, Dharma Rao; Selvam, Sumithra; Jesuraj, Nelson; Lau, Esther L; Doherty, T Mark; Grewal, Harleen M S; Vaz, Mario; Lindtjørn, Bernt

2016-07-26

Tuberculosis (TB) incidence data in vaccine target populations, particularly adolescents, are important for designing and powering vaccine clinical trials. Little is known about the incidence of tuberculosis among adolescents in India. The objective of current study is to estimate the incidence of pulmonary tuberculosis (PTB) disease among adolescents attending school in South India using two different surveillance methods (active and passive) and to compare the incidence between the two groups. The study was a prospective cohort study with a 2-year follow-up period. The study was conducted in Palamaner, Chittoor District of Andhra Pradesh, South India from February 2007 to July 2010. A random sampling procedure was used to select a subset of schools to enable approximately 8000 subjects to be available for randomization in the study. A stratified randomization procedure was used to assign the selected schools to either active or passive surveillance. Participants who met the criteria for being exposed to TB were referred to the diagnostic ward for pulmonary tuberculosis confirmation. A total number of 3441 males and 3202 females between the ages 11 and less than 18 years were enrolled into the study. Of the 3102 participants in the active surveillance group, four subjects were diagnosed with definite tuberculosis, four subjects with probable tuberculosis, and 71 subjects had non-tuberculous Mycobacteria (NTM) isolated from their sputum. Of the 3541 participants in the passive surveillance group, four subjects were diagnosed with definite tuberculosis, two subjects with probable tuberculosis, and 48 subjects had non-tuberculosis Mycobacteria isolated from their sputum. The incidence of definite + probable TB was 147.60 / 100,000 person years in the active surveillance group and 87 / 100,000 person years in the passive surveillance group. The incidence of pulmonary tuberculosis among adolescents in our study is lower than similar studies conducted in South Africa and Eastern Uganda - countries with a higher incidence of tuberculosis and human immunodeficiency virus (HIV) than India. The study data will inform sample design for vaccine efficacy trials among adolescents in India.
A Unifying Probability Example.

ERIC Educational Resources Information Center

Maruszewski, Richard F., Jr.

2002-01-01

Presents an example from probability and statistics that ties together several topics including the mean and variance of a discrete random variable, the binomial distribution and its particular mean and variance, the sum of independent random variables, the mean and variance of the sum, and the central limit theorem. Uses Excel to illustrate these…
Random errors of oceanic monthly rainfall derived from SSM/I using probability distribution functions

NASA Technical Reports Server (NTRS)

Chang, Alfred T. C.; Chiu, Long S.; Wilheit, Thomas T.

1993-01-01

Global averages and random errors associated with the monthly oceanic rain rates derived from the Special Sensor Microwave/Imager (SSM/I) data using the technique developed by Wilheit et al. (1991) are computed. Accounting for the beam-filling bias, a global annual average rain rate of 1.26 m is computed. The error estimation scheme is based on the existence of independent (morning and afternoon) estimates of the monthly mean. Calculations show overall random errors of about 50-60 percent for each 5 deg x 5 deg box. The results are insensitive to different sampling strategy (odd and even days of the month). Comparison of the SSM/I estimates with raingage data collected at the Pacific atoll stations showed a low bias of about 8 percent, a correlation of 0.7, and an rms difference of 55 percent.
Hazard Function Estimation with Cause-of-Death Data Missing at Random

PubMed Central

Wang, Qihua; Dinse, Gregg E.; Liu, Chunling

2010-01-01

Hazard function estimation is an important part of survival analysis. Interest often centers on estimating the hazard function associated with a particular cause of death. We propose three nonparametric kernel estimators for the hazard function, all of which are appropriate when death times are subject to random censorship and censoring indicators can be missing at random. Specifically, we present a regression surrogate estimator, an imputation estimator, and an inverse probability weighted estimator. All three estimators are uniformly strongly consistent and asymptotically normal. We derive asymptotic representations of the mean squared error and the mean integrated squared error for these estimators and we discuss a data-driven bandwidth selection method. A simulation study, conducted to assess finite sample behavior, demonstrates that the proposed hazard estimators perform relatively well. We illustrate our methods with an analysis of some vascular disease data. PMID:22267874
Erectile Dysfunction in Patients with Sleep Apnea--A Nationwide Population-Based Study.

PubMed

Chen, Chia-Min; Tsai, Ming-Ju; Wei, Po-Ju; Su, Yu-Chung; Yang, Chih-Jen; Wu, Meng-Ni; Hsu, Chung-Yao; Hwang, Shang-Jyh; Chong, Inn-Wen; Huang, Ming-Shyan

2015-01-01

Increased incidence of erectile dysfunction (ED) has been reported among patients with sleep apnea (SA). However, this association has not been confirmed in a large-scale study. We therefore performed a population-based cohort study using Taiwan National Health Insurance (NHI) database to investigate the association of SA and ED. From the database of one million representative subjects randomly sampled from individuals enrolled in the NHI system in 2010, we identified adult patients having SA and excluded those having a diagnosis of ED prior to SA. From these suspected SA patients, those having SA diagnosis after polysomnography were defined as probable SA patients. The dates of their first SA diagnosis were defined as their index dates. Each SA patient was matched to 30 randomly-selected, age-matched control subjects without any SA diagnosis. The control subjects were assigned index dates as their corresponding SA patients, and were ensured having no ED diagnosis prior to their index dates. Totally, 4,835 male patients with suspected SA (including 1,946 probable SA patients) were matched to 145,050 control subjects (including 58,380 subjects matched to probable SA patients). The incidence rate of ED was significantly higher in probable SA patients as compared with the corresponding control subjects (5.7 vs. 2.3 per 1000 patient-year; adjusted incidence rate ratio = 2.0 [95% CI: 1.8-2.2], p<0.0001). The cumulative incidence was also significantly higher in the probable SA patients (p<0.0001). In multivariable Cox regression analysis, probable SA remained a significant risk factor for the development of ED after adjusting for age, residency, income level and comorbidities (hazard ratio = 2.0 [95%CI: 1.5-2.7], p<0.0001). In line with previous studies, this population-based large-scale study confirmed an increased ED incidence in SA patients in Chinese population. Physicians need to pay attention to the possible underlying SA while treating ED patients.
Hubble Tarantula Treasury Project - VI. Identification of Pre-Main-Sequence Stars using Machine Learning techniques

NASA Astrophysics Data System (ADS)

Ksoll, Victor F.; Gouliermis, Dimitrios A.; Klessen, Ralf S.; Grebel, Eva K.; Sabbi, Elena; Anderson, Jay; Lennon, Daniel J.; Cignoni, Michele; de Marchi, Guido; Smith, Linda J.; Tosi, Monica; van der Marel, Roeland P.

2018-05-01

The Hubble Tarantula Treasury Project (HTTP) has provided an unprecedented photometric coverage of the entire star-burst region of 30 Doradus down to the half Solar mass limit. We use the deep stellar catalogue of HTTP to identify all the pre-main-sequence (PMS) stars of the region, i.e., stars that have not started their lives on the main-sequence yet. The photometric distinction of these stars from the more evolved populations is not a trivial task due to several factors that alter their colour-magnitude diagram positions. The identification of PMS stars requires, thus, sophisticated statistical methods. We employ Machine Learning Classification techniques on the HTTP survey of more than 800,000 sources to identify the PMS stellar content of the observed field. Our methodology consists of 1) carefully selecting the most probable low-mass PMS stellar population of the star-forming cluster NGC2070, 2) using this sample to train classification algorithms to build a predictive model for PMS stars, and 3) applying this model in order to identify the most probable PMS content across the entire Tarantula Nebula. We employ Decision Tree, Random Forest and Support Vector Machine classifiers to categorise the stars as PMS and Non-PMS. The Random Forest and Support Vector Machine provided the most accurate models, predicting about 20,000 sources with a candidateship probability higher than 50 percent, and almost 10,000 PMS candidates with a probability higher than 95 percent. This is the richest and most accurate photometric catalogue of extragalactic PMS candidates across the extent of a whole star-forming complex.
A design methodology for nonlinear systems containing parameter uncertainty

NASA Technical Reports Server (NTRS)

Young, G. E.; Auslander, D. M.

1983-01-01

In the present design methodology for nonlinear systems containing parameter uncertainty, a generalized sensitivity analysis is incorporated which employs parameter space sampling and statistical inference. For the case of a system with j adjustable and k nonadjustable parameters, this methodology (which includes an adaptive random search strategy) is used to determine the combination of j adjustable parameter values which maximize the probability of those performance indices which simultaneously satisfy design criteria in spite of the uncertainty due to k nonadjustable parameters.
The probability of false positives in zero-dimensional analyses of one-dimensional kinematic, force and EMG trajectories.

PubMed

Pataky, Todd C; Vanrenterghem, Jos; Robinson, Mark A

2016-06-14

A false positive is the mistake of inferring an effect when none exists, and although α controls the false positive (Type I error) rate in classical hypothesis testing, a given α value is accurate only if the underlying model of randomness appropriately reflects experimentally observed variance. Hypotheses pertaining to one-dimensional (1D) (e.g. time-varying) biomechanical trajectories are most often tested using a traditional zero-dimensional (0D) Gaussian model of randomness, but variance in these datasets is clearly 1D. The purpose of this study was to determine the likelihood that analyzing smooth 1D data with a 0D model of variance will produce false positives. We first used random field theory (RFT) to predict the probability of false positives in 0D analyses. We then validated RFT predictions via numerical simulations of smooth Gaussian 1D trajectories. Results showed that, across a range of public kinematic, force/moment and EMG datasets, the median false positive rate was 0.382 and not the assumed α=0.05, even for a simple two-sample t test involving N=10 trajectories per group. The median false positive rate for experiments involving three-component vector trajectories was p=0.764. This rate increased to p=0.945 for two three-component vector trajectories, and to p=0.999 for six three-component vectors. This implies that experiments involving vector trajectories have a high probability of yielding 0D statistical significance when there is, in fact, no 1D effect. Either (a) explicit a priori identification of 0D variables or (b) adoption of 1D methods can more tightly control α. Copyright © 2016 Elsevier Ltd. All rights reserved.
Stochastic Models of Emerging Infectious Disease Transmission on Adaptive Random Networks

PubMed Central

Pipatsart, Navavat; Triampo, Wannapong

2017-01-01

We presented adaptive random network models to describe human behavioral change during epidemics and performed stochastic simulations of SIR (susceptible-infectious-recovered) epidemic models on adaptive random networks. The interplay between infectious disease dynamics and network adaptation dynamics was investigated in regard to the disease transmission and the cumulative number of infection cases. We found that the cumulative case was reduced and associated with an increasing network adaptation probability but was increased with an increasing disease transmission probability. It was found that the topological changes of the adaptive random networks were able to reduce the cumulative number of infections and also to delay the epidemic peak. Our results also suggest the existence of a critical value for the ratio of disease transmission and adaptation probabilities below which the epidemic cannot occur. PMID:29075314
Influence function based variance estimation and missing data issues in case-cohort studies.

PubMed

Mark, S D; Katki, H

2001-12-01

Recognizing that the efficiency in relative risk estimation for the Cox proportional hazards model is largely constrained by the total number of cases, Prentice (1986) proposed the case-cohort design in which covariates are measured on all cases and on a random sample of the cohort. Subsequent to Prentice, other methods of estimation and sampling have been proposed for these designs. We formalize an approach to variance estimation suggested by Barlow (1994), and derive a robust variance estimator based on the influence function. We consider the applicability of the variance estimator to all the proposed case-cohort estimators, and derive the influence function when known sampling probabilities in the estimators are replaced by observed sampling fractions. We discuss the modifications required when cases are missing covariate information. The missingness may occur by chance, and be completely at random; or may occur as part of the sampling design, and depend upon other observed covariates. We provide an adaptation of S-plus code that allows estimating influence function variances in the presence of such missing covariates. Using examples from our current case-cohort studies on esophageal and gastric cancer, we illustrate how our results our useful in solving design and analytic issues that arise in practice.
A two-stage Monte Carlo approach to the expression of uncertainty with finite sample sizes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Crowder, Stephen Vernon; Moyer, Robert D.

2005-05-01

Proposed supplement I to the GUM outlines a 'propagation of distributions' approach to deriving the distribution of a measurand for any non-linear function and for any set of random inputs. The supplement's proposed Monte Carlo approach assumes that the distributions of the random inputs are known exactly. This implies that the sample sizes are effectively infinite. In this case, the mean of the measurand can be determined precisely using a large number of Monte Carlo simulations. In practice, however, the distributions of the inputs will rarely be known exactly, but must be estimated using possibly small samples. If these approximatedmore » distributions are treated as exact, the uncertainty in estimating the mean is not properly taken into account. In this paper, we propose a two-stage Monte Carlo procedure that explicitly takes into account the finite sample sizes used to estimate parameters of the input distributions. We will illustrate the approach with a case study involving the efficiency of a thermistor mount power sensor. The performance of the proposed approach will be compared to the standard GUM approach for finite samples using simple non-linear measurement equations. We will investigate performance in terms of coverage probabilities of derived confidence intervals.« less
Occupancy Modeling Species-Environment Relationships with Non-ignorable Survey Designs.

PubMed

Irvine, Kathryn M; Rodhouse, Thomas J; Wright, Wilson J; Olsen, Anthony R

2018-05-26

Statistical models supporting inferences about species occurrence patterns in relation to environmental gradients are fundamental to ecology and conservation biology. A common implicit assumption is that the sampling design is ignorable and does not need to be formally accounted for in analyses. The analyst assumes data are representative of the desired population and statistical modeling proceeds. However, if datasets from probability and non-probability surveys are combined or unequal selection probabilities are used, the design may be non ignorable. We outline the use of pseudo-maximum likelihood estimation for site-occupancy models to account for such non-ignorable survey designs. This estimation method accounts for the survey design by properly weighting the pseudo-likelihood equation. In our empirical example, legacy and newer randomly selected locations were surveyed for bats to bridge a historic statewide effort with an ongoing nationwide program. We provide a worked example using bat acoustic detection/non-detection data and show how analysts can diagnose whether their design is ignorable. Using simulations we assessed whether our approach is viable for modeling datasets composed of sites contributed outside of a probability design Pseudo-maximum likelihood estimates differed from the usual maximum likelihood occu31 pancy estimates for some bat species. Using simulations we show the maximum likelihood estimator of species-environment relationships with non-ignorable sampling designs was biased, whereas the pseudo-likelihood estimator was design-unbiased. However, in our simulation study the designs composed of a large proportion of legacy or non-probability sites resulted in estimation issues for standard errors. These issues were likely a result of highly variable weights confounded by small sample sizes (5% or 10% sampling intensity and 4 revisits). Aggregating datasets from multiple sources logically supports larger sample sizes and potentially increases spatial extents for statistical inferences. Our results suggest that ignoring the mechanism for how locations were selected for data collection (e.g., the sampling design) could result in erroneous model-based conclusions. Therefore, in order to ensure robust and defensible recommendations for evidence-based conservation decision-making, the survey design information in addition to the data themselves must be available for analysts. Details for constructing the weights used in estimation and code for implementation are provided. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

A Dynamic Bayesian Network Model for the Production and Inventory Control

NASA Astrophysics Data System (ADS)

Shin, Ji-Sun; Takazaki, Noriyuki; Lee, Tae-Hong; Kim, Jin-Il; Lee, Hee-Hyol

In general, the production quantities and delivered goods are changed randomly and then the total stock is also changed randomly. This paper deals with the production and inventory control using the Dynamic Bayesian Network. Bayesian Network is a probabilistic model which represents the qualitative dependence between two or more random variables by the graph structure, and indicates the quantitative relations between individual variables by the conditional probability. The probabilistic distribution of the total stock is calculated through the propagation of the probability on the network. Moreover, an adjusting rule of the production quantities to maintain the probability of a lower limit and a ceiling of the total stock to certain values is shown.
Definition of the Neutrosophic Probability

NASA Astrophysics Data System (ADS)

Smarandache, Florentin

2014-03-01

Neutrosophic probability (or likelihood) [1995] is a particular case of the neutrosophic measure. It is an estimation of an event (different from indeterminacy) to occur, together with an estimation that some indeterminacy may occur, and the estimation that the event does not occur. The classical probability deals with fair dice, coins, roulettes, spinners, decks of cards, random works, while neutrosophic probability deals with unfair, imperfect such objects and processes. For example, if we toss a regular die on an irregular surface which has cracks, then it is possible to get the die stuck on one of its edges or vertices in a crack (indeterminate outcome). The sample space is in this case: {1, 2, 3, 4, 5, 6, indeterminacy}. So, the probability of getting, for example 1, is less than 1/6. Since there are seven outcomes. The neutrosophic probability is a generalization of the classical probability because, when the chance of determinacy of a stochastic process is zero, these two probabilities coincide. The Neutrosophic Probability that of an event A occurs is NP (A) = (ch (A) , ch (indetA) , ch (A ̲)) = (T , I , F) , where T , I , F are subsets of [0,1], and T is the chance that A occurs, denoted ch(A); I is the indeterminate chance related to A, ch(indetermA) ; and F is the chance that A does not occur, ch (A ̲) . So, NP is a generalization of the Imprecise Probability as well. If T, I, and F are crisp numbers then: - 0 <= T + I + F <=3+ . We used the same notations (T,I,F) as in neutrosophic logic and set.
Randomized central limit theorems: A unified theory.

PubMed

Eliazar, Iddo; Klafter, Joseph

2010-08-01

The central limit theorems (CLTs) characterize the macroscopic statistical behavior of large ensembles of independent and identically distributed random variables. The CLTs assert that the universal probability laws governing ensembles' aggregate statistics are either Gaussian or Lévy, and that the universal probability laws governing ensembles' extreme statistics are Fréchet, Weibull, or Gumbel. The scaling schemes underlying the CLTs are deterministic-scaling all ensemble components by a common deterministic scale. However, there are "random environment" settings in which the underlying scaling schemes are stochastic-scaling the ensemble components by different random scales. Examples of such settings include Holtsmark's law for gravitational fields and the Stretched Exponential law for relaxation times. In this paper we establish a unified theory of randomized central limit theorems (RCLTs)-in which the deterministic CLT scaling schemes are replaced with stochastic scaling schemes-and present "randomized counterparts" to the classic CLTs. The RCLT scaling schemes are shown to be governed by Poisson processes with power-law statistics, and the RCLTs are shown to universally yield the Lévy, Fréchet, and Weibull probability laws.
Randomized central limit theorems: A unified theory

NASA Astrophysics Data System (ADS)

Eliazar, Iddo; Klafter, Joseph

2010-08-01

The central limit theorems (CLTs) characterize the macroscopic statistical behavior of large ensembles of independent and identically distributed random variables. The CLTs assert that the universal probability laws governing ensembles’ aggregate statistics are either Gaussian or Lévy, and that the universal probability laws governing ensembles’ extreme statistics are Fréchet, Weibull, or Gumbel. The scaling schemes underlying the CLTs are deterministic—scaling all ensemble components by a common deterministic scale. However, there are “random environment” settings in which the underlying scaling schemes are stochastic—scaling the ensemble components by different random scales. Examples of such settings include Holtsmark’s law for gravitational fields and the Stretched Exponential law for relaxation times. In this paper we establish a unified theory of randomized central limit theorems (RCLTs)—in which the deterministic CLT scaling schemes are replaced with stochastic scaling schemes—and present “randomized counterparts” to the classic CLTs. The RCLT scaling schemes are shown to be governed by Poisson processes with power-law statistics, and the RCLTs are shown to universally yield the Lévy, Fréchet, and Weibull probability laws.
The effects of the one-step replica symmetry breaking on the Sherrington-Kirkpatrick spin glass model in the presence of random field with a joint Gaussian probability density function for the exchange interactions and random fields

NASA Astrophysics Data System (ADS)

Hadjiagapiou, Ioannis A.; Velonakis, Ioannis N.

2018-07-01

The Sherrington-Kirkpatrick Ising spin glass model, in the presence of a random magnetic field, is investigated within the framework of the one-step replica symmetry breaking. The two random variables (exchange integral interaction Jij and random magnetic field hi) are drawn from a joint Gaussian probability density function characterized by a correlation coefficient ρ, assuming positive and negative values. The thermodynamic properties, the three different phase diagrams and system's parameters are computed with respect to the natural parameters of the joint Gaussian probability density function at non-zero and zero temperatures. The low temperature negative entropy controversy, a result of the replica symmetry approach, has been partly remedied in the current study, leading to a less negative result. In addition, the present system possesses two successive spin glass phase transitions with characteristic temperatures.
University Students' Conceptual Knowledge of Randomness and Probability in the Contexts of Evolution and Mathematics

ERIC Educational Resources Information Center

Fiedler, Daniela; Tröbst, Steffen; Harms, Ute

2017-01-01

Students of all ages face severe conceptual difficulties regarding key aspects of evolution-- the central, unifying, and overarching theme in biology. Aspects strongly related to abstract "threshold" concepts like randomness and probability appear to pose particular difficulties. A further problem is the lack of an appropriate instrument…
Generalizability of findings from randomized controlled trials: application to the National Institute of Drug Abuse Clinical Trials Network.

PubMed

Susukida, Ryoko; Crum, Rosa M; Ebnesajjad, Cyrus; Stuart, Elizabeth A; Mojtabai, Ramin

2017-07-01

To compare randomized controlled trial (RCT) sample treatment effects with the population effects of substance use disorder (SUD) treatment. Statistical weighting was used to re-compute the effects from 10 RCTs such that the participants in the trials had characteristics that resembled those of patients in the target populations. Multi-site RCTs and usual SUD treatment settings in the United States. A total of 3592 patients in 10 RCTs and 1 602 226 patients from usual SUD treatment settings between 2001 and 2009. Three outcomes of SUD treatment were examined: retention, urine toxicology and abstinence. We weighted the RCT sample treatment effects using propensity scores representing the conditional probability of participating in RCTs. Weighting the samples changed the significance of estimated sample treatment effects. Most commonly, positive effects of trials became statistically non-significant after weighting (three trials for retention and urine toxicology and one trial for abstinence); also, non-significant effects became significantly positive (one trial for abstinence) and significantly negative effects became non-significant (two trials for abstinence). There was suggestive evidence of treatment effect heterogeneity in subgroups that are under- or over-represented in the trials, some of which were consistent with the differences in average treatment effects between weighted and unweighted results. The findings of randomized controlled trials (RCTs) for substance use disorder treatment do not appear to be directly generalizable to target populations when the RCT samples do not reflect adequately the target populations and there is treatment effect heterogeneity across patient subgroups. © 2017 Society for the Study of Addiction.
Kolmogorov complexity, statistical regularization of inverse problems, and Birkhoff's formalization of beauty

NASA Astrophysics Data System (ADS)

Kreinovich, Vladik; Longpre, Luc; Koshelev, Misha

1998-09-01

Most practical applications of statistical methods are based on the implicit assumption that if an event has a very small probability, then it cannot occur. For example, the probability that a kettle placed on a cold stove would start boiling by itself is not 0, it is positive, but it is so small, that physicists conclude that such an event is simply impossible. This assumption is difficult to formalize in traditional probability theory, because this theory only describes measures on sets and does not allow us to divide functions into 'random' and non-random ones. This distinction was made possible by the idea of algorithmic randomness, introduce by Kolmogorov and his student Martin- Loef in the 1960s. We show that this idea can also be used for inverse problems. In particular, we prove that for every probability measure, the corresponding set of random functions is compact, and, therefore, the corresponding restricted inverse problem is well-defined. The resulting techniques turns out to be interestingly related with the qualitative esthetic measure introduced by G. Birkhoff as order/complexity.
Accounting for length-bias and selection effects in estimating the distribution of menstrual cycle length.

PubMed

Lum, Kirsten J; Sundaram, Rajeshwari; Louis, Thomas A

2015-01-01

Prospective pregnancy studies are a valuable source of longitudinal data on menstrual cycle length. However, care is needed when making inferences of such renewal processes. For example, accounting for the sampling plan is necessary for unbiased estimation of the menstrual cycle length distribution for the study population. If couples can enroll when they learn of the study as opposed to waiting for the start of a new menstrual cycle, then due to length-bias, the enrollment cycle will be stochastically larger than the general run of cycles, a typical property of prevalent cohort studies. Furthermore, the probability of enrollment can depend on the length of time since a woman's last menstrual period (a backward recurrence time), resulting in selection effects. We focus on accounting for length-bias and selection effects in the likelihood for enrollment menstrual cycle length, using a recursive two-stage approach wherein we first estimate the probability of enrollment as a function of the backward recurrence time and then use it in a likelihood with sampling weights that account for length-bias and selection effects. To broaden the applicability of our methods, we augment our model to incorporate a couple-specific random effect and time-independent covariate. A simulation study quantifies performance for two scenarios of enrollment probability when proper account is taken of sampling plan features. In addition, we estimate the probability of enrollment and the distribution of menstrual cycle length for the study population of the Longitudinal Investigation of Fertility and the Environment Study. Published by Oxford University Press 2014. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Meta-analysis with missing study-level sample variance data.

PubMed

Chowdhry, Amit K; Dworkin, Robert H; McDermott, Michael P

2016-07-30

We consider a study-level meta-analysis with a normally distributed outcome variable and possibly unequal study-level variances, where the object of inference is the difference in means between a treatment and control group. A common complication in such an analysis is missing sample variances for some studies. A frequently used approach is to impute the weighted (by sample size) mean of the observed variances (mean imputation). Another approach is to include only those studies with variances reported (complete case analysis). Both mean imputation and complete case analysis are only valid under the missing-completely-at-random assumption, and even then the inverse variance weights produced are not necessarily optimal. We propose a multiple imputation method employing gamma meta-regression to impute the missing sample variances. Our method takes advantage of study-level covariates that may be used to provide information about the missing data. Through simulation studies, we show that multiple imputation, when the imputation model is correctly specified, is superior to competing methods in terms of confidence interval coverage probability and type I error probability when testing a specified group difference. Finally, we describe a similar approach to handling missing variances in cross-over studies. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Detecting temporal trends in species assemblages with bootstrapping procedures and hierarchical models

USGS Publications Warehouse

Gotelli, Nicholas J.; Dorazio, Robert M.; Ellison, Aaron M.; Grossman, Gary D.

2010-01-01

Quantifying patterns of temporal trends in species assemblages is an important analytical challenge in community ecology. We describe methods of analysis that can be applied to a matrix of counts of individuals that is organized by species (rows) and time-ordered sampling periods (columns). We first developed a bootstrapping procedure to test the null hypothesis of random sampling from a stationary species abundance distribution with temporally varying sampling probabilities. This procedure can be modified to account for undetected species. We next developed a hierarchical model to estimate species-specific trends in abundance while accounting for species-specific probabilities of detection. We analysed two long-term datasets on stream fishes and grassland insects to demonstrate these methods. For both assemblages, the bootstrap test indicated that temporal trends in abundance were more heterogeneous than expected under the null model. We used the hierarchical model to estimate trends in abundance and identified sets of species in each assemblage that were steadily increasing, decreasing or remaining constant in abundance over more than a decade of standardized annual surveys. Our methods of analysis are broadly applicable to other ecological datasets, and they represent an advance over most existing procedures, which do not incorporate effects of incomplete sampling and imperfect detection.
Overlooked Threats to Respondent Driven Sampling Estimators: Peer Recruitment Reality, Degree Measures, and Random Selection Assumption.

PubMed

Li, Jianghong; Valente, Thomas W; Shin, Hee-Sung; Weeks, Margaret; Zelenev, Alexei; Moothi, Gayatri; Mosher, Heather; Heimer, Robert; Robles, Eduardo; Palmer, Greg; Obidoa, Chinekwu

2017-06-28

Intensive sociometric network data were collected from a typical respondent driven sample (RDS) of 528 people who inject drugs residing in Hartford, Connecticut in 2012-2013. This rich dataset enabled us to analyze a large number of unobserved network nodes and ties for the purpose of assessing common assumptions underlying RDS estimators. Results show that several assumptions central to RDS estimators, such as random selection, enrollment probability proportional to degree, and recruitment occurring over recruiter's network ties, were violated. These problems stem from an overly simplistic conceptualization of peer recruitment processes and dynamics. We found nearly half of participants were recruited via coupon redistribution on the street. Non-uniform patterns occurred in multiple recruitment stages related to both recruiter behavior (choosing and reaching alters, passing coupons, etc.) and recruit behavior (accepting/rejecting coupons, failing to enter study, passing coupons to others). Some factors associated with these patterns were also associated with HIV risk.
Reward and uncertainty in exploration programs

NASA Technical Reports Server (NTRS)

Kaufman, G. M.; Bradley, P. G.

1971-01-01

A set of variables which are crucial to the economic outcome of petroleum exploration are discussed. These are treated as random variables; the values they assume indicate the number of successes that occur in a drilling program and determine, for a particular discovery, the unit production cost and net economic return if that reservoir is developed. In specifying the joint probability law for those variables, extreme and probably unrealistic assumptions are made. In particular, the different random variables are assumed to be independently distributed. Using postulated probability functions and specified parameters, values are generated for selected random variables, such as reservoir size. From this set of values the economic magnitudes of interest, net return and unit production cost are computed. This constitutes a single trial, and the procedure is repeated many times. The resulting histograms approximate the probability density functions of the variables which describe the economic outcomes of an exploratory drilling program.
University Students' Conceptual Knowledge of Randomness and Probability in the Contexts of Evolution and Mathematics.

PubMed

Fiedler, Daniela; Tröbst, Steffen; Harms, Ute

2017-01-01

Students of all ages face severe conceptual difficulties regarding key aspects of evolution-the central, unifying, and overarching theme in biology. Aspects strongly related to abstract "threshold" concepts like randomness and probability appear to pose particular difficulties. A further problem is the lack of an appropriate instrument for assessing students' conceptual knowledge of randomness and probability in the context of evolution. To address this problem, we have developed two instruments, Ra ndomness and Pro bability Test in the Context of Evo lution (RaProEvo) and Ra ndomness and Pro bability Test in the Context of Math ematics (RaProMath), that include both multiple-choice and free-response items. The instruments were administered to 140 university students in Germany, then the Rasch partial-credit model was applied to assess them. The results indicate that the instruments generate reliable and valid inferences about students' conceptual knowledge of randomness and probability in the two contexts (which are separable competencies). Furthermore, RaProEvo detected significant differences in knowledge of randomness and probability, as well as evolutionary theory, between biology majors and preservice biology teachers. © 2017 D. Fiedler et al. CBE—Life Sciences Education © 2017 The American Society for Cell Biology. This article is distributed by The American Society for Cell Biology under license from the author(s). It is available to the public under an Attribution–Noncommercial–Share Alike 3.0 Unported Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/3.0).
Mining Rare Events Data for Assessing Customer Attrition Risk

NASA Astrophysics Data System (ADS)

Au, Tom; Chin, Meei-Ling Ivy; Ma, Guangqin

Customer attrition refers to the phenomenon whereby a customer leaves a service provider. As competition intensifies, preventing customers from leaving is a major challenge to many businesses such as telecom service providers. Research has shown that retaining existing customers is more profitable than acquiring new customers due primarily to savings on acquisition costs, the higher volume of service consumption, and customer referrals. For a large enterprise, its customer base consists of tens of millions service subscribers, more often the events, such as switching to competitors or canceling services are large in absolute number, but rare in percentage, far less than 5%. Based on a simple random sample, popular statistical procedures, such as logistic regression, tree-based method and neural network, can sharply underestimate the probability of rare events, and often result a null model (no significant predictors). To improve efficiency and accuracy for event probability estimation, a case-based data collection technique is then considered. A case-based sample is formed by taking all available events and a small, but representative fraction of nonevents from a dataset of interest. In this article we showed a consistent prior correction method for events probability estimation and demonstrated the performance of the above data collection techniques in predicting customer attrition with actual telecommunications data.
Complete Numerical Solution of the Diffusion Equation of Random Genetic Drift

PubMed Central

Zhao, Lei; Yue, Xingye; Waxman, David

2013-01-01

A numerical method is presented to solve the diffusion equation for the random genetic drift that occurs at a single unlinked locus with two alleles. The method was designed to conserve probability, and the resulting numerical solution represents a probability distribution whose total probability is unity. We describe solutions of the diffusion equation whose total probability is unity as complete. Thus the numerical method introduced in this work produces complete solutions, and such solutions have the property that whenever fixation and loss can occur, they are automatically included within the solution. This feature demonstrates that the diffusion approximation can describe not only internal allele frequencies, but also the boundary frequencies zero and one. The numerical approach presented here constitutes a single inclusive framework from which to perform calculations for random genetic drift. It has a straightforward implementation, allowing it to be applied to a wide variety of problems, including those with time-dependent parameters, such as changing population sizes. As tests and illustrations of the numerical method, it is used to determine: (i) the probability density and time-dependent probability of fixation for a neutral locus in a population of constant size; (ii) the probability of fixation in the presence of selection; and (iii) the probability of fixation in the presence of selection and demographic change, the latter in the form of a changing population size. PMID:23749318
Open quantum random walk in terms of quantum Bernoulli noise

NASA Astrophysics Data System (ADS)

Wang, Caishi; Wang, Ce; Ren, Suling; Tang, Yuling

2018-03-01

In this paper, we introduce an open quantum random walk, which we call the QBN-based open walk, by means of quantum Bernoulli noise, and study its properties from a random walk point of view. We prove that, with the localized ground state as its initial state, the QBN-based open walk has the same limit probability distribution as the classical random walk. We also show that the probability distributions of the QBN-based open walk include those of the unitary quantum walk recently introduced by Wang and Ye (Quantum Inf Process 15:1897-1908, 2016) as a special case.
Probability sampling in legal cases: Kansas cellphone users

NASA Astrophysics Data System (ADS)

Kadane, Joseph B.

2012-10-01

Probability sampling is a standard statistical technique. This article introduces the basic ideas of probability sampling, and shows in detail how probability sampling was used in a particular legal case.
Scaling behavior for random walks with memory of the largest distance from the origin

NASA Astrophysics Data System (ADS)

Serva, Maurizio

2013-11-01

We study a one-dimensional random walk with memory. The behavior of the walker is modified with respect to the simple symmetric random walk only when he or she is at the maximum distance ever reached from his or her starting point (home). In this case, having the choice to move farther or to move closer, the walker decides with different probabilities. If the probability of a forward step is higher then the probability of a backward step, the walker is bold, otherwise he or she is timorous. We investigate the asymptotic properties of this bold-timorous random walk, showing that the scaling behavior varies continuously from subdiffusive (timorous) to superdiffusive (bold). The scaling exponents are fully determined with a new mathematical approach based on a decomposition of the dynamics in active journeys (the walker is at the maximum distance) and lazy journeys (the walker is not at the maximum distance).
Surprisingly rational: probability theory plus noise explains biases in judgment.

PubMed

Costello, Fintan; Watts, Paul

2014-07-01

The systematic biases seen in people's probability judgments are typically taken as evidence that people do not use the rules of probability theory when reasoning about probability but instead use heuristics, which sometimes yield reasonable judgments and sometimes yield systematic biases. This view has had a major impact in economics, law, medicine, and other fields; indeed, the idea that people cannot reason with probabilities has become a truism. We present a simple alternative to this view, where people reason about probability according to probability theory but are subject to random variation or noise in the reasoning process. In this account the effect of noise is canceled for some probabilistic expressions. Analyzing data from 2 experiments, we find that, for these expressions, people's probability judgments are strikingly close to those required by probability theory. For other expressions, this account produces systematic deviations in probability estimates. These deviations explain 4 reliable biases in human probabilistic reasoning (conservatism, subadditivity, conjunction, and disjunction fallacies). These results suggest that people's probability judgments embody the rules of probability theory and that biases in those judgments are due to the effects of random noise. (c) 2014 APA, all rights reserved.

Sample design effects in landscape genetics

USGS Publications Warehouse

Oyler-McCance, Sara J.; Fedy, Bradley C.; Landguth, Erin L.

2012-01-01

An important research gap in landscape genetics is the impact of different field sampling designs on the ability to detect the effects of landscape pattern on gene flow. We evaluated how five different sampling regimes (random, linear, systematic, cluster, and single study site) affected the probability of correctly identifying the generating landscape process of population structure. Sampling regimes were chosen to represent a suite of designs common in field studies. We used genetic data generated from a spatially-explicit, individual-based program and simulated gene flow in a continuous population across a landscape with gradual spatial changes in resistance to movement. Additionally, we evaluated the sampling regimes using realistic and obtainable number of loci (10 and 20), number of alleles per locus (5 and 10), number of individuals sampled (10-300), and generational time after the landscape was introduced (20 and 400). For a simulated continuously distributed species, we found that random, linear, and systematic sampling regimes performed well with high sample sizes (>200), levels of polymorphism (10 alleles per locus), and number of molecular markers (20). The cluster and single study site sampling regimes were not able to correctly identify the generating process under any conditions and thus, are not advisable strategies for scenarios similar to our simulations. Our research emphasizes the importance of sampling data at ecologically appropriate spatial and temporal scales and suggests careful consideration for sampling near landscape components that are likely to most influence the genetic structure of the species. In addition, simulating sampling designs a priori could help guide filed data collection efforts.
Spatial Variation of Soil Lead in an Urban Community Garden: Implications for Risk-Based Sampling.

PubMed

Bugdalski, Lauren; Lemke, Lawrence D; McElmurry, Shawn P

2014-01-01

Soil lead pollution is a recalcitrant problem in urban areas resulting from a combination of historical residential, industrial, and transportation practices. The emergence of urban gardening movements in postindustrial cities necessitates accurate assessment of soil lead levels to ensure safe gardening. In this study, we examined small-scale spatial variability of soil lead within a 15 × 30 m urban garden plot established on two adjacent residential lots located in Detroit, Michigan, USA. Eighty samples collected using a variably spaced sampling grid were analyzed for total, fine fraction (less than 250 μm), and bioaccessible soil lead. Measured concentrations varied at sampling scales of 1-10 m and a hot spot exceeding 400 ppm total soil lead was identified in the northwest portion of the site. An interpolated map of total lead was treated as an exhaustive data set, and random sampling was simulated to generate Monte Carlo distributions and evaluate alternative sampling strategies intended to estimate the average soil lead concentration or detect hot spots. Increasing the number of individual samples decreases the probability of overlooking the hot spot (type II error). However, the practice of compositing and averaging samples decreased the probability of overestimating the mean concentration (type I error) at the expense of increasing the chance for type II error. The results reported here suggest a need to reconsider U.S. Environmental Protection Agency sampling objectives and consequent guidelines for reclaimed city lots where soil lead distributions are expected to be nonuniform. © 2013 Society for Risk Analysis.
Eliciting and Developing Teachers' Conceptions of Random Processes in a Probability and Statistics Course

ERIC Educational Resources Information Center

Smith, Toni M.; Hjalmarson, Margret A.

2013-01-01

The purpose of this study is to examine prospective mathematics specialists' engagement in an instructional sequence designed to elicit and develop their understandings of random processes. The study was conducted with two different sections of a probability and statistics course for K-8 teachers. Thirty-two teachers participated. Video analyses…
Theoretical size distribution of fossil taxa: analysis of a null model.

PubMed

Reed, William J; Hughes, Barry D

2007-03-22

This article deals with the theoretical size distribution (of number of sub-taxa) of a fossil taxon arising from a simple null model of macroevolution. New species arise through speciations occurring independently and at random at a fixed probability rate, while extinctions either occur independently and at random (background extinctions) or cataclysmically. In addition new genera are assumed to arise through speciations of a very radical nature, again assumed to occur independently and at random at a fixed probability rate. The size distributions of the pioneering genus (following a cataclysm) and of derived genera are determined. Also the distribution of the number of genera is considered along with a comparison of the probability of a monospecific genus with that of a monogeneric family.
Local approximation of a metapopulation's equilibrium.

PubMed

Barbour, A D; McVinish, R; Pollett, P K

2018-04-18

We consider the approximation of the equilibrium of a metapopulation model, in which a finite number of patches are randomly distributed over a bounded subset [Formula: see text] of Euclidean space. The approximation is good when a large number of patches contribute to the colonization pressure on any given unoccupied patch, and when the quality of the patches varies little over the length scale determined by the colonization radius. If this is the case, the equilibrium probability of a patch at z being occupied is shown to be close to [Formula: see text], the equilibrium occupation probability in Levins's model, at any point [Formula: see text] not too close to the boundary, if the local colonization pressure and extinction rates appropriate to z are assumed. The approximation is justified by giving explicit upper and lower bounds for the occupation probabilities, expressed in terms of the model parameters. Since the patches are distributed randomly, the occupation probabilities are also random, and we complement our bounds with explicit bounds on the probability that they are satisfied at all patches simultaneously.
Ant-inspired density estimation via random walks

PubMed Central

Musco, Cameron; Su, Hsin-Hao

2017-01-01

Many ant species use distributed population density estimation in applications ranging from quorum sensing, to task allocation, to appraisal of enemy colony strength. It has been shown that ants estimate local population density by tracking encounter rates: The higher the density, the more often the ants bump into each other. We study distributed density estimation from a theoretical perspective. We prove that a group of anonymous agents randomly walking on a grid are able to estimate their density within a small multiplicative error in few steps by measuring their rates of encounter with other agents. Despite dependencies inherent in the fact that nearby agents may collide repeatedly (and, worse, cannot recognize when this happens), our bound nearly matches what would be required to estimate density by independently sampling grid locations. From a biological perspective, our work helps shed light on how ants and other social insects can obtain relatively accurate density estimates via encounter rates. From a technical perspective, our analysis provides tools for understanding complex dependencies in the collision probabilities of multiple random walks. We bound the strength of these dependencies using local mixing properties of the underlying graph. Our results extend beyond the grid to more general graphs, and we discuss applications to size estimation for social networks, density estimation for robot swarms, and random walk-based sampling for sensor networks. PMID:28928146
Uncertainty quantification of voice signal production mechanical model and experimental updating

NASA Astrophysics Data System (ADS)

Cataldo, E.; Soize, C.; Sampaio, R.

2013-11-01

The aim of this paper is to analyze the uncertainty quantification in a voice production mechanical model and update the probability density function corresponding to the tension parameter using the Bayes method and experimental data. Three parameters are considered uncertain in the voice production mechanical model used: the tension parameter, the neutral glottal area and the subglottal pressure. The tension parameter of the vocal folds is mainly responsible for the changing of the fundamental frequency of a voice signal, generated by a mechanical/mathematical model for producing voiced sounds. The three uncertain parameters are modeled by random variables. The probability density function related to the tension parameter is considered uniform and the probability density functions related to the neutral glottal area and the subglottal pressure are constructed using the Maximum Entropy Principle. The output of the stochastic computational model is the random voice signal and the Monte Carlo method is used to solve the stochastic equations allowing realizations of the random voice signals to be generated. For each realization of the random voice signal, the corresponding realization of the random fundamental frequency is calculated and the prior pdf of this random fundamental frequency is then estimated. Experimental data are available for the fundamental frequency and the posterior probability density function of the random tension parameter is then estimated using the Bayes method. In addition, an application is performed considering a case with a pathology in the vocal folds. The strategy developed here is important mainly due to two things. The first one is related to the possibility of updating the probability density function of a parameter, the tension parameter of the vocal folds, which cannot be measured direct and the second one is related to the construction of the likelihood function. In general, it is predefined using the known pdf. Here, it is constructed in a new and different manner, using the own system considered.
Poisson statistics of PageRank probabilities of Twitter and Wikipedia networks

NASA Astrophysics Data System (ADS)

Frahm, Klaus M.; Shepelyansky, Dima L.

2014-04-01

We use the methods of quantum chaos and Random Matrix Theory for analysis of statistical fluctuations of PageRank probabilities in directed networks. In this approach the effective energy levels are given by a logarithm of PageRank probability at a given node. After the standard energy level unfolding procedure we establish that the nearest spacing distribution of PageRank probabilities is described by the Poisson law typical for integrable quantum systems. Our studies are done for the Twitter network and three networks of Wikipedia editions in English, French and German. We argue that due to absence of level repulsion the PageRank order of nearby nodes can be easily interchanged. The obtained Poisson law implies that the nearby PageRank probabilities fluctuate as random independent variables.
Quantum speedup of Monte Carlo methods.

PubMed

Montanaro, Ashley

2015-09-08

Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently.
Quantum speedup of Monte Carlo methods

PubMed Central

Montanaro, Ashley

2015-01-01

Monte Carlo methods use random sampling to estimate numerical quantities which are hard to compute deterministically. One important example is the use in statistical physics of rapidly mixing Markov chains to approximately compute partition functions. In this work, we describe a quantum algorithm which can accelerate Monte Carlo methods in a very general setting. The algorithm estimates the expected output value of an arbitrary randomized or quantum subroutine with bounded variance, achieving a near-quadratic speedup over the best possible classical algorithm. Combining the algorithm with the use of quantum walks gives a quantum speedup of the fastest known classical algorithms with rigorous performance bounds for computing partition functions, which use multiple-stage Markov chain Monte Carlo techniques. The quantum algorithm can also be used to estimate the total variation distance between probability distributions efficiently. PMID:26528079
Brick tunnel randomization and the momentum of the probability mass.

PubMed

Kuznetsova, Olga M

2015-12-30

The allocation space of an unequal-allocation permuted block randomization can be quite wide. The development of unequal-allocation procedures with a narrower allocation space, however, is complicated by the need to preserve the unconditional allocation ratio at every step (the allocation ratio preserving (ARP) property). When the allocation paths are depicted on the K-dimensional unitary grid, where allocation to the l-th treatment is represented by a step along the l-th axis, l = 1 to K, the ARP property can be expressed in terms of the center of the probability mass after i allocations. Specifically, for an ARP allocation procedure that randomizes subjects to K treatment groups in w1 :⋯:wK ratio, w1 +⋯+wK =1, the coordinates of the center of the mass are (w1 i,…,wK i). In this paper, the momentum with respect to the center of the probability mass (expected imbalance in treatment assignments) is used to compare ARP procedures in how closely they approximate the target allocation ratio. It is shown that the two-arm and three-arm brick tunnel randomizations (BTR) are the ARP allocation procedures with the tightest allocation space among all allocation procedures with the same allocation ratio; the two-arm BTR is the minimum-momentum two-arm ARP allocation procedure. Resident probabilities of two-arm and three-arm BTR are analytically derived from the coordinates of the center of the probability mass; the existence of the respective transition probabilities is proven. Probability of deterministic assignments with BTR is found generally acceptable. Copyright © 2015 John Wiley & Sons, Ltd. Copyright © 2015 John Wiley & Sons, Ltd.
Leveraging Genomic Annotations and Pleiotropic Enrichment for Improved Replication Rates in Schizophrenia GWAS

PubMed Central

Wang, Yunpeng; Thompson, Wesley K.; Schork, Andrew J.; Holland, Dominic; Chen, Chi-Hua; Bettella, Francesco; Desikan, Rahul S.; Li, Wen; Witoelar, Aree; Zuber, Verena; Devor, Anna; Nöthen, Markus M.; Rietschel, Marcella; Chen, Qiang; Werge, Thomas; Cichon, Sven; Weinberger, Daniel R.; Djurovic, Srdjan; O’Donovan, Michael; Visscher, Peter M.; Andreassen, Ole A.; Dale, Anders M.

2016-01-01

Most of the genetic architecture of schizophrenia (SCZ) has not yet been identified. Here, we apply a novel statistical algorithm called Covariate-Modulated Mixture Modeling (CM3), which incorporates auxiliary information (heterozygosity, total linkage disequilibrium, genomic annotations, pleiotropy) for each single nucleotide polymorphism (SNP) to enable more accurate estimation of replication probabilities, conditional on the observed test statistic (“z-score”) of the SNP. We use a multiple logistic regression on z-scores to combine information from auxiliary information to derive a “relative enrichment score” for each SNP. For each stratum of these relative enrichment scores, we obtain nonparametric estimates of posterior expected test statistics and replication probabilities as a function of discovery z-scores, using a resampling-based approach that repeatedly and randomly partitions meta-analysis sub-studies into training and replication samples. We fit a scale mixture of two Gaussians model to each stratum, obtaining parameter estimates that minimize the sum of squared differences of the scale-mixture model with the stratified nonparametric estimates. We apply this approach to the recent genome-wide association study (GWAS) of SCZ (n = 82,315), obtaining a good fit between the model-based and observed effect sizes and replication probabilities. We observed that SNPs with low enrichment scores replicate with a lower probability than SNPs with high enrichment scores even when both they are genome-wide significant (p < 5x10-8). There were 693 and 219 independent loci with model-based replication rates ≥80% and ≥90%, respectively. Compared to analyses not incorporating relative enrichment scores, CM3 increased out-of-sample yield for SNPs that replicate at a given rate. This demonstrates that replication probabilities can be more accurately estimated using prior enrichment information with CM3. PMID:26808560
Health Professionals Prefer to Communicate Risk-Related Numerical Information Using "1-in-X" Ratios.

PubMed

Sirota, Miroslav; Juanchich, Marie; Petrova, Dafina; Garcia-Retamero, Rocio; Walasek, Lukasz; Bhatia, Sudeep

2018-04-01

Previous research has shown that format effects, such as the "1-in-X" effect-whereby "1-in-X" ratios lead to a higher perceived probability than "N-in-N*X" ratios-alter perceptions of medical probabilities. We do not know, however, how prevalent this effect is in practice; i.e., how often health professionals use the "1-in-X" ratio. We assembled 4 different sources of evidence, involving experimental work and corpus studies, to examine the use of "1-in-X" and other numerical formats quantifying probability. Our results revealed that the use of the "1-in-X" ratio is prevalent and that health professionals prefer this format compared with other numerical formats (i.e., the "N-in-N*X", %, and decimal formats). In Study 1, UK family physicians preferred to communicate prenatal risk using a "1-in-X" ratio (80.4%, n = 131) across different risk levels and regardless of patients' numeracy levels. In Study 2, a sample from the UK adult population ( n = 203) reported that most GPs (60.6%) preferred to use "1-in-X" ratios compared with other formats. In Study 3, "1-in-X" ratios were the most commonly used format in a set of randomly sampled drug leaflets describing the risk of side effects (100%, n = 94). In Study 4, the "1-in-X" format was the most commonly used numerical expression of medical probabilities or frequencies on the UK's NHS website (45.7%, n = 2,469 sentences). The prevalent use of "1-in-X" ratios magnifies the chances of increased subjective probability. Further research should establish clinical significance of the "1-in-X" effect.
Factorization of Observables

NASA Astrophysics Data System (ADS)

Eliaš, Peter; Frič, Roman

2017-12-01

Categorical approach to probability leads to better understanding of basic notions and constructions in generalized (fuzzy, operational, quantum) probability, where observables—dual notions to generalized random variables (statistical maps)—play a major role. First, to avoid inconsistencies, we introduce three categories L, S, and P, the objects and morphisms of which correspond to basic notions of fuzzy probability theory and operational probability theory, and describe their relationships. To illustrate the advantages of categorical approach, we show that two categorical constructions involving observables (related to the representation of generalized random variables via products, or smearing of sharp observables, respectively) can be described as factorizing a morphism into composition of two morphisms having desired properties. We close with a remark concerning products.
Evaluating cost-efficiency and accuracy of hunter harvest survey designs

USGS Publications Warehouse

Lukacs, P.M.; Gude, J.A.; Russell, R.E.; Ackerman, B.B.

2011-01-01

Effective management of harvested wildlife often requires accurate estimates of the number of animals harvested annually by hunters. A variety of techniques exist to obtain harvest data, such as hunter surveys, check stations, mandatory reporting requirements, and voluntary reporting of harvest. Agencies responsible for managing harvested wildlife such as deer (Odocoileus spp.), elk (Cervus elaphus), and pronghorn (Antilocapra americana) are challenged with balancing the cost of data collection versus the value of the information obtained. We compared precision, bias, and relative cost of several common strategies, including hunter self-reporting and random sampling, for estimating hunter harvest using a realistic set of simulations. Self-reporting with a follow-up survey of hunters who did not report produces the best estimate of harvest in terms of precision and bias, but it is also, by far, the most expensive technique. Self-reporting with no followup survey risks very large bias in harvest estimates, and the cost increases with increased response rate. Probability-based sampling provides a substantial cost savings, though accuracy can be affected by nonresponse bias. We recommend stratified random sampling with a calibration estimator used to reweight the sample based on the proportions of hunters responding in each covariate category as the best option for balancing cost and accuracy. ?? 2011 The Wildlife Society.
Estimating the Probability of Electrical Short Circuits from Tin Whiskers. Part 2

NASA Technical Reports Server (NTRS)

Courey, Karim J.; Asfour, Shihab S.; Onar, Arzu; Bayliss, Jon A.; Ludwig, Larry L.; Wright, Maria C.

2009-01-01

To comply with lead-free legislation, many manufacturers have converted from tin-lead to pure tin finishes of electronic components. However, pure tin finishes have a greater propensity to grow tin whiskers than tin-lead finishes. Since tin whiskers present an electrical short circuit hazard in electronic components, simulations have been developed to quantify the risk of said short circuits occurring. Existing risk simulations make the assumption that when a free tin whisker has bridged two adjacent exposed electrical conductors, the result is an electrical short circuit. This conservative assumption is made because shorting is a random event that had an unknown probability associated with it. Note however that due to contact resistance electrical shorts may not occur at lower voltage levels. In our first article we developed an empirical probability model for tin whisker shorting. In this paper, we develop a more comprehensive empirical model using a refined experiment with a larger sample size, in which we studied the effect of varying voltage on the breakdown of the contact resistance which leads to a short circuit. From the resulting data we estimated the probability distribution of an electrical short, as a function of voltage.
Phase Transition for the Large-Dimensional Contact Process with Random Recovery Rates on Open Clusters

NASA Astrophysics Data System (ADS)

Xue, Xiaofeng

2016-12-01

In this paper we are concerned with the contact process with random recovery rates on open clusters of bond percolation on Z^d. Let ξ be a random variable such that P(ξ ≥ 1)=1, which ensures E1/ξ <+∞, then we assign i. i. d. copies of ξ on the vertices as the random recovery rates. Assuming that each edge is open with probability p and the infection can only spread through the open edges, then we obtain that limsup _{d→ +∞}λ _d≤ λ _c=1/pE{1}/{ξ}, where λ _d is the critical value of the process on Z^d, i.e., the maximum of the infection rates with which the infection dies out with probability one when only the origin is infected at t=0. To prove the above main result, we show that the following phase transition occurs. Assuming that lceil log drceil vertices are infected at t=0, where these vertices can be located anywhere, then when the infection rate λ >λ _c, the process survives with high probability as d→ +∞ while when λ <λ _c, the process dies out at time O(log d) with high probability.
Jimsphere wind and turbulence exceedance statistic

NASA Technical Reports Server (NTRS)

Adelfang, S. I.; Court, A.

1972-01-01

Exceedance statistics of winds and gusts observed over Cape Kennedy with Jimsphere balloon sensors are described. Gust profiles containing positive and negative departures, from smoothed profiles, in the wavelength ranges 100-2500, 100-1900, 100-860, and 100-460 meters were computed from 1578 profiles with four 41 weight digital high pass filters. Extreme values of the square root of gust speed are normally distributed. Monthly and annual exceedance probability distributions of normalized rms gust speeds in three altitude bands (2-7, 6-11, and 9-14 km) are log-normal. The rms gust speeds are largest in the 100-2500 wavelength band between 9 and 14 km in late winter and early spring. A study of monthly and annual exceedance probabilities and the number of occurrences per kilometer of level crossings with positive slope indicates significant variability with season, altitude, and filter configuration. A decile sampling scheme is tested and an optimum approach is suggested for drawing a relatively small random sample that represents the characteristic extreme wind speeds and shears of a large parent population of Jimsphere wind profiles.
Analysis of creative mathematic thinking ability in problem based learning model based on self-regulation learning

NASA Astrophysics Data System (ADS)

Munahefi, D. N.; Waluya, S. B.; Rochmad

2018-03-01

The purpose of this research identified the effectiveness of Problem Based Learning (PBL) models based on Self Regulation Leaning (SRL) on the ability of mathematical creative thinking and analyzed the ability of mathematical creative thinking of high school students in solving mathematical problems. The population of this study was students of grade X SMA N 3 Klaten. The research method used in this research was sequential explanatory. Quantitative stages with simple random sampling technique, where two classes were selected randomly as experimental class was taught with the PBL model based on SRL and control class was taught with expository model. The selection of samples at the qualitative stage was non-probability sampling technique in which each selected 3 students were high, medium, and low academic levels. PBL model with SRL approach effectived to students’ mathematical creative thinking ability. The ability of mathematical creative thinking of low academic level students with PBL model approach of SRL were achieving the aspect of fluency and flexibility. Students of academic level were achieving fluency and flexibility aspects well. But the originality of students at the academic level was not yet well structured. Students of high academic level could reach the aspect of originality.
Phylogenic analysis and forensic genetic characterization of Chinese Uyghur group via autosomal multi STR markers.

PubMed

Jin, Xiaoye; Wei, Yuanyuan; Chen, Jiangang; Kong, Tingting; Mu, Yuling; Guo, Yuxin; Dong, Qian; Xie, Tong; Meng, Haotian; Zhang, Meng; Li, Jianfei; Li, Xiaopeng; Zhu, Bofeng

2017-09-26

We investigated the allelic frequencies and forensic descriptive parameters of 23 autosomal short tandem repeat loci in a randomly selected sample of 1218 unrelated healthy Uyghur individuals residing in the Xinjiang Uyghur Autonomous Region, northwest China. A total of 281 alleles at these loci were identified and their corresponding allelic frequencies ranged from 0.0004 to 0.5390. The combined match probability and combined probability of exclusion of all loci were 5.192 × 10 -29 and 0.9999999996594, respectively. The results of population genetic study manifested that Uyghur had close relationships with those contiguous populations, such as Xibe and Hui groups. In a word, these autosomal short tandem repeat loci were highly informative in Uyghur group and the multiplex PCR system could be used as a valuable tool for forensic caseworks and population genetic analysis.

Learning Probabilities From Random Observables in High Dimensions: The Maximum Entropy Distribution and Others

NASA Astrophysics Data System (ADS)

Obuchi, Tomoyuki; Cocco, Simona; Monasson, Rémi

2015-11-01

We consider the problem of learning a target probability distribution over a set of N binary variables from the knowledge of the expectation values (with this target distribution) of M observables, drawn uniformly at random. The space of all probability distributions compatible with these M expectation values within some fixed accuracy, called version space, is studied. We introduce a biased measure over the version space, which gives a boost increasing exponentially with the entropy of the distributions and with an arbitrary inverse `temperature' Γ . The choice of Γ allows us to interpolate smoothly between the unbiased measure over all distributions in the version space (Γ =0) and the pointwise measure concentrated at the maximum entropy distribution (Γ → ∞ ). Using the replica method we compute the volume of the version space and other quantities of interest, such as the distance R between the target distribution and the center-of-mass distribution over the version space, as functions of α =(log M)/N and Γ for large N. Phase transitions at critical values of α are found, corresponding to qualitative improvements in the learning of the target distribution and to the decrease of the distance R. However, for fixed α the distance R does not vary with Γ which means that the maximum entropy distribution is not closer to the target distribution than any other distribution compatible with the observable values. Our results are confirmed by Monte Carlo sampling of the version space for small system sizes (N≤ 10).
Zero field reversal probability in thermally assisted magnetization reversal

NASA Astrophysics Data System (ADS)

Prasetya, E. B.; Utari; Purnama, B.

2017-11-01

This paper discussed about zero field reversal probability in thermally assisted magnetization reversal (TAMR). Appearance of reversal probability in zero field investigated through micromagnetic simulation by solving stochastic Landau-Lifshitz-Gibert (LLG). The perpendicularly anisotropy magnetic dot of 50×50×20 nm3 is considered as single cell magnetic storage of magnetic random acces memory (MRAM). Thermally assisted magnetization reversal was performed by cooling writing process from near/almost Curie point to room temperature on 20 times runs for different randomly magnetized state. The results show that the probability reversal under zero magnetic field decreased with the increase of the energy barrier. The zero-field probability switching of 55% attained for energy barrier of 60 k B T and the reversal probability become zero noted at energy barrier of 2348 k B T. The higest zero-field switching probability of 55% attained for energy barrier of 60 k B T which corespond to magnetif field of 150 Oe for switching.
Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing.

PubMed

Watanabe, Manabu; Kusano, Junko; Ohtaki, Shinsaku; Ishikura, Takashi; Katayama, Jin; Koguchi, Akira; Paumen, Michael; Hayashi, Yoshiharu

2014-09-01

Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line) were used as a model. Single-cell capture was performed using laser capture microdissection (LCM) with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈ 10(6) cells) were subjected to whole genome amplification (WGA). For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel) was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 10(31-35). For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100 × were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100 × were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.
Improved Horvitz-Thompson Estimation of Model Parameters from Two-phase Stratified Samples: Applications in Epidemiology

PubMed Central

Breslow, Norman E.; Lumley, Thomas; Ballantyne, Christie M; Chambless, Lloyd E.; Kulich, Michal

2009-01-01

The case-cohort study involves two-phase sampling: simple random sampling from an infinite super-population at phase one and stratified random sampling from a finite cohort at phase two. Standard analyses of case-cohort data involve solution of inverse probability weighted (IPW) estimating equations, with weights determined by the known phase two sampling fractions. The variance of parameter estimates in (semi)parametric models, including the Cox model, is the sum of two terms: (i) the model based variance of the usual estimates that would be calculated if full data were available for the entire cohort; and (ii) the design based variance from IPW estimation of the unknown cohort total of the efficient influence function (IF) contributions. This second variance component may be reduced by adjusting the sampling weights, either by calibration to known cohort totals of auxiliary variables correlated with the IF contributions or by their estimation using these same auxiliary variables. Both adjustment methods are implemented in the R survey package. We derive the limit laws of coefficients estimated using adjusted weights. The asymptotic results suggest practical methods for construction of auxiliary variables that are evaluated by simulation of case-cohort samples from the National Wilms Tumor Study and by log-linear modeling of case-cohort data from the Atherosclerosis Risk in Communities Study. Although not semiparametric efficient, estimators based on adjusted weights may come close to achieving full efficiency within the class of augmented IPW estimators. PMID:20174455
Music-evoked incidental happiness modulates probability weighting during risky lottery choices

PubMed Central

Schulreich, Stefan; Heussen, Yana G.; Gerhardt, Holger; Mohr, Peter N. C.; Binkofski, Ferdinand C.; Koelsch, Stefan; Heekeren, Hauke R.

2014-01-01

We often make decisions with uncertain consequences. The outcomes of the choices we make are usually not perfectly predictable but probabilistic, and the probabilities can be known or unknown. Probability judgments, i.e., the assessment of unknown probabilities, can be influenced by evoked emotional states. This suggests that also the weighting of known probabilities in decision making under risk might be influenced by incidental emotions, i.e., emotions unrelated to the judgments and decisions at issue. Probability weighting describes the transformation of probabilities into subjective decision weights for outcomes and is one of the central components of cumulative prospect theory (CPT) that determine risk attitudes. We hypothesized that music-evoked emotions would modulate risk attitudes in the gain domain and in particular probability weighting. Our experiment featured a within-subject design consisting of four conditions in separate sessions. In each condition, the 41 participants listened to a different kind of music—happy, sad, or no music, or sequences of random tones—and performed a repeated pairwise lottery choice task. We found that participants chose the riskier lotteries significantly more often in the “happy” than in the “sad” and “random tones” conditions. Via structural regressions based on CPT, we found that the observed changes in participants' choices can be attributed to changes in the elevation parameter of the probability weighting function: in the “happy” condition, participants showed significantly higher decision weights associated with the larger payoffs than in the “sad” and “random tones” conditions. Moreover, elevation correlated positively with self-reported music-evoked happiness. Thus, our experimental results provide evidence in favor of a causal effect of incidental happiness on risk attitudes that can be explained by changes in probability weighting. PMID:24432007
Music-evoked incidental happiness modulates probability weighting during risky lottery choices.

PubMed

Schulreich, Stefan; Heussen, Yana G; Gerhardt, Holger; Mohr, Peter N C; Binkofski, Ferdinand C; Koelsch, Stefan; Heekeren, Hauke R

2014-01-07

We often make decisions with uncertain consequences. The outcomes of the choices we make are usually not perfectly predictable but probabilistic, and the probabilities can be known or unknown. Probability judgments, i.e., the assessment of unknown probabilities, can be influenced by evoked emotional states. This suggests that also the weighting of known probabilities in decision making under risk might be influenced by incidental emotions, i.e., emotions unrelated to the judgments and decisions at issue. Probability weighting describes the transformation of probabilities into subjective decision weights for outcomes and is one of the central components of cumulative prospect theory (CPT) that determine risk attitudes. We hypothesized that music-evoked emotions would modulate risk attitudes in the gain domain and in particular probability weighting. Our experiment featured a within-subject design consisting of four conditions in separate sessions. In each condition, the 41 participants listened to a different kind of music-happy, sad, or no music, or sequences of random tones-and performed a repeated pairwise lottery choice task. We found that participants chose the riskier lotteries significantly more often in the "happy" than in the "sad" and "random tones" conditions. Via structural regressions based on CPT, we found that the observed changes in participants' choices can be attributed to changes in the elevation parameter of the probability weighting function: in the "happy" condition, participants showed significantly higher decision weights associated with the larger payoffs than in the "sad" and "random tones" conditions. Moreover, elevation correlated positively with self-reported music-evoked happiness. Thus, our experimental results provide evidence in favor of a causal effect of incidental happiness on risk attitudes that can be explained by changes in probability weighting.
Theoretical size distribution of fossil taxa: analysis of a null model

PubMed Central

Reed, William J; Hughes, Barry D

2007-01-01

Background This article deals with the theoretical size distribution (of number of sub-taxa) of a fossil taxon arising from a simple null model of macroevolution. Model New species arise through speciations occurring independently and at random at a fixed probability rate, while extinctions either occur independently and at random (background extinctions) or cataclysmically. In addition new genera are assumed to arise through speciations of a very radical nature, again assumed to occur independently and at random at a fixed probability rate. Conclusion The size distributions of the pioneering genus (following a cataclysm) and of derived genera are determined. Also the distribution of the number of genera is considered along with a comparison of the probability of a monospecific genus with that of a monogeneric family. PMID:17376249
Blackmail propagation on small-world networks

NASA Astrophysics Data System (ADS)

Shao, Zhi-Gang; Jian-Ping Sang; Zou, Xian-Wu; Tan, Zhi-Jie; Jin, Zhun-Zhi

2005-06-01

The dynamics of the blackmail propagation model based on small-world networks is investigated. It is found that for a given transmitting probability λ the dynamical behavior of blackmail propagation transits from linear growth type to logistical growth one with the network randomness p increases. The transition takes place at the critical network randomness pc=1/N, where N is the total number of nodes in the network. For a given network randomness p the dynamical behavior of blackmail propagation transits from exponential decrease type to logistical growth one with the transmitting probability λ increases. The transition occurs at the critical transmitting probability λc=1/, where is the average number of the nearest neighbors. The present work will be useful for understanding computer virus epidemics and other spreading phenomena on communication and social networks.
Valid statistical inference methods for a case-control study with missing data.

PubMed

Tian, Guo-Liang; Zhang, Chi; Jiang, Xuejun

2018-04-01

The main objective of this paper is to derive the valid sampling distribution of the observed counts in a case-control study with missing data under the assumption of missing at random by employing the conditional sampling method and the mechanism augmentation method. The proposed sampling distribution, called the case-control sampling distribution, can be used to calculate the standard errors of the maximum likelihood estimates of parameters via the Fisher information matrix and to generate independent samples for constructing small-sample bootstrap confidence intervals. Theoretical comparisons of the new case-control sampling distribution with two existing sampling distributions exhibit a large difference. Simulations are conducted to investigate the influence of the three different sampling distributions on statistical inferences. One finding is that the conclusion by the Wald test for testing independency under the two existing sampling distributions could be completely different (even contradictory) from the Wald test for testing the equality of the success probabilities in control/case groups under the proposed distribution. A real cervical cancer data set is used to illustrate the proposed statistical methods.
Significance of stress transfer in time-dependent earthquake probability calculations

USGS Publications Warehouse

Parsons, T.

2005-01-01

A sudden change in stress is seen to modify earthquake rates, but should it also revise earthquake probability? Data used to derive input parameters permits an array of forecasts; so how large a static stress change is require to cause a statistically significant earthquake probability change? To answer that question, effects of parameter and philosophical choices are examined through all phases of sample calculations, Drawing at random from distributions of recurrence-aperiodicity pairs identifies many that recreate long paleoseismic and historic earthquake catalogs. Probability density funtions built from the recurrence-aperiodicity pairs give the range of possible earthquake forecasts under a point process renewal model. Consequences of choices made in stress transfer calculations, such as different slip models, fault rake, dip, and friction are, tracked. For interactions among large faults, calculated peak stress changes may be localized, with most of the receiving fault area changed less than the mean. Thus, to avoid overstating probability change on segments, stress change values should be drawn from a distribution reflecting the spatial pattern rather than using the segment mean. Disparity resulting from interaction probability methodology is also examined. For a fault with a well-understood earthquake history, a minimum stress change to stressing rate ratio of 10:1 to 20:1 is required to significantly skew probabilities with >80-85% confidence. That ratio must be closer to 50:1 to exceed 90-95% confidence levels. Thus revision to earthquake probability is achievable when a perturbing event is very close to the fault in question or the tectonic stressing rate is low.
Nonlinear Spatial Inversion Without Monte Carlo Sampling

NASA Astrophysics Data System (ADS)

Curtis, A.; Nawaz, A.

2017-12-01

High-dimensional, nonlinear inverse or inference problems usually have non-unique solutions. The distribution of solutions are described by probability distributions, and these are usually found using Monte Carlo (MC) sampling methods. These take pseudo-random samples of models in parameter space, calculate the probability of each sample given available data and other information, and thus map out high or low probability values of model parameters. However, such methods would converge to the solution only as the number of samples tends to infinity; in practice, MC is found to be slow to converge, convergence is not guaranteed to be achieved in finite time, and detection of convergence requires the use of subjective criteria. We propose a method for Bayesian inversion of categorical variables such as geological facies or rock types in spatial problems, which requires no sampling at all. The method uses a 2-D Hidden Markov Model over a grid of cells, where observations represent localized data constraining the model in each cell. The data in our example application are seismic properties such as P- and S-wave impedances or rock density; our model parameters are the hidden states and represent the geological rock types in each cell. The observations at each location are assumed to depend on the facies at that location only - an assumption referred to as `localized likelihoods'. However, the facies at a location cannot be determined solely by the observation at that location as it also depends on prior information concerning its correlation with the spatial distribution of facies elsewhere. Such prior information is included in the inversion in the form of a training image which represents a conceptual depiction of the distribution of local geologies that might be expected, but other forms of prior information can be used in the method as desired. The method provides direct (pseudo-analytic) estimates of posterior marginal probability distributions over each variable, so these do not need to be estimated from samples as is required in MC methods. On a 2-D test example the method is shown to outperform previous methods significantly, and at a fraction of the computational cost. In many foreseeable applications there are therefore no serious impediments to extending the method to 3-D spatial models.
Optimal auxiliary-covariate-based two-phase sampling design for semiparametric efficient estimation of a mean or mean difference, with application to clinical trials.

PubMed

Gilbert, Peter B; Yu, Xuesong; Rotnitzky, Andrea

2014-03-15

To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semiparametric efficient estimator is applied. This approach is made efficient by specifying the phase two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. We perform simulations to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. We provide proofs and R code. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean 'importance-weighted' breadth (Y) of the T-cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24 % in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y | W] is important for realizing the efficiency gain, which is aided by an ample phase two sample and by using a robust fitting method. Copyright © 2013 John Wiley & Sons, Ltd.
Optimal Auxiliary-Covariate Based Two-Phase Sampling Design for Semiparametric Efficient Estimation of a Mean or Mean Difference, with Application to Clinical Trials

PubMed Central

Gilbert, Peter B.; Yu, Xuesong; Rotnitzky, Andrea

2014-01-01

To address the objective in a clinical trial to estimate the mean or mean difference of an expensive endpoint Y, one approach employs a two-phase sampling design, wherein inexpensive auxiliary variables W predictive of Y are measured in everyone, Y is measured in a random sample, and the semi-parametric efficient estimator is applied. This approach is made efficient by specifying the phase-two selection probabilities as optimal functions of the auxiliary variables and measurement costs. While this approach is familiar to survey samplers, it apparently has seldom been used in clinical trials, and several novel results practicable for clinical trials are developed. Simulations are performed to identify settings where the optimal approach significantly improves efficiency compared to approaches in current practice. Proofs and R code are provided. The optimality results are developed to design an HIV vaccine trial, with objective to compare the mean “importance-weighted” breadth (Y) of the T cell response between randomized vaccine groups. The trial collects an auxiliary response (W) highly predictive of Y, and measures Y in the optimal subset. We show that the optimal design-estimation approach can confer anywhere between absent and large efficiency gain (up to 24% in the examples) compared to the approach with the same efficient estimator but simple random sampling, where greater variability in the cost-standardized conditional variance of Y given W yields greater efficiency gains. Accurate estimation of E[Y∣W] is important for realizing the efficiency gain, which is aided by an ample phase-two sample and by using a robust fitting method. PMID:24123289
Geospatial techniques for developing a sampling frame of watersheds across a region

USGS Publications Warehouse

Gresswell, Robert E.; Bateman, Douglas S.; Lienkaemper, George; Guy, T.J.

2004-01-01

Current land-management decisions that affect the persistence of native salmonids are often influenced by studies of individual sites that are selected based on judgment and convenience. Although this approach is useful for some purposes, extrapolating results to areas that were not sampled is statistically inappropriate because the sampling design is usually biased. Therefore, in recent investigations of coastal cutthroat trout (Oncorhynchus clarki clarki) located above natural barriers to anadromous salmonids, we used a methodology for extending the statistical scope of inference. The purpose of this paper is to apply geospatial tools to identify a population of watersheds and develop a probability-based sampling design for coastal cutthroat trout in western Oregon, USA. The population of mid-size watersheds (500-5800 ha) west of the Cascade Range divide was derived from watershed delineations based on digital elevation models. Because a database with locations of isolated populations of coastal cutthroat trout did not exist, a sampling frame of isolated watersheds containing cutthroat trout had to be developed. After the sampling frame of watersheds was established, isolated watersheds with coastal cutthroat trout were stratified by ecoregion and erosion potential based on dominant bedrock lithology (i.e., sedimentary and igneous). A stratified random sample of 60 watersheds was selected with proportional allocation in each stratum. By comparing watershed drainage areas of streams in the general population to those in the sampling frame and the resulting sample (n = 60), we were able to evaluate the how representative the subset of watersheds was in relation to the population of watersheds. Geospatial tools provided a relatively inexpensive means to generate the information necessary to develop a statistically robust, probability-based sampling design.
Generation of pseudo-random numbers

NASA Technical Reports Server (NTRS)

Howell, L. W.; Rheinfurth, M. H.

1982-01-01

Practical methods for generating acceptable random numbers from a variety of probability distributions which are frequently encountered in engineering applications are described. The speed, accuracy, and guarantee of statistical randomness of the various methods are discussed.
Statistical uncertainty analysis applied to the DRAGONv4 code lattice calculations and based on JENDL-4 covariance data

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hernandez-Solis, A.; Demaziere, C.; Ekberg, C.

2012-07-01

In this paper, multi-group microscopic cross-section uncertainty is propagated through the DRAGON (Version 4) lattice code, in order to perform uncertainty analysis on k{infinity} and 2-group homogenized macroscopic cross-sections predictions. A statistical methodology is employed for such purposes, where cross-sections of certain isotopes of various elements belonging to the 172 groups DRAGLIB library format, are considered as normal random variables. This library is based on JENDL-4 data, because JENDL-4 contains the largest amount of isotopic covariance matrixes among the different major nuclear data libraries. The aim is to propagate multi-group nuclide uncertainty by running the DRAGONv4 code 500 times, andmore » to assess the output uncertainty of a test case corresponding to a 17 x 17 PWR fuel assembly segment without poison. The chosen sampling strategy for the current study is Latin Hypercube Sampling (LHS). The quasi-random LHS allows a much better coverage of the input uncertainties than simple random sampling (SRS) because it densely stratifies across the range of each input probability distribution. Output uncertainty assessment is based on the tolerance limits concept, where the sample formed by the code calculations infers to cover 95% of the output population with at least a 95% of confidence. This analysis is the first attempt to propagate parameter uncertainties of modern multi-group libraries, which are used to feed advanced lattice codes that perform state of the art resonant self-shielding calculations such as DRAGONv4. (authors)« less
What's Missing in Teaching Probability and Statistics: Building Cognitive Schema for Understanding Random Phenomena

ERIC Educational Resources Information Center

Kuzmak, Sylvia

2016-01-01

Teaching probability and statistics is more than teaching the mathematics itself. Historically, the mathematics of probability and statistics was first developed through analyzing games of chance such as the rolling of dice. This article makes the case that the understanding of probability and statistics is dependent upon building a…
A unified approach for squeal instability analysis of disc brakes with two types of random-fuzzy uncertainties

NASA Astrophysics Data System (ADS)

Lü, Hui; Shangguan, Wen-Bin; Yu, Dejie

2017-09-01

Automotive brake systems are always subjected to various types of uncertainties and two types of random-fuzzy uncertainties may exist in the brakes. In this paper, a unified approach is proposed for squeal instability analysis of disc brakes with two types of random-fuzzy uncertainties. In the proposed approach, two uncertainty analysis models with mixed variables are introduced to model the random-fuzzy uncertainties. The first one is the random and fuzzy model, in which random variables and fuzzy variables exist simultaneously and independently. The second one is the fuzzy random model, in which uncertain parameters are all treated as random variables while their distribution parameters are expressed as fuzzy numbers. Firstly, the fuzziness is discretized by using α-cut technique and the two uncertainty analysis models are simplified into random-interval models. Afterwards, by temporarily neglecting interval uncertainties, the random-interval models are degraded into random models, in which the expectations, variances, reliability indexes and reliability probabilities of system stability functions are calculated. And then, by reconsidering the interval uncertainties, the bounds of the expectations, variances, reliability indexes and reliability probabilities are computed based on Taylor series expansion. Finally, by recomposing the analysis results at each α-cut level, the fuzzy reliability indexes and probabilities can be obtained, by which the brake squeal instability can be evaluated. The proposed approach gives a general framework to deal with both types of random-fuzzy uncertainties that may exist in the brakes and its effectiveness is demonstrated by numerical examples. It will be a valuable supplement to the systematic study of brake squeal considering uncertainty.
Asymptotic properties of a bold random walk

NASA Astrophysics Data System (ADS)

Serva, Maurizio

2014-08-01

In a recent paper we proposed a non-Markovian random walk model with memory of the maximum distance ever reached from the starting point (home). The behavior of the walker is different from the simple symmetric random walk only when she is at this maximum distance, where, having the choice to move either farther or closer, she decides with different probabilities. If the probability of a forward step is higher than the probability of a backward step, the walker is bold and her behavior turns out to be superdiffusive; otherwise she is timorous and her behavior turns out to be subdiffusive. The scaling behavior varies continuously from subdiffusive (timorous) to superdiffusive (bold) according to a single parameter γ ∈R. We investigate here the asymptotic properties of the bold case in the nonballistic region γ ∈[0,1/2], a problem which was left partially unsolved previously. The exact results proved in this paper require new probabilistic tools which rely on the construction of appropriate martingales of the random walk and its hitting times.
Pólya number and first return of bursty random walk: Rigorous solutions

NASA Astrophysics Data System (ADS)

Wan, J.; Xu, X. P.

2012-03-01

The recurrence properties of random walks can be characterized by Pólya number, i.e., the probability that the walker has returned to the origin at least once. In this paper, we investigate Pólya number and first return for bursty random walk on a line, in which the walk has different step size and moving probabilities. Using the concept of the Catalan number, we obtain exact results for first return probability, the average first return time and Pólya number for the first time. We show that Pólya number displays two different functional behavior when the walk deviates from the recurrent point. By utilizing the Lagrange inversion formula, we interpret our findings by transferring Pólya number to the closed-form solutions of an inverse function. We also calculate Pólya number using another approach, which corroborates our results and conclusions. Finally, we consider the recurrence properties and Pólya number of two variations of the bursty random walk model.

Pacific island landbird monitoring annual report, National Park of American Samoa, Ta‘u and Tutuila units, 2011

USGS Publications Warehouse

Judge, Seth W.; Camp, Richard J.; Vaivai, Visa; Hart, Patrick J.

2013-01-01

NPSA canopy and understory composition was predominantly native, and trees formed a dense closed canopy at nearly 90% of the stations sampled. More than half of the tree heights in both units were taller than 5 m and the majority of slopes were steeper than 20 degrees. There were no clear dominant tree species in the mixed native forests. The most common tree species documented included Syzygium spp., Dysoxylum spp., Ficus spp., Hibiscus tiliaceus and Rhus taitensis (among others). There were significant differences in the distribution of bird densities between legacy and random transects. Determining differences in detection probabilities cannot be definitively assessed from a single survey. We recommend both panels be sampled in the future until bias in density and abundance can be evaluated, or if sampling may be reduced.
Application of Monte Carlo cross-validation to identify pathway cross-talk in neonatal sepsis.

PubMed

Zhang, Yuxia; Liu, Cui; Wang, Jingna; Li, Xingxia

2018-03-01

To explore genetic pathway cross-talk in neonates with sepsis, an integrated approach was used in this paper. To explore the potential relationships between differently expressed genes between normal uninfected neonates and neonates with sepsis and pathways, genetic profiling and biologic signaling pathway were first integrated. For different pathways, the score was obtained based upon the genetic expression by quantitatively analyzing the pathway cross-talk. The paired pathways with high cross-talk were identified by random forest classification. The purpose of the work was to find the best pairs of pathways able to discriminate sepsis samples versus normal samples. The results found 10 pairs of pathways, which were probably able to discriminate neonates with sepsis versus normal uninfected neonates. Among them, the best two paired pathways were identified according to analysis of extensive literature. Impact statement To find the best pairs of pathways able to discriminate sepsis samples versus normal samples, an RF classifier, the DS obtained by DEGs of paired pathways significantly associated, and Monte Carlo cross-validation were applied in this paper. Ten pairs of pathways were probably able to discriminate neonates with sepsis versus normal uninfected neonates. Among them, the best two paired pathways ((7) IL-6 Signaling and Phospholipase C Signaling (PLC); (8) Glucocorticoid Receptor (GR) Signaling and Dendritic Cell Maturation) were identified according to analysis of extensive literature.
Statistical power in parallel group point exposure studies with time-to-event outcomes: an empirical comparison of the performance of randomized controlled trials and the inverse probability of treatment weighting (IPTW) approach.

PubMed

Austin, Peter C; Schuster, Tibor; Platt, Robert W

2015-10-15

Estimating statistical power is an important component of the design of both randomized controlled trials (RCTs) and observational studies. Methods for estimating statistical power in RCTs have been well described and can be implemented simply. In observational studies, statistical methods must be used to remove the effects of confounding that can occur due to non-random treatment assignment. Inverse probability of treatment weighting (IPTW) using the propensity score is an attractive method for estimating the effects of treatment using observational data. However, sample size and power calculations have not been adequately described for these methods. We used an extensive series of Monte Carlo simulations to compare the statistical power of an IPTW analysis of an observational study with time-to-event outcomes with that of an analysis of a similarly-structured RCT. We examined the impact of four factors on the statistical power function: number of observed events, prevalence of treatment, the marginal hazard ratio, and the strength of the treatment-selection process. We found that, on average, an IPTW analysis had lower statistical power compared to an analysis of a similarly-structured RCT. The difference in statistical power increased as the magnitude of the treatment-selection model increased. The statistical power of an IPTW analysis tended to be lower than the statistical power of a similarly-structured RCT.
Does Breast Cancer Drive the Building of Survival Probability Models among States? An Assessment of Goodness of Fit for Patient Data from SEER Registries

PubMed

Khan, Hafiz; Saxena, Anshul; Perisetti, Abhilash; Rafiq, Aamrin; Gabbidon, Kemesha; Mende, Sarah; Lyuksyutova, Maria; Quesada, Kandi; Blakely, Summre; Torres, Tiffany; Afesse, Mahlet

2016-12-01

Background: Breast cancer is a worldwide public health concern and is the most prevalent type of cancer in women in the United States. This study concerned the best fit of statistical probability models on the basis of survival times for nine state cancer registries: California, Connecticut, Georgia, Hawaii, Iowa, Michigan, New Mexico, Utah, and Washington. Materials and Methods: A probability random sampling method was applied to select and extract records of 2,000 breast cancer patients from the Surveillance Epidemiology and End Results (SEER) database for each of the nine state cancer registries used in this study. EasyFit software was utilized to identify the best probability models by using goodness of fit tests, and to estimate parameters for various statistical probability distributions that fit survival data. Results: Statistical analysis for the summary of statistics is reported for each of the states for the years 1973 to 2012. Kolmogorov-Smirnov, Anderson-Darling, and Chi-squared goodness of fit test values were used for survival data, the highest values of goodness of fit statistics being considered indicative of the best fit survival model for each state. Conclusions: It was found that California, Connecticut, Georgia, Iowa, New Mexico, and Washington followed the Burr probability distribution, while the Dagum probability distribution gave the best fit for Michigan and Utah, and Hawaii followed the Gamma probability distribution. These findings highlight differences between states through selected sociodemographic variables and also demonstrate probability modeling differences in breast cancer survival times. The results of this study can be used to guide healthcare providers and researchers for further investigations into social and environmental factors in order to reduce the occurrence of and mortality due to breast cancer. Creative Commons Attribution License
Small violations of Bell inequalities for multipartite pure random states

NASA Astrophysics Data System (ADS)

Drumond, Raphael C.; Duarte, Cristhiano; Oliveira, Roberto I.

2018-05-01

For any finite number of parts, measurements, and outcomes in a Bell scenario, we estimate the probability of random N-qudit pure states to substantially violate any Bell inequality with uniformly bounded coefficients. We prove that under some conditions on the local dimension, the probability to find any significant amount of violation goes to zero exponentially fast as the number of parts goes to infinity. In addition, we also prove that if the number of parts is at least 3, this probability also goes to zero as the local Hilbert space dimension goes to infinity.
Predicting the probability of mortality of gastric cancer patients using decision tree.

PubMed

Mohammadzadeh, F; Noorkojuri, H; Pourhoseingholi, M A; Saadat, S; Baghestani, A R

2015-06-01

Gastric cancer is the fourth most common cancer worldwide. This reason motivated us to investigate and introduce gastric cancer risk factors utilizing statistical methods. The aim of this study was to identify the most important factors influencing the mortality of patients who suffer from gastric cancer disease and to introduce a classification approach according to decision tree model for predicting the probability of mortality from this disease. Data on 216 patients with gastric cancer, who were registered in Taleghani hospital in Tehran,Iran, were analyzed. At first, patients were divided into two groups: the dead and alive. Then, to fit decision tree model to our data, we randomly selected 20% of dataset to the test sample and remaining dataset considered as the training sample. Finally, the validity of the model examined with sensitivity, specificity, diagnosis accuracy and the area under the receiver operating characteristic curve. The CART version 6.0 and SPSS version 19.0 softwares were used for the analysis of the data. Diabetes, ethnicity, tobacco, tumor size, surgery, pathologic stage, age at diagnosis, exposure to chemical weapons and alcohol consumption were determined as effective factors on mortality of gastric cancer. The sensitivity, specificity and accuracy of decision tree were 0.72, 0.75 and 0.74 respectively. The indices of sensitivity, specificity and accuracy represented that the decision tree model has acceptable accuracy to prediction the probability of mortality in gastric cancer patients. So a simple decision tree consisted of factors affecting on mortality of gastric cancer may help clinicians as a reliable and practical tool to predict the probability of mortality in these patients.
Evaluating impacts using a BACI design, ratios, and a Bayesian approach with a focus on restoration.

PubMed

Conner, Mary M; Saunders, W Carl; Bouwes, Nicolaas; Jordan, Chris

2015-10-01

Before-after-control-impact (BACI) designs are an effective method to evaluate natural and human-induced perturbations on ecological variables when treatment sites cannot be randomly chosen. While effect sizes of interest can be tested with frequentist methods, using Bayesian Markov chain Monte Carlo (MCMC) sampling methods, probabilities of effect sizes, such as a ≥20 % increase in density after restoration, can be directly estimated. Although BACI and Bayesian methods are used widely for assessing natural and human-induced impacts for field experiments, the application of hierarchal Bayesian modeling with MCMC sampling to BACI designs is less common. Here, we combine these approaches and extend the typical presentation of results with an easy to interpret ratio, which provides an answer to the main study question-"How much impact did a management action or natural perturbation have?" As an example of this approach, we evaluate the impact of a restoration project, which implemented beaver dam analogs, on survival and density of juvenile steelhead. Results indicated the probabilities of a ≥30 % increase were high for survival and density after the dams were installed, 0.88 and 0.99, respectively, while probabilities for a higher increase of ≥50 % were variable, 0.17 and 0.82, respectively. This approach demonstrates a useful extension of Bayesian methods that can easily be generalized to other study designs from simple (e.g., single factor ANOVA, paired t test) to more complicated block designs (e.g., crossover, split-plot). This approach is valuable for estimating the probabilities of restoration impacts or other management actions.
Importance Sampling of Word Patterns in DNA and Protein Sequences

PubMed Central

Chan, Hock Peng; Chen, Louis H.Y.

2010-01-01

Abstract Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs. PMID:21128856
Using occupancy models of forest breeding birds to prioritize conservation planning

USGS Publications Warehouse

De Wan, A. A.; Sullivan, P.J.; Lembo, A.J.; Smith, C.R.; Maerz, J.C.; Lassoie, J.P.; Richmond, M.E.

2009-01-01

As urban development continues to encroach on the natural and rural landscape, land-use planners struggle to identify high priority conservation areas for protection. Although knowing where urban-sensitive species may be occurring on the landscape would facilitate conservation planning, research efforts are often not sufficiently designed to make quality predictions at unknown locations. Recent advances in occupancy modeling allow for more precise estimates of occupancy by accounting for differences in detectability. We applied these techniques to produce robust estimates of habitat occupancy for a subset of forest breeding birds, a group that has been shown to be sensitive to urbanization, in a rapidly urbanizing yet biological diverse region of New York State. We found that detection probability ranged widely across species, from 0.05 to 0.8. Our models suggest that detection probability declined with increasing forest fragmentation. We also found that the probability of occupancy of forest breeding birds is negatively influenced by increasing perimeter-area ratio of forest fragments and urbanization in the surrounding habitat matrix. We capitalized on our random sampling design to produce spatially explicit models that predict high priority conservation areas across the entire region, where interior-species were most likely to occur. Finally, we use our predictive maps to demonstrate how a strict sampling design coupled with occupancy modeling can be a valuable tool for prioritizing biodiversity conservation in land-use planning. ?? 2009 Elsevier Ltd.
Statistical tests for whether a given set of independent, identically distributed draws comes from a specified probability density.

PubMed

Tygert, Mark

2010-09-21

We discuss several tests for determining whether a given set of independent and identically distributed (i.i.d.) draws does not come from a specified probability density function. The most commonly used are Kolmogorov-Smirnov tests, particularly Kuiper's variant, which focus on discrepancies between the cumulative distribution function for the specified probability density and the empirical cumulative distribution function for the given set of i.i.d. draws. Unfortunately, variations in the probability density function often get smoothed over in the cumulative distribution function, making it difficult to detect discrepancies in regions where the probability density is small in comparison with its values in surrounding regions. We discuss tests without this deficiency, complementing the classical methods. The tests of the present paper are based on the plain fact that it is unlikely to draw a random number whose probability is small, provided that the draw is taken from the same distribution used in calculating the probability (thus, if we draw a random number whose probability is small, then we can be confident that we did not draw the number from the same distribution used in calculating the probability).
47 CFR 1.1623 - Probability calculation.

Code of Federal Regulations, 2010 CFR

2010-10-01

... 47 Telecommunication 1 2010-10-01 2010-10-01 false Probability calculation. 1.1623 Section 1.1623 Telecommunication FEDERAL COMMUNICATIONS COMMISSION GENERAL PRACTICE AND PROCEDURE Random Selection Procedures for Mass Media Services General Procedures § 1.1623 Probability calculation. (a) All calculations shall be...
The estimation of tree posterior probabilities using conditional clade probability distributions.

PubMed

Larget, Bret

2013-07-01

In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample.
Asteroid orbital inversion using uniform phase-space sampling

NASA Astrophysics Data System (ADS)

Muinonen, K.; Pentikäinen, H.; Granvik, M.; Oszkiewicz, D.; Virtanen, J.

2014-07-01

We review statistical inverse methods for asteroid orbit computation from a small number of astrometric observations and short time intervals of observations. With the help of Markov-chain Monte Carlo methods (MCMC), we present a novel inverse method that utilizes uniform sampling of the phase space for the orbital elements. The statistical orbital ranging method (Virtanen et al. 2001, Muinonen et al. 2001) was set out to resolve the long-lasting challenges in the initial computation of orbits for asteroids. The ranging method starts from the selection of a pair of astrometric observations. Thereafter, the topocentric ranges and angular deviations in R.A. and Decl. are randomly sampled. The two Cartesian positions allow for the computation of orbital elements and, subsequently, the computation of ephemerides for the observation dates. Candidate orbital elements are included in the sample of accepted elements if the χ^2-value between the observed and computed observations is within a pre-defined threshold. The sample orbital elements obtain weights based on a certain debiasing procedure. When the weights are available, the full sample of orbital elements allows the probabilistic assessments for, e.g., object classification and ephemeris computation as well as the computation of collision probabilities. The MCMC ranging method (Oszkiewicz et al. 2009; see also Granvik et al. 2009) replaces the original sampling algorithm described above with a proposal probability density function (p.d.f.), and a chain of sample orbital elements results in the phase space. MCMC ranging is based on a bivariate Gaussian p.d.f. for the topocentric ranges, and allows for the sampling to focus on the phase-space domain with most of the probability mass. In the virtual-observation MCMC method (Muinonen et al. 2012), the proposal p.d.f. for the orbital elements is chosen to mimic the a posteriori p.d.f. for the elements: first, random errors are simulated for each observation, resulting in a set of virtual observations; second, corresponding virtual least-squares orbital elements are derived using the Nelder-Mead downhill simplex method; third, repeating the procedure two times allows for a computation of a difference for two sets of virtual orbital elements; and, fourth, this orbital-element difference constitutes a symmetric proposal in a random-walk Metropolis-Hastings algorithm, avoiding the explicit computation of the proposal p.d.f. In a discrete approximation, the allowed proposals coincide with the differences that are based on a large number of pre-computed sets of virtual least-squares orbital elements. The virtual-observation MCMC method is thus based on the characterization of the relevant volume in the orbital-element phase space. Here we utilize MCMC to map the phase-space domain of acceptable solutions. We can make use of the proposal p.d.f.s from the MCMC ranging and virtual-observation methods. The present phase-space mapping produces, upon convergence, a uniform sampling of the solution space within a pre-defined χ^2-value. The weights of the sampled orbital elements are then computed on the basis of the corresponding χ^2-values. The present method resembles the original ranging method. On one hand, MCMC mapping is insensitive to local extrema in the phase space and efficiently maps the solution space. This is somewhat contrary to the MCMC methods described above. On the other hand, MCMC mapping can suffer from producing a small number of sample elements with small χ^2-values, in resemblance to the original ranging method. We apply the methods to example near-Earth, main-belt, and transneptunian objects, and highlight the utilization of the methods in the data processing and analysis pipeline of the ESA Gaia space mission.
The performance of sample selection estimators to control for attrition bias.

PubMed

Grasdal, A

2001-07-01

Sample attrition is a potential source of selection bias in experimental, as well as non-experimental programme evaluation. For labour market outcomes, such as employment status and earnings, missing data problems caused by attrition can be circumvented by the collection of follow-up data from administrative registers. For most non-labour market outcomes, however, investigators must rely on participants' willingness to co-operate in keeping detailed follow-up records and statistical correction procedures to identify and adjust for attrition bias. This paper combines survey and register data from a Norwegian randomized field trial to evaluate the performance of parametric and semi-parametric sample selection estimators commonly used to correct for attrition bias. The considered estimators work well in terms of producing point estimates of treatment effects close to the experimental benchmark estimates. Results are sensitive to exclusion restrictions. The analysis also demonstrates an inherent paradox in the 'common support' approach, which prescribes exclusion from the analysis of observations outside of common support for the selection probability. The more important treatment status is as a determinant of attrition, the larger is the proportion of treated with support for the selection probability outside the range, for which comparison with untreated counterparts is possible. Copyright 2001 John Wiley & Sons, Ltd.
Rapid detection of coliforms in drinking water of Arak city using multiplex PCR method in comparison with the standard method of culture (Most Probably Number)

PubMed Central

Fatemeh, Dehghan; Reza, Zolfaghari Mohammad; Mohammad, Arjomandzadegan; Salomeh, Kalantari; Reza, Ahmari Gholam; Hossein, Sarmadian; Maryam, Sadrnia; Azam, Ahmadi; Mana, Shojapoor; Negin, Najarian; Reza, Kasravi Alii; Saeed, Falahat

2014-01-01

Objective To analyse molecular detection of coliforms and shorten the time of PCR. Methods Rapid detection of coliforms by amplification of lacZ and uidA genes in a multiplex PCR reaction was designed and performed in comparison with most probably number (MPN) method for 16 artificial and 101 field samples. The molecular method was also conducted on isolated coliforms from positive MPN samples; standard sample for verification of microbial method certificated reference material; isolated strains from certificated reference material and standard bacteria. The PCR and electrophoresis parameters were changed for reducing the operation time. Results Results of PCR for lacZ and uidA genes were similar in all of standard, operational and artificial samples and showed the 876 bp and 147 bp bands of lacZ and uidA genes by multiplex PCR. PCR results were confirmed by MPN culture method by sensitivity 86% (95% CI: 0.71-0.93). Also the total execution time, with a successful change of factors, was reduced to less than two and a half hour. Conclusions Multiplex PCR method with shortened operation time was used for the simultaneous detection of total coliforms and Escherichia coli in distribution system of Arak city. It's recommended to be used at least as an initial screening test, and then the positive samples could be randomly tested by MPN. PMID:25182727
Estuarine water quality in parks of the Northeast Coastal and Barrier Network: Development and early implementation of vital signs estuarine nutrient-enrichment monitoring, 2003-06

USGS Publications Warehouse

Kopp, Blaine S.; Nielsen, Martha; Glisic, Dejan; Neckles, Hilary A.

2009-01-01

This report documents results of pilot tests of a protocol for monitoring estuarine nutrient enrichment for the Vital Signs Monitoring Program of the National Park Service Northeast Coastal and Barrier Network. Data collected from four parks during protocol development in 2003-06 are presented: Gateway National Recreation Area, Colonial National Historic Park, Fire Island National Seashore, and Assateague Island National Seashore. The monitoring approach incorporates several spatial and temporal designs to address questions at a hierarchy of scales. Indicators of estuarine response to nutrient enrichment were sampled using a probability design within park estuaries during a late-summer index period. Monitoring variables consisted of dissolved-oxygen concentration, chlorophyll a concentration, water temperature, salinity, attenuation of downwelling photosynthetically available radiation (PAR), and turbidity. The statistical sampling design allowed the condition of unsampled locations to be inferred from the distribution of data from a set of randomly positioned "probability" stations. A subset of sampling stations was sampled repeatedly during the index period, and stations were not rerandomized in subsequent years. These "trend stations" allowed us to examine temporal variability within the index period, and to improve the sensitivity of the monitoring protocol to detecting change through time. Additionally, one index site in each park was equipped for continuous monitoring throughout the index period. Thus, the protocol includes elements of probabilistic and targeted spatial sampling, and the temporal intensity ranges from snapshot assessments to continuous monitoring.
Knot probabilities in random diagrams

NASA Astrophysics Data System (ADS)

Cantarella, Jason; Chapman, Harrison; Mastin, Matt

2016-10-01

We consider a natural model of random knotting—choose a knot diagram at random from the finite set of diagrams with n crossings. We tabulate diagrams with 10 and fewer crossings and classify the diagrams by knot type, allowing us to compute exact probabilities for knots in this model. As expected, most diagrams with 10 and fewer crossings are unknots (about 78% of the roughly 1.6 billion 10 crossing diagrams). For these crossing numbers, the unknot fraction is mostly explained by the prevalence of ‘tree-like’ diagrams which are unknots for any assignment of over/under information at crossings. The data shows a roughly linear relationship between the log of knot type probability and the log of the frequency rank of the knot type, analogous to Zipf’s law for word frequency. The complete tabulation and all knot frequencies are included as supplementary data.
The Estimation of Tree Posterior Probabilities Using Conditional Clade Probability Distributions

PubMed Central

Larget, Bret

2013-01-01

In this article I introduce the idea of conditional independence of separated subtrees as a principle by which to estimate the posterior probability of trees using conditional clade probability distributions rather than simple sample relative frequencies. I describe an algorithm for these calculations and software which implements these ideas. I show that these alternative calculations are very similar to simple sample relative frequencies for high probability trees but are substantially more accurate for relatively low probability trees. The method allows the posterior probability of unsampled trees to be calculated when these trees contain only clades that are in other sampled trees. Furthermore, the method can be used to estimate the total probability of the set of sampled trees which provides a measure of the thoroughness of a posterior sample. [Bayesian phylogenetics; conditional clade distributions; improved accuracy; posterior probabilities of trees.] PMID:23479066
Effect of Therapeutic Hypothermia Initiated After 6 Hours of Age on Death or Disability Among Newborns With Hypoxic-Ischemic Encephalopathy: A Randomized Clinical Trial.

PubMed

Laptook, Abbot R; Shankaran, Seetha; Tyson, Jon E; Munoz, Breda; Bell, Edward F; Goldberg, Ronald N; Parikh, Nehal A; Ambalavanan, Namasivayam; Pedroza, Claudia; Pappas, Athina; Das, Abhik; Chaudhary, Aasma S; Ehrenkranz, Richard A; Hensman, Angelita M; Van Meurs, Krisa P; Chalak, Lina F; Khan, Amir M; Hamrick, Shannon E G; Sokol, Gregory M; Walsh, Michele C; Poindexter, Brenda B; Faix, Roger G; Watterberg, Kristi L; Frantz, Ivan D; Guillet, Ronnie; Devaskar, Uday; Truog, William E; Chock, Valerie Y; Wyckoff, Myra H; McGowan, Elisabeth C; Carlton, David P; Harmon, Heidi M; Brumbaugh, Jane E; Cotten, C Michael; Sánchez, Pablo J; Hibbs, Anna Maria; Higgins, Rosemary D

2017-10-24

Hypothermia initiated at less than 6 hours after birth reduces death or disability for infants with hypoxic-ischemic encephalopathy at 36 weeks' or later gestation. To our knowledge, hypothermia trials have not been performed in infants presenting after 6 hours. To estimate the probability that hypothermia initiated at 6 to 24 hours after birth reduces the risk of death or disability at 18 months among infants with hypoxic-ischemic encephalopathy. A randomized clinical trial was conducted between April 2008 and June 2016 among infants at 36 weeks' or later gestation with moderate or severe hypoxic-ischemic encephalopathy enrolled at 6 to 24 hours after birth. Twenty-one US Neonatal Research Network centers participated. Bayesian analyses were prespecified given the anticipated limited sample size. Targeted esophageal temperature was used in 168 infants. Eighty-three hypothermic infants were maintained at 33.5°C (acceptable range, 33°C-34°C) for 96 hours and then rewarmed. Eighty-five noncooled infants were maintained at 37.0°C (acceptable range, 36.5°C-37.3°C). The composite of death or disability (moderate or severe) at 18 to 22 months adjusted for level of encephalopathy and age at randomization. Hypothermic and noncooled infants were term (mean [SD], 39 [2] and 39 [1] weeks' gestation, respectively), and 47 of 83 (57%) and 55 of 85 (65%) were male, respectively. Both groups were acidemic at birth, predominantly transferred to the treating center with moderate encephalopathy, and were randomized at a mean (SD) of 16 (5) and 15 (5) hours for hypothermic and noncooled groups, respectively. The primary outcome occurred in 19 of 78 hypothermic infants (24.4%) and 22 of 79 noncooled infants (27.9%) (absolute difference, 3.5%; 95% CI, -1% to 17%). Bayesian analysis using a neutral prior indicated a 76% posterior probability of reduced death or disability with hypothermia relative to the noncooled group (adjusted posterior risk ratio, 0.86; 95% credible interval, 0.58-1.29). The probability that death or disability in cooled infants was at least 1%, 2%, or 3% less than noncooled infants was 71%, 64%, and 56%, respectively. Among term infants with hypoxic-ischemic encephalopathy, hypothermia initiated at 6 to 24 hours after birth compared with noncooling resulted in a 76% probability of any reduction in death or disability, and a 64% probability of at least 2% less death or disability at 18 to 22 months. Hypothermia initiated at 6 to 24 hours after birth may have benefit but there is uncertainty in its effectiveness. clinicaltrials.gov Identifier: NCT00614744.
Data-Driven Lead-Acid Battery Prognostics Using Random Survival Forests

DTIC Science & Technology

2014-10-02

Kogalur, Blackstone , & Lauer, 2008; Ishwaran & Kogalur, 2010). Random survival forest is a sur- vival analysis extension of Random Forests (Breiman, 2001...Statistics & probability letters, 80(13), 1056–1064. Ishwaran, H., Kogalur, U. B., Blackstone , E. H., & Lauer, M. S. (2008). Random survival forests. The

Multi-hazard Assessment and Scenario Toolbox (MhAST): A Framework for Analyzing Compounding Effects of Multiple Hazards

NASA Astrophysics Data System (ADS)

Sadegh, M.; Moftakhari, H.; AghaKouchak, A.

2017-12-01

Many natural hazards are driven by multiple forcing variables, and concurrence/consecutive extreme events significantly increases risk of infrastructure/system failure. It is a common practice to use univariate analysis based upon a perceived ruling driver to estimate design quantiles and/or return periods of extreme events. A multivariate analysis, however, permits modeling simultaneous occurrence of multiple forcing variables. In this presentation, we introduce the Multi-hazard Assessment and Scenario Toolbox (MhAST) that comprehensively analyzes marginal and joint probability distributions of natural hazards. MhAST also offers a wide range of scenarios of return period and design levels and their likelihoods. Contribution of this study is four-fold: 1. comprehensive analysis of marginal and joint probability of multiple drivers through 17 continuous distributions and 26 copulas, 2. multiple scenario analysis of concurrent extremes based upon the most likely joint occurrence, one ruling variable, and weighted random sampling of joint occurrences with similar exceedance probabilities, 3. weighted average scenario analysis based on a expected event, and 4. uncertainty analysis of the most likely joint occurrence scenario using a Bayesian framework.
Comet and asteroid hazard to the terrestrial planets

NASA Astrophysics Data System (ADS)

Ipatov, S. I.; Mather, J. C.

2004-01-01

We estimated the rate of comet and asteroid collisions with the terrestrial planets by calculating the orbits of 13,000 Jupiter-crossing objects (JCOs) and 1300 resonant asteroids and computing the probabilities of collisions based on random-phase approximations and the orbital elements sampled with a 500 years step. The Bulirsh-Stoer and a symplectic orbit integrator gave similar results for orbital evolution, but may give different collision probabilities with the Sun. A small fraction of former JCOs reached orbits with aphelia inside Jupiter's orbit and some reached Apollo orbits with semi-major axes less than 2 AU, Aten orbits and inner-Earth orbits (with aphelia less than 0.983 AU) and remained there for millions of years. Though less than 0.1% of the total, these objects were responsible for most of the collision probability of former JCOs with Earth and Venus. We conclude that a significant fraction of near-Earth objects could be extinct comets that came from the trans-Neptunian region or most of such comets disintegrated during their motion in near-Earth object orbits.
An efficient reliability algorithm for locating design point using the combination of importance sampling concepts and response surface method

NASA Astrophysics Data System (ADS)

Shayanfar, Mohsen Ali; Barkhordari, Mohammad Ali; Roudak, Mohammad Amin

2017-06-01

Monte Carlo simulation (MCS) is a useful tool for computation of probability of failure in reliability analysis. However, the large number of required random samples makes it time-consuming. Response surface method (RSM) is another common method in reliability analysis. Although RSM is widely used for its simplicity, it cannot be trusted in highly nonlinear problems due to its linear nature. In this paper, a new efficient algorithm, employing the combination of importance sampling, as a class of MCS, and RSM is proposed. In the proposed algorithm, analysis starts with importance sampling concepts and using a represented two-step updating rule of design point. This part finishes after a small number of samples are generated. Then RSM starts to work using Bucher experimental design, with the last design point and a represented effective length as the center point and radius of Bucher's approach, respectively. Through illustrative numerical examples, simplicity and efficiency of the proposed algorithm and the effectiveness of the represented rules are shown.
More than Just Convenient: The Scientific Merits of Homogeneous Convenience Samples

PubMed Central

Jager, Justin; Putnick, Diane L.; Bornstein, Marc H.

2017-01-01

Despite their disadvantaged generalizability relative to probability samples, non-probability convenience samples are the standard within developmental science, and likely will remain so because probability samples are cost-prohibitive and most available probability samples are ill-suited to examine developmental questions. In lieu of focusing on how to eliminate or sharply reduce reliance on convenience samples within developmental science, here we propose how to augment their advantages when it comes to understanding population effects as well as subpopulation differences. Although all convenience samples have less clear generalizability than probability samples, we argue that homogeneous convenience samples have clearer generalizability relative to conventional convenience samples. Therefore, when researchers are limited to convenience samples, they should consider homogeneous convenience samples as a positive alternative to conventional or heterogeneous) convenience samples. We discuss future directions as well as potential obstacles to expanding the use of homogeneous convenience samples in developmental science. PMID:28475254
Estimation of flock/herd-level true Mycobacterium avium subspecies paratuberculosis prevalence on sheep, beef cattle and deer farms in New Zealand using a novel Bayesian model.

PubMed

Verdugo, Cristobal; Jones, Geoff; Johnson, Wes; Wilson, Peter; Stringer, Lesley; Heuer, Cord

2014-12-01

The study aimed to estimate the national- and island-level flock/herd true prevalence (HTP) of Mycobacterium avium subsp. paratuberculosis (MAP) infection in pastoral farmed sheep, beef cattle and deer in New Zealand. A random sample of 238 single- or multi-species farms was selected from a postal surveyed population of 1940 farms. The sample included 162 sheep flocks, 116 beef cattle and 99 deer herds from seven of 16 geographical regions. Twenty animals from each species present on farm were randomly selected for blood and faecal sampling. Pooled faecal culture testing was conducted using a single pool (sheep flocks) or two pools (beef cattle/deer herds) of 20 and 10 samples per pool, respectively. To increase flock/herd-level sensitivity, sera from all 20 animals from culture negative flocks/herds were individually tested by Pourquier(®) ELISA (sheep and cattle) or Paralisa™ (deer). Results were adjusted for sensitivity and specificity of diagnostic tests using a novel Bayesian latent class model. Outcomes were adjusted by their sampling fractions to obtain HTP estimates at national level. For each species, the posterior probability (POPR) of HTP differences between New Zealand North (NI) and South (SI) Islands was obtained. Across all species, 69% of farms had at least one species test positive. Sheep flocks had the highest HTP estimate (76%, posterior probability interval (PPI) 70-81%), followed by deer (46%, PPI 38-55%) and beef herds (42%, PPI 35-50%). Differences were observed between the two main islands of New Zealand, with higher HTP in sheep and beef cattle flocks/herds in the NI. Sheep flock HTP was 80% in the NI compared with 70% (POPR=0.96) in the SI, while the HTP for beef cattle was 44% in the NI and 38% in the SI (POPR=0.80). Conversely, deer HTP was higher in the SI (54%) than the NI (33%, POPR=0.99). Infection with MAP is endemic at high prevalence in sheep, beef cattle and deer flocks/herds across New Zealand. Copyright © 2014 Elsevier B.V. All rights reserved.
Randomness and diversity matter in the maintenance of the public resources

NASA Astrophysics Data System (ADS)

Liu, Aizhi; Zhang, Yanling; Chen, Xiaojie; Sun, Changyin

2017-03-01

Most previous models about the public goods game usually assume two possible strategies, i.e., investing all or nothing. The real-life situation is rarely all or nothing. In this paper, we consider that multiple strategies are adopted in a well-mixed population, and each strategy represents an investment to produce the public goods. Past efforts have found that randomness matters in the evolution of fairness in the ultimatum game. In the framework involving no other mechanisms, we study how diversity and randomness influence the average investment of the population defined by the mean value of all individuals' strategies. The level of diversity is increased by increasing the strategy number, and the level of randomness is increased by increasing the mutation probability, or decreasing the population size or the selection intensity. We find that a higher level of diversity and a higher level of randomness lead to larger average investment and favor more the evolution of cooperation. Under weak selection, the average investment changes very little with the strategy number, the population size, and the mutation probability. Under strong selection, the average investment changes very little with the strategy number and the population size, but changes a lot with the mutation probability. Under intermediate selection, the average investment increases significantly with the strategy number and the mutation probability, and decreases significantly with the population size. These findings are meaningful to study how to maintain the public resource.
[Determination of wine original regions using information fusion of NIR and MIR spectroscopy].

PubMed

Xiang, Ling-Li; Li, Meng-Hua; Li, Jing-Mingz; Li, Jun-Hui; Zhang, Lu-Da; Zhao, Long-Lian

2014-10-01

Geographical origins of wine grapes are significant factors affecting wine quality and wine prices. Tasters' evaluation is a good method but has some limitations. It is important to discriminate different wine original regions quickly and accurately. The present paper proposed a method to determine wine original regions based on Bayesian information fusion that fused near-infrared (NIR) transmission spectra information and mid-infrared (MIR) ATR spectra information of wines. This method improved the determination results by expanding the sources of analysis information. NIR spectra and MIR spectra of 153 wine samples from four different regions of grape growing were collected by near-infrared and mid-infrared Fourier transform spe trometer separately. These four different regions are Huailai, Yantai, Gansu and Changli, which areall typical geographical originals for Chinese wines. NIR and MIR discriminant models for wine regions were established using partial least squares discriminant analysis (PLS-DA) based on NIR spectra and MIR spectra separately. In PLS-DA, the regions of wine samples are presented in group of binary code. There are four wine regions in this paper, thereby using four nodes standing for categorical variables. The output nodes values for each sample in NIR and MIR models were normalized first. These values stand for the probabilities of each sample belonging to each category. They seemed as the input to the Bayesian discriminant formula as a priori probability value. The probabilities were substituteed into the Bayesian formula to get posterior probabilities, by which we can judge the new class characteristics of these samples. Considering the stability of PLS-DA models, all the wine samples were divided into calibration sets and validation sets randomly for ten times. The results of NIR and MIR discriminant models of four wine regions were as follows: the average accuracy rates of calibration sets were 78.21% (NIR) and 82.57% (MIR), and the average accuracy rates of validation sets were 82.50% (NIR) and 81.98% (MIR). After using the method proposed in this paper, the accuracy rates of calibration and validation changed to 87.11% and 90.87% separately, which all achieved better results of determination than individual spectroscopy. These results suggest that Bayesian information fusion of NIR and MIR spectra is feasible for fast identification of wine original regions.
Influence of level of education on disability free life expectancy by sex: the ILSA study.

PubMed

Minicuci, N; Noale, M

2005-12-01

To assess the effect of education on Disability Free Life Expectancy among older Italians, using a hierarchical model as indicator of disability, with estimates based on the multistate life table method and IMaCh software. Data were obtained from the Italian Longitudinal Study on Aging which considered a random sample of 5632 individuals. Total life expectancy ranged from 16.5 years for men aged 65 years to 6 years for men aged 80. The age range for women was 19.6 and 8.4 years, respectively. For both sexes, increasing age was associated with a lower probability of recovery from a mild state of disability, with a greater probability of worsening for all individuals presenting an independent state at baseline, and with a greater probability of dying except for women from a mild state of disability. A medium/high educational level was associated with a greater probability of recovery only in men with a mild state of disability at baseline, and with a lower probability of worsening in both sexes, except for men with a mild state of disability at baseline. The positive effects of high education are well established in most research work and, being a modifiable factor, strategies focused on increasing level of education and, hence strengthening access to information and use of health services would produce significant benefits.
Phylogenic analysis and forensic genetic characterization of Chinese Uyghur group via autosomal multi STR markers

PubMed Central

Jin, Xiaoye; Wei, Yuanyuan; Chen, Jiangang; Kong, Tingting; Mu, Yuling; Guo, Yuxin; Dong, Qian; Xie, Tong; Meng, Haotian; Zhang, Meng; Li, Jianfei; Li, Xiaopeng; Zhu, Bofeng

2017-01-01

We investigated the allelic frequencies and forensic descriptive parameters of 23 autosomal short tandem repeat loci in a randomly selected sample of 1218 unrelated healthy Uyghur individuals residing in the Xinjiang Uyghur Autonomous Region, northwest China. A total of 281 alleles at these loci were identified and their corresponding allelic frequencies ranged from 0.0004 to 0.5390. The combined match probability and combined probability of exclusion of all loci were 5.192 × 10−29 and 0.9999999996594, respectively. The results of population genetic study manifested that Uyghur had close relationships with those contiguous populations, such as Xibe and Hui groups. In a word, these autosomal short tandem repeat loci were highly informative in Uyghur group and the multiplex PCR system could be used as a valuable tool for forensic caseworks and population genetic analysis. PMID:29088750
Measurement of the Bs0-Bs0 oscillation frequency.

PubMed

Abulencia, A; Acosta, D; Adelman, J; Affolder, T; Akimoto, T; Albrow, M G; Ambrose, D; Amerio, S; Amidei, D; Anastassov, A; Anikeev, K; Annovi, A; Antos, J; Aoki, M; Apollinari, G; Arguin, J-F; Arisawa, T; Artikov, A; Ashmanskas, W; Attal, A; Azfar, F; Azzi-Bacchetta, P; Azzurri, P; Bacchetta, N; Bachacou, H; Badgett, W; Barbaro-Galtieri, A; Barnes, V E; Barnett, B A; Baroiant, S; Bartsch, V; Bauer, G; Bedeschi, F; Behari, S; Belforte, S; Bellettini, G; Bellinger, J; Belloni, A; Ben Haim, E; Benjamin, D; Beretvas, A; Beringer, J; Berry, T; Bhatti, A; Binkley, M; Bisello, D; Blair, R E; Blocker, C; Blumenfeld, B; Bocci, A; Bodek, A; Boisvert, V; Bolla, G; Bolshov, A; Bortoletto, D; Boudreau, J; Boveia, A; Brau, B; Bromberg, C; Brubaker, E; Budagov, J; Budd, H S; Budd, S; Burkett, K; Busetto, G; Bussey, P; Byrum, K L; Cabrera, S; Campanelli, M; Campbell, M; Canelli, F; Canepa, A; Carlsmith, D; Carosi, R; Carron, S; Casal, B; Casarsa, M; Castro, A; Catastini, P; Cauz, D; Cavalli-Sforza, M; Cerri, A; Cerrito, L; Chang, S H; Chapman, J; Chen, Y C; Chertok, M; Chiarelli, G; Chlachidze, G; Chlebana, F; Cho, I; Cho, K; Chokheli, D; Chou, J P; Chu, P H; Chuang, S H; Chung, K; Chung, W H; Chung, Y S; Ciljak, M; Ciobanu, C I; Ciocci, M A; Clark, A; Clark, D; Coca, M; Compostella, G; Convery, M E; Conway, J; Cooper, B; Copic, K; Cordelli, M; Cortiana, G; Crescioli, F; Cruz, A; Cuenca Almenar, C; Cuevas, J; Culbertson, R; Cyr, D; DaRonco, S; D'Auria, S; D'Onofrio, M; Dagenhart, D; de Barbaro, P; De Cecco, S; Deisher, A; De Lentdecker, G; Dell'Orso, M; Delli Paoli, F; Demers, S; Demortier, L; Deng, J; Deninno, M; De Pedis, D; Derwent, P F; Di Giovanni, G P; Di Ruzza, B; Dionisi, C; Dittmann, J R; DiTuro, P; Dörr, C; Donati, S; Donega, M; Dong, P; Donini, J; Dorigo, T; Dube, S; Ebina, K; Efron, J; Ehlers, J; Erbacher, R; Errede, D; Errede, S; Eusebi, R; Fang, H C; Farrington, S; Fedorko, I; Fedorko, W T; Feild, R G; Feindt, M; Fernandez, J P; Field, R; Flanagan, G; Flores-Castillo, L R; Foland, A; Forrester, S; Foster, G W; Franklin, M; Freeman, J C; Frisch, H J; Furic, I; Gallinaro, M; Galyardt, J; Garcia, J E; Garcia Sciveres, M; Garfinkel, A F; Gay, C; Gerberich, H; Gerdes, D; Giagu, S; Giannetti, P; Gibson, A; Gibson, K; Ginsburg, C; Giokaris, N; Giolo, K; Giordani, M; Giromini, P; Giunta, M; Giurgiu, G; Glagolev, V; Glenzinski, D; Gold, M; Goldschmidt, N; Goldstein, J; Gomez, G; Gomez-Ceballos, G; Goncharov, M; González, O; Gorelov, I; Goshaw, A T; Gotra, Y; Goulianos, K; Gresele, A; Griffiths, M; Grinstein, S; Grosso-Pilcher, C; Group, R C; Grundler, U; Guimaraes da Costa, J; Gunay-Unalan, Z; Haber, C; Hahn, S R; Hahn, K; Halkiadakis, E; Hamilton, A; Han, B-Y; Han, J Y; Handler, R; Happacher, F; Hara, K; Hare, M; Harper, S; Harr, R F; Harris, R M; Hatakeyama, K; Hauser, J; Hays, C; Heijboer, A; Heinemann, B; Heinrich, J; Herndon, M; Hidas, D; Hill, C S; Hirschbuehl, D; Hocker, A; Holloway, A; Hou, S; Houlden, M; Hsu, S-C; Huffman, B T; Hughes, R E; Huston, J; Incandela, J; Introzzi, G; Iori, M; Ishizawa, Y; Ivanov, A; Iyutin, B; James, E; Jang, D; Jayatilaka, B; Jeans, D; Jensen, H; Jeon, E J; Jindariani, S; Jones, M; Joo, K K; Jun, S Y; Junk, T R; Kamon, T; Kang, J; Karchin, P E; Kato, Y; Kemp, Y; Kephart, R; Kerzel, U; Khotilovich, V; Kilminster, B; Kim, D H; Kim, H S; Kim, J E; Kim, M J; Kim, S B; Kim, S H; Kim, Y K; Kirsch, L; Klimenko, S; Klute, M; Knuteson, B; Ko, B R; Kobayashi, H; Kondo, K; Kong, D J; Konigsberg, J; Korytov, A; Kotwal, A V; Kovalev, A; Kraan, A; Kraus, J; Kravchenko, I; Kreps, M; Kroll, J; Krumnack, N; Kruse, M; Krutelyov, V; Kuhlmann, S E; Kusakabe, Y; Kwang, S; Laasanen, A T; Lai, S; Lami, S; Lammel, S; Lancaster, M; Lander, R L; Lannon, K; Lath, A; Latino, G; Lazzizzera, I; LeCompte, T; Lee, J; Lee, J; Lee, Y J; Lee, S W; Lefèvre, R; Leonardo, N; Leone, S; Levy, S; Lewis, J D; Lin, C; Lin, C S; Lindgren, M; Lipeles, E; Liss, T M; Lister, A; Litvintsev, D O; Liu, T; Lockyer, N S; Loginov, A; Loreti, M; Loverre, P; Lu, R-S; Lucchesi, D; Lujan, P; Lukens, P; Lungu, G; Lyons, L; Lys, J; Lysak, R; Lytken, E; Mack, P; MacQueen, D; Madrak, R; Maeshima, K; Maki, T; Maksimovic, P; Malde, S; Manca, G; Margaroli, F; Marginean, R; Marino, C; Martin, A; Martin, V; Martínez, M; Maruyama, T; Mastrandrea, P; Matsunaga, H; Mattson, M E; Mazini, R; Mazzanti, P; McFarland, K S; McIntyre, P; McNulty, R; Mehta, A; Menzemer, S; Menzione, A; Merkel, P; Mesropian, C; Messina, A; von der Mey, M; Miao, T; Miladinovic, N; Miles, J; Miller, R; Miller, J S; Mills, C; Milnik, M; Miquel, R; Mitra, A; Mitselmakher, G; Miyamoto, A; Moggi, N; Mohr, B; Moore, R; Morello, M; Movilla Fernandez, P; Mülmenstädt, J; Mukherjee, A; Muller, Th; Mumford, R; Murat, P; Nachtman, J; Naganoma, J; Nahn, S; Nakano, I; Napier, A; Naumov, D; Necula, V; Neu, C; Neubauer, M S; Nielsen, J; Nigmanov, T; Nodulman, L; Norniella, O; Nurse, E; Ogawa, T; Oh, S H; Oh, Y D; Okusawa, T; Oldeman, R; Orava, R; Osterberg, K; Pagliarone, C; Palencia, E; Paoletti, R; Papadimitriou, V; Paramonov, A A; Parks, B; Pashapour, S; Patrick, J; Pauletta, G; Paulini, M; Paus, C; Pellett, D E; Penzo, A; Phillips, T J; Piacentino, G; Piedra, J; Pinera, L; Pitts, K; Plager, C; Pondrom, L; Portell, X; Poukhov, O; Pounder, N; Prakoshyn, F; Pronko, A; Proudfoot, J; Ptohos, F; Punzi, G; Pursley, J; Rademacker, J; Rahaman, A; Rakitin, A; Rappoccio, S; Ratnikov, F; Reisert, B; Rekovic, V; van Remortel, N; Renton, P; Rescigno, M; Richter, S; Rimondi, F; Ristori, L; Robertson, W J; Robson, A; Rodrigo, T; Rogers, E; Rolli, S; Roser, R; Rossi, M; Rossin, R; Rott, C; Ruiz, A; Russ, J; Rusu, V; Saarikko, H; Sabik, S; Safonov, A; Sakumoto, W K; Salamanna, G; Saltó, O; Saltzberg, D; Sanchez, C; Santi, L; Sarkar, S; Sartori, L; Sato, K; Savard, P; Savoy-Navarro, A; Scheidle, T; Schlabach, P; Schmidt, E E; Schmidt, M P; Schmitt, M; Schwarz, T; Scodellaro, L; Scott, A L; Scribano, A; Scuri, F; Sedov, A; Seidel, S; Seiya, Y; Semenov, A; Sexton-Kennedy, L; Sfiligoi, I; Shapiro, M D; Shears, T; Shepard, P F; Sherman, D; Shimojima, M; Shochet, M; Shon, Y; Shreyber, I; Sidoti, A; Sinervo, P; Sisakyan, A; Sjolin, J; Skiba, A; Slaughter, A J; Sliwa, K; Smith, J R; Snider, F D; Snihur, R; Soderberg, M; Soha, A; Somalwar, S; Sorin, V; Spalding, J; Spezziga, M; Spinella, F; Spreitzer, T; Squillacioti, P; Stanitzki, M; Staveris-Polykalas, A; St Denis, R; Stelzer, B; Stelzer-Chilton, O; Stentz, D; Strologas, J; Stuart, D; Suh, J S; Sukhanov, A; Sumorok, K; Sun, H; Suzuki, T; Taffard, A; Takashima, R; Takeuchi, Y; Takikawa, K; Tanaka, M; Tanaka, R; Tanimoto, N; Tecchio, M; Teng, P K; Terashi, K; Tether, S; Thom, J; Thompson, A S; Thomson, E; Tipton, P; Tiwari, V; Tkaczyk, S; Toback, D; Tokar, S; Tollefson, K; Tomura, T; Tonelli, D; Tönnesmann, M; Torre, S; Torretta, D; Tourneur, S; Trischuk, W; Tsuchiya, R; Tsuno, S; Turini, N; Ukegawa, F; Unverhau, T; Uozumi, S; Usynin, D; Vaiciulis, A; Vallecorsa, S; Varganov, A; Vataga, E; Velev, G; Veramendi, G; Veszpremi, V; Vidal, R; Vila, I; Vilar, R; Vine, T; Vollrath, I; Volobouev, I; Volpi, G; Würthwein, F; Wagner, P; Wagner, R G; Wagner, R L; Wagner, W; Wallny, R; Walter, T; Wan, Z; Wang, S M; Warburton, A; Waschke, S; Waters, D; Wester, W C; Whitehouse, B; Whiteson, D; Wicklund, A B; Wicklund, E; Williams, G; Williams, H H; Wilson, P; Winer, B L; Wittich, P; Wolbers, S; Wolfe, C; Wright, T; Wu, X; Wynne, S M; Yagil, A; Yamamoto, K; Yamaoka, J; Yamashita, T; Yang, C; Yang, U K; Yang, Y C; Yao, W M; Yeh, G P; Yoh, J; Yorita, K; Yoshida, T; Yu, G B; Yu, I; Yu, S S; Yun, J C; Zanello, L; Zanetti, A; Zaw, I; Zetti, F; Zhang, X; Zhou, J; Zucchelli, S

2006-08-11

We present the first precise measurement of the Bs0-Bs0 oscillation frequency Deltams. We use 1 fb-1 of data from pp collisions at sqrts=1.96 TeV collected with the CDF II detector at the Fermilab Tevatron. The sample contains signals of 3600 fully reconstructed hadronic Bs decays and 37,000 partially reconstructed semileptonic Bs decays. We measure the probability as a function of proper decay time that the Bs decays with the same, or opposite, flavor as the flavor at production, and we find a signal consistent with Bs0-Bs0 oscillations. The probability that random fluctuations could produce a comparable signal is 0.2%. Under the hypothesis that the signal is due to Bs0-Bs0 oscillations, we measure Deltams=17.31(-0.18)+0.33(stat)+/-0.07(syst) ps-1 and determine |Vtd/Vts|=0.208(-0.002)+0.001(expt)-0.006(+0.008)(theor).
Survival analysis for the missing censoring indicator model using kernel density estimation techniques

PubMed Central

Subramanian, Sundarraman

2008-01-01

This article concerns asymptotic theory for a new estimator of a survival function in the missing censoring indicator model of random censorship. Specifically, the large sample results for an inverse probability-of-non-missingness weighted estimator of the cumulative hazard function, so far not available, are derived, including an almost sure representation with rate for a remainder term, and uniform strong consistency with rate of convergence. The estimator is based on a kernel estimate for the conditional probability of non-missingness of the censoring indicator. Expressions for its bias and variance, in turn leading to an expression for the mean squared error as a function of the bandwidth, are also obtained. The corresponding estimator of the survival function, whose weak convergence is derived, is asymptotically efficient. A numerical study, comparing the performances of the proposed and two other currently existing efficient estimators, is presented. PMID:18953423
Survival analysis for the missing censoring indicator model using kernel density estimation techniques.

PubMed

Subramanian, Sundarraman

2006-01-01

This article concerns asymptotic theory for a new estimator of a survival function in the missing censoring indicator model of random censorship. Specifically, the large sample results for an inverse probability-of-non-missingness weighted estimator of the cumulative hazard function, so far not available, are derived, including an almost sure representation with rate for a remainder term, and uniform strong consistency with rate of convergence. The estimator is based on a kernel estimate for the conditional probability of non-missingness of the censoring indicator. Expressions for its bias and variance, in turn leading to an expression for the mean squared error as a function of the bandwidth, are also obtained. The corresponding estimator of the survival function, whose weak convergence is derived, is asymptotically efficient. A numerical study, comparing the performances of the proposed and two other currently existing efficient estimators, is presented.
A statistical treatment of bioassay pour fractions

NASA Astrophysics Data System (ADS)

Barengoltz, Jack; Hughes, David

A bioassay is a method for estimating the number of bacterial spores on a spacecraft surface for the purpose of demonstrating compliance with planetary protection (PP) requirements (Ref. 1). The details of the process may be seen in the appropriate PP document (e.g., for NASA, Ref. 2). In general, the surface is mechanically sampled with a damp sterile swab or wipe. The completion of the process is colony formation in a growth medium in a plate (Petri dish); the colonies are counted. Consider a set of samples from randomly selected, known areas of one spacecraft surface, for simplicity. One may calculate the mean and standard deviation of the bioburden density, which is the ratio of counts to area sampled. The standard deviation represents an estimate of the variation from place to place of the true bioburden density commingled with the precision of the individual sample counts. The accuracy of individual sample results depends on the equipment used, the collection method, and the culturing method. One aspect that greatly influences the result is the pour fraction, which is the quantity of fluid added to the plates divided by the total fluid used in extracting spores from the sampling equipment. In an analysis of a single sample’s counts due to the pour fraction, one seeks to answer the question: What is the probability that if a certain number of spores are counted with a known pour fraction, that there are an additional number of spores in the part of the rinse not poured. This is given for specific values by the binomial distribution density, where detection (of culturable spores) is success and the probability of success is the pour fraction. A special summation over the binomial distribution, equivalent to adding for all possible values of the true total number of spores, is performed. This distribution when normalized will almost yield the desired quantity. It is the probability that the additional number of spores does not exceed a certain value. Of course, for a desired value of uncertainty, one must invert the calculation. However, this probability of finding exactly the number of spores in the poured part is correct only in the case where all values of the true number of spores greater than or equal to the adjusted count are equally probable. This is not realistic, of course, but the result can only overestimate the uncertainty. So it is useful. In probability speak, one has the conditional probability given any true total number of spores. Therefore one must multiply it by the probability of each possible true count, before the summation. If the counts for a sample set (of which this is one sample) are available, one may use the calculated variance and the normal probability distribution. In this approach, one assumes a normal distribution and neglects the contribution from spatial variation. The former is a common assumption. The latter can only add to the conservatism (over estimate the number of spores at some level of confidence). A more straightforward approach is to assume a Poisson probability distribution for the measured total sample set counts, and use the product of the number of samples and the mean number of counts per sample as the mean of the Poisson distribution. It is necessary to set the total count to 1 in the Poisson distribution when actual total count is zero. Finally, even when the planetary protection requirements for spore burden refer only to the mean values, they require an adjustment for pour fraction and method efficiency (a PP specification based on independent data). The adjusted mean values are a 50/50 proposition (e.g., the probability of the true total counts in the sample set exceeding the estimate is 0.50). However, this is highly unconservative when the total counts are zero. No adjustment to the mean values occurs for either pour fraction or efficiency. The recommended approach is once again to set the total counts to 1, but now applied to the mean values. Then one may apply the corrections to the revised counts. It can be shown by the methods developed in this work that this change is usually conservative enough to increase the level of confidence in the estimate to 0.5. 1. NASA. (2005) Planetary protection provisions for robotic extraterrestrial missions. NPR 8020.12C, April 2005, National Aeronautics and Space Administration, Washington, DC. 2. NASA. (2010) Handbook for the Microbiological Examination of Space Hardware, NASA-HDBK-6022, National Aeronautics and Space Administration, Washington, DC.
Perfluorinated compounds in fish from U.S. urban rivers and the Great Lakes.

PubMed

Stahl, Leanne L; Snyder, Blaine D; Olsen, Anthony R; Kincaid, Thomas M; Wathen, John B; McCarty, Harry B

2014-11-15

Perfluorinated compounds (PFCs) have recently received scientific and regulatory attention due to their broad environmental distribution, persistence, bioaccumulative potential, and toxicity. Studies suggest that fish consumption may be a source of human exposure to perfluorooctane sulfonate (PFOS) or long-chain perfluorocarboxylic acids. Most PFC fish tissue literature focuses on marine fish and waters outside of the United States (U.S.). To broaden assessments in U.S. fish, a characterization of PFCs in freshwater fish was initiated on a national scale using an unequal probability design during the U.S. Environmental Protection Agency's (EPA's) 2008-2009 National Rivers and Streams Assessment (NRSA) and the Great Lakes Human Health Fish Tissue Study component of the 2010 EPA National Coastal Condition Assessment (NCCA/GL). Fish were collected from randomly selected locations--164 urban river sites and 157 nearshore Great Lake sites. The probability design allowed extrapolation to the sampled population of 17,059 km in urban rivers and a nearshore area of 11,091 km(2) in the Great Lakes. Fillets were analyzed for 13 PFCs using high-performance liquid chromatography tandem mass spectrometry. Results showed that PFOS dominated in frequency of occurrence, followed by three other longer-chain PFCs (perfluorodecanoic acid, perfluoroundecanoic acid, and perfluorododecanoic acid). Maximum PFOS concentrations were 127 and 80 ng/g in urban river samples and Great Lakes samples, respectively. The range of NRSA PFOS detections was similar to literature accounts from targeted riverine fish sampling. NCCA/GL PFOS levels were lower than those reported by other Great Lakes researchers, but generally higher than values in targeted inland lake studies. The probability design allowed development of cumulative distribution functions (CDFs) to quantify PFOS concentrations versus the sampled population, and the application of fish consumption advisory guidance to the CDFs resulted in an estimation of the proportion of urban rivers and the Great Lakes that exceed human health protection thresholds. Copyright © 2014. Published by Elsevier B.V.
Randomized, controlled, two-arm, interventional, multicenter study on risk-adapted damage control orthopedic surgery of femur shaft fractures in multiple-trauma patients.

PubMed

Rixen, Dieter; Steinhausen, Eva; Sauerland, Stefan; Lefering, Rolf; Maegele, Marc G; Bouillon, Bertil; Grass, Guido; Neugebauer, Edmund A M

2016-01-25

Long bone fractures, particularly of the femur, are common in multiple-trauma patients, but their optimal management has not yet been determined. Although a trend exists toward the concept of "damage control orthopedics" (DCO), current literature is inconclusive. Thus, a need exists for a more specific controlled clinical study. The primary objective of this study was to clarify whether a risk-adapted procedure for treating femoral fractures, as opposed to an early definitive treatment strategy, leads to an improved outcome (morbidity and mortality). The study was designed as a randomized controlled multicenter study. Multiple-trauma patients with femur shaft fractures and a calculated probability of death of 20 to 60 % were randomized to either temporary fracture fixation with external fixation and defined secondary definitive treatment (DCO) or primary reamed nailing (early total care). The primary objective was to reduce the extent of organ failure as measured by the maximum sepsis-related organ failure assessment (SOFA) score. Thirty-four patients were randomized to two groups of 17 patients each. Both groups were comparable regarding sex, age, injury severity score, Glasgow Coma Scale, prothrombin time, base excess, calculated probability of death, and other physiologic variables. The maximum SOFA score was comparable (nonsignificant) between the groups. Regarding the secondary endpoints, the patients with external fixation required a significantly longer ventilation period (p = 0.049) and stayed on the intensive care significantly longer (p = 0.037), whereas the in-hospital length of stay was balanced for both groups. Unfortunately, the study had to be terminated prior to reaching the anticipated sample size because of unexpected low patient recruitment. Thus, the results of this randomized study reflect the ambivalence in the literature. No advantage of the damage control concept could be detected in the treatment of femur fractures in multiple-trauma patients. The necessity for scientific evaluation of this clinically relevant question remains. Current Controlled Trials ISRCTN10321620 Date assigned: 9 February 2007.
[Respondent-Driven Sampling: a new sampling method to study visible and hidden populations].

PubMed

Mantecón, Alejandro; Juan, Montse; Calafat, Amador; Becoña, Elisardo; Román, Encarna

2008-01-01

The paper introduces a variant of chain-referral sampling: respondent-driven sampling (RDS). This sampling method shows that methods based on network analysis can be combined with the statistical validity of standard probability sampling methods. In this sense, RDS appears to be a mathematical improvement of snowball sampling oriented to the study of hidden populations. However, we try to prove its validity with populations that are not within a sampling frame but can nonetheless be contacted without difficulty. The basics of RDS are explained through our research on young people (aged 14 to 25) who go clubbing, consume alcohol and other drugs, and have sex. Fieldwork was carried out between May and July 2007 in three Spanish regions: Baleares, Galicia and Comunidad Valenciana. The presentation of the study shows the utility of this type of sampling when the population is accessible but there is a difficulty deriving from the lack of a sampling frame. However, the sample obtained is not a random representative one in statistical terms of the target population. It must be acknowledged that the final sample is representative of a 'pseudo-population' that approximates to the target population but is not identical to it.
Randomness in Competitions

NASA Astrophysics Data System (ADS)

Ben-Naim, E.; Hengartner, N. W.; Redner, S.; Vazquez, F.

2013-05-01

We study the effects of randomness on competitions based on an elementary random process in which there is a finite probability that a weaker team upsets a stronger team. We apply this model to sports leagues and sports tournaments, and compare the theoretical results with empirical data. Our model shows that single-elimination tournaments are efficient but unfair: the number of games is proportional to the number of teams N, but the probability that the weakest team wins decays only algebraically with N. In contrast, leagues, where every team plays every other team, are fair but inefficient: the top √{N} of teams remain in contention for the championship, while the probability that the weakest team becomes champion is exponentially small. We also propose a gradual elimination schedule that consists of a preliminary round and a championship round. Initially, teams play a small number of preliminary games, and subsequently, a few teams qualify for the championship round. This algorithm is fair and efficient: the best team wins with a high probability and the number of games scales as N 9/5, whereas traditional leagues require N 3 games to fairly determine a champion.
Complex growing networks with intrinsic vertex fitness

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bedogne, C.; Rodgers, G. J.

2006-10-15

One of the major questions in complex network research is to identify the range of mechanisms by which a complex network can self organize into a scale-free state. In this paper we investigate the interplay between a fitness linking mechanism and both random and preferential attachment. In our models, each vertex is assigned a fitness x, drawn from a probability distribution {rho}(x). In Model A, at each time step a vertex is added and joined to an existing vertex, selected at random, with probability p and an edge is introduced between vertices with fitnesses x and y, with a ratemore » f(x,y), with probability 1-p. Model B differs from Model A in that, with probability p, edges are added with preferential attachment rather than randomly. The analysis of Model A shows that, for every fixed fitness x, the network's degree distribution decays exponentially. In Model B we recover instead a power-law degree distribution whose exponent depends only on p, and we show how this result can be generalized. The properties of a number of particular networks are examined.« less
Properties of behavior under different random ratio and random interval schedules: A parametric study.

PubMed

Dembo, M; De Penfold, J B; Ruiz, R; Casalta, H

1985-03-01

Four pigeons were trained to peck a key under different values of a temporally defined independent variable (T) and different probabilities of reinforcement (p). Parameter T is a fixed repeating time cycle and p the probability of reinforcement for the first response of each cycle T. Two dependent variables were used: mean response rate and mean postreinforcement pause. For all values of p a critical value for the independent variable T was found (T=1 sec) in which marked changes took place in response rate and postreinforcement pauses. Behavior typical of random ratio schedules was obtained at T 1 sec and behavior typical of random interval schedules at T 1 sec. Copyright © 1985. Published by Elsevier B.V.
A Statistical Framework for Microbial Source Attribution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Velsko, S P; Allen, J E; Cunningham, C T

2009-04-28

This report presents a general approach to inferring transmission and source relationships among microbial isolates from their genetic sequences. The outbreak transmission graph (also called the transmission tree or transmission network) is the fundamental structure which determines the statistical distributions relevant to source attribution. The nodes of this graph are infected individuals or aggregated sub-populations of individuals in which transmitted bacteria or viruses undergo clonal expansion, leading to a genetically heterogeneous population. Each edge of the graph represents a transmission event in which one or a small number of bacteria or virions infects another node thus increasing the size ofmore » the transmission network. Recombination and re-assortment events originate in nodes which are common to two distinct networks. In order to calculate the probability that one node was infected by another, given the observed genetic sequences of microbial isolates sampled from them, we require two fundamental probability distributions. The first is the probability of obtaining the observed mutational differences between two isolates given that they are separated by M steps in a transmission network. The second is the probability that two nodes sampled randomly from an outbreak transmission network are separated by M transmission events. We show how these distributions can be obtained from the genetic sequences of isolates obtained by sampling from past outbreaks combined with data from contact tracing studies. Realistic examples are drawn from the SARS outbreak of 2003, the FMDV outbreak in Great Britain in 2001, and HIV transmission cases. The likelihood estimators derived in this report, and the underlying probability distribution functions required to calculate them possess certain compelling general properties in the context of microbial forensics. These include the ability to quantify the significance of a sequence 'match' or 'mismatch' between two isolates; the ability to capture non-intuitive effects of network structure on inferential power, including the 'small world' effect; the insensitivity of inferences to uncertainties in the underlying distributions; and the concept of rescaling, i.e. ability to collapse sub-networks into single nodes and examine transmission inferences on the rescaled network.« less

Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design

PubMed Central

Lu, Tsui-Shan; Longnecker, Matthew P.; Zhou, Haibo

2016-01-01

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data and the general ODS design for a continuous response. While substantial work has been done for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome dependent sampling (Multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the Multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the Multivariate-ODS or the estimator from a simple random sample with the same sample size. The Multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of PCB exposure to hearing loss in children born to the Collaborative Perinatal Study. PMID:27966260
On the Prediction of Ground Motion

NASA Astrophysics Data System (ADS)

Lavallee, D.; Schmedes, J.; Archuleta, R. J.

2012-12-01

Using a slip-weakening dynamic model of rupture, we generated earthquake scenarios that provided the spatio-temporal evolution of the slip on the fault and the radiated field at the free surface. We observed scenarios where the rupture propagates at a supershear speed on some parts of the fault while remaining subshear for other parts of the fault. For some scenarios with nearly identical initial conditions, the rupture speed was always subshear. For both types of scenarios (mixture of supershear and subshear speeds and only subshear), we compute the peak ground accelerations (PGA) regularly distributed over the Earth's surface. We then calculate the probability density functions (PDF) of the PGA. For both types of scenarios, the PDF curves are asymmetrically shaped and asymptotically attenuated according to power law. This behavior of the PDF is similar to that observed for the PDF curves of PGA recorded during earthquakes. The main difference between scenarios with a supershear rupture speed and scenarios with only subshear rupture speed is the range of PGA values. Based on these results, we investigate three issues fundamental for the prediction of ground motion. It is important to recognize that recorded ground motions during an earthquake sample a small fraction of the radiation field. It is not obvious that such sampling will capture the largest ground motion generated during an earthquake, nor that the number of stations is large enough to properly infer the statistical properties associated with the radiation field. To quantify the effect of under (or low) sampling of the radiation field, we design three experiments. For a scenario where the rupture speed is only subshear, we construct multiple sets of observations. Each set is comprised of 100 randomly selected PGA values from all of the PGA's calculated at the Earth's surface. In the first experiment, we evaluate how the distributions of PGA in the sets compare with the distribution of all the PGA. For this experiment, we used different statistical tests (e.g. chi-square). This experiment quantifies the likelihood that a random set of PGA can be used to infer the statistical properties of all the PGA. In the second experiment, we fit the PDF of the PGA of every set with probability laws used in the literature to describe the PDF of recorded PGA: the lognormal law, the generalized maximum extreme value law, and the Levy law. For each set, the probability laws are then used to compute the probability to observe a PGA value that will cause "moderate to heavy" potential damage according to Instrumental Intensity scale developed by USGS. For each probability law, we compare predictions based on the set with the prediction estimated from all the PGA. This experiment quantifies the reliability and uncertainty in predicting an outcome due to under sampling the radiation field. The third experiment consists in using the sets discussed above and repeats the two investigations discussed above but this time comparing with a scenario where the rupture has a supershear speed over part of the fault. The objective here is to assess additional uncertainty in predicting PGA and damage resulting from ruptures that have supershear speeds.
Average fidelity between random quantum states

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zyczkowski, Karol; Centrum Fizyki Teoretycznej, Polska Akademia Nauk, Aleja Lotnikow 32/44, 02-668 Warsaw; Perimeter Institute, Waterloo, Ontario, N2L 2Y5

2005-03-01

We analyze mean fidelity between random density matrices of size N, generated with respect to various probability measures in the space of mixed quantum states: the Hilbert-Schmidt measure, the Bures (statistical) measure, the measure induced by the partial trace, and the natural measure on the space of pure states. In certain cases explicit probability distributions for the fidelity are derived. The results obtained may be used to gauge the quality of quantum-information-processing schemes.
Probability of failure prediction for step-stress fatigue under sine or random stress

NASA Technical Reports Server (NTRS)

Lambert, R. G.

1979-01-01

A previously proposed cumulative fatigue damage law is extended to predict the probability of failure or fatigue life for structural materials with S-N fatigue curves represented as a scatterband of failure points. The proposed law applies to structures subjected to sinusoidal or random stresses and includes the effect of initial crack (i.e., flaw) sizes. The corrected cycle ratio damage function is shown to have physical significance.
The difference between two random mixed quantum states: exact and asymptotic spectral analysis

NASA Astrophysics Data System (ADS)

Mejía, José; Zapata, Camilo; Botero, Alonso

2017-01-01

We investigate the spectral statistics of the difference of two density matrices, each of which is independently obtained by partially tracing a random bipartite pure quantum state. We first show how a closed-form expression for the exact joint eigenvalue probability density function for arbitrary dimensions can be obtained from the joint probability density function of the diagonal elements of the difference matrix, which is straightforward to compute. Subsequently, we use standard results from free probability theory to derive a relatively simple analytic expression for the asymptotic eigenvalue density (AED) of the difference matrix ensemble, and using Carlson’s theorem, we obtain an expression for its absolute moments. These results allow us to quantify the typical asymptotic distance between the two random mixed states using various distance measures; in particular, we obtain the almost sure asymptotic behavior of the operator norm distance and the trace distance.
A Random Variable Transformation Process.

ERIC Educational Resources Information Center

Scheuermann, Larry

1989-01-01

Provides a short BASIC program, RANVAR, which generates random variates for various theoretical probability distributions. The seven variates include: uniform, exponential, normal, binomial, Poisson, Pascal, and triangular. (MVL)
Simulations of Probabilities for Quantum Computing

NASA Technical Reports Server (NTRS)

Zak, M.

1996-01-01

It has been demonstrated that classical probabilities, and in particular, probabilistic Turing machine, can be simulated by combining chaos and non-LIpschitz dynamics, without utilization of any man-made devices (such as random number generators). Self-organizing properties of systems coupling simulated and calculated probabilities and their link to quantum computations are discussed.
Visualizing and Understanding Probability and Statistics: Graphical Simulations Using Excel

ERIC Educational Resources Information Center

Gordon, Sheldon P.; Gordon, Florence S.

2009-01-01

The authors describe a collection of dynamic interactive simulations for teaching and learning most of the important ideas and techniques of introductory statistics and probability. The modules cover such topics as randomness, simulations of probability experiments such as coin flipping, dice rolling and general binomial experiments, a simulation…
Temporal fluctuation of the lead level in the cord blood of neonates in Taipei.

PubMed

Hwang, Y H; Wang, J D

1990-01-01

From August 1985 to September 1987, 9,502 cord blood samples were obtained from the Taipei Municipal Maternal and Child Hospital. A total of 205 cord blood samples chosen randomly from newborns without parental exposure to lead were analyzed by flameless atomic absorption spectrophotometry. The average blood lead level was .36 +/- .11 mumol/l (7.48 +/- 2.25 micrograms/dl). A similar analysis was performed on samples obtained from 160 newborns whose fathers had occupational lead exposure. In both groups, the average concentration of lead in cord blood in the summer was statistically greater than that in the winter. Air lead and total amount of lead in gasoline consumed in Taipei appeared to be associated with this seasonal fluctuation in the average lead level of cord blood. After considering alternative sources, we conclude that the seasonal fluctuation of cord blood lead is probably influenced by air lead produced from the combustion of gasoline.
Random Evolutionary Dynamics Driven by Fitness and House-of-Cards Mutations: Sampling Formulae

NASA Astrophysics Data System (ADS)

Huillet, Thierry E.

2017-07-01

We first revisit the multi-allelic mutation-fitness balance problem, especially when mutations obey a house of cards condition, where the discrete-time deterministic evolutionary dynamics of the allelic frequencies derives from a Shahshahani potential. We then consider multi-allelic Wright-Fisher stochastic models whose deviation to neutrality is from the Shahshahani mutation/selection potential. We next focus on the weak selection, weak mutation cases and, making use of a Gamma calculus, we compute the normalizing partition functions of the invariant probability densities appearing in their Wright-Fisher diffusive approximations. Using these results, generalized Ewens sampling formulae (ESF) from the equilibrium distributions are derived. We start treating the ESF in the mixed mutation/selection potential case and then we restrict ourselves to the ESF in the simpler house-of-cards mutations only situation. We also address some issues concerning sampling problems from infinitely-many alleles weak limits.
Efficiency of exchange schemes in replica exchange

NASA Astrophysics Data System (ADS)

Lingenheil, Martin; Denschlag, Robert; Mathias, Gerald; Tavan, Paul

2009-08-01

In replica exchange simulations a fast diffusion of the replicas through the temperature space maximizes the efficiency of the statistical sampling. Here, we compare the diffusion speed as measured by the round trip rates for four exchange algorithms. We find different efficiency profiles with optimal average acceptance probabilities ranging from 8% to 41%. The best performance is determined by benchmark simulations for the most widely used algorithm, which alternately tries to exchange all even and all odd replica pairs. By analytical mathematics we show that the excellent performance of this exchange scheme is due to the high diffusivity of the underlying random walk.
The competing risks Cox model with auxiliary case covariates under weaker missing-at-random cause of failure.

PubMed

Nevo, Daniel; Nishihara, Reiko; Ogino, Shuji; Wang, Molin

2017-08-04

In the analysis of time-to-event data with multiple causes using a competing risks Cox model, often the cause of failure is unknown for some of the cases. The probability of a missing cause is typically assumed to be independent of the cause given the time of the event and covariates measured before the event occurred. In practice, however, the underlying missing-at-random assumption does not necessarily hold. Motivated by colorectal cancer molecular pathological epidemiology analysis, we develop a method to conduct valid analysis when additional auxiliary variables are available for cases only. We consider a weaker missing-at-random assumption, with missing pattern depending on the observed quantities, which include the auxiliary covariates. We use an informative likelihood approach that will yield consistent estimates even when the underlying model for missing cause of failure is misspecified. The superiority of our method over naive methods in finite samples is demonstrated by simulation study results. We illustrate the use of our method in an analysis of colorectal cancer data from the Nurses' Health Study cohort, where, apparently, the traditional missing-at-random assumption fails to hold.
Enumerative and binomial sequential sampling plans for the multicolored Asian lady beetle (Coleoptera: Coccinellidae) in wine grapes.

PubMed

Galvan, T L; Burkness, E C; Hutchison, W D

2007-06-01

To develop a practical integrated pest management (IPM) system for the multicolored Asian lady beetle, Harmonia axyridis (Pallas) (Coleoptera: Coccinellidae), in wine grapes, we assessed the spatial distribution of H. axyridis and developed eight sampling plans to estimate adult density or infestation level in grape clusters. We used 49 data sets collected from commercial vineyards in 2004 and 2005, in Minnesota and Wisconsin. Enumerative plans were developed using two precision levels (0.10 and 0.25); the six binomial plans reflected six unique action thresholds (3, 7, 12, 18, 22, and 31% of cluster samples infested with at least one H. axyridis). The spatial distribution of H. axyridis in wine grapes was aggregated, independent of cultivar and year, but it was more randomly distributed as mean density declined. The average sample number (ASN) for each sampling plan was determined using resampling software. For research purposes, an enumerative plan with a precision level of 0.10 (SE/X) resulted in a mean ASN of 546 clusters. For IPM applications, the enumerative plan with a precision level of 0.25 resulted in a mean ASN of 180 clusters. In contrast, the binomial plans resulted in much lower ASNs and provided high probabilities of arriving at correct "treat or no-treat" decisions, making these plans more efficient for IPM applications. For a tally threshold of one adult per cluster, the operating characteristic curves for the six action thresholds provided binomial sequential sampling plans with mean ASNs of only 19-26 clusters, and probabilities of making correct decisions between 83 and 96%. The benefits of the binomial sampling plans are discussed within the context of improving IPM programs for wine grapes.
Students' Misconceptions about Random Variables

ERIC Educational Resources Information Center

Kachapova, Farida; Kachapov, Ilias

2012-01-01

This article describes some misconceptions about random variables and related counter-examples, and makes suggestions about teaching initial topics on random variables in general form instead of doing it separately for discrete and continuous cases. The focus is on post-calculus probability courses. (Contains 2 figures.)
Almost all quantum channels are equidistant

NASA Astrophysics Data System (ADS)

Nechita, Ion; Puchała, Zbigniew; Pawela, Łukasz; Życzkowski, Karol

2018-05-01

In this work, we analyze properties of generic quantum channels in the case of large system size. We use random matrix theory and free probability to show that the distance between two independent random channels converges to a constant value as the dimension of the system grows larger. As a measure of the distance we use the diamond norm. In the case of a flat Hilbert-Schmidt distribution on quantum channels, we obtain that the distance converges to 1/2 +2/π , giving also an estimate for the maximum success probability for distinguishing the channels. We also consider the problem of distinguishing two random unitary rotations.
Short assessment of the Big Five: robust across survey methods except telephone interviewing.

PubMed

Lang, Frieder R; John, Dennis; Lüdtke, Oliver; Schupp, Jürgen; Wagner, Gert G

2011-06-01

We examined measurement invariance and age-related robustness of a short 15-item Big Five Inventory (BFI-S) of personality dimensions, which is well suited for applications in large-scale multidisciplinary surveys. The BFI-S was assessed in three different interviewing conditions: computer-assisted or paper-assisted face-to-face interviewing, computer-assisted telephone interviewing, and a self-administered questionnaire. Randomized probability samples from a large-scale German panel survey and a related probability telephone study were used in order to test method effects on self-report measures of personality characteristics across early, middle, and late adulthood. Exploratory structural equation modeling was used in order to test for measurement invariance of the five-factor model of personality trait domains across different assessment methods. For the short inventory, findings suggest strong robustness of self-report measures of personality dimensions among young and middle-aged adults. In old age, telephone interviewing was associated with greater distortions in reliable personality assessment. It is concluded that the greater mental workload of telephone interviewing limits the reliability of self-report personality assessment. Face-to-face surveys and self-administrated questionnaire completion are clearly better suited than phone surveys when personality traits in age-heterogeneous samples are assessed.
Empirical likelihood method for non-ignorable missing data problems.

PubMed

Guan, Zhong; Qin, Jing

2017-01-01

Missing response problem is ubiquitous in survey sampling, medical, social science and epidemiology studies. It is well known that non-ignorable missing is the most difficult missing data problem where the missing of a response depends on its own value. In statistical literature, unlike the ignorable missing data problem, not many papers on non-ignorable missing data are available except for the full parametric model based approach. In this paper we study a semiparametric model for non-ignorable missing data in which the missing probability is known up to some parameters, but the underlying distributions are not specified. By employing Owen (1988)'s empirical likelihood method we can obtain the constrained maximum empirical likelihood estimators of the parameters in the missing probability and the mean response which are shown to be asymptotically normal. Moreover the likelihood ratio statistic can be used to test whether the missing of the responses is non-ignorable or completely at random. The theoretical results are confirmed by a simulation study. As an illustration, the analysis of a real AIDS trial data shows that the missing of CD4 counts around two years are non-ignorable and the sample mean based on observed data only is biased.
Prevalence of isolated non-albumin proteinuria in the US population tested for both, urine total protein and urine albumin: An unexpected discovery.

PubMed

Katayev, Alexander; Zebelman, Arthur M; Sharp, Thomas M; Samantha Flynn; Bernstein, Richard K

2017-04-01

Isolated non-albumin proteinuria (NAP) is a condition when urine total protein concentrations are elevated without elevation of urine albumin. The prevalence of NAP in the US population tested for both, urine total protein and albumin was assessed in this study. The database of a US nationwide laboratory network was queried for test results when random urine albumin was ordered together with urine total protein and also when timed 24-hour urine albumin was ordered together with urine total protein. The total prevalence of NAP in the US population tested for both, urine total protein and albumin was calculated for patient groups having normal and low-normal urine albumin (random and timed) with elevated and severely increased urine total protein (random and timed). Also, the prevalence of NAP was calculated for patients with normal urine albumin to assess the probability of missing proteinuria if only urine albumin is measured. The prevalence of NAP in the random samples group was 10.1% (15.2% for females and 4.7% for males). Among patients with normal random albumin, there were 20.0% (27.3% of females and 10.7% of males) patients with NAP. The prevalence of NAP in the timed samples group was 24.6% (29.8% for females and 18.5% for males). Among patients with normal timed urine albumin, there were 36.2% (40.0% of females and 30.8% of males) patients with NAP. There was a strong positive association with female gender and NAP in most patients groups. Testing for only urine (micro)albumin can miss up to 40% of females and 30.8% of males with gross proteinuria. Copyright © 2016 The Canadian Society of Clinical Chemists. Published by Elsevier Inc. All rights reserved.
Probability distribution of the entanglement across a cut at an infinite-randomness fixed point

NASA Astrophysics Data System (ADS)

Devakul, Trithep; Majumdar, Satya N.; Huse, David A.

2017-03-01

We calculate the probability distribution of entanglement entropy S across a cut of a finite one-dimensional spin chain of length L at an infinite-randomness fixed point using Fisher's strong randomness renormalization group (RG). Using the random transverse-field Ising model as an example, the distribution is shown to take the form p (S |L ) ˜L-ψ (k ) , where k ≡S /ln[L /L0] , the large deviation function ψ (k ) is found explicitly, and L0 is a nonuniversal microscopic length. We discuss the implications of such a distribution on numerical techniques that rely on entanglement, such as matrix-product-state-based techniques. Our results are verified with numerical RG simulations, as well as the actual entanglement entropy distribution for the random transverse-field Ising model which we calculate for large L via a mapping to Majorana fermions.
A scaling law for random walks on networks

PubMed Central

Perkins, Theodore J.; Foxall, Eric; Glass, Leon; Edwards, Roderick

2014-01-01

The dynamics of many natural and artificial systems are well described as random walks on a network: the stochastic behaviour of molecules, traffic patterns on the internet, fluctuations in stock prices and so on. The vast literature on random walks provides many tools for computing properties such as steady-state probabilities or expected hitting times. Previously, however, there has been no general theory describing the distribution of possible paths followed by a random walk. Here, we show that for any random walk on a finite network, there are precisely three mutually exclusive possibilities for the form of the path distribution: finite, stretched exponential and power law. The form of the distribution depends only on the structure of the network, while the stepping probabilities control the parameters of the distribution. We use our theory to explain path distributions in domains such as sports, music, nonlinear dynamics and stochastic chemical kinetics. PMID:25311870

A scaling law for random walks on networks

NASA Astrophysics Data System (ADS)

Perkins, Theodore J.; Foxall, Eric; Glass, Leon; Edwards, Roderick

2014-10-01

The dynamics of many natural and artificial systems are well described as random walks on a network: the stochastic behaviour of molecules, traffic patterns on the internet, fluctuations in stock prices and so on. The vast literature on random walks provides many tools for computing properties such as steady-state probabilities or expected hitting times. Previously, however, there has been no general theory describing the distribution of possible paths followed by a random walk. Here, we show that for any random walk on a finite network, there are precisely three mutually exclusive possibilities for the form of the path distribution: finite, stretched exponential and power law. The form of the distribution depends only on the structure of the network, while the stepping probabilities control the parameters of the distribution. We use our theory to explain path distributions in domains such as sports, music, nonlinear dynamics and stochastic chemical kinetics.
A scaling law for random walks on networks.

PubMed

Perkins, Theodore J; Foxall, Eric; Glass, Leon; Edwards, Roderick

2014-10-14

The dynamics of many natural and artificial systems are well described as random walks on a network: the stochastic behaviour of molecules, traffic patterns on the internet, fluctuations in stock prices and so on. The vast literature on random walks provides many tools for computing properties such as steady-state probabilities or expected hitting times. Previously, however, there has been no general theory describing the distribution of possible paths followed by a random walk. Here, we show that for any random walk on a finite network, there are precisely three mutually exclusive possibilities for the form of the path distribution: finite, stretched exponential and power law. The form of the distribution depends only on the structure of the network, while the stepping probabilities control the parameters of the distribution. We use our theory to explain path distributions in domains such as sports, music, nonlinear dynamics and stochastic chemical kinetics.
Comparison of genetic algorithm methods for fuel management optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeChaine, M.D.; Feltus, M.A.

1995-12-31

The CIGARO system was developed for genetic algorithm fuel management optimization. Tests are performed to find the best fuel location swap mutation operator probability and to compare genetic algorithm to a truly random search method. Tests showed the fuel swap probability should be between 0% and 10%, and a 50% definitely hampered the optimization. The genetic algorithm performed significantly better than the random search method, which did not even satisfy the peak normalized power constraint.
Continuous-time random-walk model for financial distributions

NASA Astrophysics Data System (ADS)

Masoliver, Jaume; Montero, Miquel; Weiss, George H.

2003-02-01

We apply the formalism of the continuous-time random walk to the study of financial data. The entire distribution of prices can be obtained once two auxiliary densities are known. These are the probability densities for the pausing time between successive jumps and the corresponding probability density for the magnitude of a jump. We have applied the formalism to data on the U.S. dollar deutsche mark future exchange, finding good agreement between theory and the observed data.
Hoeffding Type Inequalities and their Applications in Statistics and Operations Research

NASA Astrophysics Data System (ADS)

Daras, Tryfon

2007-09-01

Large Deviation theory is the branch of Probability theory that deals with rare events. Sometimes, these events can be described by the sum of random variables that deviates from its mean more than a "normal" amount. A precise calculation of the probabilities of such events turns out to be crucial in a variety of different contents (e.g. in Probability Theory, Statistics, Operations Research, Statistical Physics, Financial Mathematics e.t.c.). Recent applications of the theory deal with random walks in random environments, interacting diffusions, heat conduction, polymer chains [1]. In this paper we prove an inequality of exponential type, namely theorem 2.1, which gives a large deviation upper bound for a specific sequence of r.v.s. Inequalities of this type have many applications in Combinatorics [2]. The inequality generalizes already proven results of this type, in the case of symmetric probability measures. We get as consequences to the inequality: (a) large deviations upper bounds for exchangeable Bernoulli sequences of random variables, generalizing results proven for independent and identically distributed Bernoulli sequences of r.v.s. and (b) a general form of Bernstein's inequality. We compare the inequality with large deviation results already proven by the author and try to see its advantages. Finally, using the inequality, we solve one of the basic problems of Operations Research (bin packing problem) in the case of exchangeable r.v.s.
A quantum-like model of homeopathy clinical trials: importance of in situ randomization and unblinding.

PubMed

Beauvais, Francis

2013-04-01

The randomized controlled trial (RCT) is the 'gold standard' of modern clinical pharmacology. However, for many practitioners of homeopathy, blind RCTs are an inadequate research tool for testing complex therapies such as homeopathy. Classical probabilities used in biological sciences and in medicine are only a special case of the generalized theory of probability used in quantum physics. I describe homeopathy trials using a quantum-like statistical model, a model inspired by quantum physics and taking into consideration superposition of states, non-commuting observables, probability interferences, contextuality, etc. The negative effect of blinding on success of homeopathy trials and the 'smearing effect' ('specific' effects of homeopathy medicine occurring in the placebo group) are described by quantum-like probabilities without supplementary ad hoc hypotheses. The difference of positive outcome rates between placebo and homeopathy groups frequently vanish in centralized blind trials. The model proposed here suggests a way to circumvent such problems in masked homeopathy trials by incorporating in situ randomization/unblinding. In this quantum-like model of homeopathy clinical trials, success in open-label setting and failure with centralized blind RCTs emerge logically from the formalism. This model suggests that significant differences between placebo and homeopathy in blind RCTs would be found more frequently if in situ randomization/unblinding was used. Copyright © 2013. Published by Elsevier Ltd.
A comparison of numerical solutions of partial differential equations with probabilistic and possibilistic parameters for the quantification of uncertainty in subsurface solute transport.

PubMed

Zhang, Kejiang; Achari, Gopal; Li, Hua

2009-11-03

Traditionally, uncertainty in parameters are represented as probabilistic distributions and incorporated into groundwater flow and contaminant transport models. With the advent of newer uncertainty theories, it is now understood that stochastic methods cannot properly represent non random uncertainties. In the groundwater flow and contaminant transport equations, uncertainty in some parameters may be random, whereas those of others may be non random. The objective of this paper is to develop a fuzzy-stochastic partial differential equation (FSPDE) model to simulate conditions where both random and non random uncertainties are involved in groundwater flow and solute transport. Three potential solution techniques namely, (a) transforming a probability distribution to a possibility distribution (Method I) then a FSPDE becomes a fuzzy partial differential equation (FPDE), (b) transforming a possibility distribution to a probability distribution (Method II) and then a FSPDE becomes a stochastic partial differential equation (SPDE), and (c) the combination of Monte Carlo methods and FPDE solution techniques (Method III) are proposed and compared. The effects of these three methods on the predictive results are investigated by using two case studies. The results show that the predictions obtained from Method II is a specific case of that got from Method I. When an exact probabilistic result is needed, Method II is suggested. As the loss or gain of information during a probability-possibility (or vice versa) transformation cannot be quantified, their influences on the predictive results is not known. Thus, Method III should probably be preferred for risk assessments.
On the error probability of general tree and trellis codes with applications to sequential decoding

NASA Technical Reports Server (NTRS)

Johannesson, R.

1973-01-01

An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.
A Mainly Circum-Mediterranean Origin for West Eurasian and North African mtDNAs in Puerto Rico with Strong Contributions from the Canary Islands and West Africa.

PubMed

Díaz-Zabala, Héctor J; Nieves-Colón, María A; Martínez-Cruzado, Juan C

2017-04-01

Maternal lineages of West Eurasian and North African origin account for 11.5% of total mitochondrial ancestry in Puerto Rico. Historical sources suggest that this ancestry arrived mostly from European migrations that took place during the four centuries of the Spanish colonization of Puerto Rico. This study analyzed 101 mitochondrial control region sequences and diagnostic coding region variants from a sample set randomly and systematically selected using a census-based sampling frame to be representative of the Puerto Rican population, with the goal of defining West Eurasian-North African maternal clades and estimating their possible geographical origin. Median-joining haplotype networks were constructed using hypervariable regions 1 and 2 sequences from various reference populations in search of shared haplotypes. A posterior probability analysis was performed to estimate the percentage of possible origins across wide geographic regions for the entire sample set and for the most common haplogroups on the island. Principal component analyses were conducted to place the Puerto Rican mtDNA set within the variation present among all reference populations. Our study shows that up to 38% of West Eurasian and North African mitochondrial ancestry in Puerto Rico most likely migrated from the Canary Islands. However, most of those haplotypes had previously migrated to the Canary Islands from elsewhere, and there are substantial contributions from various populations across the circum-Mediterranean region and from West African populations related to the modern Wolof and Serer peoples from Senegal and the nomad Fulani who extend up to Cameroon. In conclusion, the West Eurasian mitochondrial ancestry in Puerto Ricans is geographically diverse. However, haplotype diversity seems to be low, and frequencies have been shaped by population bottlenecks, migration waves, and random genetic drift. Consequently, approximately 47% of mtDNAs of West Eurasian and North African ancestry in Puerto Rico probably arrived early in its colonial history.
On the synchronizability and detectability of random PPM sequences

NASA Technical Reports Server (NTRS)

Georghiades, Costas N.; Lin, Shu

1987-01-01

The problem of synchronization and detection of random pulse-position-modulation (PPM) sequences is investigated under the assumption of perfect slot synchronization. Maximum-likelihood PPM symbol synchronization and receiver algorithms are derived that make decisions based both on soft as well as hard data; these algorithms are seen to be easily implementable. Bounds derived on the symbol error probability as well as the probability of false synchronization indicate the existence of a rather severe performance floor, which can easily be the limiting factor in the overall system performance. The performance floor is inherent in the PPM format and random data and becomes more serious as the PPM alphabet size Q is increased. A way to eliminate the performance floor is suggested by inserting special PPM symbols in the random data stream.
On the synchronizability and detectability of random PPM sequences

NASA Technical Reports Server (NTRS)

Georghiades, Costas N.

1987-01-01

The problem of synchronization and detection of random pulse-position-modulation (PPM) sequences is investigated under the assumption of perfect slot synchronization. Maximum likelihood PPM symbol synchronization and receiver algorithms are derived that make decisions based both on soft as well as hard data; these algorithms are seen to be easily implementable. Bounds were derived on the symbol error probability as well as the probability of false synchronization that indicate the existence of a rather severe performance floor, which can easily be the limiting factor in the overall system performance. The performance floor is inherent in the PPM format and random data and becomes more serious as the PPM alphabet size Q is increased. A way to eliminate the performance floor is suggested by inserting special PPM symbols in the random data stream.
An Asymptotically-Optimal Sampling-Based Algorithm for Bi-directional Motion Planning

PubMed Central

Starek, Joseph A.; Gomez, Javier V.; Schmerling, Edward; Janson, Lucas; Moreno, Luis; Pavone, Marco

2015-01-01

Bi-directional search is a widely used strategy to increase the success and convergence rates of sampling-based motion planning algorithms. Yet, few results are available that merge both bi-directional search and asymptotic optimality into existing optimal planners, such as PRM*, RRT*, and FMT*. The objective of this paper is to fill this gap. Specifically, this paper presents a bi-directional, sampling-based, asymptotically-optimal algorithm named Bi-directional FMT* (BFMT*) that extends the Fast Marching Tree (FMT*) algorithm to bidirectional search while preserving its key properties, chiefly lazy search and asymptotic optimality through convergence in probability. BFMT* performs a two-source, lazy dynamic programming recursion over a set of randomly-drawn samples, correspondingly generating two search trees: one in cost-to-come space from the initial configuration and another in cost-to-go space from the goal configuration. Numerical experiments illustrate the advantages of BFMT* over its unidirectional counterpart, as well as a number of other state-of-the-art planners. PMID:27004130
Ultrasensitive Detection of Shigella Species in Blood and Stool.

PubMed

Luo, Jieling; Wang, Jiapeng; Mathew, Anup S; Yau, Siu-Tung

2016-02-16

A modified immunosensing system with voltage-controlled signal amplification was used to detect Shigella in stool and blood matrixes at the single-digit CFU level. Inactivated Shigella was spiked in these matrixes and detected directly. The detection was completed in 78 min. Detection limits of 21 CFU/mL and 18 CFU/mL were achieved in stool and blood, respectively, corresponding to 2-7 CFUs immobilized on the detecting electrode. The outcome of the detection of extremely low bacterium concentration, i.e., below 100 CFU/mL, blood samples show a random nature. An analysis of the detection probabilities indicates the correlation between the sample volume and the success of detection and suggests that sample volume is critical for ultrasensitive detection of bacteria. The calculated detection limit is qualitatively in agreement with the empirically determined detection limit. The demonstrated ultrasensitive detection of Shigella on the single-digit CFU level suggests the feasibility of the direct detection of the bacterium in the samples without performing a culture.
Remanent magnetization of lunar samples.

NASA Technical Reports Server (NTRS)

Strangway, D. W.; Pearce, G. W.; Gose, W. A.; Timme, R. W.

1971-01-01

The remanent magnetization of samples returned from the moon by the Apollo 11 and 12 missions consists, in most cases, of two distinct components. An unstable component is readily removed upon alternating field (AF) demagnetization in fields less than 100 Oe and is considered to be an isothermal remanence acquired during or after return to earth. The second component is unaltered by demagnetization in fields up to 400 Oe. It is probably a thermoremanent magnetization due to cooling from above 800 C in the presence of a field of a few thousand gammas. Chips from individual rocks have the same direction of magnetization after demagnetization, while the directions of different samples are random. This again demonstrates the high stability. Our data imply that the moon experienced a magnetic field that lasted at least from about 3.0 to 3.8 b.y., which is the age of Apollo 11 and 12 samples. One explanation of the origin of this field is that the moon had a liquid core and a self-exciting dynamo early in its history.
On the Determinants of the Conjunction Fallacy: Probability versus Inductive Confirmation

ERIC Educational Resources Information Center

Tentori, Katya; Crupi, Vincenzo; Russo, Selena

2013-01-01

Major recent interpretations of the conjunction fallacy postulate that people assess the probability of a conjunction according to (non-normative) averaging rules as applied to the constituents' probabilities or represent the conjunction fallacy as an effect of random error in the judgment process. In the present contribution, we contrast such…
Introducing Perception and Modelling of Spatial Randomness in Classroom

ERIC Educational Resources Information Center

De Nóbrega, José Renato

2017-01-01

A strategy to facilitate understanding of spatial randomness is described, using student activities developed in sequence: looking at spatial patterns, simulating approximate spatial randomness using a grid of equally-likely squares, using binomial probabilities for approximations and predictions and then comparing with given Poisson…
Device-independent randomness generation from several Bell estimators

NASA Astrophysics Data System (ADS)

Nieto-Silleras, Olmo; Bamps, Cédric; Silman, Jonathan; Pironio, Stefano

2018-02-01

Device-independent randomness generation and quantum key distribution protocols rely on a fundamental relation between the non-locality of quantum theory and its random character. This relation is usually expressed in terms of a trade-off between the probability of guessing correctly the outcomes of measurements performed on quantum systems and the amount of violation of a given Bell inequality. However, a more accurate assessment of the randomness produced in Bell experiments can be obtained if the value of several Bell expressions is simultaneously taken into account, or if the full set of probabilities characterizing the behavior of the device is considered. We introduce protocols for device-independent randomness generation secure against classical side information, that rely on the estimation of an arbitrary number of Bell expressions or even directly on the experimental frequencies of measurement outcomes. Asymptotically, this results in an optimal generation of randomness from experimental data (as measured by the min-entropy), without having to assume beforehand that the devices violate a specific Bell inequality.
On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21

NASA Technical Reports Server (NTRS)

Aalfs, David D.

1995-01-01

For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.
Effect of separate sampling on classification accuracy.

PubMed

Shahrokh Esfahani, Mohammad; Dougherty, Edward R

2014-01-15

Measurements are commonly taken from two phenotypes to build a classifier, where the number of data points from each class is predetermined, not random. In this 'separate sampling' scenario, the data cannot be used to estimate the class prior probabilities. Moreover, predetermined class sizes can severely degrade classifier performance, even for large samples. We employ simulations using both synthetic and real data to show the detrimental effect of separate sampling on a variety of classification rules. We establish propositions related to the effect on the expected classifier error owing to a sampling ratio different from the population class ratio. From these we derive a sample-based minimax sampling ratio and provide an algorithm for approximating it from the data. We also extend to arbitrary distributions the classical population-based Anderson linear discriminant analysis minimax sampling ratio derived from the discriminant form of the Bayes classifier. All the codes for synthetic data and real data examples are written in MATLAB. A function called mmratio, whose output is an approximation of the minimax sampling ratio of a given dataset, is also written in MATLAB. All the codes are available at: http://gsp.tamu.edu/Publications/supplementary/shahrokh13b.
Sampling methods to the statistical control of the production of blood components.

PubMed

Pereira, Paulo; Seghatchian, Jerard; Caldeira, Beatriz; Santos, Paula; Castro, Rosa; Fernandes, Teresa; Xavier, Sandra; de Sousa, Gracinda; de Almeida E Sousa, João Paulo

2017-12-01

The control of blood components specifications is a requirement generalized in Europe by the European Commission Directives and in the US by the AABB standards. The use of a statistical process control methodology is recommended in the related literature, including the EDQM guideline. The control reliability is dependent of the sampling. However, a correct sampling methodology seems not to be systematically applied. Commonly, the sampling is intended to comply uniquely with the 1% specification to the produced blood components. Nevertheless, on a purely statistical viewpoint, this model could be argued not to be related to a consistent sampling technique. This could be a severe limitation to detect abnormal patterns and to assure that the production has a non-significant probability of producing nonconforming components. This article discusses what is happening in blood establishments. Three statistical methodologies are proposed: simple random sampling, sampling based on the proportion of a finite population, and sampling based on the inspection level. The empirical results demonstrate that these models are practicable in blood establishments contributing to the robustness of sampling and related statistical process control decisions for the purpose they are suggested for. Copyright © 2017 Elsevier Ltd. All rights reserved.

Gossip in Random Networks

NASA Astrophysics Data System (ADS)

Malarz, K.; Szvetelszky, Z.; Szekf, B.; Kulakowski, K.

2006-11-01

We consider the average probability X of being informed on a gossip in a given social network. The network is modeled within the random graph theory of Erd{õ}s and Rényi. In this theory, a network is characterized by two parameters: the size N and the link probability p. Our experimental data suggest three levels of social inclusion of friendship. The critical value pc, for which half of agents are informed, scales with the system size as N-gamma with gamma approx 0.68. Computer simulations show that the probability X varies with p as a sigmoidal curve. Influence of the correlations between neighbors is also evaluated: with increasing clustering coefficient C, X decreases.
Estimating Model Probabilities using Thermodynamic Markov Chain Monte Carlo Methods

NASA Astrophysics Data System (ADS)

Ye, M.; Liu, P.; Beerli, P.; Lu, D.; Hill, M. C.

2014-12-01

Markov chain Monte Carlo (MCMC) methods are widely used to evaluate model probability for quantifying model uncertainty. In a general procedure, MCMC simulations are first conducted for each individual model, and MCMC parameter samples are then used to approximate marginal likelihood of the model by calculating the geometric mean of the joint likelihood of the model and its parameters. It has been found the method of evaluating geometric mean suffers from the numerical problem of low convergence rate. A simple test case shows that even millions of MCMC samples are insufficient to yield accurate estimation of the marginal likelihood. To resolve this problem, a thermodynamic method is used to have multiple MCMC runs with different values of a heating coefficient between zero and one. When the heating coefficient is zero, the MCMC run is equivalent to a random walk MC in the prior parameter space; when the heating coefficient is one, the MCMC run is the conventional one. For a simple case with analytical form of the marginal likelihood, the thermodynamic method yields more accurate estimate than the method of using geometric mean. This is also demonstrated for a case of groundwater modeling with consideration of four alternative models postulated based on different conceptualization of a confining layer. This groundwater example shows that model probabilities estimated using the thermodynamic method are more reasonable than those obtained using the geometric method. The thermodynamic method is general, and can be used for a wide range of environmental problem for model uncertainty quantification.
2011 Joplin, Missouri Tornado Experience, Mental Health Reactions, and Service Utilization: Cross-Sectional Assessments at Approximately 6 Months and 2.5 Years Post-Event.

PubMed

Houston, J Brian; Spialek, Matthew L; Stevens, Jordan; First, Jennifer; Mieseler, Vicky L; Pfefferbaum, Betty

2015-10-26

Introduction. On May 22, 2011 the deadliest tornado in the United States since 1947 struck Joplin, Missouri killing 161 people, injuring approximately 1,150 individuals, and causing approximately $2.8 billion in economic losses. Methods. This study examined the mental health effects of this event through a random digit dialing sample (N = 380) of Joplin adults at approximately 6 months post-disaster (Survey 1) and a purposive convenience sample (N = 438) of Joplin adults at approximately 2.5 years post-disaster (Survey 2). For both surveys we assessed tornado experience, posttraumatic stress, depression, mental health service utilization, and sociodemographics. For Survey 2 we also assessed social support and parent report of child strengths and difficulties. Results. Probable PTSD relevance was 12.63% at Survey 1 and 26.74% at Survey 2, while current depression prevalence was 20.82% at Survey 1 and 13.33% at Survey 2. Less education and more tornado experience was generally related to greater likelihood of experiencing probable PTSD and current depression for both surveys. Men and younger participants were more likely to report current depression at Survey 1. Low levels of social support (assessed only at Survey 2) were related to more probable PTSD and current depression. For both surveys, we observed low rates of mental health service utilization, and these rates were also low for participants reporting probable PTSD and current depression. At Survey 2 we assessed parent report of child (ages 4 to 17) strengths and difficulties and found that child difficulties were more frequent for younger children (ages 4 to 10) than older children (ages 11 to 17), and that parents reporting probable PTSD reported a greater frequency of children with borderline or abnormal difficulties. Discussion. Overall our results indicate that long-term (multi-year) community disaster mental health monitoring, assessment, referral, outreach, and services are needed following a major disaster like the 2011 Joplin tornado.
2011 Joplin, Missouri Tornado Experience, Mental Health Reactions, and Service Utilization: Cross-Sectional Assessments at Approximately 6 Months and 2.5 Years Post-Event

PubMed Central

Houston, J. Brian; Spialek, Matthew L.; Stevens, Jordan; First, Jennifer; Mieseler, Vicky L.; Pfefferbaum, Betty

2015-01-01

Introduction. On May 22, 2011 the deadliest tornado in the United States since 1947 struck Joplin, Missouri killing 161 people, injuring approximately 1,150 individuals, and causing approximately $2.8 billion in economic losses. Methods. This study examined the mental health effects of this event through a random digit dialing sample (N = 380) of Joplin adults at approximately 6 months post-disaster (Survey 1) and a purposive convenience sample (N = 438) of Joplin adults at approximately 2.5 years post-disaster (Survey 2). For both surveys we assessed tornado experience, posttraumatic stress, depression, mental health service utilization, and sociodemographics. For Survey 2 we also assessed social support and parent report of child strengths and difficulties. Results. Probable PTSD relevance was 12.63% at Survey 1 and 26.74% at Survey 2, while current depression prevalence was 20.82% at Survey 1 and 13.33% at Survey 2. Less education and more tornado experience was generally related to greater likelihood of experiencing probable PTSD and current depression for both surveys. Men and younger participants were more likely to report current depression at Survey 1. Low levels of social support (assessed only at Survey 2) were related to more probable PTSD and current depression. For both surveys, we observed low rates of mental health service utilization, and these rates were also low for participants reporting probable PTSD and current depression. At Survey 2 we assessed parent report of child (ages 4 to 17) strengths and difficulties and found that child difficulties were more frequent for younger children (ages 4 to 10) than older children (ages 11 to 17), and that parents reporting probable PTSD reported a greater frequency of children with borderline or abnormal difficulties. Discussion. Overall our results indicate that long-term (multi-year) community disaster mental health monitoring, assessment, referral, outreach, and services are needed following a major disaster like the 2011 Joplin tornado. PMID:26579331
The usefulness of administrative databases for identifying disease cohorts is increased with a multivariate model.

PubMed

van Walraven, Carl; Austin, Peter C; Manuel, Douglas; Knoll, Greg; Jennings, Allison; Forster, Alan J

2010-12-01

Administrative databases commonly use codes to indicate diagnoses. These codes alone are often inadequate to accurately identify patients with particular conditions. In this study, we determined whether we could quantify the probability that a person has a particular disease-in this case renal failure-using other routinely collected information available in an administrative data set. This would allow the accurate identification of a disease cohort in an administrative database. We determined whether patients in a randomly selected 100,000 hospitalizations had kidney disease (defined as two or more sequential serum creatinines or the single admission creatinine indicating a calculated glomerular filtration rate less than 60 mL/min/1.73 m²). The independent association of patient- and hospitalization-level variables with renal failure was measured using a multivariate logistic regression model in a random 50% sample of the patients. The model was validated in the remaining patients. Twenty thousand seven hundred thirteen patients had kidney disease (20.7%). A diagnostic code of kidney disease was strongly associated with kidney disease (relative risk: 34.4), but the accuracy of the code was poor (sensitivity: 37.9%; specificity: 98.9%). Twenty-nine patient- and hospitalization-level variables entered the kidney disease model. This model had excellent discrimination (c-statistic: 90.1%) and accurately predicted the probability of true renal failure. The probability threshold that maximized sensitivity and specificity for the identification of true kidney disease was 21.3% (sensitivity: 80.0%; specificity: 82.2%). Multiple variables available in administrative databases can be combined to quantify the probability that a person has a particular disease. This process permits accurate identification of a disease cohort in an administrative database. These methods may be extended to other diagnoses or procedures and could both facilitate and clarify the use of administrative databases for research and quality improvement. Copyright © 2010 Elsevier Inc. All rights reserved.
Evaluating a Modular Design Approach to Collecting Survey Data Using Text Messages

PubMed Central

West, Brady T.; Ghimire, Dirgha; Axinn, William G.

2015-01-01

This article presents analyses of data from a pilot study in Nepal that was designed to provide an initial examination of the errors and costs associated with an innovative methodology for survey data collection. We embedded a randomized experiment within a long-standing panel survey, collecting data on a small number of items with varying sensitivity from a probability sample of 450 young Nepalese adults. Survey items ranged from simple demographics to indicators of substance abuse and mental health problems. Sampled adults were randomly assigned to one of three different modes of data collection: 1) a standard one-time telephone interview, 2) a “single sitting” back-and-forth interview with an interviewer using text messaging, and 3) an interview using text messages within a modular design framework (which generally involves breaking the survey response task into distinct parts over a short period of time). Respondents in the modular group were asked to respond (via text message exchanges with an interviewer) to only one question on a given day, rather than complete the entire survey. Both bivariate and multivariate analyses demonstrate that the two text messaging modes increased the probability of disclosing sensitive information relative to the telephone mode, and that respondents in the modular design group, while responding less frequently, found the survey to be significantly easier. Further, those who responded in the modular group were not unique in terms of available covariates, suggesting that the reduced item response rates only introduced limited nonresponse bias. Future research should consider enhancing this methodology, applying it with other modes of data collection (e. g., web surveys), and continuously evaluating its effectiveness from a total survey error perspective. PMID:26322137
A New Random Walk for Replica Detection in WSNs.

PubMed

Aalsalem, Mohammed Y; Khan, Wazir Zada; Saad, N M; Hossain, Md Shohrab; Atiquzzaman, Mohammed; Khan, Muhammad Khurram

2016-01-01

Wireless Sensor Networks (WSNs) are vulnerable to Node Replication attacks or Clone attacks. Among all the existing clone detection protocols in WSNs, RAWL shows the most promising results by employing Simple Random Walk (SRW). More recently, RAND outperforms RAWL by incorporating Network Division with SRW. Both RAND and RAWL have used SRW for random selection of witness nodes which is problematic because of frequently revisiting the previously passed nodes that leads to longer delays, high expenditures of energy with lower probability that witness nodes intersect. To circumvent this problem, we propose to employ a new kind of constrained random walk, namely Single Stage Memory Random Walk and present a distributed technique called SSRWND (Single Stage Memory Random Walk with Network Division). In SSRWND, single stage memory random walk is combined with network division aiming to decrease the communication and memory costs while keeping the detection probability higher. Through intensive simulations it is verified that SSRWND guarantees higher witness node security with moderate communication and memory overheads. SSRWND is expedient for security oriented application fields of WSNs like military and medical.
A New Random Walk for Replica Detection in WSNs

PubMed Central

Aalsalem, Mohammed Y.; Saad, N. M.; Hossain, Md. Shohrab; Atiquzzaman, Mohammed; Khan, Muhammad Khurram

2016-01-01

Wireless Sensor Networks (WSNs) are vulnerable to Node Replication attacks or Clone attacks. Among all the existing clone detection protocols in WSNs, RAWL shows the most promising results by employing Simple Random Walk (SRW). More recently, RAND outperforms RAWL by incorporating Network Division with SRW. Both RAND and RAWL have used SRW for random selection of witness nodes which is problematic because of frequently revisiting the previously passed nodes that leads to longer delays, high expenditures of energy with lower probability that witness nodes intersect. To circumvent this problem, we propose to employ a new kind of constrained random walk, namely Single Stage Memory Random Walk and present a distributed technique called SSRWND (Single Stage Memory Random Walk with Network Division). In SSRWND, single stage memory random walk is combined with network division aiming to decrease the communication and memory costs while keeping the detection probability higher. Through intensive simulations it is verified that SSRWND guarantees higher witness node security with moderate communication and memory overheads. SSRWND is expedient for security oriented application fields of WSNs like military and medical. PMID:27409082
Approximation of Failure Probability Using Conditional Sampling

NASA Technical Reports Server (NTRS)

Giesy. Daniel P.; Crespo, Luis G.; Kenney, Sean P.

2008-01-01

In analyzing systems which depend on uncertain parameters, one technique is to partition the uncertain parameter domain into a failure set and its complement, and judge the quality of the system by estimating the probability of failure. If this is done by a sampling technique such as Monte Carlo and the probability of failure is small, accurate approximation can require so many sample points that the computational expense is prohibitive. Previous work of the authors has shown how to bound the failure event by sets of such simple geometry that their probabilities can be calculated analytically. In this paper, it is shown how to make use of these failure bounding sets and conditional sampling within them to substantially reduce the computational burden of approximating failure probability. It is also shown how the use of these sampling techniques improves the confidence intervals for the failure probability estimate for a given number of sample points and how they reduce the number of sample point analyses needed to achieve a given level of confidence.
Probability Analysis of the Wave-Slamming Pressure Values of the Horizontal Deck with Elastic Support

NASA Astrophysics Data System (ADS)

Zuo, Weiguang; Liu, Ming; Fan, Tianhui; Wang, Pengtao

2018-06-01

This paper presents the probability distribution of the slamming pressure from an experimental study of regular wave slamming on an elastically supported horizontal deck. The time series of the slamming pressure during the wave impact were first obtained through statistical analyses on experimental data. The exceeding probability distribution of the maximum slamming pressure peak and distribution parameters were analyzed, and the results show that the exceeding probability distribution of the maximum slamming pressure peak accords with the three-parameter Weibull distribution. Furthermore, the range and relationships of the distribution parameters were studied. The sum of the location parameter D and the scale parameter L was approximately equal to 1.0, and the exceeding probability was more than 36.79% when the random peak was equal to the sample average during the wave impact. The variation of the distribution parameters and slamming pressure under different model conditions were comprehensively presented, and the parameter values of the Weibull distribution of wave-slamming pressure peaks were different due to different test models. The parameter values were found to decrease due to the increased stiffness of the elastic support. The damage criterion of the structure model caused by the wave impact was initially discussed, and the structure model was destroyed when the average slamming time was greater than a certain value during the duration of the wave impact. The conclusions of the experimental study were then described.
Confidence intervals for the between-study variance in random-effects meta-analysis using generalised heterogeneity statistics: should we use unequal tails?

PubMed

Jackson, Dan; Bowden, Jack

2016-09-07

Confidence intervals for the between study variance are useful in random-effects meta-analyses because they quantify the uncertainty in the corresponding point estimates. Methods for calculating these confidence intervals have been developed that are based on inverting hypothesis tests using generalised heterogeneity statistics. Whilst, under the random effects model, these new methods furnish confidence intervals with the correct coverage, the resulting intervals are usually very wide, making them uninformative. We discuss a simple strategy for obtaining 95 % confidence intervals for the between-study variance with a markedly reduced width, whilst retaining the nominal coverage probability. Specifically, we consider the possibility of using methods based on generalised heterogeneity statistics with unequal tail probabilities, where the tail probability used to compute the upper bound is greater than 2.5 %. This idea is assessed using four real examples and a variety of simulation studies. Supporting analytical results are also obtained. Our results provide evidence that using unequal tail probabilities can result in shorter 95 % confidence intervals for the between-study variance. We also show some further results for a real example that illustrates how shorter confidence intervals for the between-study variance can be useful when performing sensitivity analyses for the average effect, which is usually the parameter of primary interest. We conclude that using unequal tail probabilities when computing 95 % confidence intervals for the between-study variance, when using methods based on generalised heterogeneity statistics, can result in shorter confidence intervals. We suggest that those who find the case for using unequal tail probabilities convincing should use the '1-4 % split', where greater tail probability is allocated to the upper confidence bound. The 'width-optimal' interval that we present deserves further investigation.
Sino-implant (II) - a levonorgestrel-releasing two-rod implant: systematic review of the randomized controlled trials

PubMed Central

Steiner, Markus J.; Lopez, Laureen M.; Grimes, David A.; Cheng, Linan; Shelton, Jim; Trussell, James; Farley, Timothy M.M.; Dorflinger, Laneta

2013-01-01

Background Sino-implant (II) is a subdermal contraceptive implant manufactured in China. This two-rod levonorgestrel-releasing implant has the same amount of active ingredient (150 mg levonorgestrel) and mechanism of action as the widely available contraceptive implant Jadelle. We examined randomized controlled trials of Sino-implant (II) for effectiveness and side effects. Study design We searched electronic databases for studies of Sino-implant (II), and then restricted our review to randomized controlled trials. The primary outcome of this review was pregnancy. Results Four randomized trials with a total of 15,943 women assigned to Sino-implant (II) had first-year probabilities of pregnancy ranging from 0.0% to 0.1%. Cumulative probabilities of pregnancy during the four years of the product's approved duration of use were 0.9% and 1.06% in the two trials that presented date for four-year use. Five-year cumulative probabilities of pregnancy ranged from 0.7% to 2.1%. In one trial, the cumulative probability of pregnancy more than doubled during the fifth year (from 0.9% to 2.1%), which may be why the implant is approved for four years of use in China. Five-year cumulative probabilities of discontinuation due to menstrual problems ranged from 12.5% to 15.5% for Sino-implant (II). Conclusions Sino-implant (II) is one of the most effective contraceptives available today. These available clinical data, combined with independent laboratory testing, and the knowledge that 7 million women have used this method since 1994, support the safety and effectiveness of Sino-implant (II). The lower cost of Sino-implant (II) compared with other subdermal implants could improve access to implants in resource-constrained settings. PMID:20159174
Modeling Achievement Trajectories when Attrition Is Informative

ERIC Educational Resources Information Center

Feldman, Betsy J.; Rabe-Hesketh, Sophia

2012-01-01

In longitudinal education studies, assuming that dropout and missing data occur completely at random is often unrealistic. When the probability of dropout depends on covariates and observed responses (called "missing at random" [MAR]), or on values of responses that are missing (called "informative" or "not missing at random" [NMAR]),…
An extended car-following model considering random safety distance with different probabilities

NASA Astrophysics Data System (ADS)

Wang, Jufeng; Sun, Fengxin; Cheng, Rongjun; Ge, Hongxia; Wei, Qi

2018-02-01

Because of the difference in vehicle type or driving skill, the driving strategy is not exactly the same. The driving speeds of the different vehicles may be different for the same headway. Since the optimal velocity function is just determined by the safety distance besides the maximum velocity and headway, an extended car-following model accounting for random safety distance with different probabilities is proposed in this paper. The linear stable condition for this extended traffic model is obtained by using linear stability theory. Numerical simulations are carried out to explore the complex phenomenon resulting from multiple safety distance in the optimal velocity function. The cases of multiple types of safety distances selected with different probabilities are presented. Numerical results show that the traffic flow with multiple safety distances with different probabilities will be more unstable than that with single type of safety distance, and will result in more stop-and-go phenomena.
Decomposition of conditional probability for high-order symbolic Markov chains.

PubMed

Melnik, S S; Usatenko, O V

2017-07-01

The main goal of this paper is to develop an estimate for the conditional probability function of random stationary ergodic symbolic sequences with elements belonging to a finite alphabet. We elaborate on a decomposition procedure for the conditional probability function of sequences considered to be high-order Markov chains. We represent the conditional probability function as the sum of multilinear memory function monomials of different orders (from zero up to the chain order). This allows us to introduce a family of Markov chain models and to construct artificial sequences via a method of successive iterations, taking into account at each step increasingly high correlations among random elements. At weak correlations, the memory functions are uniquely expressed in terms of the high-order symbolic correlation functions. The proposed method fills the gap between two approaches, namely the likelihood estimation and the additive Markov chains. The obtained results may have applications for sequential approximation of artificial neural network training.
Decomposition of conditional probability for high-order symbolic Markov chains

NASA Astrophysics Data System (ADS)

Melnik, S. S.; Usatenko, O. V.

2017-07-01

The main goal of this paper is to develop an estimate for the conditional probability function of random stationary ergodic symbolic sequences with elements belonging to a finite alphabet. We elaborate on a decomposition procedure for the conditional probability function of sequences considered to be high-order Markov chains. We represent the conditional probability function as the sum of multilinear memory function monomials of different orders (from zero up to the chain order). This allows us to introduce a family of Markov chain models and to construct artificial sequences via a method of successive iterations, taking into account at each step increasingly high correlations among random elements. At weak correlations, the memory functions are uniquely expressed in terms of the high-order symbolic correlation functions. The proposed method fills the gap between two approaches, namely the likelihood estimation and the additive Markov chains. The obtained results may have applications for sequential approximation of artificial neural network training.
Unsupervised Bayesian linear unmixing of gene expression microarrays.

PubMed

Bazot, Cécile; Dobigeon, Nicolas; Tourneret, Jean-Yves; Zaas, Aimee K; Ginsburg, Geoffrey S; Hero, Alfred O

2013-03-19

This paper introduces a new constrained model and the corresponding algorithm, called unsupervised Bayesian linear unmixing (uBLU), to identify biological signatures from high dimensional assays like gene expression microarrays. The basis for uBLU is a Bayesian model for the data samples which are represented as an additive mixture of random positive gene signatures, called factors, with random positive mixing coefficients, called factor scores, that specify the relative contribution of each signature to a specific sample. The particularity of the proposed method is that uBLU constrains the factor loadings to be non-negative and the factor scores to be probability distributions over the factors. Furthermore, it also provides estimates of the number of factors. A Gibbs sampling strategy is adopted here to generate random samples according to the posterior distribution of the factors, factor scores, and number of factors. These samples are then used to estimate all the unknown parameters. Firstly, the proposed uBLU method is applied to several simulated datasets with known ground truth and compared with previous factor decomposition methods, such as principal component analysis (PCA), non negative matrix factorization (NMF), Bayesian factor regression modeling (BFRM), and the gradient-based algorithm for general matrix factorization (GB-GMF). Secondly, we illustrate the application of uBLU on a real time-evolving gene expression dataset from a recent viral challenge study in which individuals have been inoculated with influenza A/H3N2/Wisconsin. We show that the uBLU method significantly outperforms the other methods on the simulated and real data sets considered here. The results obtained on synthetic and real data illustrate the accuracy of the proposed uBLU method when compared to other factor decomposition methods from the literature (PCA, NMF, BFRM, and GB-GMF). The uBLU method identifies an inflammatory component closely associated with clinical symptom scores collected during the study. Using a constrained model allows recovery of all the inflammatory genes in a single factor.
Tin Whisker Electrical Short Circuit Characteristics. Part 2

NASA Technical Reports Server (NTRS)

Courey, Karim J.; Asfour, Shihab S.; Onar, Arzu; Bayliss, Jon A.; Ludwig, Lawrence L.; Wright, Maria C.

2009-01-01

Existing risk simulations make the assumption that when a free tin whisker has bridged two adjacent exposed electrical conductors, the result is an electrical short circuit. This conservative assumption is made because shorting is a random event that has an unknown probability associated with it. Note however that due to contact resistance electrical shorts may not occur at lower voltage levels. In our first article we developed an empirical probability model for tin whisker shorting. In this paper, we develop a more comprehensive empirical model using a refined experiment with a larger sample size, in which we studied the effect of varying voltage on the breakdown of the contact resistance which leads to a short circuit. From the resulting data we estimated the probability distribution of an electrical short, as a function of voltage. In addition, the unexpected polycrystalline structure seen in the focused ion beam (FIB) cross section in the first experiment was confirmed in this experiment using transmission electron microscopy (TEM). The FIB was also used to cross section two card guides to facilitate the measurement of the grain size of each card guide's tin plating to determine its finish.
Assessing the completeness of the fossil record using brachiopod Lazarus taxa

NASA Astrophysics Data System (ADS)

Gearty, W.; Payne, J.

2012-12-01

Lazarus taxa, organisms that disappear from the fossil record only to reappear later, provide a unique opportunity to assess the completeness of the fossil record. In this study, we apply logistic regression to quantify the associations of body size, geographic extent, and species diversity with the probability of being a Lazarus genus using the Phanerozoic fossil record of brachiopods. We find that both the geographic range and species diversity of a genus are inversely associated with the probability of being a Lazarus taxon in the preceding or succeeding stage. In contrast, body size exhibits little association with the probability of becoming a Lazarus taxon. A model including species diversity and geographic extent as predictors performs best among all combinations examined, whereas a model including only shell size as a predictor performs the worst - even worse than a model that assumes Lazarus taxa are randomly drawn from all available genera. These findings suggest that geographic range and species richness data can be used to improve estimates of extensions on the observed fossil ranges of genera and, thereby, better correct for sampling effects in estimates of taxonomic diversity change through the Phanerozoic.
Probabilities and statistics for backscatter estimates obtained by a scatterometer

NASA Technical Reports Server (NTRS)

Pierson, Willard J., Jr.

1989-01-01

Methods for the recovery of winds near the surface of the ocean from measurements of the normalized radar backscattering cross section must recognize and make use of the statistics (i.e., the sampling variability) of the backscatter measurements. Radar backscatter values from a scatterometer are random variables with expected values given by a model. A model relates backscatter to properties of the waves on the ocean, which are in turn generated by the winds in the atmospheric marine boundary layer. The effective wind speed and direction at a known height for a neutrally stratified atmosphere are the values to be recovered from the model. The probability density function for the backscatter values is a normal probability distribution with the notable feature that the variance is a known function of the expected value. The sources of signal variability, the effects of this variability on the wind speed estimation, and criteria for the acceptance or rejection of models are discussed. A modified maximum likelihood method for estimating wind vectors is described. Ways to make corrections for the kinds of errors found for the Seasat SASS model function are described, and applications to a new scatterometer are given.

Statistical inferences for data from studies conducted with an aggregated multivariate outcome-dependent sample design.

PubMed

Lu, Tsui-Shan; Longnecker, Matthew P; Zhou, Haibo

2017-03-15

Outcome-dependent sampling (ODS) scheme is a cost-effective sampling scheme where one observes the exposure with a probability that depends on the outcome. The well-known such design is the case-control design for binary response, the case-cohort design for the failure time data, and the general ODS design for a continuous response. While substantial work has been carried out for the univariate response case, statistical inference and design for the ODS with multivariate cases remain under-developed. Motivated by the need in biological studies for taking the advantage of the available responses for subjects in a cluster, we propose a multivariate outcome-dependent sampling (multivariate-ODS) design that is based on a general selection of the continuous responses within a cluster. The proposed inference procedure for the multivariate-ODS design is semiparametric where all the underlying distributions of covariates are modeled nonparametrically using the empirical likelihood methods. We show that the proposed estimator is consistent and developed the asymptotically normality properties. Simulation studies show that the proposed estimator is more efficient than the estimator obtained using only the simple-random-sample portion of the multivariate-ODS or the estimator from a simple random sample with the same sample size. The multivariate-ODS design together with the proposed estimator provides an approach to further improve study efficiency for a given fixed study budget. We illustrate the proposed design and estimator with an analysis of association of polychlorinated biphenyl exposure to hearing loss in children born to the Collaborative Perinatal Study. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.
Random Variables: Simulations and Surprising Connections.

ERIC Educational Resources Information Center

Quinn, Robert J.; Tomlinson, Stephen

1999-01-01

Features activities for advanced second-year algebra students in grades 11 and 12. Introduces three random variables and considers an empirical and theoretical probability for each. Uses coins, regular dice, decahedral dice, and calculators. (ASK)
Method- and species-specific detection probabilities of fish occupancy in Arctic lakes: Implications for design and management

USGS Publications Warehouse

Haynes, Trevor B.; Rosenberger, Amanda E.; Lindberg, Mark S.; Whitman, Matthew; Schmutz, Joel A.

2013-01-01

Studies examining species occurrence often fail to account for false absences in field sampling. We investigate detection probabilities of five gear types for six fish species in a sample of lakes on the North Slope, Alaska. We used an occupancy modeling approach to provide estimates of detection probabilities for each method. Variation in gear- and species-specific detection probability was considerable. For example, detection probabilities for the fyke net ranged from 0.82 (SE = 0.05) for least cisco (Coregonus sardinella) to 0.04 (SE = 0.01) for slimy sculpin (Cottus cognatus). Detection probabilities were also affected by site-specific variables such as depth of the lake, year, day of sampling, and lake connection to a stream. With the exception of the dip net and shore minnow traps, each gear type provided the highest detection probability of at least one species. Results suggest that a multimethod approach may be most effective when attempting to sample the entire fish community of Arctic lakes. Detection probability estimates will be useful for designing optimal fish sampling and monitoring protocols in Arctic lakes.
A Probabilistic Design Method Applied to Smart Composite Structures

NASA Technical Reports Server (NTRS)

Shiao, Michael C.; Chamis, Christos C.

1995-01-01

A probabilistic design method is described and demonstrated using a smart composite wing. Probabilistic structural design incorporates naturally occurring uncertainties including those in constituent (fiber/matrix) material properties, fabrication variables, structure geometry and control-related parameters. Probabilistic sensitivity factors are computed to identify those parameters that have a great influence on a specific structural reliability. Two performance criteria are used to demonstrate this design methodology. The first criterion requires that the actuated angle at the wing tip be bounded by upper and lower limits at a specified reliability. The second criterion requires that the probability of ply damage due to random impact load be smaller than an assigned value. When the relationship between reliability improvement and the sensitivity factors is assessed, the results show that a reduction in the scatter of the random variable with the largest sensitivity factor (absolute value) provides the lowest failure probability. An increase in the mean of the random variable with a negative sensitivity factor will reduce the failure probability. Therefore, the design can be improved by controlling or selecting distribution parameters associated with random variables. This can be implemented during the manufacturing process to obtain maximum benefit with minimum alterations.
Utility-based designs for randomized comparative trials with categorical outcomes

PubMed Central

Murray, Thomas A.; Thall, Peter F.; Yuan, Ying

2016-01-01

A general utility-based testing methodology for design and conduct of randomized comparative clinical trials with categorical outcomes is presented. Numerical utilities of all elementary events are elicited to quantify their desirabilities. These numerical values are used to map the categorical outcome probability vector of each treatment to a mean utility, which is used as a one-dimensional criterion for constructing comparative tests. Bayesian tests are presented, including fixed sample and group sequential procedures, assuming Dirichlet-multinomial models for the priors and likelihoods. Guidelines are provided for establishing priors, eliciting utilities, and specifying hypotheses. Efficient posterior computation is discussed, and algorithms are provided for jointly calibrating test cutoffs and sample size to control overall type I error and achieve specified power. Asymptotic approximations for the power curve are used to initialize the algorithms. The methodology is applied to re-design a completed trial that compared two chemotherapy regimens for chronic lymphocytic leukemia, in which an ordinal efficacy outcome was dichotomized and toxicity was ignored to construct the trial’s design. The Bayesian tests also are illustrated by several types of categorical outcomes arising in common clinical settings. Freely available computer software for implementation is provided. PMID:27189672
Status report on the Zagreb Radiocarbon Laboratory - AMS and LSC results of VIRI intercomparison samples

NASA Astrophysics Data System (ADS)

Sironić, Andreja; Krajcar Bronić, Ines; Horvatinčić, Nada; Barešić, Jadranka; Obelić, Bogomil; Felja, Igor

2013-01-01

A new line for preparation of the graphite samples for 14C dating by Accelerator Mass Spectrometry (AMS) in the Zagreb Radiocarbon Laboratory has been validated by preparing graphite from various materials distributed within the Fifth International Radiocarbon Intercomparison (VIRI) study. 14C activity of prepared graphite was measured at the SUERC AMS facility. The results are statistically evaluated by means of the z-score and u-score values. The mean z-score value of 28 prepared VIRI samples is (0.06 ± 0.23) showing excellent agreement with the consensus VIRI values. Only one sample resulted in the u-score value above the limit of acceptability (defined for the confidence interval of 99%) and this was probably caused by a random contamination of the graphitization rig. After the rig had been moved to the new adapted and isolated room, all u-score values laid within the acceptable limits. Our LSC results of VIRI intercomparison samples are also presented and they are all accepted according to the u-score values.
Modeling of chromosome intermingling by partially overlapping uniform random polygons.

PubMed

Blackstone, T; Scharein, R; Borgo, B; Varela, R; Diao, Y; Arsuaga, J

2011-03-01

During the early phase of the cell cycle the eukaryotic genome is organized into chromosome territories. The geometry of the interface between any two chromosomes remains a matter of debate and may have important functional consequences. The Interchromosomal Network model (introduced by Branco and Pombo) proposes that territories intermingle along their periphery. In order to partially quantify this concept we here investigate the probability that two chromosomes form an unsplittable link. We use the uniform random polygon as a crude model for chromosome territories and we model the interchromosomal network as the common spatial region of two overlapping uniform random polygons. This simple model allows us to derive some rigorous mathematical results as well as to perform computer simulations easily. We find that the probability that one uniform random polygon of length n that partially overlaps a fixed polygon is bounded below by 1 − O(1/√n). We use numerical simulations to estimate the dependence of the linking probability of two uniform random polygons (of lengths n and m, respectively) on the amount of overlapping. The degree of overlapping is parametrized by a parameter [Formula: see text] such that [Formula: see text] indicates no overlapping and [Formula: see text] indicates total overlapping. We propose that this dependence relation may be modeled as f (ε, m, n) = [Formula: see text]. Numerical evidence shows that this model works well when [Formula: see text] is relatively large (ε ≥ 0.5). We then use these results to model the data published by Branco and Pombo and observe that for the amount of overlapping observed experimentally the URPs have a non-zero probability of forming an unsplittable link.
Adaptive Randomization of Neratinib in Early Breast Cancer

PubMed Central

Park, John W.; Liu, Minetta C.; Yee, Douglas; Yau, Christina; van 't Veer, Laura J.; Symmans, W. Fraser; Paoloni, Melissa; Perlmutter, Jane; Hylton, Nola M.; Hogarth, Michael; DeMichele, Angela; Buxton, Meredith B.; Chien, A. Jo; Wallace, Anne M.; Boughey, Judy C.; Haddad, Tufia C.; Chui, Stephen Y.; Kemmer, Kathleen A.; Kaplan, Henry G.; Liu, Minetta C.; Isaacs, Claudine; Nanda, Rita; Tripathy, Debasish; Albain, Kathy S.; Edmiston, Kirsten K.; Elias, Anthony D.; Northfelt, Donald W.; Pusztai, Lajos; Moulder, Stacy L.; Lang, Julie E.; Viscusi, Rebecca K.; Euhus, David M.; Haley, Barbara B.; Khan, Qamar J.; Wood, William C.; Melisko, Michelle; Schwab, Richard; Lyandres, Julia; Davis, Sarah E.; Hirst, Gillian L.; Sanil, Ashish; Esserman, Laura J.; Berry, Donald A.

2017-01-01

Background I-SPY2, a standing, multicenter, adaptive phase 2 neoadjuvant trial ongoing in high-risk clinical stage II/III breast cancer, is designed to evaluate multiple, novel experimental agents added to standard chemotherapy for their ability to improve the rate of pathologic complete response (pCR). Experimental therapies are compared against a common control arm. We report efficacy for the tyrosine kinase inhibitor neratinib. Methods Eligible women had ≥2.5 cm stage II/III breast cancer, categorized into 8 biomarker subtypes based on HER2, hormone-receptor status (HR), and MammaPrint. Neratinib was evaluated for 10 signatures (prospectively defined subtype combinations), with primary endpoint pCR. MR volume changes inform likelihood of pCR for each patient prior to surgery. Adaptive assignment to experimental arms within disease subtype was based on current Bayesian probabilities of superiority over control. Accrual to experimental arm stop at any time for futility or graduation within a particular signature based on Bayesian predictive probability of success in a confirmatory trial. The maximum sample size in any experimental arm is 120 patients, Results With 115 patients and 78 concurrently randomized controls, neratinib graduated in the HER2+/HR− signature, with mean pCR rate 56% (95% PI: 37 to 73%) vs 33% for controls (11 to 54%). Final predictive probability of success, updated when all pathology data were available, was 79%. Conclusion Adaptive, multi-armed trials can efficiently identify responding tumor subtypes. Neratinib added to standard therapy is highly likely to improve pCR rates in HER2+/HR2212; breast cancer. Confirmation in I-SPY 3, a phase 3 neoadjuvant registration trial, is planned. PMID:27406346
Deep convolutional networks for pancreas segmentation in CT imaging

NASA Astrophysics Data System (ADS)

Roth, Holger R.; Farag, Amal; Lu, Le; Turkbey, Evrim B.; Summers, Ronald M.

2015-03-01

Automatic organ segmentation is an important prerequisite for many computer-aided diagnosis systems. The high anatomical variability of organs in the abdomen, such as the pancreas, prevents many segmentation methods from achieving high accuracies when compared to state-of-the-art segmentation of organs like the liver, heart or kidneys. Recently, the availability of large annotated training sets and the accessibility of affordable parallel computing resources via GPUs have made it feasible for "deep learning" methods such as convolutional networks (ConvNets) to succeed in image classification tasks. These methods have the advantage that used classification features are trained directly from the imaging data. We present a fully-automated bottom-up method for pancreas segmentation in computed tomography (CT) images of the abdomen. The method is based on hierarchical coarse-to-fine classification of local image regions (superpixels). Superpixels are extracted from the abdominal region using Simple Linear Iterative Clustering (SLIC). An initial probability response map is generated, using patch-level confidences and a two-level cascade of random forest classifiers, from which superpixel regions with probabilities larger 0.5 are retained. These retained superpixels serve as a highly sensitive initial input of the pancreas and its surroundings to a ConvNet that samples a bounding box around each superpixel at different scales (and random non-rigid deformations at training time) in order to assign a more distinct probability of each superpixel region being pancreas or not. We evaluate our method on CT images of 82 patients (60 for training, 2 for validation, and 20 for testing). Using ConvNets we achieve maximum Dice scores of an average 68% +/- 10% (range, 43-80%) in testing. This shows promise for accurate pancreas segmentation, using a deep learning approach and compares favorably to state-of-the-art methods.
Deciphering the Routes of invasion of Drosophila suzukii by Means of ABC Random Forest.

PubMed

Fraimout, Antoine; Debat, Vincent; Fellous, Simon; Hufbauer, Ruth A; Foucaud, Julien; Pudlo, Pierre; Marin, Jean-Michel; Price, Donald K; Cattel, Julien; Chen, Xiao; Deprá, Marindia; François Duyck, Pierre; Guedot, Christelle; Kenis, Marc; Kimura, Masahito T; Loeb, Gregory; Loiseau, Anne; Martinez-Sañudo, Isabel; Pascual, Marta; Polihronakis Richmond, Maxi; Shearer, Peter; Singh, Nadia; Tamura, Koichiro; Xuéreb, Anne; Zhang, Jinping; Estoup, Arnaud

2017-04-01

Deciphering invasion routes from molecular data is crucial to understanding biological invasions, including identifying bottlenecks in population size and admixture among distinct populations. Here, we unravel the invasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23 worldwide sampling locations). To do this, we use approximate Bayesian computation (ABC), which has improved the reconstruction of invasion routes, but can be computationally expensive. We use our study to illustrate the use of a new, more efficient, ABC method, ABC random forest (ABC-RF) and compare it to a standard ABC method (ABC-LDA). We find that Japan emerges as the most probable source of the earliest recorded invasion into Hawaii. Southeast China and Hawaii together are the most probable sources of populations in western North America, which then in turn served as sources for those in eastern North America. European populations are genetically more homogeneous than North American populations, and their most probable source is northeast China, with evidence of limited gene flow from the eastern US as well. All introduced populations passed through bottlenecks, and analyses reveal five distinct admixture events. These findings can inform hypotheses concerning how this species evolved between different and independent source and invasive populations. Methodological comparisons indicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number of simulated datasets, especially when analyzing complex introduction scenarios. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Personality and medication non-adherence among older adults enrolled in a six-year trial

PubMed Central

Jerant, Anthony; Chapman, Benjamin; Duberstein, Paul; Robbins, John; Franks, Peter

2011-01-01

Objectives Personality factors parsimoniously capture the variation in dispositional characteristics that affect behaviours, but their value in predicting medication non-adherence is unclear. We investigated the relationship between five-factor model personality factors (Conscientiousness, Neuroticism, Agreeableness, Extraversion, and Openness) and medication non-adherence among older participants during a six-year randomized placebo-controlled trial (RCT). Design Observational cohort data from 771 subjects aged ≥72 years enrolled in the Ginkgo Evaluation of Memory study, a RCT of Ginkgo biloba for prevention of dementia. Methods Random effects logistic regression analyses examined effects of NEO Five-Factor Inventory scores on medication non-adherence, determined via pill counts every 6 months (median follow-up 6.1 years) and defined as taking <80% of prescribed pills. Analyses adjusted for covariates linked with non-adherence in prior studies. Results Each 5 year increment in participant age was associated with a 6.7% greater probability of non-adherence (95% confidence interval, CI [2.4, 11.0]). Neuroticism was the only personality factor associated with non-adherence: a 1 SD increase was associated with a 3.8% increase in the probability of non-adherence (95% CI [0.4, 7.2]). Lower cognitive function was also associated with non-adherence: a 1 SD decrease in mental status exam score was associated with a 3.0% increase in the probability of non-adherence (95% CI [0.2, 5.9]). Conclusions Neuroticism was associated with medication non-adherence over 6 years of follow-up in a large sample of older RCT participants. Personality measurement in clinical and research settings might help to identify and guide interventions for older adults at risk for medication non-adherence. PMID:21226789
Are all data created equal?--Exploring some boundary conditions for a lazy intuitive statistician.

PubMed

Lindskog, Marcus; Winman, Anders

2014-01-01

The study investigated potential effects of the presentation order of numeric information on retrospective subjective judgments of descriptive statistics of this information. The studies were theoretically motivated by the assumption in the naïve sampling model of independence between temporal encoding order of data in long-term memory and retrieval probability (i.e. as implied by a "random sampling" from memory metaphor). In Experiment 1, participants experienced Arabic numbers that varied in distribution shape/variability between the first and the second half of the information sequence. Results showed no effects of order on judgments of mean, variability or distribution shape. To strengthen the interpretation of these results, Experiment 2 used a repeated judgment procedure, with an initial judgment occurring prior to the change in distribution shape of the information half-way through data presentation. The results of Experiment 2 were in line with those from Experiment 1, and in addition showed that the act of making explicit judgments did not impair accuracy of later judgments, as would be suggested by an anchoring and insufficient adjustment strategy. Overall, the results indicated that participants were very responsive to the properties of the data while at the same time being more or less immune to order effects. The results were interpreted as being in line with the naïve sampling models in which values are stored as exemplars and sampled randomly from long-term memory.
Classical verification of quantum circuits containing few basis changes

NASA Astrophysics Data System (ADS)

Demarie, Tommaso F.; Ouyang, Yingkai; Fitzsimons, Joseph F.

2018-04-01

We consider the task of verifying the correctness of quantum computation for a restricted class of circuits which contain at most two basis changes. This contains circuits giving rise to the second level of the Fourier hierarchy, the lowest level for which there is an established quantum advantage. We show that when the circuit has an outcome with probability at least the inverse of some polynomial in the circuit size, the outcome can be checked in polynomial time with bounded error by a completely classical verifier. This verification procedure is based on random sampling of computational paths and is only possible given knowledge of the likely outcome.
Linear regression analysis of survival data with missing censoring indicators.

PubMed

Wang, Qihua; Dinse, Gregg E

2011-04-01

Linear regression analysis has been studied extensively in a random censorship setting, but typically all of the censoring indicators are assumed to be observed. In this paper, we develop synthetic data methods for estimating regression parameters in a linear model when some censoring indicators are missing. We define estimators based on regression calibration, imputation, and inverse probability weighting techniques, and we prove all three estimators are asymptotically normal. The finite-sample performance of each estimator is evaluated via simulation. We illustrate our methods by assessing the effects of sex and age on the time to non-ambulatory progression for patients in a brain cancer clinical trial.
Root location in random trees: a polarity property of all sampling consistent phylogenetic models except one.

PubMed

Steel, Mike

2012-10-01

Neutral macroevolutionary models, such as the Yule model, give rise to a probability distribution on the set of discrete rooted binary trees over a given leaf set. Such models can provide a signal as to the approximate location of the root when only the unrooted phylogenetic tree is known, and this signal becomes relatively more significant as the number of leaves grows. In this short note, we show that among models that treat all taxa equally, and are sampling consistent (i.e. the distribution on trees is not affected by taxa yet to be included), all such models, except one (the so-called PDA model), convey some information as to the location of the ancestral root in an unrooted tree. Copyright © 2012 Elsevier Inc. All rights reserved.
Joint modeling and registration of cell populations in cohorts of high-dimensional flow cytometric data.

PubMed

Pyne, Saumyadipta; Lee, Sharon X; Wang, Kui; Irish, Jonathan; Tamayo, Pablo; Nazaire, Marc-Danie; Duong, Tarn; Ng, Shu-Kay; Hafler, David; Levy, Ronald; Nolan, Garry P; Mesirov, Jill; McLachlan, Geoffrey J

2014-01-01

In biomedical applications, an experimenter encounters different potential sources of variation in data such as individual samples, multiple experimental conditions, and multivariate responses of a panel of markers such as from a signaling network. In multiparametric cytometry, which is often used for analyzing patient samples, such issues are critical. While computational methods can identify cell populations in individual samples, without the ability to automatically match them across samples, it is difficult to compare and characterize the populations in typical experiments, such as those responding to various stimulations or distinctive of particular patients or time-points, especially when there are many samples. Joint Clustering and Matching (JCM) is a multi-level framework for simultaneous modeling and registration of populations across a cohort. JCM models every population with a robust multivariate probability distribution. Simultaneously, JCM fits a random-effects model to construct an overall batch template--used for registering populations across samples, and classifying new samples. By tackling systems-level variation, JCM supports practical biomedical applications involving large cohorts. Software for fitting the JCM models have been implemented in an R package EMMIX-JCM, available from http://www.maths.uq.edu.au/~gjm/mix_soft/EMMIX-JCM/.
Score distributions of gapped multiple sequence alignments down to the low-probability tail

NASA Astrophysics Data System (ADS)

Fieth, Pascal; Hartmann, Alexander K.

2016-08-01

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Forecasting Solar Flares Using Magnetogram-based Predictors and Machine Learning

NASA Astrophysics Data System (ADS)

Florios, Kostas; Kontogiannis, Ioannis; Park, Sung-Hong; Guerra, Jordan A.; Benvenuto, Federico; Bloomfield, D. Shaun; Georgoulis, Manolis K.

2018-02-01

We propose a forecasting approach for solar flares based on data from Solar Cycle 24, taken by the Helioseismic and Magnetic Imager (HMI) on board the Solar Dynamics Observatory (SDO) mission. In particular, we use the Space-weather HMI Active Region Patches (SHARP) product that facilitates cut-out magnetograms of solar active regions (AR) in the Sun in near-realtime (NRT), taken over a five-year interval (2012 - 2016). Our approach utilizes a set of thirteen predictors, which are not included in the SHARP metadata, extracted from line-of-sight and vector photospheric magnetograms. We exploit several machine learning (ML) and conventional statistics techniques to predict flares of peak magnitude {>} M1 and {>} C1 within a 24 h forecast window. The ML methods used are multi-layer perceptrons (MLP), support vector machines (SVM), and random forests (RF). We conclude that random forests could be the prediction technique of choice for our sample, with the second-best method being multi-layer perceptrons, subject to an entropy objective function. A Monte Carlo simulation showed that the best-performing method gives accuracy ACC=0.93(0.00), true skill statistic TSS=0.74(0.02), and Heidke skill score HSS=0.49(0.01) for {>} M1 flare prediction with probability threshold 15% and ACC=0.84(0.00), TSS=0.60(0.01), and HSS=0.59(0.01) for {>} C1 flare prediction with probability threshold 35%.
Directly administered antiretroviral therapy for HIV-infected drug users does not have an impact on antiretroviral resistance: results from a randomized controlled trial.

PubMed

Maru, Duncan Smith-Rohrberg; Kozal, Michael J; Bruce, R Douglas; Springer, Sandra A; Altice, Frederick L

2007-12-15

Directly administered antiretroviral therapy (DAART) is an effective intervention that improves clinical outcomes among HIV-infected drug users. Its effects on antiretroviral drug resistance, however, are unknown. We conducted a community-based, prospective, randomized controlled trial of DAART compared with self-administered therapy (SAT). We performed a modified intention-to-treat analysis among 115 subjects who provided serum samples for HIV genotypic resistance testing at baseline and at follow-up. The main outcomes measures included total genotypic sensitivity score, future drug options, number of new drug resistance mutations (DRMs), and number of new major International AIDS Society (IAS) mutations. The adjusted probability of developing at least 1 new DRM did not differ between the 2 arms (SAT: 0.41 per person-year [PPY], DAART: 0.49 PPY; adjusted relative risk [RR] = 1.04; P = 0.90), nor did the number of new mutations (SAT: 0.76 PPY, DAART: 0.83 PPY; adjusted RR = 0.99; P = 0.99) or the probability of developing new major IAS new drug mutations (SAT: 0.30 PPY, DAART: 0.33 PPY; adjusted RR = 1.12; P = 0.78). On measures of GSS and FDO, the 2 arms also did not differ. In this trial, DAART provided on-treatment virologic benefit for HIV-infected drug users without affecting the rate of development of antiretroviral medication resistance.
Random Photon Absorption Model Elucidates How Early Gain Control in Fly Photoreceptors Arises from Quantal Sampling

PubMed Central

Song, Zhuoyi; Zhou, Yu; Juusola, Mikko

2016-01-01

Many diurnal photoreceptors encode vast real-world light changes effectively, but how this performance originates from photon sampling is unclear. A 4-module biophysically-realistic fly photoreceptor model, in which information capture is limited by the number of its sampling units (microvilli) and their photon-hit recovery time (refractoriness), can accurately simulate real recordings and their information content. However, sublinear summation in quantum bump production (quantum-gain-nonlinearity) may also cause adaptation by reducing the bump/photon gain when multiple photons hit the same microvillus simultaneously. Here, we use a Random Photon Absorption Model (RandPAM), which is the 1st module of the 4-module fly photoreceptor model, to quantify the contribution of quantum-gain-nonlinearity in light adaptation. We show how quantum-gain-nonlinearity already results from photon sampling alone. In the extreme case, when two or more simultaneous photon-hits reduce to a single sublinear value, quantum-gain-nonlinearity is preset before the phototransduction reactions adapt the quantum bump waveform. However, the contribution of quantum-gain-nonlinearity in light adaptation depends upon the likelihood of multi-photon-hits, which is strictly determined by the number of microvilli and light intensity. Specifically, its contribution to light-adaptation is marginal (≤ 1%) in fly photoreceptors with many thousands of microvilli, because the probability of simultaneous multi-photon-hits on any one microvillus is low even during daylight conditions. However, in cells with fewer sampling units, the impact of quantum-gain-nonlinearity increases with brightening light. PMID:27445779

Some links on this page may take you to non-federal websites. Their policies may differ from this site.